Class time

May 12 - August 9, 2021

Day	Time
Tuesday	10:30 am - 12:20 pm
Friday	10:30 - 11:20 am

Instructor and TAs

Role	Name	Email	Office hour
Instructor	Jian Pei	jpei@cs.sfu.ca	Fridays 9:00 - 10:20 am Zoom link
TA	Saghar Irandoust	sirandou@sfu.ca	Tuesdays 5:30-6:30 pm Zoom link

About this course

What should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, fundamental algorithms, implementations, and applications.

Prerequisites

Comprehensive understanding and fluent skills in data structures, such as linked data structures, B-trees, and hash functions.
Analysis of algorithms and time complexity.
Operating systems, main memory and disk management, file systems.
Elementary probability theory and statistics, such as random variables, distributions, probability mass functions, sampling, and statistical tests.

Textbook and references

(Official textbook) Data Mining: Concepts and Techniques (3rd ed.), Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2011.
Drafts of some chapters in the 4th ed. will be available for interested students.

Format and Zoom link

A combination of live zoom meeting sessions and pre-recorded lectures. Students are required to attend the live sessions. See the schedule section for details.
Live sessions of classes through zoom CMPT 741 Zoom.
Instructor’s office hours through zoom Jian’s office hour zoom.
TAs’ office hours through zoom TAs’ office hour zoom.
In-class exams.

Course outline

Welcome and introcution
Mining patterns and rules
Predictive analytics
Clustering analytics
Outlier detection
Advanced topics

Assignments (tentative, subject to changes)

Assignment	Release date	Due date
Assignment 1 (mining patterns and rules)	May 25	June 16
Assignment 2 (predictive analytics, clustering analysis outlier detection)	June 21	July 14

Schedule (tentative, subject to changes)

Date	Live or pre-recorded	Topic
May 14	Live	Introduction (Chapter 1) [slides] [video] [Live session recording]
May 18	Pre-recorded	Frequent pattern mining (1) (Chapters 6.1 & 6.2) [slides] [video]
May 21	Pre-recorded	Frequent pattern mining (2) (Chapters 6.2.6 & 7.3) [slides] [video]
May 25	Pre-recorded	Frequent pattern mining (3) (Chapters 6.3 & this paper) [slides] [video]
May 28	Pre-recorded	Classification (1) (Chapters 8.1 & 8.2) [slides] [video]
June 1	Pre-recorded	Classification (2) (Chapters 8.5 & 8.6) [slides] [video]
June 4	Pre-recorded	Classification (3) (Chapters 8.3, 9.2, 9.3, 9.5) [slides] [video]
June 8	Pre-recorded + Live	Trustworthy data science: [slides, video] Course project kick-off [slides, live session recording]
June 11	Live	Review for Exam 1 [slides] [Live session recording]
June 15	Pre-recorded	Clustering (1) (Chapters 10.1 & 10.2) [slides] [video]
June 18	Live	Exam 1
June 22	Pre-recorded	Clustering (2) (Chapters 10.3, 10.4 & 10.5) [slides] [video]
June 25	Pre-recorded	Clustering (3) (Chapters 11.2.3 & 10.6) [slides] [video]
June 29	Pre-recorded	Outlier detection (1) (Chapters 12.1, 12.2 & 12.3) [slides] [video]
July 2	Live	Course project check-up
July 6	Pre-recorded	Outlier detection (2) (Chapters 12.4, 12.5, 12.6, 12.7 & 12.8) [slides] [video]
July 9	Live	Review for Exam 2 [slides] [Live session recording]
July 13	Pre-recorded	Introduction to deep learning (1) [slides] [video]
July 16	Pre-recorded	Introduction to deep learning (2) [slides] [video]
July 20	Pre-recorded	Data Pricing (1) [slide] [video]
July 23	Live	Exam 2
July 27	Pre-recorded	Data Pricing (2) [slide] [video]
July 30	Live	Project presentations
August 3	Live	Project presentations
August 6	Live	Project presentations

References for course project

CMPT 741: Data Mining