Class time
May 12 - August 9, 2021
Day |
Time (Pacific time) |
Tuesday |
2:30-4:20 pm |
Friday |
2:30-3:20 pm |
Instructor and TAs
Role |
Name |
Email |
Office hour (Pacific time) |
Instructor |
Jian Pei |
jpei@cs.sfu.ca |
Fridays 9:00 - 10:20 am Zoom link |
TA |
Xuan Luo |
xuan_luo@sfu.ca |
Wednesdays 6:00 - 7:00 pm Zoom link |
TA |
Lakshay Sethi |
lakshay_sethi@sfu.ca |
Mondays 5:00 - 6:00 pm Zoom link |
About this course
As a data scientist, what should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, essential ideas, fundamental algorithms, implementations, and applications.
Prerequisites
- Comprehensive understanding and fluent skills in data structures, such as linked data structures, B-trees, and hash functions.
- Analysis of algorithms and time complexity.
- Operating systems, main memory and disk management, file systems.
- Elementary probability theory and statistics, such as random variables, distributions, probability mass functions, sampling, and statistical tests.
Textbook and references
- (Official textbook) Data Mining: Concepts and Techniques (3rd ed.), Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2011.
- Drafts of some chapters in the 4th ed. will be available for interested students.
- A combination of live zoom meeting sessions and pre-recorded lectures. Students are required to attend the live sessions. See the schedule section for details.
- Live sessions of classes through zoom CMPT 459 zoom.
- Instructor’s office hours through zoom Jian’s office hour zoom.
- TAs’ office hours through zoom TAs’ office hour zoom.
- In-class exams.
Course outline
- Welcome and introcution
- Business intelligence (data warehousing, OLAP and data lakes)
- Mining patterns and rules
- Predictive analytics
- Clustering analytics
- Outlier detection
Assignments (tentative, subject to changes)
Assignment |
Release date |
Due date |
Assignment 1 (Data warehousing and frequent pattern mining) |
May 20 |
June 9 |
Assignment 2 (predictive analytics) |
June 14 |
June 30 |
Assignment 3 (clustering analysis) |
July 6 |
July 21 |
Assignment 4 (Outlier detection and advanced topics) |
July 21 |
August 6 |
Schedule (tentative, subject to changes)
Date |
Live or pre-recorded |
Topic |
May 14 |
Live |
Introduction (Chapter 1) [slides] [video] [live session recording] |
May 18 |
Pre-recorded |
Data warehousing and OLAP (Chapters 4 & 5) [slides] [video] |
May 21 |
Pre-recorded |
Data lakes and enterprise data infrastructure [slides] [video] |
May 25 |
Pre-recorded |
Frequent pattern mining (1) (Chapters 6.1 & 6.2) [slides] [video] |
May 28 |
Pre-recorded |
Frequent pattern mining (2) (Chapters 6.2.6 & 7.3) [slides] [video] |
June 1 |
Pre-recorded |
Frequent pattern mining (3) (Chapters 6.3 & this paper) [slides] [video] |
June 4 |
Live |
Review for Exam 1 [slides] [live session recording] |
June 8 |
Pre-recorded |
Advanced topic: Trustworthy data science [slides] [video] |
June 11 |
Live |
Exam 1 |
June 15 |
Pre-recorded |
Classification (1) (Chapters 8.1 & 8.2) [slides] [video] |
June 18 |
Pre-recorded |
Classification (2) (Chapters 8.5 & 8.6) [slides] [video] |
June 22 |
Pre-recorded |
Classification (3) (Chapters 8.3, 9.2, 9.3, 9.5) [slides] [video] |
June 25 |
Live |
Review for Exam 2 [slides] [video] |
June 29 |
Pre-recorded |
Clustering (1) (Chapters 10.1 & 10.2) [slides] [video] |
July 2 |
Live |
Exam 2 |
July 6 |
Pre-recorded |
Clustering (2) (Chapters 10.3, 10.4 & 10.5) [slides] [video] |
July 9 |
Pre-recorded |
Clustering (3) (Chapters 11.2.3 & 10.6) [slides] [video] |
July 13 |
Pre-recorded |
Outlier detection (1) (Chapters 12.1, 12.2 & 12.3) [slides] [video] |
July 16 |
Pre-recorded |
Outlier detection (2) (Chapters 12.4, 12.5, 12.6, 12.7 & 12.8) [slides] [video] |
July 20 |
Pre-recorded |
Introduction to deep learning (1) [slides] [video] |
July 23 |
Live |
Review for Exam 3 [slides] [video] |
July 27 |
Pre-recorded |
Introduction to deep learning (2) [slides] [video] |
July 30 |
Live |
Exam 3 |
August 3 |
Pre-recorded |
Data Pricing and Data Asset Management [video] |
August 6 |
Live |
Summary and welfare |