Class time
May 12 - August 9, 2021
Day |
Time |
Tuesday |
10:30 am - 12:20 pm |
Friday |
10:30 - 11:20 am |
Instructor and TAs
Role |
Name |
Email |
Office hour |
Instructor |
Jian Pei |
jpei@cs.sfu.ca |
Fridays 9:00 - 10:20 am Zoom link |
TA |
Saghar Irandoust |
sirandou@sfu.ca |
Tuesdays 5:30-6:30 pm Zoom link |
About this course
What should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, fundamental algorithms, implementations, and applications.
Prerequisites
- Comprehensive understanding and fluent skills in data structures, such as linked data structures, B-trees, and hash functions.
- Analysis of algorithms and time complexity.
- Operating systems, main memory and disk management, file systems.
- Elementary probability theory and statistics, such as random variables, distributions, probability mass functions, sampling, and statistical tests.
Textbook and references
- (Official textbook) Data Mining: Concepts and Techniques (3rd ed.), Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann, 2011.
- Drafts of some chapters in the 4th ed. will be available for interested students.
- A combination of live zoom meeting sessions and pre-recorded lectures. Students are required to attend the live sessions. See the schedule section for details.
- Live sessions of classes through zoom CMPT 741 Zoom.
- Instructor’s office hours through zoom Jian’s office hour zoom.
- TAs’ office hours through zoom TAs’ office hour zoom.
- In-class exams.
Course outline
- Welcome and introcution
- Mining patterns and rules
- Predictive analytics
- Clustering analytics
- Outlier detection
- Advanced topics
Assignments (tentative, subject to changes)
Assignment |
Release date |
Due date |
Assignment 1 (mining patterns and rules) |
May 25 |
June 16 |
Assignment 2 (predictive analytics, clustering analysis outlier detection) |
June 21 |
July 14 |
Schedule (tentative, subject to changes)
Date |
Live or pre-recorded |
Topic |
May 14 |
Live |
Introduction (Chapter 1) [slides] [video] [Live session recording] |
May 18 |
Pre-recorded |
Frequent pattern mining (1) (Chapters 6.1 & 6.2) [slides] [video] |
May 21 |
Pre-recorded |
Frequent pattern mining (2) (Chapters 6.2.6 & 7.3) [slides] [video] |
May 25 |
Pre-recorded |
Frequent pattern mining (3) (Chapters 6.3 & this paper) [slides] [video] |
May 28 |
Pre-recorded |
Classification (1) (Chapters 8.1 & 8.2) [slides] [video] |
June 1 |
Pre-recorded |
Classification (2) (Chapters 8.5 & 8.6) [slides] [video] |
June 4 |
Pre-recorded |
Classification (3) (Chapters 8.3, 9.2, 9.3, 9.5) [slides] [video] |
June 8 |
Pre-recorded + Live |
Trustworthy data science: [slides, video] Course project kick-off [slides, live session recording] |
June 11 |
Live |
Review for Exam 1 [slides] [Live session recording] |
June 15 |
Pre-recorded |
Clustering (1) (Chapters 10.1 & 10.2) [slides] [video] |
June 18 |
Live |
Exam 1 |
June 22 |
Pre-recorded |
Clustering (2) (Chapters 10.3, 10.4 & 10.5) [slides] [video] |
June 25 |
Pre-recorded |
Clustering (3) (Chapters 11.2.3 & 10.6) [slides] [video] |
June 29 |
Pre-recorded |
Outlier detection (1) (Chapters 12.1, 12.2 & 12.3) [slides] [video] |
July 2 |
Live |
Course project check-up |
July 6 |
Pre-recorded |
Outlier detection (2) (Chapters 12.4, 12.5, 12.6, 12.7 & 12.8) [slides] [video] |
July 9 |
Live |
Review for Exam 2 [slides] [Live session recording] |
July 13 |
Pre-recorded |
Introduction to deep learning (1) [slides] [video] |
July 16 |
Pre-recorded |
Introduction to deep learning (2) [slides] [video] |
July 20 |
Pre-recorded |
Data Pricing (1) [slide] [video] |
July 23 |
Live |
Exam 2 |
July 27 |
Pre-recorded |
Data Pricing (2) [slide] [video] |
July 30 |
Live |
Project presentations |
August 3 |
Live |
Project presentations |
August 6 |
Live |
Project presentations |
References for course project
- Data/model markets & pricing
- Interpretability
- W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, Bin Yu, Definitions, methods, and applications in interpretable machine learning,
Proceedings of the National Academy of Sciences Oct 2019, 116 (44) 22071-22080; DOI: 10.1073/pnas.1900654116
- Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 2019, 8, 832. https://doi.org/10.3390/electronics8080832
- Fairness/bias