Class time

May 12 - August 9, 2021

Day Time (Pacific time)
Tuesday 2:30-4:20 pm
Friday 2:30-3:20 pm

Instructor and TAs

Role Name Email Office hour (Pacific time)
Instructor Jian Pei jpei@cs.sfu.ca Fridays 9:00 - 10:20 am Zoom link
TA Xuan Luo xuan_luo@sfu.ca Wednesdays 6:00 - 7:00 pm Zoom link
TA Lakshay Sethi lakshay_sethi@sfu.ca Mondays 5:00 - 6:00 pm Zoom link

About this course

As a data scientist, what should you do when you are facing a huge amount of complicated data from real life applications? This course introduces the core techniques in big data analytics, namely knowledge discovery in databases (KDD), also known as data mining (DM). It focuses on the principles, essential ideas, fundamental algorithms, implementations, and applications.

Prerequisites

Textbook and references

Format and Zoom link

Course outline

  1. Welcome and introcution
  2. Business intelligence (data warehousing, OLAP and data lakes)
  3. Mining patterns and rules
  4. Predictive analytics
  5. Clustering analytics
  6. Outlier detection

Assignments (tentative, subject to changes)

Assignment Release date Due date
Assignment 1 (Data warehousing and frequent pattern mining) May 20 June 9
Assignment 2 (predictive analytics) June 14 June 30
Assignment 3 (clustering analysis) July 6 July 21
Assignment 4 (Outlier detection and advanced topics) July 21 August 6

Schedule (tentative, subject to changes)

Date Live or pre-recorded Topic
May 14 Live Introduction (Chapter 1) [slides] [video] [live session recording]
May 18 Pre-recorded Data warehousing and OLAP (Chapters 4 & 5) [slides] [video]
May 21 Pre-recorded Data lakes and enterprise data infrastructure [slides] [video]
May 25 Pre-recorded Frequent pattern mining (1) (Chapters 6.1 & 6.2) [slides] [video]
May 28 Pre-recorded Frequent pattern mining (2) (Chapters 6.2.6 & 7.3) [slides] [video]
June 1 Pre-recorded Frequent pattern mining (3) (Chapters 6.3 & this paper) [slides] [video]
June 4 Live Review for Exam 1 [slides] [live session recording]
June 8 Pre-recorded Advanced topic: Trustworthy data science [slides] [video]
June 11 Live Exam 1
June 15 Pre-recorded Classification (1) (Chapters 8.1 & 8.2) [slides] [video]
June 18 Pre-recorded Classification (2) (Chapters 8.5 & 8.6) [slides] [video]
June 22 Pre-recorded Classification (3) (Chapters 8.3, 9.2, 9.3, 9.5) [slides] [video]
June 25 Live Review for Exam 2 [slides] [video]
June 29 Pre-recorded Clustering (1) (Chapters 10.1 & 10.2) [slides] [video]
July 2 Live Exam 2
July 6 Pre-recorded Clustering (2) (Chapters 10.3, 10.4 & 10.5) [slides] [video]
July 9 Pre-recorded Clustering (3) (Chapters 11.2.3 & 10.6) [slides] [video]
July 13 Pre-recorded Outlier detection (1) (Chapters 12.1, 12.2 & 12.3) [slides] [video]
July 16 Pre-recorded Outlier detection (2) (Chapters 12.4, 12.5, 12.6, 12.7 & 12.8) [slides] [video]
July 20 Pre-recorded Introduction to deep learning (1) [slides] [video]
July 23 Live Review for Exam 3 [slides] [video]
July 27 Pre-recorded Introduction to deep learning (2) [slides] [video]
July 30 Live Exam 3
August 3 Pre-recorded Data Pricing and Data Asset Management [video]
August 6 Live Summary and welfare