CS6220/DS5230 Unsupervised Data Mining, Summer 2024

     About CS6220     Home     Schedule     Piazza     Gradescope    DM Resources

* Schedule and materials subject to change
-->
Module / Live Stream Topic / Recorded.Lecture Other Reading Assignment
  • 5/8
  • Module 1 : Data Basics, Similarity, KNN
    Week 1 : Intro, Data Features, Mining Rules

  • 5/15
Slides: Distance and Similarity
Paper: Distance / Similarity Measures
  • [A] ch 3
5/22
Module 2: Clustering
Week 3 : KMeans
Lecture 4 Notes

Slides: Intro to Clustering
Cluster Evaluation (Aggarwal)
Cluster Evaluation (Stanford NLP)
5/29
Week 4 : soft KMeans / Gaussian Mixture EM
Lecture 5 Notes
Lecture 6 Notes


Notes: Gaussian Mixtures

Mixture Matlab code
6/5
Week 5 : Hierarchical, DBScan
Lecture 7 Notes
Ward distance


6/12
Module 3: Dim Reduction, Feature Selection
Week 6 : PCA, feature Selection
Lecture 8 (PCA, kernelPCA)


Notes: PCA

Class notes (handwritten+ DHS book): PCA

PCA demo (Matlab): PCA
Kernel PCA (slides)
Kernels for ML (article)

optional:
UMAP Dimension Reduction , UMAP-paper
6/19
Week 7 : tSNE, Feature Selection
Lecture 9 Notes

Paper: Harr Features

Notes:ChiSquare_FeatureSelection
Wikipedia: Mutual Information

Slides: tSNE / paper / implementation

(optional) tSNE gradient calculation
StanfordNLP: ChiSquare Feature Selection
StanfordNLP: Mutual Information Feature Selection
Paper: Feature Section for Gaussian Mixtures
6/26
Week 8 : Supervised Classification
Lecture 10 Notes
Linear Regression

Notes: Linear Regression
Notes: Logistic Regression
Notes: Regression Regularization

7/10
Module 3: Classification
Week 8 : Supervised Classification

Neural Networks

Decision Trees Lecture 12 Notes
Decision Notes (Virgil)
Boosting Notes



Notes: Decision Trees

Notes: Perceptrons, Neural Networks
Slides (Mitchell book): Neural Networks

TF Visualizer (toy data)
NN interactive tutorial

Word2Vec Tutorial
Word2Vec paper 1
Word2Vec paper 2
  • HW4
  • HW4-PB2, due 7/12
7/17
Week 11 : Summarization
Lecture 15 NMF
Lecture 16 Summarization


Paper: Text Summarization Survey
Paper: Topic Modeling Summarization
Paper: ROUGE Evaluation for Summaries
Slides: ROUGE
IR/Linguistics old paper: Automatic Abstracts

Summarization basics
7/24
Module 4: Text Modeling
Week 9 : Topic Models, LDA
Lecture 13 Notes
Lecture 14 Notes

Lecture 15 NMF


Slides: NMF
paper: NMF
Slides: LDA
paper: LDA simplified
paper: LDA
More Slides: LDA
paper: Bayesian Parameter Estimation for text

Book on Implementing LDA with R code

paper: LDA vs NMF
7/31
Week 10 : Sampling
Lecture 17 Markov chains

Lecture 18: Sampling

Stevens Method: Sample Non-uniform Without Repetition

Sampling Basics (Matlab)
Rejection Sampling
Inverse Transform Sampling
Book: Un-uniform Sampling Procedures

Gibbs Sampling for LDA
Sampling MC/ Gibbs Demo


paper: Gibbs explained
8/7
Module 5: Graphs/ Social Mining
Week 12 : Social Graphs
Lecture on PageRank, Markov Chain
Lecture 19 Graph Intro/Communities
Lecture 20 Graph Communities

Textbook: Aggarwal, Data Mining, ch 18-19
Slides: Girvan - Newman Algorithm

Python Community Visualization
Paper1: Girvan - Newman Algorithm
Paper2: Girvan - Newman Algorithm
Paper3: Girvan - Newman Algorithm
  • HW6
  • Due: 8/13
8/7
Week 13 : Social Mining
Lecture 18 Collab Filtering
Lecture 21 KB-QA

Textbook: Aggarwal, Data Mining, ch 18-19
Notes: collaborative fiiltering basic formula
Slides: Netflix User Profiles

FINAL EXAM 8/14 2pm-6pm in class
You will need a computer for the exam problems, and might be called to explain/demo your code after.
Submit a copy of your code on gradescope together with running instructions.