CS 6220
Data Mining Techniques
Fall 2015
Tuesdays 11:45 - 1:25am, Thursdays 2:50-4:30pm, Shillman Hall 210
CS 6220
Data Mining Techniques
Fall 2015
Tuesdays 11:45 - 1:25am, Thursdays 2:50-4:30pm, Shillman Hall 210
Instructor: Olga Vitek, o.vitek@neu.edu
Office hours: WVH 310F, Tuesdays 1:30-2:30pm or by appointment
Phone: (617) 373-6305
Mailbox: WVH 202
Teaching assistant: Aida Ehyaei, ehyaei.a@husky.neu.edu
Office hours: WVH 362, 9:00-10:00am
Admin: Syllabus, Piazza, Blackboard. Project guidelines.
R: CRAN, reference, search, cookbook, graphics cookbook, bloggers.
Textbooks: Han et al., Aggarwal, James et al.
Introduction and R
Data mining: goals and tasks, data types.
R: data reading, manipulation, visualization.
Thursday, Sep 10: Hw1 out.
Tuesday, Sep 15: Updates Lecture notes. R example code and slides. Reading: Tidy data.
Simple linear regression
Simple linear regression: parameter estimation, regression vs correlation.
Thursday, Sep 17: Lecture notes and R. Reading: JWHT Ch. 3. Hw1 solution outline. Hw2 out.
Tuesday, Sep 22:
Multiple linear regression
Parameter estimation, bias-variance trade off.
Cross-validation, bootstrap. Variable selection. Regularization.
Reading: JWHT Ch 5, Ch. 6.1 and Ch. 6.5.
Thursday, Sep 24: Lecture notes and R. Hw2 solution outline. Hw3 out.
Tuesday, Sep 29:
Thursday, Oct 1: Lecture notes, dataset and R. Hw3 due. Hw4 out.
Tuesday, Oct 6:
Thursday, Oct 8: Lecture notes and R. Project groups due. Hw4 due. Hw5 out.
Supervised class prediction: (generalized) linear models
Logistic regression. Model evaluation: ROC curves.
Reading: JWHT Sec. 4.3
Tuesday, Oct 13:
Thursday, Oct 15: Lecture notes, R and data. Hw5 due. Hw6 out, and dataset.
Tuesday, Oct 20:
Unsupervised class discovery
Data reduction: principle component analysis (PCA) and singular value decomposition (SVD).
Similarity measures and cluster analysis
Reading: JWHT Ch .10 ad Sec 6.3.1
Thursday, Oct 22:
Friday Oct 23: Hw6 due.
Tuesday, Oct 27:
Thursday, Oct 29: Midterm exam: solutions and grades.
Tuesday, Nov 3: Project proposals due. Lecture notes. Hw7 out, and dataset.
Supervised class prediction: non-linear models
Tree-based classifiers, random forest.
Reading: JWHT Ch .8
Thursday, Nov 5: Guest lecture by Mr. Robert Ness, staff scientist, CCIS.
Tuesday, Nov 10: Review: unsupervised class discovery example code and data.
Wednesday Nov 11: Hw7 due.
Thursday, Nov 12: Hw8 out, and dataset. Example R code.
Tuesday, Nov 17:
Thursday, Nov 19: Guest lecture by Mr. Paul Grosu, MS candidate, CCIS. See notes on Piazza.
Tuesday, Nov 24: Guest lecture by Prof. Jan Vitek, CCIS. Lecture notes. Hw8 due.
Thursday, Nov 26: No class, Thanksgiving.
Topics on graph mining
Reading: Practical graph mining with R.
Tuesday, Dec 1: Project report due.
R example. Book Ch5 and yeast case study on Piazza.
Thursday, Dec 3: Hw9 out.
Tuesday, Dec 8:
Thursday, Dec 10: No class, reading day. Project reviews due. Hw9 due.
Tuesday, Dec 15, 11:45am-1:25pm: In-class final exam grades.
Tentative schedule and handouts