CS 6220

Data Mining Techniques

Fall 2015

Tuesdays 11:45 - 1:25am, Thursdays 2:50-4:30pm, Shillman Hall 210

Instructor: Olga Vitek, o.vitek@neu.edu

Office hours: WVH 310F, Tuesdays 1:30-2:30pm or by appointment

Phone: (617) 373-6305

Mailbox: WVH 202


Teaching assistant: Aida Ehyaei, ehyaei.a@husky.neu.edu

Office hours: WVH 362, 9:00-10:00am


Admin: Syllabus, Piazza, Blackboard. Project guidelines.

R: CRAN, reference, search, cookbook, graphics cookbook, bloggers.  

Textbooks: Han et al., Aggarwal, James et al.

Introduction and R

Data mining: goals and tasks, data types.

R: data reading, manipulation, visualization.

Thursday, Sep 10: Hw1 out.

Tuesday, Sep 15: Updates Lecture notes. R example code and slides. Reading: Tidy data.


Simple linear regression

Simple linear regression: parameter estimation, regression vs correlation.

Thursday, Sep 17: Lecture notes and R. Reading: JWHT Ch. 3. Hw1 solution outline. Hw2 out.

Tuesday, Sep 22:


Multiple linear regression

Parameter estimation, bias-variance trade off.

Cross-validation, bootstrap. Variable selection. Regularization.

Reading: JWHT Ch 5, Ch. 6.1 and Ch. 6.5.

Thursday, Sep 24: Lecture notes and R. Hw2 solution outline. Hw3 out.

Tuesday, Sep 29:

Thursday, Oct 1: Lecture notes, dataset and R. Hw3 due. Hw4 out.

Tuesday, Oct 6:

Thursday, Oct 8: Lecture notes and R. Project groups due. Hw4 due. Hw5 out.


Supervised class prediction: (generalized) linear models

Logistic regression. Model evaluation: ROC curves.

Reading: JWHT Sec. 4.3

Tuesday, Oct 13:

Thursday, Oct 15: Lecture notes, R and data. Hw5 due. Hw6 out, and dataset.

Tuesday, Oct 20:


Unsupervised class discovery

Data reduction: principle component analysis (PCA)  and singular value decomposition (SVD).

Similarity measures and cluster analysis

Reading: JWHT Ch .10 ad Sec 6.3.1

Thursday, Oct 22:

Friday Oct 23: Hw6 due.

Tuesday, Oct 27:

Thursday, Oct 29: Midterm exam: solutions and grades.

Tuesday, Nov 3: Project proposals due. Lecture notes. Hw7 out, and dataset.


Supervised class prediction: non-linear models

Tree-based classifiers, random forest.

Reading: JWHT Ch .8

Thursday, Nov 5: Guest lecture by Mr. Robert Ness, staff scientist, CCIS.


Tuesday, Nov 10: Review: unsupervised class discovery example code and data.

Wednesday Nov 11: Hw7 due.

Thursday, Nov 12:  Hw8 out, and dataset. Example R code.


Tuesday, Nov 17:

Thursday, Nov 19: Guest lecture by Mr. Paul Grosu, MS candidate, CCIS. See notes on Piazza.


Tuesday, Nov 24: Guest lecture by Prof. Jan Vitek, CCIS. Lecture notes. Hw8 due.

Thursday, Nov 26: No class, Thanksgiving.


Topics on graph mining

Reading: Practical graph mining with R.

Tuesday, Dec 1: Project report due.

R example. Book Ch5 and yeast case study on Piazza.

Thursday, Dec 3: Hw9 out.


Tuesday, Dec 8:

Thursday, Dec 10: No class, reading day. Project reviews due. Hw9 due.


Tuesday, Dec 15, 11:45am-1:25pm: In-class final exam grades.

Overall course grades.

Tentative schedule and handouts