**Time and Location:** Thursdays from 6:00 pm to 9:00 pm in Behrakis Health Sciences Cntr 325

**Instructor**: Lu Wang, Office 448 WVH

**Staff and Office Hours**:

- Prof. Lu Wang, Thursdays from 4:30pm to 5:30pm, or by appointment, 448 WVH
- Gabriel Bakiewicz (TA, email: gbakie@ccs.neu.edu), Mondays and Tuesdays from 4:00pm to 5:00pm, 362 WVH

**Discussion Forum**: Piazza, sign up at http://piazza.com/northeastern/spring2016/cs6140

- Regression: linear regression, logistic regression
- Dimensionality Reduction: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis
- Probabilistic Models: Naive Bayes, maximum likelihood estimation, bayesian inference
- Statistical Learning Theory: VC dimension
- Kernels: Support Vector Machines (SVMs), kernel tricks, duality
- Sequential Models and Structural Models: Hidden Markov Model (HMM), Conditional Random Fields (CRFs)
- Clustering: spectral clustering, hierachical clustering
- Latent Variable Models: K-means, mixture models, expectation-maximization (EM) algorithms, Latent Dirichlet Allocation (LDA), representation learning
- Deep Learning: feedforward neural network, restricted Boltzmann machine, autoencoders, recurrent neural network, convolutional neural network
- and others, including advanced topics for machine learning in natural language processing and text analysis

- Main Textbook: Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012.
- Other References:
- Christopher M. Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
- Tom Mitchell, "Machine Learning", McGraw Hill, 1997.

This course is designed for graduate students majoring in computer science, applied math, and other related areas. Students who take this coruse are expected to be able to write code in some programming languages (e.g. Python, Java, C/C++) proficiently, and finish courses in algorithms (CS 5800 or CS 7800), multivariable calculus, probability, statistics, and linear algebra.

Each assignment or report, both electronic copy and hard copy, is due at the beginning of class on the corresponding due date. Hard copies are submitted in class. Assignment or report turned in late will be charged 10 points (out of 100 points) off for each late day (i.e. 24 hours). Each student has a budget of 5 days throughout the semester before a late penalty is applied. You may want to use it wisely, e.g. save for emergencies.

Grades will be determined based on three assignments, one course project, one open-book exam, and participation:

- Assignments (30%): three assignments, each of 10%
- Project (35%): team of 2 to 3 students, reports (5%+10%+10%), final presentation (10%)
- Exam (30%): open-book
- Participation (5%): classes, Piazza

- Topic: Introduction, basic concepts, K-nearest neighbors, linear regression, ridge regression
- Slides: [Download] [6pp version]
- Reading: Murphy CH 1, 2, 7
- TODO: start thinking about projects and looking for teammates

- Topic: Logistic Regression, Decision Tree, Generative Models (Naive Bayes)
- Slides: [Download] [6pp version]
- Reading: Murphy CH 3, 8.1-8.3, 8.6, 16.2
- TODO: assignment 1 is released [pdf] [dataset(.zip)]

- Topic: Bayesian Statistics and Frequentist Statistics
- Project proposal report due
- Slides: [Download] [6pp version]
- Reading: Murphy CH 5.1-5.3, 6.1, 6.4

- Topic: Perceptron, Support Vector Machines, Kernels, Statistical Learning Theory
- Slides: [Download] [6pp version]
- Assignment 1 due

- Topic: Deep Learning
- Slides: [Download] [6pp version]
- TODO: assignment 2 is released [pdf] [dataset(.zip)]

- Topic: Deep Learning
- Slides: [Download] [6pp version]

- Topic: Dimensionality Reduction
- Slides: [Download] [6pp version]
- Assignment 2 (part 1) due
- Assignment 2 (part 2) due on Mar 2

- Topic: Clustering
- Slides: [Download] [6pp version]

- Project progress report due (no hard copy required)
- TODO: assignment 3 is released [pdf] [dataset(.zip)]

- Topic: Structured Output Prediction
- Slides: [Download] [6pp version]

- Topic: Mixture Models and Expectation Maximization
- Slides: [Download] [6pp version]
- Exam guideline: [Download]
- Assignment 3 due

- Topic: Exam

- Topic: Representation Learning
- Slides: [Download] [6pp version]

- Topic: Course Project Presentation
- Project final report due on April 18 (no hard copy required)

This course follows the Northeastern University Academic Integrity Policy. All students in this course are expected to abide by the Academic Integrity Policy. Any work submitted by a student in this course for academic credit should be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, fabrication, plagiarism) will be reported.