CS 7280

Statistics for Big Data

Spring 2015

MTh 11:45 - 1:25am, Cargill Hall 094

Instructor: Olga Vitek

Email: o.vitek@neu.edu

Office: WVH 313, Mondays 1:30-2:30 or by appointment

Phone: (617) 373-6305

Mailbox: 102 HT


Teaching assistant: Paul Grosu

Email: pgrosu@gmail.com

Office: Wednesdays 5:00-6:00pm WVH 164, or by appointment


Admin: Syllabus, Piazza, Blackboard

R: CRAN, reference, search. RStudio.

Statistics texts: Kutner et al., 5th Ed., Agresti 3rd Ed.

R texts:  Venable & Ripley, 4th Ed., James et al., Faraway.

Introduction

Mon, Jan 12: Notes. Survey answers.


Simple linear regression

Inference basics: estimation, testing, prediction. R

Thu,  Jan 15: KNNL Ch1-Ch3. Hw1 out.

Mon, Jan 19:  MLK day, no class, no office hours

Thu, Jan 22: Updated notes. Hw1 due. Hw2 out.

Mon, Jan 26: Snow day.


Quality of model fit. Single-variable screening

Deviations from assumptions. Associations vs causality, confounding. A/B testing.

Thu, Jan 29: Notes. KNNL Ch4. Hw2 due. Hw3 out.

Mon, Feb 2: Snow day.

Thu, Feb 5: Notes.


Mon, Feb 9: Snow day. Practice midterm problems and solutions.

                    Hw3 due by email on Tuesday Feb 10, noon.

Thu, Feb 12: Midterm 1 solutions and grades. Hw4 out. KNNL Ch5-Ch6. Project guidelines.


Multivariate linear regression

Model interpretation. Multicollinearity. Categorical predictors.

Mon, Feb 16: Presidents’ day, no class, no office hours

Thu, Feb 19: Hw4 due. Hw5 out. KNNL Ch7-8.

Mon, Feb 23: Updated notes.


Linear model selection

Subset selection. Evaluation of predictive ability. Regularization.

Thu, Feb 26: Notes. Hw5 due. Hw6 out. KNNL Ch9-11. Project groups due.

Mon, Mar 2:


Multivariate logistic regression

Statistical inference for categorical response

Thu, Mar 5: Lecture notes. Hw6 due. Hw7 out. R code. KNNL Ch 14.


Mon, Mar 9:  Spring break, no class, no office hours

Thu, Mar 12: Spring break, no class


Mon, Mar 16: Guest lecture. Jan Vitek, Professor, CCIS. Lecture notes.

Thu, Mar 19: Hw7 due. Project proposal due


Mon, Mar 23: Practice midterm problems and solutions.

Thu, Mar 26: Midterm 2 solutions and grades.


Mon, Mar 30:


Poisson and Negative Binomial regression

Thu, Apr 2: Hw8 out. Lecture notes.

Mon, Apr 6: 


Weighted regression. Simulation-based inference

Permutations, bootstrap

Thu, Apr 9:  Lecture notes. Hw8 due. Hw9 out. Homework datasets 1 and 2.

Mon, Apr 13:


Unsupervised vs supervised data exploration

PCA vs SVD. Multiple testing.

Thu, Apr 16: Lecture notes. R code. Project due.


Mon, Apr 20: Reading day.

Thu, Apr 23: Hw9 due. Reading day. Practice final exam problems and solutions.

                    Project reviews due Friday April 24.


Friday April 27: Final exam during regular class hours. Solutions and grades.

Tentative schedule and handouts