Graduate course. This course covers techniques for analyzing very large data sets. We introduce the MapReduce programming model and the core technologies it relies on in practice, such as a distributed file system. Related approaches and technologies from distributed databases and Cloud Computing will also be introduced. Particular emphasis is placed on practical examples and hands-on programming experience. Both plain MapReduce and database-inspired advanced programming models running on top of a MapReduce infrastructure will be used.
|Nat Tuck||Monday, 3pm-4pm||WVH 314||ntuck@ccs|
|Rundong Li||Wednesdays, 3:00-4:00 PM||WVH Room 472||rundong@ccs|
|Pooja Chitrakar||Thursdays, 6:30-7:30 PM||CCIS Lab||chitrap@ccs|
|Nikite Gulve||Fridays, 12:00-1:00 PM||WVH Main Labemail@example.com|
This is a "partially flipped" class. For most weeks you will only need to come to one lecture a week. You will be assigned Tuesday or Friday randomly.
|Week #||Dates||Topics||Assignments Due||Split?|
|1||Sep 11||Course Intro||-||No|
|2||Sep 15, 18||Parallel Processing||-||Yes|
|3||Sep 22, 25||Map-Reduce Overview||HW1||Yes|
|4||Sep 29, Oct 2||Fundamental Techniques||-||Yes|
|5||Oct 6, 9||Basic Algoritdms||HW2||Yes|
|6||Oct 13, 16||Applications of Basic Algoritdms||-||Yes|
|7||Oct 20, 23||Pig||Project Proposal||Yes|
|8||Oct 27, 30||Databases||-||Yes|
|9||Nov 3, 6||CAP Theorem, HBase & Hive||HW3||Yes|
|10||Nov 10, 13||Midterm Exam||-||No|
|11||Nov 17, 20||Graph Algorithms||HW4||Yes|
|12||Nov 24||Intelligent Partitioning||-||No|
|13||Dec 1, 4||Data Mining, Spark||Final Project||Yes|
|14||Dec 8, 11||Project Presentations||-||No|
CS 5800 or CS 7800, or consent of instructor
To gain a deeper understanding of the material covered in this course, we recommend the following books, most of which are available online (and for free) for Northeastern University students from Safari Books Online.
For a nice compact summary of MapReduce and some design patterns, read Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, which is available for free at http://www.umiacs.umd.edu/~jimmylin/book.html.
For some topics we will work with research papers or other online resources. One important resource will be the Hadoop API.
If the Disability Resource Center has formally approved you for an academic accommodation in this class, please present the instructor with your “Professor Notification Letter” during the first week of the semester, so that we can address your specific needs as early as possible.
A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University.
For more information, please refer to the Academic Integrity web page.