CS 6240: Large-Scale Parallel Data Processing

Covers big-data analysis techniques that scale out with increasing number of compute nodes, e.g., for cloud computing. Focuses on approaches for problem and data partitioning that distribute work effectively while keeping total cost for computation and data transfer low. Deterministic and random algorithms from a variety of domains, including graphs, data mining, linear algebra, and information retrieval, are studied and analyzed in terms of their cost, scalability, and robustness against skew. Coursework emphasizes hands-on programming experience with modern state-of-the-art big-data processing technology. Students who do not meet course prerequisites may seek permission of instructor.


News

We will use Blackboard for online course discussions.


This course is managed through Blackboard (northeastern.blackboard.com). If you are registered, just go there to see all course material. If you are not registered, but want to get an idea about the course, please take a look at the syllabus.