CS 6240: Large-Scale Parallel Data Processing
- Interested, but you do not meet the pre-reqs?
Please read this FAQ.
- Covers big-data analysis techniques that scale out with
increasing number of compute nodes, e.g., for cloud computing. Focuses on
approaches for problem and data partitioning that distribute work
effectively while keeping total cost for computation and data transfer low.
Deterministic and random algorithms from a variety of domains, including
graphs, data mining, linear algebra, and information retrieval, are studied
and analyzed in terms of their cost, scalability, and robustness against
skew. Coursework emphasizes hands-on programming experience with modern
state-of-the-art big-data processing technology. Students who do not meet
course prerequisites may seek permission of instructor.
CS 7240/7280: Principles of scalable data management: theory,
algorithms and database systems
- This course provides a rigorous introduction to the
algorithms, core principles, and foundational concepts for managing data at
scale. The emphasis is on both, the high-level theoretical intuitions and
principles underlying scalable data management, as well as technical
details. Topics include data models and query languages, query optimization,
complexity of big-data analysis, data-stream processing, parallel data
processing, and probabilistic data management. Students will gain deep
algorithmic understanding through interactive classes and a project with
regular feedback. The latter will be flexible, allowing students to explore
scalable data management and analysis aspects related to their PhD research.
CS 7290: Special Topics in Data Science: Foundations in
Scalable Data Management
- This course explores research topics in analysis and
management of large data, with a focus on distributed and parallel
approaches, join processing, and imprecise data/approximation. We will
discuss and analyze papers covering applications, algorithms, systems, and
theory--with a focus on the most recent developments. This course is
designed for PhD students, as well as advanced Masters students with a solid
background in algorithms and one or more data-oriented areas of computer
science, incl. machine learning, AI, logics, information retrieval, and
security. A desired outcome of the course project is the creation of
research results that are publishable in a peer-reviewed conference.
CS 6240: Parallel Data Processing in MapReduce
- Graduate course. This course covers techniques for
analyzing very large data sets. We introduce the MapReduce programming model
and the core technologies it relies on in practice, such as a distributed
file system. Related approaches and technologies from distributed databases
and Cloud Computing will also be introduced. Particular emphasis is placed
on practical examples and hands-on programming experience. Both plain
MapReduce and database-inspired advanced programming models running on top
of a MapReduce infrastructure will be used.
CS 6220: Data Mining Techniques
- Graduate course. This course covers various aspects of data mining including data
preprocessing, classification, ensemble methods, association rule mining, sequence
mining, and cluster analysis. The class project involves hands-on practice
of mining useful knowledge from a large database.
CS 3200: Database Design
- Upper division undergraduate course. This course studies the design of relational databases, including the
entity-relationship model, normalization, relational algebra, SQL, triggers,
stored procedures, indexing, elementary query optimization, and fundamentals
of concurrency and recovery. The class project involves working with a
commercial relational database management system and accessing it from an
CSG 339: Scalable Techniques for Massive Data
- Graduate course. We discuss influential and cutting edge research papers from academia
and industry research groups. The course also has a project requirement
where students can choose a research project related to large-scale data