Teaching Experience

Courses Taught

CS 6240: Large-Scale Parallel Data Processing

S18, F18, S19, F19, F20, S21, F21, S22, S24, F24
Interested in CS 6240, but you do not meet the pre-reqs? Please read this FAQ.
Graduate course. Covers big-data analysis techniques that scale out with increasing number of compute nodes, e.g., for cloud computing. Focuses on approaches for problem and data partitioning that distribute work effectively while keeping total cost for computation and data transfer low. Deterministic and random algorithms from a variety of domains, including graphs, data mining, linear algebra, and information retrieval, are studied and analyzed in terms of their cost, scalability, and robustness against skew. Coursework emphasizes hands-on programming experience with modern state-of-the-art big-data processing technology. Students who do not meet course prerequisites may seek permission of instructor.

CS 7280: Special Topics in Databases

CS 3800: Theory of Computation

F20 (this course is managed completely on Canvas)
Upper division undergraduate course. Introduces the theory behind computers and computing aimed at answering the question, �What are the capabilities and limitations of computers? Covers automata theory, computability, and complexity. The automata theory portion includes finite automata, regular expressions, nondeterminism, nonregular languages, context-free languages, pushdown automata, and noncontext-free languages. The computability portion includes Turing machines, the Church-Turing thesis, decidable languages, and the Halting theorem. The complexity portion includes big-O and small-o notation, the classes P and NP, the P vs. NP question, and NP-completeness.

CS 7240/7280: Principles of scalable data management: theory, algorithms and database systems

S19
This course provides a rigorous introduction to the algorithms, core principles, and foundational concepts for managing data at scale. The emphasis is on both, the high-level theoretical intuitions and principles underlying scalable data management, as well as technical details. Topics include data models and query languages, query optimization, complexity of big-data analysis, data-stream processing, parallel data processing, and probabilistic data management. Students will gain deep algorithmic understanding through interactive classes and a project with regular feedback. The latter will be flexible, allowing students to explore scalable data management and analysis aspects related to their PhD research.

CS 7290: Special Topics in Data Science: Foundations in Scalable Data Management

F17
This course explores research topics in analysis and management of large data, with a focus on distributed and parallel approaches, join processing, and imprecise data/approximation. We will discuss and analyze papers covering applications, algorithms, systems, and theory--with a focus on the most recent developments. This course is designed for PhD students, as well as advanced Masters students with a solid background in algorithms and one or more data-oriented areas of computer science, incl. machine learning, AI, logics, information retrieval, and security. A desired outcome of the course project is the creation of research results that are publishable in a peer-reviewed conference.

CS 6240: Parallel Data Processing in MapReduce

F11, F12, S13, F13, S14, F14, S15, F16, S17, F17
Graduate course. This course covers techniques for analyzing very large data sets. We introduce the MapReduce programming model and the core technologies it relies on in practice, such as a distributed file system. Related approaches and technologies from distributed databases and Cloud Computing will also be introduced. Particular emphasis is placed on practical examples and hands-on programming experience. Both plain MapReduce and database-inspired advanced programming models running on top of a MapReduce infrastructure will be used.

CS 6220: Data Mining Techniques

F09, S10, S11, S12
Graduate course. This course covers various aspects of data mining including data preprocessing, classification, ensemble methods, association rule mining, sequence mining, and cluster analysis. The class project involves hands-on practice of mining useful knowledge from a large database.

CS 3200: Database Design

F10, S12, F12
Upper division undergraduate course. This course studies the design of relational databases, including the entity-relationship model, normalization, relational algebra, SQL, triggers, stored procedures, indexing, elementary query optimization, and fundamentals of concurrency and recovery. The class project involves working with a commercial relational database management system and accessing it from an application.

Graduate course. We discuss influential and cutting edge research papers from academia and industry research groups. The course also has a project requirement where students can choose a research project related to large-scale data analysis.