CS 3200: Database Design

This course studies the design of a database for use in a relational database management system. The entity-relationship model and normalization will be used in example problems. Relational algebra and the SQL language will be presented. Advanced topics include triggers, stored procedures, indexing, elementary query optimization, and fundamentals of concurrency and recovery. Students will implement a database schema and short application programs on a commercial relational database management system.


News

Link to Piazza discussion forum: https://piazza.com/northeastern/fall2012/cs3200/home

[12/11/2012] Topic overview document and lecture audio updated


Lectures

(Future lectures and events are tentative.)

Date Topic Remarks and Recommended Homework
Sep 5 Introduction Read introduction chapter in textbook
Sep 10 Entity Relationship Model (ERM) Read in your favorite textbook about modeling entities and their relationships using ERM. (UML is conceptually the same, just with different pictures.) Things to focus on: What are keys? How is the primary key of an entity set selected? How is the primary key of a relationship set determined? How does the corresponding SQL code look like? And how can we enforce referential integrity (foreign key references) and control the database's reaction to a potential violation?
Sep 12 ERM continued Study the advanced ERM concepts: key and participation constraints (e.g., 1-to-1, 1-to-many, many-to-1, many-to-many), weak entities, hierarchies ("ISA"). Try to write the SQL code for the design alternatives we discussed for the ISA hierarchy example.
Sep 17 ERM completed Study the aggregation concept and explain why we need it. Go over the design alternatives we discussed and make sure you understand their tradeoffs: entity vs. attribute, entity vs. relationship, binary vs. ternary relationships. For each example we discussed, try to create the corresponding relations and write the SQL code. Make sure you specify the required primary keys, foreign key references, and NOT NULL constraints.
Sep 19 Relational Algebra Study the relational operators and make sure you could compute their results for given example relations. Take the example "queries", hide the algebra expression, and now try to find the expression yourself. Then try to modify the query to find other information in the example database. Write down the schema of intermediate results to catch mistakes like the "topic=DB AND topic=PL" selection condition we discussed. Draw the query plan and think about how pushing down selections and projections can reduce cost.
Sep 24 Relational Calculus Try to write calculus expressions, starting with the examples we discussed and then modifying the queries. Make sure you know how to express joins in DRC.
Sep 26 SQL Practice writing SQL queries. You can create the example tables in the DB server, add a few example tuples, and then try out your queries. Then modify the queries and see what happens. For a given basic SELECT-FROM-WHERE and for nested SQL queries such as the ones we discussed, you should be able to manually compute the result on a given database.
Oct 1 SQL Review the "students who reserved all books" query in SQL and compare it to the calculus expression. Go over the conceptual evaluation algorithm for queries with GROUP BY and aggregation. Pay particular attention to the following: which attributes can appear in the SELECT clause and which cannot? What is different between a condition in the WHERE clause versus the HAVING clause? Try different versions of the aggregate queries, including the incorrect ones, in the real database and see what happens.
Oct 3 SQL, NULL, integrity constraints, views Study the issues related to NULL and the various subtleties we discussed in class. Create a view in the Postgres database. Run SELECT * on the view and observe the result. Then modify some of the view's base tables, run SELECT * again, and see how the result differs. Add some CHECK constraints to your database tables. Then try to perform an update that would violate a constraint. What happens? Also, try to write a CHECK constraint that computes an aggregate on the table or refers to data in another table.
Oct 8 No class: Columbus Day  
Oct 10 Integrity constraints, outer join, SQL functions, user-defined aggregates, trigger introduction Try an outer join on the HW 2 database and see what happens compared to a "normal" (i.e., inner) join. Review SQL functions: function parameters and using them to parameterize a query, calling functions, using functions in the FROM clause, basic definition of user-defined aggregates. Review triggers: distinction between event, condition, and action. What are possible events?
Oct 15 HW 2 solution, triggers, trigger functions Review the HW 2 reference solution and compare it with your own design. For triggers, read the sections on CREATE TRIGGE and on trigger procedures in the Postgres manual. How can we specify conditions and actions? Try to write your own function or trigger and run it in Postgres. For a trigger, try to make it fire by executing the appropriate event.
Oct 17 PL/pgSQL functions; transactions Write a trigger function that uses conditionals (IF, CASE) and loops (LOOP, WHILE, FOR). Make the trigger fire and see if it behaves as expected. Run the somefunc() function we discussed in class and see if the printed value of variable 'quantity' is indeed as claimed. Experiment with the scope of variables by renaming variable 'quantity' in the local scope declaration to 'quant' and see if the function output changes. Review the discussion about transactions.
Oct 22 Midterm review; transactions Make sure you understand the meaning of the ACID properties. Go over the WR, RW, WW anomaly examples and identify which ACID properties they violate. Then explore how Strict 2PL would prevent these anomalies by going step-by-step through the examples. How can Strict 2PL result in deadlocks?
Oct 24 Midterm Exam Same time and location as the lecture.
Oct 29 No class due to hurricane emergency.  
Oct 31 Transactions completed Study the phantom problem and how to prevent it. Try creating transactions and running them on Postgres.
Nov 5 JDBC Look carefully at the JDBC examples (update and query) and try to run them from your machine. (For the update, replace the SQL part with a data update.) Make some modifications by using the API calls we discussed, e.g., try running a transaction.
Nov 7 Storage and Indexing Study heap file, sorted file, and B+ tree. Think about how they affect query and update cost. Make sure you understand how equality and inequality queries are processed by a B+ tree. What is different between storing the actual data records in the tree's leaf nodes compared to only storing data entries that point (through RIDs) to these records in a separate heap file?
Nov 12 No class: Veteran's Day  
Nov 14 Storage and Indexing Read about the indexes we discussed in your favorite DB textbook.
Nov 19 Query Optimization Read about index selection design decisions in your favorite DB textbook. Think of other queries and try to find the best indexes for them. Think about how to achieve index-only plans.
Nov 21 No class: Thanksgiving  
Nov 26 Query Optimization Review the different join algorithms and learn about the block nested loops algorithm in your favorite textbook. Read more about query optimization there as well.
Nov 28 Normal Forms Read about functional dependencies, Armstrong's axioms, and the Boyce-Codd Normal Form (BCNF) in your favorite textbook.
Dec 3 Normal Forms; Recovery Read about 3NF and the other normal forms in your favorite textbook. Focus on the difference between 3NF and BCNF, especially the tradeoffs of choosing one versus the other. Read about the details of the ARIES recovery approach in your textbook.
Dec 5 Final exam review; Beyond Relational DBMS For more information about SQL injection, try a Web search. There is a lot of quality information among the top-10 results already. For MapReduce, take a look at the original Google paper and the articles we posted on Blackboard under Course Material -> Course Documents.
Dec 14 Final Exam 1-3pm, Hurtig Hall 129 (check myNEU for the latest official information)

Course Information

Instructor: Mirek Riedewald

TA: Yijia Gu

Lecture times: MW 2:50 - 4:30pm
Lecture location: WVG 104

Prerequisites

CS 2510 (CS U213)

Grading

Recommended Textbooks

Academic Integrity Policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University.

For more information, please refer to the Academic Integrity Web page.