CS 3200: Database Design

This course studies the design of a database for use in a relational database management system. The entity-relationship model and normalization will be used in example problems. Relational algebra and the SQL language will be presented. Advanced topics include triggers, stored procedures, indexing, elementary query optimization, and fundamentals of concurrency and recovery. Students will implement a database schema and short application programs on a commercial relational database management system.


News

[04/12/2012] All material up to the 4/11 lecture, including the final exam study guide, is now online. Good luck with the exam preparation!


Lectures

(Future lectures and events are tentative.)

Date Topic Remarks and Recommended Homework
Jan 9 Introduction Read introduction chapter in textbook
Jan 11 Entity Relationship Model (ERM) Read in your favorite textbook about modeling entities and their relationships using ERM. (UML is conceptually the same, just with different pictures.) Things to focus on: What are keys? How is the primary key of an entity set selected? How is the primary key of a relationship set determined? How do we model key and participation constraints (e.g., 1-to-1, 1-to-many, many-to-1, many-to-many)? How does the corresponding SQL code look like? And how can we enforce referential integrity (foreign key references) and control the database's reaction to a potential violation?
Jan 16 No class: MLK Day  
Jan 18 ERM continued Study the advanced concepts: weak entities, hierarchies ("ISA"), and aggregation. Go over the design alternatives we discussed and make sure you understand their tradeoffs: entity vs. attribute, entity vs. relationship, binary vs. ternary relationships. For each example we discussed, create the corresponding relations and write the SQL code. Make sure you specify the required primary keys, foreign key references, and NOT NULL constraints.
Jan 20 HW 1: early submission (11pm)  
Jan 23 ERM completed; Relational algebra Study the relational operators and make sure you could compute their results for given example relations. Take the example "queries", hide the algebra expression, and now try to find the expression yourself. Then try to modify the query to find other information in the example database.
Jan 24 HW 1 due (11pm)  
Jan 25 Relational algebra, relational calculus Continue practicing how to write relational algebra expressions for a given query. Write down the schema of intermediate results to catch mistakes like the "topic=DB AND topic=PL" selection condition we discussed. Draw the query plan and think about how pushing down selections and projections can reduce cost. Review the definition of formulas and the example for queries in DRC.
Jan 30 Relational calculus, SQL Try to write calculus expressions, starting with the examples we discussed and then modifying the queries. Make sure you know how to express joins in DRC. Practice writing SQL queries. You can create the example tables in the DB server, add a few example tuples, and then try out your queries. Then modify the queries and see what happens. For a given basic SELECT-FROM-WHERE and for nested SQL queries such as the ones we discussed, you should be able to manually compute the result on a given database.
Feb 01 SQL Review the "students who reserved all books" query in SQL and compare it to the algebra and calculus expressions. Go over the conceptual evaluation algorithm for queries with GROUP BY and aggregation. Pay particular attention to the following: which attributes can appear in the SELECT clause and which cannot? What is different between a condition in the WHERE clause versus the HAVING clause? Try different versions of the aggregate queries, including the incorrect ones, in the real database and see what happens.
Feb 6 Advanced SQL Study the issues related to NULL and the various subtleties we discussed in class. Create a view in the Postgres database. Run SELECT * on the view and observe the result. Then modify some of the view's base tables, run SELECT * again, and see how the result differs. Add some CHECK constraints to your database tables. Then try to perform an update that would violate a constraint. What happens? Also, try to write a CHECK constraint that computes an aggregate on the table or refers to data in another table.
Feb 8 Advanced SQL

HW 2 due (11pm)
Review SQL functions: function parameters and using them to parameterize a query, calling functions, using functions in the FROM clause, basic definition of user-defined aggregates. Review triggers: distinction between event, condition, and action. What are possible events? How can we specify conditions and actions? Try to write your own function or trigger and run it in Postgres. For a trigger, try to make it fire by executing the appropriate event.
Feb 13 PL/pgSQL functions; Transactions Write a trigger function that uses conditionals (IF, CASE) and loops (LOOP, WHILE, FOR). Make the trigger fire and see if it behaves as expected. Run the somefunc() function we discussed in class and see if the printed value of variable 'quantity' is indeed as claimed. Experiment with the scope of variables by renaming variable 'quantity' in the local scope declaration to 'quant' and see if the function output changes. Review the discussion about transactions.
Feb 15 Transactions Make sure you understand the meaning of the ACID properties. Go over the WR, RW, WW anomaly examples and identify which ACID properties they violate. Then explore how Strict 2PL would prevent these anomalies by going step-by-step through the examples. How can Strict 2PL result in deadlocks?
Feb 20 No class: Presidents' Day  
Feb 22 Transactions; JDBC & Co. Study the phantom problem and how to prevent it. Try creating transactions and running them on Postgres. Why do we need a cursor when accessing a database from an application. Look carefully at the example of SQL embedded in C and identify how the cursor is used for accessing the Students table.
Feb 27 JDBC Look carefully at the JDBC examples (update and query) and try to run them from your machine. (For the update, replace the SQL part with a data update.) Make some modifications by using the API calls we discussed, e.g., try running a transaction.
Feb 29 Storage and Indexing Study heap file, sorted file, and B+ tree. Think about how they affect query and update cost. Make sure you understand how equality and inequality queries are processed by a B+ tree. What is different between storing the actual data records in the tree's leaf nodes compared to only storing data entries that point (through RIDs) to these records in a separate heap file?
Mar 1 HW 3 due (11pm)  
Mar 5, 7 No classes: Spring Break  
Mar 12 Midterm Exam Same time and location as lectures.
Mar 14 Storage and Indexing Study the difference between clustered and unclustered indexes. Review static hashing. Go over the approximate cost estimation results for the five different operations (scan, equality query, range query, insert, delete) and five different file organizations (heap file, sorted file, clustered tree index, unclustered tree index + heap file, unclustered hash index + heap file).
Mar 19 Storage and Indexing Review the general approach for choosing appropriate indexes. Go carefully through the various examples. For practice, only look at the query and then try to decide on the best index structure on your own.
Mar 21 Storage and Indexing; Query Optimization Continue reviewing the query examples and their index choices. Focus on composite search keys, combining index results (intersection of RID lists), and index-only plans. Look at some query examples, including those from HW 3, and try to create some corresponding query plans.
Mar 26 Query Optimization

HW 4 due (11pm)
Review the query plans and cost computation for selection, projection, and join. Read more about query optimization in the textbook.
Mar 28 Normal Forms Study functional dependencies, Armstrong's axioms, and the Boyce-Codd Normal Form (BCNF). Think about the tradeoffs of decomposition: redundancy versus query/constraint checking cost.
Apr 2 Query Optimization Go over the query plan examples and try to compute the I/O cost on your own.
Apr 4 Recovery Read about the details of ARIES in your textbook.
Apr 9 Security; SQL Injection

HW 5 due (11pm)
Read about basic access control (GRANT) in your textbook.
Apr 11 Data Warehousing and OLAP; Data Mining  
Apr 16 No class: Patriots' Day  
Apr 18 Last class  
  Final Exam (time and location TBD)  

Course Information

Instructor: Mirek Riedewald

TA: Bahar Qarabaqi

Lecture times: MW 2:50 - 4:30pm
Lecture location: WVH 108

Prerequisites

CS 2510 (CS U213)

Grading

Recommended Textbooks

Academic Integrity Policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University.

For more information, please refer to the Academic Integrity Web page.