Lab 3 Details: Building DBs using random file access -- COM1101 Winter 2001

Updated 29 December 2000

Professor Futrelle
College of Computer Science, Northeastern U., Boston, MA

In this project, covering Labs 2 and 3, you will be creating a table of records in a file on disk for a simple relational database and accessing the records.

This page explains Lab 3 in particular: How you can use or adapt programs from your textbook to created a sorted index of your database records. This allows records to be found quickly without having to read through your entire DB file. You will need to read this page carefully in order to do the lab. The Lab 3 assignment itself, what you are to do and hand in, is here.

How to retrieve source code from the textbook

I have created local copies of all the source code from the textbook, obtained through the publisher's site (textbook, pg. viii). This link gives you the access you need: http://ftp.ccs.neu.edu/pub/people/futrelle/com1101/. Open the folder savitch-cpp3e-sourcefiles/ there and open further folders until you find what you need. When you find and select a file, the source code listing will appear in your browser. Select the portions you need and copy them and paste them into your source code in VC++. Add the appropriate function prototypes and make any changes you need to the code, adding comments to explain any changes you needed to make.

Here are the source files you'll need:

For the sort() function, which you'll need to change: File 09-13.cxx. From this source you'll also need to get the functions swap_values() and index_of_smallest().
For the binary search() function: File 12-08.cxx.

Database indexes

An index is a compact data structure that allows one to search the index for field values in the database and the search will return the record number. Then the record can be retrieved from the database file. This avoids the very slow process of trying to read through the entire database file, searching for a particular field value. A field that has separate values for a field in every record can be used as a key. That is, if we search for the records that have a particular value for that key, only one record will satisfy it, at most. Thus if one field has the four distinct values {1, 7, 33, 12} for four records, it can be used as a key. Typical keys that are used in practice are Social Security numbers, part numbers, UPC codes and membership ID numbers. To keep our work simple be sure to use distinct values for all records for any fields that we index.

Building an index

You are to build your index in the following way. Assume that you have an array of keys, such as the one above, {1, 7, 33, 12}, construct another array with the corresponding record numbers {0, 1, 2, 3}. Use the sort() function you retrieve from the Savitch sources, but modify it in the following way. Add another array argument to the function, which will be for the record numbers. When you do the swapping of key elements in the sort function also swap the corresponding elements in the record array so they're kept in correspondence.

Searching an index and retrieving a record

Use the binary search function from Savitch to search the keys array. You will get an array index returned by that function (the int& location argument). Then use that index to look up the record number in its array. Using that record number, retrieve the record in the same way you did in the previous lab, Lab 2.

Extra credit: Storing the index on disk

You may have noticed that the two arrays, the integer keys and the record numbers, could easily be used to build a relational table that could be stored on disk. For extra credit, do just that. Sort the two arrays and create a database file holding the corresponding two-field records. Then read in these records and recreate the two arrays used as your index. Search it as before to prove that it's there and works. Note that for a database with many fields, the index could be a good deal smaller than the database itself. For additional extra credit, build indexes for the float and string fields of the database, either in memory or also stored on disk. You could add a boolean field of one character also, but of course it could not be used as a key!

Here are a few links with brief introductions to relational databases if you'd like to know more:

Overview of the Relational Model from U. Texas
Introduction to Relational Database Design -- probably more than you'll want.
Fundamentals of Relational Database Design from Microsoft.

Snell library has more than 1,000 books on databases! It's an important topic. And finally, it's worth mentioning that in real DB systems even an index such as the one we've used in this lab may be too large to fit into memory. So in real systems even more efficient index structures are used, either B-Trees or hash tables.