Data Organization

Data organization is one of the most important parts of software design and implementation. The way you organize, access, or modify your data plays an important factor on your program's performance, i.e., memory used and speed.

We will go over some of the data structures you have already encountered in this class and provide some high level analysis on their performance. We will analyze, in terms of operations needed, what is the best, worst and average time needed for operations on our data structure.

Find an item in a data set using a comparator

How long does it take us to retrieve an element from our data structure? We will not use a clock to measure "how long" an operation takes to complete. Instead we will measure how many operations the program has to perform. Our notion of operations will be abstract, e.g. retrieving the 5th element from our standard ConsList representation of lists takes 5 operations in total.

Up to now we have seen two types of data structures.

  1. Sequential access. The elements of this type of data structure are accessed in one particular order. The order that they are stored in the data structure, i.e. ConsList
  2. Direct access. The elements can be accessed according to an address, i.e. using ArrayList with indexes.
Data Structure Number of Items Best Case Worst Case
ConsList 1000 1 1000
ArrayList 1000 1 1000
SortedConsList 1000 1 1000
SortedArrayList 1000 1 10

Let's look at the different cases that we have when we are looking for an element inside the data structures from the preceding table.

Let's work out the worst case scenario as a game. Player A thinks of a number between 1 to 1000. Player B has to find this number by asking player A questions. Consider a list of integers from 1 to 1000.

 
              |-------------------------------------------------| 
              1                                                1000

              Player B : Is it 500? 
              Player A : NO. It's bigger.  
        
 
              |------------------------| 
              500                      1000

              Player B : Is it 750?  
              Player A : NO. Smaller 
        
 
              |-----------|
              500        750

              Player B : Is it 625? 
              Player A : NO. Bigger. 
        
 
              |-----|
              625   750

              Player B : Is it 687? 
              Player A : NO. Smaller
        
 
              |---|
              625 687

              Player B : Is it 657? 
              Player A : NO. Bigger
        
 
              |--| 
              657 687 

              Player B : Is it 672? 
              Player A : NO. Smaller
        
 
              |-|
              657 672 

              Player B : Is it 664? 
              Player A : NO. Bigger
        
 
              |-|
              664 672 

              Player B : Is is 668? 
              Player A : NO. Smaller
        
 
              |-|
              664 668  

              Player B : Is is 666? 
              Player A : Yes 
        

We found it after 10 turns.

Let's think about inserting a new element to our data structure and how many operations it takes. This time we will assume we have n elements instead of 1000.

Add an element to our data structure

Data Structure Number of Items Best Case Worst Case Average Case
ConsList n 1 n n
ArrayList n 1 n n
SortedConsList n 1 n n/2
SortedArrayList n 1 log2 n log2 n

How does a binary search tree perform? Lets build a binary search tree by adding the numbers 2, 48, 36, 20, 67, 45, 46 in order.

                                 2
                                  \ 
                                   48
                                   / \ 
                                  36 67
                                  /  /
                                 20 45 
                                     \
                                     46

To insert a new element we nee to

  1. Find the appropriate place to add our new element.
  2. Add the element.

To find the appropriate place to add our new element in a binary search tree depends on the shape of our tree. For example adding 9 to

                             1
                              \ 
                               2
                                \ 
                                 3
                                  \
                                   4
                                    \ 
                                     5
                                      \ 
                                       6
                                        \
                                         7
                                          \ 
                                           8

We have to walk down all 9 numbers to find the right position to add 10. On the other hand, with a balanced tree as below

                                    5
                                   / \
                                  2   7
                                 / \ / \
                                1  3 6  8

only takes 3 operations. So what are the best and worst time for binary search trees.

Number of nodes Operations (balanced tree) Operations (line tree)
7 3 7
15 4 15
31 5 31