Data organization is one of the most important parts of software design and implementation. The way you organize, access, or modify your data plays an important factor on your program's performance, i.e., memory used and speed.
We will go over some of the data structures you have already encountered in this class and provide some high level analysis on their performance. We will analyze, in terms of operations needed, what is the best, worst and average time needed for operations on our data structure.
How long does it take us to retrieve an element from our data
structure? We will not use a clock to measure "how long" an
operation takes to complete. Instead we will measure how many
operations the program has to perform. Our notion of
operations will be abstract, e.g. retrieving the 5th
element from our standard ConsList
representation
of lists takes 5 operations in total.
Up to now we have seen two types of data structures.
ConsList
ArrayList
with indexes.
Data Structure | Number of Items | Best Case | Worst Case |
---|---|---|---|
ConsList | 1000 | 1 | 1000 |
ArrayList | 1000 | 1 | 1000 |
SortedConsList | 1000 | 1 | 1000 |
SortedArrayList | 1000 | 1 | 10 |
Let's look at the different cases that we have when we are looking for an element inside the data structures from the preceding table.
ConsList
ConsList
. One operation is
all we need.
ConsList
(or it is not even in our
ConsList
). We have to traverse the whole list
in order to find our element (or decide that it is not in
our list). That takes 1000 operations.
ArrayList
. The elements in our
ArrayList
are not sorted. We can look in the array
in a sequential manner, i.e., starting from index 0 until the
end of the ArrayList
. We can also pick indexes
at random that are between 0 and the ArrayList
's
size. Each time we make sure we do not visit the same index
twice.
ArrayList
. One
operation is all we need.
ArrayList
(or it is not even
in our ArrayList
). We have looked at the whole
ArrayList
in order to find our element (or decide
that it is not in our list). That takes 1000 operations.
SortedConsList
. The elements in our
SortedConsList
are sorted.
SortedConsList
. One operation is
all we need.
SortedConsList
(or it is not even
in our SortedConsList
). Since we only have sequential
access to reach the last element of our list (or even figure
out that this element is not in our list) we have to traverse
the whole list. That takes 1000 operations.
SortedArrayList
. The elements in our
SortedArrayList
are sorted. In this case we'll
be a bit more clever on how we access elements. Instead of
looking randomly, or even in sequence from index 0 to the end,
we will do the following.
array.size()
/ 2. In the case were
the length of our array is odd we can take the ceiling
(or floor) after we divide by 2.
array.size()
/ 2 with
the element we are looking for. Then we check
array.size()
/ 2, then we
need to look at the last half of our array. We go back to
step 1, but we consider the start of our array to be
array.size()
/ 2 and the end of the array to
be array.size()
.
array.size()
/ 2, then
we go back to step 1, but we consider the start of
array to be index 0 and the end of our array to be
array.size()
/ 2.
array.size()
/ 2, we are done.
array.size()
/ 2. One operation is all we need.
Let's work out the worst case scenario as a game. Player A thinks of a number between 1 to 1000. Player B has to find this number by asking player A questions. Consider a list of integers from 1 to 1000.
|-------------------------------------------------| 1 1000 Player B : Is it 500? Player A : NO. It's bigger.
|------------------------| 500 1000 Player B : Is it 750? Player A : NO. Smaller
|-----------| 500 750 Player B : Is it 625? Player A : NO. Bigger.
|-----| 625 750 Player B : Is it 687? Player A : NO. Smaller
|---| 625 687 Player B : Is it 657? Player A : NO. Bigger
|--| 657 687 Player B : Is it 672? Player A : NO. Smaller
|-| 657 672 Player B : Is it 664? Player A : NO. Bigger
|-| 664 672 Player B : Is is 668? Player A : NO. Smaller
|-| 664 668 Player B : Is is 666? Player A : Yes
We found it after 10 turns.
Let's think about inserting a new element to our data structure and how many operations it takes. This time we will assume we have n elements instead of 1000.
Data Structure | Number of Items | Best Case | Worst Case | Average Case |
---|---|---|---|---|
ConsList | n | 1 | n | n |
ArrayList | n | 1 | n | n |
SortedConsList | n | 1 | n | n/2 |
SortedArrayList | n | 1 | log_{2} n | log_{2} n |
How does a binary search tree perform? Lets build a binary search tree by adding the numbers 2, 48, 36, 20, 67, 45, 46 in order.
2 \ 48 / \ 36 67 / / 20 45 \ 46
To insert a new element we nee to
To find the appropriate place to add our new element in a binary search tree depends on the shape of our tree. For example adding 9 to
1 \ 2 \ 3 \ 4 \ 5 \ 6 \ 7 \ 8
We have to walk down all 9 numbers to find the right position to add 10. On the other hand, with a balanced tree as below
5 / \ 2 7 / \ / \ 1 3 6 8
only takes 3 operations. So what are the best and worst time for binary search trees.
Number of nodes | Operations (balanced tree) | Operations (line tree) |
---|---|---|
7 | 3 | 7 |
15 | 4 | 15 |
31 | 5 | 31 |