6.2.1

#### Lab 11: Stress Tests and Big-O behavior

Related files:
tester.jar     Lab11.zip

Goals: This lab supplies six implementations of sorting algorithms. Your challenge is to identify which type algorithm is being used by each implementation. Additionally, you are to implement the heapsort algorithm, and confirm that its runtime behavior is as expected.

##### 1Getting started

Create a new project named "Lab11". Add the tester library to it. Open the Lab11.zip file, and unzip it into the same directory in your workspace where the src and bin directories for your new project are. Add the Sort.jar and javafx libraries as external JARs. Right-click on the "Lab11" project and select "Refresh". You should now see a package named student inside the src directory, with two files inside it: (Heapsort.java and SortingHeapSort.java). You will be modifying Heapsort.java eventually; leave the other file alone. Make sure citydb.txt is in the same directory as your src folder (not within your src folder), and that the javafx jar is within its folder, and finally that the bin folder contains three .dll files. .

Notice that both files in this package start with the declaration package student;. This declaration lets you relate multiple classes in multiple files and combine them into a library. The List data type seen in the classes is an interface that ArrayList implements; it has all the methods you’re used to using (add, get, set, etc. etc.).

Create a run configuration using ControllerView.Controller as the main class. Try running this configuration, to ensure you have all the pieces of this lab in place. The data sorted for this lab is found in citydb.txt, and it contains data on over 29 thousand cities — names, zip codes, states, and latitude and longitude.

##### 2StressTests - Timing Tests

Your job is now to be an algorithm detective. The program we give you allows you to run any of the six different sorting algorithms on data sets of five different sizes using three different Comparators to define the ordering of the data. When you run the program, the time that each of these algorithms took to complete the task is shown in the console.

Do Now!

How many timing results would you collect if all of these tests rans successfully?

Using the created Run Configuration, run the program. You can navigate the checkboxes using a mouse or arrow keys. Spacebar will select or deselect a box, and pressing enter/clicking on the ok button will run the tests. Should you wish to supply your own input, clicking on the file button will allow you to select which file you wish to test. Note that it must follow the same formatting as the citydb.txt. By default, the output will be printed to the console, however, by checking the To File checkbox you can redirect the output to a sortOut.csv file that is ready to be opened by Excel.

Start with just a few small tests, to see how the program behaves, before you decide to run all tests.

The last choice is heapsort, that you have yet to implement. The two files in the student package originally provided only stubs to the stress test program — the method heapsort in the class Heapsort so you can run the stress tests even if you did not implement the heapsort algorithm. The original stub just returns the original unsorted ArrayList. So, if you are having difficulties with implementing heapsort, leave it alone for now, and run the program with the original files in the student package. The running times you get for the heapsort option will be bogus, but you can at least run the program.

You can repeat this as many times as you want, with different choices for algorithms, datset sizes, and comparators.

##### 3Exploration:

Spend about fifteen minutes trying to answer some of the following questions.

Run the program a few times with small data sizes, to get familiar with what it can do. Then run experiments and try to answer the following questions:

• Which algorithms run mostly in quadratic time, i.e. $$O(n^2)$$?

• Which algorithms run mostly in $$O(n \log n)$$ time?

• Which algorithms use the functional style, using Cons lists?

• Which algorithm is the selection sort?

• Why is there a difference when the algorithms use a different Comparator?

• Copy the results into a spreadsheet. You may save the result portion in a text editor with a .csv suffix and open it in Excel (or some other spreadsheet of your choice). You can now study the data and represent the results as charts.

Do so for at least three algorithms, where there is one of each — a quadratic algorithm and a linear-logarithmic algorithm.

Note: The following algorithms are included in the choices: binary tree sort; insertion sort (2 versions); merge sort; quicksort (2 versions); selection sort. See if you can figure out which one is which.

##### 4Implementing Heapsort

Recall from lecture that a heap is a binary tree with two invariants:
• Structural: It is a full tree: every layer of the tree is full, except perhaps the bottom layer, and that one must be full from the left.

• Logical: Every node’s value is greater than the values of both of its children.

We can use these two invariants to implement a sorting algorithm very efficiently.

To represent a heap efficiently, we can store its items in an ArrayList in a particular order:
• The root of the tree is at index 0.

• The left child of the item at index $$i$$ is at index $$2i+1$$.

• The right child of the item at index $$i$$ is at index $$2i+2$$.

• The parent of the item at index $$i$$ is at index $$(i-1)/2$$, rounded down.

Now the structural invariant is even easier to state: a full tree with $$n$$ nodes simply uses indices zero through $$n-1$$, with no gaps.

Recall from lecture that heapsort takes place in two phases: first, building the heap, and second, repeatedly removing the maximum from the heap and placing it at the end of the ArrayList. We can build a heap in one of two ways; here is the easier one:
• Starting from the last item in the ArrayList, ensure that it and its children are a valid heap. It has no children, so it must be a valid heap. Go to the next-to-last item in the ArrayList and ensure that it too is a valid heap. It also has no children, so it must be valid. In fact, none of the last $$n/2$$ items (where $$n$$ is the size of the ArrayList) have any children (why?), so they must all be valid heaps.

• The $$n/2$$ item must have at least one child. If it is smaller than its children, then swap it with the larger of the two, and recursively fixup the heap at the location of that child (because now it might not be ordered properly with respect to its children). This step is called “downheap”.

• Repeat “downheap” for every item up to and including the root.

NOTE: Java does not use the IComparator<T> interface we have been using so far with the same names we have been using. Instead, it’s called a Comparator, and its method is called compare. But it behaves exactly the same as the IComparator interface we have used; it just has an inconsistent naming convention.

Next, we need to remove the maximum item from a heap. Fortunately, we know where to find it: at the root. To remove the item, we must fill the hole left at the root, without violating our invariants. So we choose the last item in the last layer of the tree and swap them. Now our structural invariant is restored, but the logical invariant is broken. Fortunately, we know how to fix that: just downheap the new value at the root!

Finally, we need to repeat this removal for every item in the heap. Once the heap is empty, the “discards” will actually be the contents of the heap in sorted order.