#### Lab 11: Stress Tests and Big-O behavior

Goals: This lab supplies six implementations of sorting algorithms. Your challenge is to identify which type algorithm is being used by each implementation. Additionally, you are to implement the heapsort algorithm, and confirm that its runtime behavior is as expected.

##### 1` `Getting started

Create a new project named "Lab11". Add the tester library to it. Open the Lab11.zip file, and unzip it into the same directory in your workspace where the src and bin directories for your new project are. Add the jpt.jar and sorting.jar libraries as external JARs. Right-click on the "Lab11" project and select "Refresh". You should now see a package named student inside the src directory, with two files inside it (Heapsort.java and StringHeapSort.java). You will be modifying Heapsort.java eventually; leave the other file alone.

Notice that both files in this package start with the declaration package student;. This declaration lets you relate multiple classes in multiple files and combine them into a library.

Create a run configuration using sorting.TimerTests as the main class.
Try running this configuration, to ensure you have all the pieces of this lab in place.
When the File Chooser Dialog comes up select the file citydb.txt. This contains
data on over 29 thousand cities —

##### 2` `StressTests - Timing Tests

Your job is now to be an algorithm detective. The program we give you allows you to run any of the six different sorting algorithms on data sets of five different sizes using three different Comparators to define the ordering of the data. When you run the program, the time that each of these algorithms took to complete the task is shown in the console.

Do Now!

How many timing results would you collect if all of these tests rans successfully?

Create a new Run Configuration using the class sorting.Interactions as the main class. Run the program. It will display a window with several buttons. You can ignore most of them, but you’ll need to use three of them in the proper order:

Start with File Input button. It opens the File Chooser Dialog and when you select the file citydb.txt it reads in the data for the 29470 cities.

Next hit the TimerInput button. It lets you select which algorithms to test, which Comparators to use, and what size data should be used in the tests.

Start with just a few small tests, to see how the program behaves, before you decide to run all tests.

The last choice is heapsort, that you have yet to implement. The two files in the student package originally provided only stubs to the stress test program —

the method heapsort in the class Heapsort — so you can run the stress tests even if you did not implement the heapsort algorithm. The original stub just returns the original unsorted ArrayList. So, if you are having difficulties with implementing heapsort, leave it alone for now, and run the program with the original files in the student package. The running times you get for the heapsort option will be bogus, but you can at least run the program. Third, once you read in the data and have chosen which algorithms to run, for which data set sizes, and with which Comparators, you can run the actual timing tests by hitting the RunTests button. It will print its output to the console. You can either copy and paste the text from the console window in Eclipse, or you can hit the Toggle Console button before hitting RunTests. This will open a window from which you can copy text or save the whole console output to a file.

You can repeat the last two steps as many times as you want, with different choices for algorithms, dataset sizes, and comparators.

##### 3` `Exploration:

Spend about fifteen minutes trying to answer some of the following questions.

Run the program a few times with small data sizes, to get familiar with what it can do. Then run experiments and try to answer the following questions:

Which algorithms run mostly in quadratic time, i.e. \(O(n^2)\)?

Which algorithms run mostly in \(O(n \log n)\) time?

Which algorithms use the functional style, using Cons lists?

Which algorithm is the selection sort?

Why is there a difference when the algorithms use a different Comparator?

Copy the results into a spreadsheet. You may save the result portion in a text editor with a .csv suffix and open it in Excel (or some other spreadsheet of your choice). You can now study the data and represent the results as charts.

Do so for at least three algorithms, where there is one of each —

a quadratic algorithm and a linear-logarithmic algorithm.

Note: The following algorithms are included in the choices: binary tree sort; insertion sort (2 versions); merge sort; quicksort (2 versions); selection sort. See if you can figure out which one is which.

##### 4` `Implementing Heapsort

Structural: It is a full tree: every layer of the tree is full, except perhaps the bottom layer, and that one must be full from the left.

Logical: Every node’s value is greater than the values of both of its children.

The root of the tree is at index 0.

The left child of the item at index \(i\) is at index \(2i+1\).

The right child of the item at index \(i\) is at index \(2i+2\).

The parent of the item at index \(i\) is at index \((i-1)/2\), rounded down.

Starting from the last item in the ArrayList, ensure that it and its children are a valid heap. It has no children, so it must be a valid heap. Go to the next-to-last item in the ArrayList and ensure that it too is a valid heap. It also has no children, so it must be valid. In fact, none of the last \(n/2\) items (where \(n\) is the size of the ArrayList) have any children (why?), so they must all be valid heaps.

The \(n/2\) item must have at least one child. If it is smaller than its children, then swap it with the larger of the two, and recursively fixup the heap at the location of that child (because now it might not be ordered properly with respect to its children). This step is called “downheap”.

Repeat “downheap” for every item up to and including the root.

Next, we need to remove the maximum item from a heap. Fortunately, we know where to find it: at the root. To remove the item, we must fill the hole left at the root, without violating our invariants. So we choose the last item in the last layer of the tree and swap them. Now our structural invariant is restored, but the logical invariant is broken. Fortunately, we know how to fix that: just downheap the new value at the root!

Finally, we need to repeat this removal for every item in the heap. Once the heap is empty, the “discards” will actually be the contents of the heap in sorted order.