Lab 7: Heaps


Goal

To learn about the priority queue abstract data type. To implement a priority queue using the heap data structure, which uses an array to compactly and efficiently represent a complete binary tree.

Part 1: Priority Queue

We've seen the queue abstract date type, and we've seen at least three implementations: using arrays, using singly-linked lists, and using doubly-linked lists. An ordinary queue has the property that the first element inserted into the queue is the first one you can pull out of it (First In, First Out = FIFO). A priority queue supports the same enqueue and dequeue operations, but in a priority queue, the highest-priority item is the first one you can pull out of it. A more important item will cut in front of other items in the priority queue, so the next item removed from the priority queue will always be the most important item currently in the priority queue.

Part 2: Heaps

How do we implement these efficiently? We could try representing a priority queue as a sorted list, where enqueue inserts items ranked by priority. This has lousy asymptotic complexity for enqueue, though: O(size(list)). We can do better.

Instead of maintaining a sorted list, we maintain a "kind of" sorted binary tree. This data structure will maintain the heap property: every node is larger than either of its children. That's it. That means that the root of the tree is the largest node in the entire heap, so we have direct access to the most important item in our priority queue. How do we maintain the heap property? There are two operations to think about.

On insertion, we start out by adding the item to the first spot in the bottom row of the tree. Then we need to check to see if the heap property is satisfied at that node. If that item is larger than its parent, we've got a problem! We fix it by swapping the node with its parent, and then we check the node (now in the parent's position) to make sure the heap property is satisfied there. If the new item is the largest in the heap, we'll keep doing this all the way to the top. What is the complexity of insert, then?

When we take away the largest item in the heap, we've got a hole to fill at the top of the tree. So we just put the last item on the last row of the tree in it's place. We clearly can't leave it that way, though. Now we check to see if the heap property is satisfied (it's almost certainly not), and swap it with the greater of its children (think: why the greater?). Then we look at the new location of the node (in one of its children's places) and do the check again... and we keep adjusting until the heap property is satisfied. What is the complexity of removeMax?

Part 3: Heaps as arrays

We've got a binary tree, and we need direct access to both its parent and its two children for the algorithm above to work. We could create a node class that has three links plus the data it contains, but there's a trick we can use here to simplify the representation. We also know that if we look at one of these heaps, all of the rows will be completely filled except the last one. We could compactly represent this complete binary tree as an array. But how do we get from parent to children and children to parent?

Let's number the root node 1 (yeah, sometimes 0-based indexing isn't the easiest way to think about things). Then we number the next row, starting at the left, so node 1's children are node 2 and node 3. And so on and so on... and we notice that the rows have a certain pattern. The first nodes in each row are numbered 1, 2, 4, 8, 16, ... And what's more, the parent of 2 and 3 is 1, the parent of 4 and 5 is 2, the parent of 12 and 13 is 6... So we can get a node's parent by taking half of the index of the current node (rounding down), and we get a parent's children by doubling (for the left child) or doubling and adding one for the right child.

Part 4: Simulation

Suppose you're a freelancer, and you get job offers periodically. They pay varying amounts of money, and they take varying amounts of time. How do you choose which jobs to take? You might try taking them just as they come. Or you might try taking the jobs that pay best for the least amount of time first. Simulate this job-selection strategy using your heap implementation together with the code provided. See how it does compared to the FIFO strategy.