Data Structures (from Chapter 4)

Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)

Depth-first search, Breadth-first search, and Best-first search

We have now seen depth-first search (DFS, Chapter 3), breadth-first search (BFS, Chapter 4), and best-first search (based on Dijkstra's algorithm in Chapter 4).

Breadth-first search

Recall that in breadth-first search, one maintains the queue of unexplored vertices (visited, but neighbors not expanded yet) as a FIFO (first-in-first-out) queue. In choosing which vertex to explore next, we always explore the "oldest" vertex on the queue.

As seen in the text, the innermost operation of best-first search is to:

  1. remove the "oldest" vertex from the frontier,
  2. expand its neighbors,
  3. add the neighbors to the frontier (if not already visited),
  4. and mark the neighbor as visited (if not already visited).

Best-first search

In best-first search, each vertex has associated with it a cost function that gives the smallest cost path back to the origin. As we find multiple paths to a given vertex, we update the cost of that vertex to be the smallest cost path seen so far.

As seen in the text, the innermost operation of best-first search is to:

  1. remove the vertex with the smallest cost path from the frontier,
  2. expand its neighbors,
  3. add the neighbors to the frontier (if not already visited),
  4. and mark the neighbor as visited (if not already visited).
  5. and update the cost function of the neighbor if the new path to the neighbor is a smaller cost path. (Vertices have an initial cost of infinity when they have not yet been visited, since there is no current path.)

The above algorithm is also known as Dijkstra's shortest path algorithm.

FIFO queue and Priority queue

A FIFO queue of size n should have two especially efficient operations, and one really slow operation: 1. Insert new element (efficient, O(1)) 2. Remove "oldest" element (efficient, O(1)) 3. Find and remove smallest element (slow O(n))

A priority queue of size n should have one especially efficient and two moderately efficient operations: 1. Insert new element (moderately efficient, O(log n)) 2. Remove "oldest" element (moderately efficient, O(log n)) 3. Find and remove smallest element (efficient, O(log n))

From this, clearly a FIFO queue is the right match for a breadth-first search algorithm, since it only needs the first two operations to process a vertex: insert new element, and remove "oldest" element. In contrast, a priority queue is the right match for a best-first search algorithm, since it needs all three operations.

In best-first search, processing each vertex must execute all three operations, which is O(log n + log n + log n) = O(log n) in the case of priority queues. If a best-first search were implemented using a FIFO queue, it would cost O(1 + 1 + n) = O(n) to process one vertex.

Implementation: data structures for insert new element, remove "oldest" element, and remove smallest element

There are several data structures that one could consider to implement a FIFO queue and a priority queue.

FIFO queue

  1. linked list
  2. circular buffer (circular array)

(Note that if we know in advance the maximum number of vertices, a circular buffer is often the best. For large linked lists, we may have a cache miss at almost every step. A cache miss costs around 100 CPU cycles on current CPUs.)

Priority queue

  1. Binary heap [always works]
  2. (Balanced) binary search tree [But note that if the binary search tree becomes unbalanced, then the cost of finding the smallest element could grow to O(n).]

Other data structures that weren't competitive for FIFO or priority queue

  1. Sorted array [`O(n) steps to remove an element and shift the remaining elements over]
  2. Unsorted array [`O(n) steps to remove an element and shift the remaining elements over]
  3. Hash array [`O(n) steps to find the smallest element]

Sorted arrays and hash arrays have other nice features, however. Hash arrays are just what we want to maintain the visited status of a vertex. (Given the name of the vertex, hash it to find where the vertex is stored, and then look up its visited status.) So, we always want to also use a hash array along with our FIFO queue (breadth-first search) or along with our priority queue (best-first search).

Sorted arrays often end up competing with hash arrays for efficiency. For small arrays, sorted arrays are sometimes better, because they avoid the higher overhead of a hash array.

Implementation of binary heap

The textbook (exercise 4.16 at end of Chapter 4) has an implementation of a binary heap. (Add further details later??)