Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)
(For the 2009 year, this topic is not core material. If it is on any exam, the background of the algorithm will first be reviewed. Regardless, it is a beautiful and elegant topic for those who enjoy algorithms.)
Statement of Problem and Where It's Used (see text for now). (Also note that there is a Wikipedia article on Union-Find.
If Union-Find is done naively, it will have time O(n2). Two key heuristics make it faster:
The second principle is called path compression. If you have time to
implement only one heuristic, implement path compression. That will
make your algorithm O(n log n). In addition, for many common cases,
the actual number of steps will be closer to
c n log n,
for some constant
The first principle is still too vague to have a name. When we say the "larger" component, we could mean based on the number of vertices in that component. But that would be inefficient to compute. (Because we have some indirect pointers that eventually lead to the component representative, it's difficult to update in the representative the total number of vertices in that component.)
There are two ways to get around the problem. In one way, every time we add a new edge, we could immediately do path compression along that edge to guarantee that the two vertices on either side of the edge immediately point to the component representative. If we do that, then we will have a O(n log n) union-find algorithm.
The reason that the previous solution to the first principle is O(n log n) is that we could start with 8 vertices. Then we add 4 edges to create 4 componensts of 2 elements each. Then we add 2 edges to create 2 components of 4 elements each, and do path compression to make all pointers point to the new representative. Then we add 1 edge to create 1 component of 7 elements, and do path compression on all indirect pointer. Generalizing this, we find that the extra path compressions force us to do O(n log n) total work.
So, people use a less accurate (but more efficient to compute) method for deciding which component is larger. The rank of a component is the longest path (following pointers) in that component. Each component representative stores the rank of its component. When we add a single vertex to a component, we immediately do path compression (or else we would have a hard time efficiently updating the rank of that component, which is stored within the component representative). When we combine two components, the rank of the new component, rank(C), will be exactly max(rank(A), rank(B)) + 1. Test yourself by showing why.
The rank heuristic is called Union by Rank. The combination of the two heuristics leads to an algorithm whose complexity is O(n Ack-1,/sup>(n)), where Ack-1,/sup>(n) is the inverse Ackermann function. This is one of the slowest growing functions known to mathematics. We (and text) do not prove the complexity O(n Ack-1,/sup>(n)), but it can be found in books like Cormen, Leiserson, Rivest, and Stein, for those who are interested.
So, in summary, an almost linear algorithm for Union-Find exists. It works by combining the two heuristics:
Note: Path compression is a simple form of dynamic programming or memoizing. We will discuss that in the next set of notes.