*Notes by Gene Cooperman, © 2009
(may be freely copied as long as this copyright notice remains)*

(For the 2009 year, this topic is not core material. If it is on any exam, the background of the algorithm will first be reviewed. Regardless, it is a beautiful and elegant topic for those who enjoy algorithms.)

**Statement of Problem and Where It's Used (see text for now).**
(Also note that there is a Wikipedia article on
Union-Find.

If Union-Find is done naively, it will have time O(n^{2}).
Two key heuristics make it faster:

- When connecting two components, add a pointer from the representative of the smaller component to the representative of the larger component.
*Path Compression:*If you have to follow several pointers to find the representative of your component, then do it twice! The second time, for each vertex that you visit, change the pointer to point directly to the component representative. Now, if you are asked*a third time*for the representative of any vertex on this path, you will do it in O(1) steps.

The second principle is called path compression. If you have time to
implement only one heuristic, implement path compression. That will
make your algorithm O(n log n). In addition, for many common cases,
the actual number of steps will be closer to `c n `

`than to `

`c n log n`

,
for some constant `c`

.

```
```The first principle is still too vague to have a name. When we say the
"larger" component, we could mean based on the number of vertices in
that component. But that would be inefficient to compute. (Because we
have some indirect pointers that *eventually* lead to the component
representative, it's difficult to update in the representative the total
number of vertices in that component.)

There are two ways to get around the problem. In one way, every time
we add a new edge, we could immediately do path compression along that
edge to guarantee that the two vertices on either side of the edge
immediately point to the component representative. If we do that,
then we will have a O(n log n) union-find algorithm.

The reason that the previous solution to the first principle is O(n log n)
is that we could start with 8 vertices. Then we add 4 edges
to create 4 componensts of 2 elements each. Then we add
2 edges to create 2 components of 4 elements each,
*and do path compression to make all pointers point to the new
representative*.
Then we add 1 edge to create 1 component of 7 elements,
*and do path compression on all indirect pointer*. Generalizing this,
we find that the extra path compressions force us to do O(n log n)
total work.

So, people use a less accurate (but more efficient to compute)
method for deciding which component is larger. The *rank of a component*
is the longest path (following pointers) in that component. Each
component representative stores the rank of its component.
When we add a single vertex to a component, we immediately do path compression
(or else we would have a hard time efficiently updating the rank
of that component, which is stored within the component representative).
When we
combine two components, the rank of the new component, rank(C),
will be exactly max(rank(A), rank(B)) + 1. **Test yourself by showing why.**

The rank heuristic is called *Union by Rank*. The combination
of the two heuristics leads to an algorithm whose complexity is O(n
Ack^{-1,/sup>(n)), where Ack-1,/sup>(n) is the inverse Ackermann
function.
This is one of the slowest growing functions known to mathematics.
We (and text) do not prove the complexity O(n Ack-1,/sup>(n)),
but it can be found in books like
Cormen, Leiserson, Rivest, and Stein,
for those who are interested.}

So, in summary, an almost linear algorithm for Union-Find exists. It works
by combining the two heuristics:

- Union by Rank
- Path Compression

**Note:** Path compression is a simple form of dynamic programming
or memoizing. We will discuss that in the next set of notes.