Read the next two papers:

Sethi and Ullman
Kurlander, Proebsting, and Fischer


Global optimization using SSA form

The Rosen, Wegman, and Zadeck paper presents an efficient and effective global algorithm for recognizing common, available, redundant, and loop-invariant computations and for hoisting redundant or loop-invariant computations to a common pre-dominator. The algorithm works for reducible flow graphs that have been converted into static single assignment (SSA) form.

Phase 1.


Phase 2. For each rank R = 0, 1, ... do

for each basic block B (in reverse topsort order) do

Eliminate globally redundant assignments of rank R.

Phase 3.


Phase 1.

Although the following example is from the paper, I have changed the numbering of variables so that each variable has the same number as the basic block that assigns it. Unfortunately, some of the numbered variables derived from the original variable L have the same name as a basic block. I didn't notice this until it was too late to fix it easily.

L1:
    read (A, B, C, D, L, M, S, T)
    goto L2a

L2a:
    goto L2

L2:
    if ___ then goto L3 else goto L4

L3:
    L := C * B;
    M := L + 4;
    A := C;
    goto L5

L4:
    D := C;
    L := D * B;
    S := A * B;
    T := S + 1;
    goto L5

L5:
    X := A * B;
    Y := X + 1;
    if ___ then goto L2 else return

L2a has been inserted as a pre-header for the loop consisting of blocks L2, L3, L4, and L5. There is one back edge, from L5 to L2. This example is already in topsort order. The next step is to convert it into SSA form.

Conversion to SSA form.

The variables that occur within this program are

    A  B  C  D  L  M  S  T  X  Y
We can form the initial name for each variable by suffixing it with a 0. We can then traverse the spanning tree for the CFG, keeping track of the current names for each variable.
  1. When a variable is assigned, we give it a new name.
  2. At a loop header (which is a kind of join point), we create a new assignment for every variable. If the old name of the variable is V, then the right hand side of the assignment is phi (V, ??) where ?? stands for the name of the variable that will be transmitted by the back edge. This name will be filled in later when we get to the back edge.
  3. At a join point, we create a new assignment for every variable whose names differ depending on which edge was used to arrive at the join point. If the names for the variable are V1 if arriving via the left edge, and V2 if arriving via the right edge, then the right hand side of the assignment is phi (V1, V2).

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);
    B2 := phi (B1, B2);
    C2 := phi (C1, C2);
    D2 := phi (D1, D5);
    L2 := phi (L1, L5);
    M2 := phi (M1, M5);
    S2 := phi (S1, S5);
    T2 := phi (T1, T5);
    X2 := phi (X0, X5);
    Y2 := phi (Y0, Y5);
    if ___ then goto L3 else goto L4

L3:
    L3 := C2 * B2;
    M3 := L3 + 4;
    A3 := C2;
    goto L5

L4:
    D4 := C2;
    L4 := D4 * B2;
    S4 := A2 * B2;
    T4 := S4 + 1;
    goto L5

L5:
    A5 := phi (A3, A2);
    D5 := phi (D2, D4);
    L5 := phi (L3, L4);
    M5 := phi (M3, M2);
    S5 := phi (S2, S4);
    T5 := phi (T2, T4);
    X5 := A5 * B2;
    Y5 := X5 + 1;
    if ___ then goto L2 else return

Removal of trivial assignments.

A trivial assignment is an assignment that has one of the forms

[Although the third case above is from the paper, it appears incorrect to me. With their conventions, I think this case should be A := phi (B, A) instead. Yes, I think it matters.]

Now that the program is in SSA form, a trivial assignment can be removed by replacing all mentions of A by B (or vice versa). As we remove trivial assignments, we look for local redundancies: assignments within a single basic block whose right hand sides are the same. The removal of trivial assignments tends to create local redundancies. A local redundancy can be removed by introducing a trivial assignment. So we repeatedly remove trivial assignments and local redundancies until all are gone.

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);
#   B2 := B1;
#   C2 := C1;
    D2 := phi (D1, D5);
    L2 := phi (L1, L5);
    M2 := phi (M1, M5);
    S2 := phi (S1, S5);
    T2 := phi (T1, T5);
    X2 := phi (X0, X5);
    Y2 := phi (Y0, Y5);
    if ___ then goto L3 else goto L4

L3:
    L3 := C1 * B1;
    M3 := L3 + 4;
#   A3 := C1;
    goto L5

L4:
#   D4 := C1;
    L4 := C1 * B1;
    S4 := A2 * B1;
    T4 := S4 + 1;
    goto L5

L5:
    A5 := phi (C1, A2);
    D5 := phi (D2, C1);
    L5 := phi (L3, L4);
    M5 := phi (M3, M2);
    S5 := phi (S2, S4);
    T5 := phi (T2, T4);
    X5 := A5 * B1;
    Y5 := X5 + 1;
    if ___ then goto L2 else return

Available ranks.

Intuitively, the available rank of a variable or expression is a bound on the (static) number of operators used to compute its value without traversing any back edges. Formally

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L5);                                 [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    if ___ then goto L3 else goto L4

L3:
    L3 := C1 * B1;                                      [1]
    M3 := L3 + 4;                                       [2]
    goto L5

L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    L5 := phi (L3, L4);                                 [1]
    M5 := phi (M3, M2);                                 [2]
    S5 := phi (S2, S4);                                 [1]
    T5 := phi (T2, T4);                                 [2]
    X5 := A5 * B1;                                      [1]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return

If the assignments within a basic block are sorted by rank, then the sorted order will be equivalent to the original order. If two assignments within a basic block have the same rank, then it doesn't matter which one comes first.


Phase 2.

As in the PL.8 compiler, code motion is implemented by inserting new code and then deleting redundant code.

For each edge in the SSA form there is a movable computation table (MCT) that lists all assignments that are candidates for movement out of the basic block that is the destination of that edge.

For each rank and for each basic block, this phase

When all basic blocks have been processed, globally redundant computations of the current rank are eliminated.

Ordinary basic blocks

An ordinary basic block is a basic block that is not a loop header or pre-header. For example, L3, L4, and L5 are ordinary blocks.

Loop headers

Assignments are moved into a loop header exactly as for an ordinary basic block, but the identification of candidates for movement into the pre-header is complicated by the need to ensure that the computation is available along the back edge.

Loop pre-headers

Loop pre-headers are treated the same as ordinary blocks, except that an attempt is also made to move fully redundant computations from the edges that exit the loop to the loop pre-header.

Question propagation

Question propagation searches the SSA form of the control-flow graph to determine whether a computation is available on entry to some basic block. Local question propagation is local to a single loop. Global question propagation can search the entire SSA form.


At rank 0 there is nothing to be done for any of the blocks in the example.

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L5);                                 [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    if ___ then goto L3 else goto L4

L3:
    L3 := C1 * B1;                                      [1]
    M3 := L3 + 4;                                       [2]
    goto L5

L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    L5 := phi (L3, L4);                                 [1]
    S5 := phi (S2, S4);                                 [1]
    X5 := A5 * B1;                                      [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return

Global redundancy elimination: Is the value of A5 available at entry to L5? Only if C1 is available at exit from L3, and A2 is available at exit from L4.

Is C1 available at exit from L3? Only if C1 is available at exit from L2.

Is A2 available at exit from L4? Only if A2 is available at exit from L2.

We have now propagated the question to L2 along both of its outedges, and the question is not the same along both outedges. So we give up.

For this example there are no global redundancies of rank 0.


At rank 1 the assignment to X5 becomes a candidate for movement into the predecessors of L5. Along the edge from L3 to L5 this assignment becomes

    X3 := C1 * B1;
Along the edge from L4 to L5 this assignment becomes
    X4 := A2 * B1;

At rank 1 the processing of block L4 begins by moving the assignment to X4 into L4.

L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    X4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5
This creates a local redundancy, which is removed by introducing a trivial assignment:
L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    X4 := S4;                                           [1]
    T4 := S4 + 1;                                       [2]
    goto L5
The trivial assignment to X4 is then removed by renaming all uses of X4. Since there are no uses, there is no net effect, and block L4 reverts to its previous form.

The assignments to L4 and S4 then become candidates for movement into L4's predecessor block L2.

Similarly block L3 becomes

L3:
    L3 := C1 * B1;                                      [1]
    X3 := C1 * B1;                                      [1]
    M3 := L3 + 4;                                       [2]
    goto L5
which becomes
L3:
    L3 := C1 * B1;                                      [1]
    X3 := L3;                                           [1]
    M3 := L3 + 4;                                       [2]
    goto L5
which reverts to the original form of L3 when the trivial assignment to X3 is eliminated.

The assignment to L3 becomes a candidate for movement into L3's predecessor block L2.

The assignments to L3 and L4 are then merged into block L2. The entire loop has become

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L5);                                 [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    L2b := C1 * B1;                                     [1]
    if ___ then goto L3 else goto L4

L3:
    L3 := C1 * B1;                                      [1]
    M3 := L3 + 4;                                       [2]
    goto L5

L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    L5 := phi (L3, L4);                                 [1]
    S5 := phi (S2, S4);                                 [1]
    X5 := A5 * B1;                                      [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return

To determine whether the assignment to L2b should be a candidate for movement into the loop pre-header L2a, we need to determine whether C1 * B1 is available along the back edge from L5. We use local question propagation determine that it is.

The assignment to L2 is then copied to the pre-header L2a, and finally to block L1:

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    L1b := C1 * B1;                                     [1]
    goto L2a

L2a:
    L2a := C1 * B1;                                     [1]
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L5);                                 [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    L2b := C1 * B1;                                     [1]
    if ___ then goto L3 else goto L4

L3:
    L3 := C1 * B1;                                      [1]
    M3 := L3 + 4;                                       [2]
    goto L5

L4:
    L4 := C1 * B1;                                      [1]
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    L5 := phi (L3, L4);                                 [1]
    S5 := phi (S2, S4);                                 [1]
    X5 := A5 * B1;                                      [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return

We now end the processing for rank 1 by using global question propagation to perform global redundancy elimination on assignments of rank 1. We begin with the assignment to X5:

Is A5 * B1 available at entry at the assignment to X5?
  Is C1 * B1 available at exit from block L3?
    Yes, as L3.
  Is A2 * B1 available at exit from block L4?
    Yes, as S4.
Similarly we find that the values computed by the assignments to L4, L3, L2b, and L2a are available at those assignments. The program becomes
L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    L1b := C1 * B1;                                     [1]
    goto L2a

L2a:
    L2a := L1b;                                         [1]
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L5);                                 [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    L2b := L2a;                                         [1]
    if ___ then goto L3 else goto L4

L3:
    L3 := L2b;                                          [1]
    M3 := L3 + 4;                                       [2]
    goto L5

L4:
    L4 := L2b;                                          [1]
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    L5 := phi (L3, L4);                                 [1]
    S5 := phi (S2, S4);                                 [1]
    X5 := phi (L3, S4);                                 [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return

Eliminating trivial assignments yields

L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    L1b := C1 * B1;                                     [1]
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L1b);                                [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    if ___ then goto L3 else goto L4

L3:
    M3 := L1b + 4;                                      [2]
    goto L5

L4:
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    S5 := phi (S2, S4);                                 [1]
    X5 := phi (L1b, S4);                                [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := X5 + 1;                                       [2]
    if ___ then goto L2 else return


At rank 2 we still have the assignment to S4 as a candidate for movement out of block L4.

The assignment to Y5 becomes a candidate for movement out of block L5. Along the edge from L3 to L5 this assignment becomes

    Y3 := L1b + 1;
while along the edge from L4 to L5 it becomes
    Y4 := S4 + 1;
These assignments will be copied into L3 and L4, creating a local redundancy within block L4:
L3:
    M3 := L1b + 4;                                      [2]
    Y3 := L1b + 1;                                      [2]
    goto L5

L4:
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    Y4 := S4 + 1;                                       [2]
    goto L5
When this local redundancy is replaced by a trivial assignment, the trivial assignment soon disappears without a trace because Y4 is never used.

The assignments to M3, Y3, and T4 become candidates for movement into block L2, but they will not be copied into block L2 because there is no redundancy.

We now end the processing for rank 2 by using global question propagation to perform global redundancy elimination on assignments of rank 2. We begin with the assignment to Y5:

Is X5 + 1 available at Y5?
  Is L1b + 1 available at exit from block L3?
    Yes, as Y3.
  Is S4 + 1 available at exit from block L4?
    Yes, as T4.
So the program becomes
L1:
    read (A1, B1, C1, D1, L1, M1, S1, T1)               [0]
    L1b := C1 * B1;                                     [1]
    goto L2a

L2a:
    goto L2

L2:
    A2 := phi (A1, A5);                                 [0]
    D2 := phi (D1, D5);                                 [0]
    L2 := phi (L1, L1b);                                [0]
    M2 := phi (M1, M5);                                 [0]
    S2 := phi (S1, S5);                                 [0]
    T2 := phi (T1, T5);                                 [0]
    X2 := phi (X0, X5);                                 [0]
    Y2 := phi (Y0, Y5);                                 [0]
    if ___ then goto L3 else goto L4

L3:
    M3 := L1b + 4;                                      [2]
    Y3 := L1b + 1;                                      [2]
    goto L5

L4:
    S4 := A2 * B1;                                      [1]
    T4 := S4 + 1;                                       [2]
    goto L5

L5:
    A5 := phi (C1, A2);                                 [0]
    D5 := phi (D2, C1);                                 [0]
    S5 := phi (S2, S4);                                 [1]
    X5 := phi (L1b, S4);                                [1]
    M5 := phi (M3, M2);                                 [2]
    T5 := phi (T2, T4);                                 [2]
    Y5 := phi (Y3, T4);                                 [2]
    if ___ then goto L2 else return


Phase 3.

Converting from SSA form back into a conventional control flow graph is a matter of dropping subscripts and converting phi nodes into assignments on exit from a predecessor block. We then eliminate assignments of the form A := A and also eliminate empty basic blocks. The optimized program is

L1:
    read (A, B, C, D, L, M, S, T)
    L := C * B;
    goto L2

L2:
    if ___ then goto L3 else goto L4

L3:
    M := L + 4;
    Y := L + 1;
    A := C;
    X := L;
    goto L5

L4:
    S := A * B;
    T := S + 1;
    D := C;
    X := S;
    Y := T;
    goto L5

L5:
    if ___ then goto L2 else return