Read the next two papers:
Sethi and Ullman
Kurlander, Proebsting, and Fischer
The Rosen, Wegman, and Zadeck paper presents an efficient and effective global algorithm for recognizing common, available, redundant, and loop-invariant computations and for hoisting redundant or loop-invariant computations to a common pre-dominator. The algorithm works for reducible flow graphs that have been converted into static single assignment (SSA) form.
for each basic block B (in reverse topsort order) do
Although the following example is from the paper, I have changed the numbering of variables so that each variable has the same number as the basic block that assigns it. Unfortunately, some of the numbered variables derived from the original variable L have the same name as a basic block. I didn't notice this until it was too late to fix it easily.
L1:
read (A, B, C, D, L, M, S, T)
goto L2a
L2a:
goto L2
L2:
if ___ then goto L3 else goto L4
L3:
L := C * B;
M := L + 4;
A := C;
goto L5
L4:
D := C;
L := D * B;
S := A * B;
T := S + 1;
goto L5
L5:
X := A * B;
Y := X + 1;
if ___ then goto L2 else return
L2a has been inserted as a pre-header for the loop consisting of blocks L2, L3, L4, and L5. There is one back edge, from L5 to L2. This example is already in topsort order. The next step is to convert it into SSA form.
The variables that occur within this program are
A B C D L M S T X Y
We can form the initial name for each variable by suffixing it
with a 0. We can then traverse the spanning tree for the CFG,
keeping track of the current names for each variable.
phi (V, ??) where ??
stands for the name of the variable that will be transmitted
by the back edge. This name will be filled in later when we
get to the back edge.
phi (V1, V2).
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1)
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5);
B2 := phi (B1, B2);
C2 := phi (C1, C2);
D2 := phi (D1, D5);
L2 := phi (L1, L5);
M2 := phi (M1, M5);
S2 := phi (S1, S5);
T2 := phi (T1, T5);
X2 := phi (X0, X5);
Y2 := phi (Y0, Y5);
if ___ then goto L3 else goto L4
L3:
L3 := C2 * B2;
M3 := L3 + 4;
A3 := C2;
goto L5
L4:
D4 := C2;
L4 := D4 * B2;
S4 := A2 * B2;
T4 := S4 + 1;
goto L5
L5:
A5 := phi (A3, A2);
D5 := phi (D2, D4);
L5 := phi (L3, L4);
M5 := phi (M3, M2);
S5 := phi (S2, S4);
T5 := phi (T2, T4);
X5 := A5 * B2;
Y5 := X5 + 1;
if ___ then goto L2 else return
A trivial assignment is an assignment that has one of the forms
A := B;
A := phi (B, B);
A := phi (A, B); [see below]
[Although the third case above is from the paper, it appears
incorrect to me. With their conventions, I think this case
should be A := phi (B, A) instead.
Yes, I think it matters.]
Now that the program is in SSA form, a trivial assignment can be
removed by replacing all mentions of A by B
(or vice versa). As we remove trivial assignments, we look for
local redundancies: assignments within a single basic block whose
right hand sides are the same. The removal of trivial assignments
tends to create local redundancies. A local redundancy can be
removed by introducing a trivial assignment. So we repeatedly
remove trivial assignments and local redundancies until all are
gone.
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1)
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5);
# B2 := B1;
# C2 := C1;
D2 := phi (D1, D5);
L2 := phi (L1, L5);
M2 := phi (M1, M5);
S2 := phi (S1, S5);
T2 := phi (T1, T5);
X2 := phi (X0, X5);
Y2 := phi (Y0, Y5);
if ___ then goto L3 else goto L4
L3:
L3 := C1 * B1;
M3 := L3 + 4;
# A3 := C1;
goto L5
L4:
# D4 := C1;
L4 := C1 * B1;
S4 := A2 * B1;
T4 := S4 + 1;
goto L5
L5:
A5 := phi (C1, A2);
D5 := phi (D2, C1);
L5 := phi (L3, L4);
M5 := phi (M3, M2);
S5 := phi (S2, S4);
T5 := phi (T2, T4);
X5 := A5 * B1;
Y5 := X5 + 1;
if ___ then goto L2 else return
Intuitively, the available rank of a variable or expression is a bound on the (static) number of operators used to compute its value without traversing any back edges. Formally
phi (A, B) is the maximum
of the available ranks of A and B.
$ is a unary operation, then the available
rank of $ V is 1 + the available rank of V.
$ is a binary operation, then the available rank of
V1 $ V2 is 1 + the maximum of the available ranks of
V1 and V2.
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L5); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
if ___ then goto L3 else goto L4
L3:
L3 := C1 * B1; [1]
M3 := L3 + 4; [2]
goto L5
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
L5 := phi (L3, L4); [1]
M5 := phi (M3, M2); [2]
S5 := phi (S2, S4); [1]
T5 := phi (T2, T4); [2]
X5 := A5 * B1; [1]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
If the assignments within a basic block are sorted by rank, then the sorted order will be equivalent to the original order. If two assignments within a basic block have the same rank, then it doesn't matter which one comes first.
As in the PL.8 compiler, code motion is implemented by inserting new code and then deleting redundant code.
For each edge in the SSA form there is a movable computation table (MCT) that lists all assignments that are candidates for movement out of the basic block that is the destination of that edge.
For each rank and for each basic block, this phase
An ordinary basic block is a basic block that is not a loop header or pre-header. For example, L3, L4, and L5 are ordinary blocks.
Assignments are moved into a loop header exactly as for an ordinary basic block, but the identification of candidates for movement into the pre-header is complicated by the need to ensure that the computation is available along the back edge.
Loop pre-headers are treated the same as ordinary blocks, except that an attempt is also made to move fully redundant computations from the edges that exit the loop to the loop pre-header.
Question propagation searches the SSA form of the control-flow graph to determine whether a computation is available on entry to some basic block. Local question propagation is local to a single loop. Global question propagation can search the entire SSA form.
At rank 0 there is nothing to be done for any of the blocks in the example.
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L5); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
if ___ then goto L3 else goto L4
L3:
L3 := C1 * B1; [1]
M3 := L3 + 4; [2]
goto L5
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
L5 := phi (L3, L4); [1]
S5 := phi (S2, S4); [1]
X5 := A5 * B1; [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
Global redundancy elimination: Is the value of A5 available at entry to L5? Only if C1 is available at exit from L3, and A2 is available at exit from L4.
Is C1 available at exit from L3? Only if C1 is available at exit from L2.
Is A2 available at exit from L4? Only if A2 is available at exit from L2.
We have now propagated the question to L2 along both of its outedges, and the question is not the same along both outedges. So we give up.
For this example there are no global redundancies of rank 0.
At rank 1 the assignment to X5 becomes a candidate for movement into the predecessors of L5. Along the edge from L3 to L5 this assignment becomes
X3 := C1 * B1;
Along the edge from L4 to L5 this assignment becomes
X4 := A2 * B1;
At rank 1 the processing of block L4 begins by moving the assignment to X4 into L4.
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
X4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
This creates a local redundancy, which is removed by
introducing a trivial assignment:
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
X4 := S4; [1]
T4 := S4 + 1; [2]
goto L5
The trivial assignment to X4 is then removed by renaming
all uses of X4. Since there are no uses, there is no net
effect, and block L4 reverts to its previous form.
The assignments to L4 and S4 then become candidates for movement into L4's predecessor block L2.
Similarly block L3 becomes
L3:
L3 := C1 * B1; [1]
X3 := C1 * B1; [1]
M3 := L3 + 4; [2]
goto L5
which becomes
L3:
L3 := C1 * B1; [1]
X3 := L3; [1]
M3 := L3 + 4; [2]
goto L5
which reverts to the original form of L3 when the trivial
assignment to X3 is eliminated.
The assignment to L3 becomes a candidate for movement into L3's predecessor block L2.
The assignments to L3 and L4 are then merged into block L2. The entire loop has become
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L5); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
L2b := C1 * B1; [1]
if ___ then goto L3 else goto L4
L3:
L3 := C1 * B1; [1]
M3 := L3 + 4; [2]
goto L5
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
L5 := phi (L3, L4); [1]
S5 := phi (S2, S4); [1]
X5 := A5 * B1; [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
To determine whether the assignment to L2b should be a candidate
for movement into the loop pre-header L2a, we need to determine
whether C1 * B1 is available along the back edge
from L5. We use local question propagation determine that it
is.
The assignment to L2 is then copied to the pre-header L2a, and finally to block L1:
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
L1b := C1 * B1; [1]
goto L2a
L2a:
L2a := C1 * B1; [1]
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L5); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
L2b := C1 * B1; [1]
if ___ then goto L3 else goto L4
L3:
L3 := C1 * B1; [1]
M3 := L3 + 4; [2]
goto L5
L4:
L4 := C1 * B1; [1]
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
L5 := phi (L3, L4); [1]
S5 := phi (S2, S4); [1]
X5 := A5 * B1; [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
We now end the processing for rank 1 by using global question propagation to perform global redundancy elimination on assignments of rank 1. We begin with the assignment to X5:
Is A5 * B1 available at entry at the assignment to X5?
Is C1 * B1 available at exit from block L3?
Yes, as L3.
Is A2 * B1 available at exit from block L4?
Yes, as S4.
Similarly we find that the values computed by the assignments
to L4, L3, L2b, and L2a are available at those assignments.
The program becomes
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
L1b := C1 * B1; [1]
goto L2a
L2a:
L2a := L1b; [1]
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L5); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
L2b := L2a; [1]
if ___ then goto L3 else goto L4
L3:
L3 := L2b; [1]
M3 := L3 + 4; [2]
goto L5
L4:
L4 := L2b; [1]
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
L5 := phi (L3, L4); [1]
S5 := phi (S2, S4); [1]
X5 := phi (L3, S4); [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
Eliminating trivial assignments yields
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
L1b := C1 * B1; [1]
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L1b); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
if ___ then goto L3 else goto L4
L3:
M3 := L1b + 4; [2]
goto L5
L4:
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
S5 := phi (S2, S4); [1]
X5 := phi (L1b, S4); [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := X5 + 1; [2]
if ___ then goto L2 else return
At rank 2 we still have the assignment to S4 as a candidate for movement out of block L4.
The assignment to Y5 becomes a candidate for movement out of block L5. Along the edge from L3 to L5 this assignment becomes
Y3 := L1b + 1;
while along the edge from L4 to L5 it becomes
Y4 := S4 + 1;
These assignments will be copied into L3 and L4, creating a
local redundancy within block L4:
L3:
M3 := L1b + 4; [2]
Y3 := L1b + 1; [2]
goto L5
L4:
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
Y4 := S4 + 1; [2]
goto L5
When this local redundancy is replaced by a trivial assignment,
the trivial assignment soon disappears without a trace because
Y4 is never used.
The assignments to M3, Y3, and T4 become candidates for movement into block L2, but they will not be copied into block L2 because there is no redundancy.
We now end the processing for rank 2 by using global question propagation to perform global redundancy elimination on assignments of rank 2. We begin with the assignment to Y5:
Is X5 + 1 available at Y5?
Is L1b + 1 available at exit from block L3?
Yes, as Y3.
Is S4 + 1 available at exit from block L4?
Yes, as T4.
So the program becomes
L1:
read (A1, B1, C1, D1, L1, M1, S1, T1) [0]
L1b := C1 * B1; [1]
goto L2a
L2a:
goto L2
L2:
A2 := phi (A1, A5); [0]
D2 := phi (D1, D5); [0]
L2 := phi (L1, L1b); [0]
M2 := phi (M1, M5); [0]
S2 := phi (S1, S5); [0]
T2 := phi (T1, T5); [0]
X2 := phi (X0, X5); [0]
Y2 := phi (Y0, Y5); [0]
if ___ then goto L3 else goto L4
L3:
M3 := L1b + 4; [2]
Y3 := L1b + 1; [2]
goto L5
L4:
S4 := A2 * B1; [1]
T4 := S4 + 1; [2]
goto L5
L5:
A5 := phi (C1, A2); [0]
D5 := phi (D2, C1); [0]
S5 := phi (S2, S4); [1]
X5 := phi (L1b, S4); [1]
M5 := phi (M3, M2); [2]
T5 := phi (T2, T4); [2]
Y5 := phi (Y3, T4); [2]
if ___ then goto L2 else return
Converting from SSA form back into a conventional control flow
graph is a matter of dropping subscripts and converting phi
nodes into assignments on exit from a predecessor block.
We then eliminate assignments of the form A := A
and also eliminate empty basic blocks.
The optimized program is
L1:
read (A, B, C, D, L, M, S, T)
L := C * B;
goto L2
L2:
if ___ then goto L3 else goto L4
L3:
M := L + 4;
Y := L + 1;
A := C;
X := L;
goto L5
L4:
S := A * B;
T := S + 1;
D := C;
X := S;
Y := T;
goto L5
L5:
if ___ then goto L2 else return