Read the Chow and Hennessy paper on priority-based coloring.

We will have a midterm 1 May. Open book, open notes.


Global optimizations


The control flow graph for

void ip (array of ref float x, array of ref float y) {
    int n = #x;
    ref int i = new 0;
    ref float sum = new 0.0;
    while i < n do {
        sum := sum + x[i] * y[i];
        i := i + 1
    };
    sum
};

is

L0:
    goto L1

L1:
    n := #x;
    i := new 0;
    sum := new 0.0;
    goto L2

L2:
    goto L3

L3:
    t1 := @i;
    t2 := t1 < n;
    if t2 then goto L4 else goto L5

L4:
    t3 := @sum;
    t4 := @i;
    t5 := x[t4];
    t6 := t3 +% t5;
    t7 := @i;
    t8 := y[t7];
    t9 := t6 +% t8;
    @sum := t9;
    t10 := @i;
    t11 := t10 + 1;
    @i := t11;
    goto L3

L5:
    t12 := @sum;
    return t12


Aliasing

Escape analysis for reference values: i and sum are initialized by local declarations of the form

    ref T I = new E
and they do not escape: They are not accessible from any function expressions that escape, and are mentioned only in load and store (@) contexts.

Hence they are not aliased to anything, so @i and @sum can be treated as scalar variables instead of load or store operations. Replacing i and sum by their R-values v_i and v_sum, we get:

L0:
    goto L1

L1:
    n := #x;
    v_i := 0;
    v_sum := 0.0;
    goto L2

L2:
    goto L3

L3:
    t1 := v_i;
    t2 := t1 < n;
    if t2 then goto L4 else goto L5

L4:
    t3 := v_sum;
    t4 := v_i;
    t5 := x[t4];
    t6 := @t5;
    t7 := t3 +% t6;
    t8 := v_i;
    t9 := y[t8];
    t10 := @t9;
    t11 := t7 +% t10;
    v_sum := t11;
    t12 := v_i;
    t13 := t12 + 1;
    v_i := t13;
    goto L3

L5:
    t14 := v_sum;
    return t14


L0    uses { }                 defines { }

L1    uses {x}                 defines {n, v_i, v_sum}

L2    uses { }                 defines { }

L3    uses {v_i, n}            defines {t1, t2}

L4    uses {v_sum, v_i, x, y}  defines {t3, t4, t5, t6, t7,
                                        t8, t9, v_sum, t10,
                                        t11, v_i}

L5    uses {v_sum}             defines {t14}


Liveness

LiveIn(i)  =  Uses(i)  union  ( LiveOut(i) - Defines(i) )

LiveOut(i)  =  Union { LiveIn(j) | i goesto j}


At the return point (on exit from L5), no local variables are live except t14.

LiveIn(0) = LiveOut(0)

LiveIn(1) = {x}  union  (LiveOut(1) - {n, v_i, v_sum})

LiveIn(2) = LiveOut(2)

LiveIn(3) = {v_i, n}  union  (LiveOut(3) - {t1, t2})

LiveIn(4) = {v_sum, v_i, x, y}
            union  (LiveOut(4) - {t3, t4, t5, t6, t7,
                                  t8, t9, v_sum, t10,
                                  t11, v_i})

LiveIn(5) = {v_sum}  union  (LiveOut(5) - {t14})

LiveOut(0) = LiveIn(1)

LiveOut(1) = LiveIn(2)

LiveOut(2) = LiveIn(3)

LiveOut(3) = LiveIn(4)  union  LiveIn(5)

LiveOut(4) = LiveIn(3)

LiveOut(5) = {t14}

LiveIn(0) = {x, y}

LiveIn(1) = {x, y}

LiveIn(2) = {x, y, n, v_i, v_sum}

LiveIn(3) = {x, y, n, v_i, v_sum}

LiveIn(4) = {x, y, n, v_i, v_sum}

LiveIn(5) = {v_sum}

LiveOut(0) = {x, y}

LiveOut(1) = {x, y, n, v_i, v_sum}

LiveOut(2) = {x, y, n, v_i, v_sum}

LiveOut(3) = {x, y, n, v_i, v_sum}

LiveOut(4) = {x, y, n, v_i, v_sum}

LiveOut(5) = {t14}

We can use the information provided by liveness analysis to perform global register allocation.


A hidden cost of garbage collection

There isn't much else we can do with this example by way of global optimization, because our copying garbage collector requires the address arithmetic for loads and stores involving heap objects to be delayed until run time.

If not for the garbage collector, we could expose some of this addressing arithmetic to optimization:

L0:
    goto L1

L1:
    n := #x;
    v_i := 0;
    v_sum := 0.0;
    goto L2

L2:
    goto L3

L3:
    t1 := v_i;
    t2 := t1 < n;
    if t2 then goto L4 else goto L5

L4:
    t3 := v_sum;
    t4 := v_i;
    u1 := 4 * t4;
    t5 := x + u1;
    t6 := @t5;
    t7 := t3 +% t6;
    t8 := v_i;
    u2 := 4 * t8;
    t9 := y + u2;
    t10 := @t9;
    t11 := t7 +% t10;
    v_sum := t11;
    t12 := v_i;
    t13 := t12 + 1;
    v_i := t13;
    goto L3

L5:
    t14 := v_sum;
    return t14

Local optimization on L3 and L4 yields

L3:
    t2 := v_i < n;
    if t2 then goto L4 else goto L5

L4:
    u1 := 4 * v_i;
    t5 := x + u1;
    t6 := @t5;
    t7 := v_sum +% t6;
    t9 := y + u1;
    t10 := @t9;
    v_sum := t7 +% t10;
    v_i := v_i + 1;
    goto L3

Loop optimizations

The L3-L4 loop contains no loop-invariant computations except n, x, and y, but it does have several induction variables.

An induction variable is a variable whose value changes by a constant amount on each iteration of the loop. Any variable whose value is a linear function of an induction variable is also an induction variable.
Proof: Let v be an induction variable, and let w be a variable such that

    w = c1 * v + c2
Let
    v' = v + c
    w' = c1 * v' + c2
Then
    w' = c1 * (v + c) + c2
       = c1 * v + c2 + c * c1
       = w + (c * c1)
so the new value of w differs from the old by (c * c1).

We can find induction variables by searching for variables that are assigned exactly once on every path through the loop body by an assignment whose right hand side is the sum or difference of the variable and a constant. We can then find related induction variables by searching for variables that are assigned exactly once on every path by an assignment whose right hand side is a linear function of an induction variable.

For example, v_i is an induction variable. Therefore u1 is an induction variable, as are t5 and t9:

    v_i := v_i + 1;        ; + 1
    u1 := 4 * v_i;         ; + 4
    t5 := x + u1;          ; + 4
    t9 := y + u1;          ; + 4

It may be advantageous to create a new induction variable with the same value as the old that is initialized in the loop pre-header and is updated by an assignment of the form w := w + c.

L2:
    vii := v_i;
    u1i := 4 * vii;
    t5i := x + u1i;
    t9i := y + u1i;
    goto L3

L3:
    t2 := v_i < n;
    if t2 then goto L4 else goto L5

L4:
    u1 := 4 * v_i;
    t5 := x + u1;
    t6 := @t5;
    t7 := v_sum +% t6;
    t9 := y + u1;
    t10 := @t9;
    v_sum := t7 +% t10;
    v_i := v_i + 1;
    vii := vii + 1;
    u1i := u1i + 4;
    t5i := t5i + 4;
    t9i := t9i + 4;
    goto L3

The new induction variable can be used in place of the old. To prevent an old induction variable that is live on exit from the loop from being updated each time through the loop, it can be assigned at loop exit, using a new block for that purpose if necessary.

L2:
    vii := v_i;
    u1i := 4 * vii;
    t5i := x + u1i;
    t9i := y + u1i;
    goto L3

L3:
    t2 := vii < n;
    if t2 then goto L4 else goto L5a

L4:
    u1 := 4 * vii;            ; now dead
    t5 := x + u1i;            ; now dead
    t6 := @t5i;
    t7 := v_sum +% t6;
    t9 := y + u1i;            ; now dead
    t10 := @t9i;
    v_sum := t7 +% t10;
    v_i := vii + 1;           ; now dead
    vii := vii + 1;
    u1i := u1i + 4;
    t5i := t5i + 4;
    t9i := t9i + 4;
    goto L3

L5a:
    v_i := vii;              ; dead
    u1 := u1i;               ; dead
    t5 := t5i;               ; dead
    t9 := t9i;               ; dead
    goto L5

Uses of an induction variable can be replaced by uses of a linearly related induction variable. The induction variables t5i and t9i are used as addresses, so they are not good candidates for replacement. On the other hand, u1i isn't doing anything useful within the loop itself, and the only use of vii is to calculate a boolean, so their uses can be replaced by uses of t5i.

L2:
    vii := v_i;               ; now dead
    u1i := 4 * vii;           ; now dead
    t5i := x + u1i;
    t9i := y + u1i;
    goto L3

L3:
    t20 := 4 * n;             ; t20 = 4 * n
    t21 := t20 + x;           ; t21 = x + 4 * n
    t2 := t5i < t21;          ; (t5i < t21) = (vii < n)
    if t2 then goto L4 else goto L5a

L4:
    u1 := 4 * vii;            ; dead
    t5 := x + u1i;            ; dead
    t6 := @t5i;
    t7 := v_sum +% t6;
    t9 := y + u1i;            ; dead
    t10 := @t9i;
    v_sum := t7 +% t10;
    v_i := vii + 1;           ; dead
    vii := vii + 1;           ; now dead
    t22 := t5i - x;           ; now dead
    u1i := t22 + 4;           ; now dead
    t5i := t5i + 4;
    t9i := t9i + 4;
    goto L3

L5a:
    v_i := vii;              ; dead
    u1 := u1i;               ; dead
    t5 := t5i;               ; dead
    t9 := t9i;               ; dead
    goto L5

Reduction in strength then falls out as a byproduct of dead code elimination and hoisting of computations whose values are constant within the loop.

L2:
    t5i := x + u1i;
    t9i := y + u1i;
    t20 := 4 * n;
    t21 := t20 + x;
    goto L3

L3:
    t2 := t5i < t21;
    if t2 then goto L4 else goto L5a

L4:
    t6 := @t5i;
    t7 := v_sum +% t6;
    t10 := @t9i;
    v_sum := t7 +% t10;
    t5i := t5i + 4;
    t9i := t9i + 4;
    goto L3

L5a:
    goto L5