Twobit Pass 4: frame optimization

Twobit's code generator requires a stack frame whenever:

When should these stack frames be allocated and deallocated?

Some compilers allocate a stack frame upon entry to every procedure. This is inefficient for procedures that may contain non-tail-recursive calls but seldom perform them.

Some compilers allocate a separate stack frame for every non-tail-recursive call. This is inefficient for procedures that perform many such calls.

Twobit performs a simple optimization to reduce the overhead of creating and initializing stack frames. The goals of this optimization are:

  1. To create no more than one stack frame per procedure activation.
  2. To create a stack frame only when necessary.

It is not always possible to achieve both of these goals. Twobit always achieves the first goal, but it sometimes creates an unnecessary stack frame. Chez Scheme always achieves the second goal but does not always achieve the first.

Control flow graphs

In the subset of Scheme that Twobit uses as its intermediate form, an expression represents straight-line code unless it contains a conditional expression or a non-tail-recursive procedure call. If non-tail calls are regarded as basic blocks with unknown effects, then the body of a lambda expression becomes an acyclic control flow graph with multiple exit points. The number of exit points is equal to 1 plus the number of conditional expressions that occur in a tail-recursive position. Fork points arise from conditional expressions, and come right after the test.

For example, the doubly recursive Fibonacci procedure

  (define fib
    (lambda (n)
      (if (< n 2)
          n
          (+ (fib (- n 1))
             (fib (- n 2))))))

contains one fork point and two exit points. Its control flow graph is

                     ----------
                     |  entry |
                     ----------
                          |
                          |
                          *
                        *   *
                      *       *
                        *   *
                      /   *   \
                    /           \
                  /               \
                /                   \
            *********        *****************
            *   n   *        * (fib (- n 1)) *
            *********        *****************
                                     |
                             *****************
                             * (fib (- n 2)) *
                             *****************
                                     |
                             *****************
                             *       +       *
                             *****************

There are only two kinds of positions within a control flow graph at which Twobit might allocate a stack frame:

Let's call these positions the save positions. If a stack frame is needed anywhere in the control flow graph, then at least one save position will dominate all places where a stack frame is needed. These dominating save positions are totally ordered by the control flow graph. Twobit will allocate a stack frame at the last save position that dominates all places where a stack frame is needed.

If no stack frame is needed, then Twobit will not allocate one.

For the Fibonacci example, Twobit will not allocate a stack frame for the base case. In the recursive case, Twobit will allocate a stack frame immediately after the test has determined that the recursive branch will be taken.

Frame optimization

Twobit generates a save instruction, which allocates a stack frame, at entry. The internal representation of the operand for this save instruction is a mutable data structure that the code generator uses to keep track of the maximum number of data slots within the frame that are live at any time. This maximum is initialized to -1.

Twobit generates code as though a stack frame has been allocated. Any use of the stack frame will ensure a non-negative value for the operand of the nearest save instruction that dominates the use. When the code generator reaches a new save position, it checks the mutable data structure to see whether a stack frame has been required to that point. If so, then it does not generate a save instruction at that save position, but continues to use the data structure for the previously generated save instruction. If no stack frame has been required, however, then it generates another save instruction, with a new mutable data structure as its operand.

A pop instruction is generated for returns and tail-recursive calls. Its operand matches the operand of the nearest save instruction that dominates it. The local optimizer and assembler ignore save and pop instructions whose operand is -1.

Comparison with Chez Scheme

Twobit's frame optimization is conservative in the sense that no procedure activation ever allocates more than one frame. Sometimes a stack frame will be allocated when none is actually needed, however. Consider

  (define f
    (lambda (x y z)
      (if (> x 0)
          (show x))
      (if (> y 0)
          (show y))
      (if (> z 0)
          (show z))
      (list x y z)))

Twobit will allocate a stack frame for f even if all of its arguments are zero. Chez Scheme would not. On the other hand, Chez Scheme would allocate three separate stack frames for f if all of its arguments are positive. Twobit would allocate only one.

The algorithm used in Chez Scheme is a form of total redundacy elimination for register saves. For more information see

Robert G Burger, Oscar Waddell, and R. Kent Dybvig. Register allocation using lazy saves, eager restores, and greedy shuffling. In 1995 ACM Conference on Programming Language Design and Implementation, June 1995, pages 130-138.

Frame optimization in Twobit is similar to partial redundancy elimination, but is not quite the same. For example, Twobit will generate code to allocate a stack frame within each arm of the conditional expression in

  (define f
    (lambda (n)
      (if (< n 2)
          (begin (show n)
                 n)
          (+ (f (- n 1))
             (f (- n 2))))))

Partial redundancy elimination would reduce code space by hoisting this code to a common dominator.