Most of the optimizations in Pass 2 are independent of the target architecture. Those that are sensitive to the code generator or target are controlled by parameters, customizable policies, and compiler switches:

- Register set
- Inlining of primitive operations
- Inlining of local procedures
- Lambda lifting
- Interprocedural register targeting
- Compiler switches

The number of

The front end of Twobit currently has no knowledge of specialized
representations or registers. In particular, Twobit boxes all
numbers and uses subroutine calls for all

The IEEE standard for Scheme allows any procedure to be redefined. Since most operations are procedures in Scheme, this prevents the compiler from generating inline code for most operations.

A compiler switch allows the programmer to promise that none of the usual primitive procedures will be redefined, which allows Twobit to compile them specially. When this switch is true:

- Pass 1 standardizes the number of arguments passed to procedures
such as
`+`

and performs some constant folding. - Pass 2 performs certain local source transformations that could not otherwise be performed.
- Pass 4 generates inline code for calls to primitive procedures.

A table of primitive operations is supplied as a parameter. For each primitive operation, this table specifies:

- The name of the operation.
- Its arity.
- Its MacScheme machine assembler mnemonic or encoding.
- A predicate that recognizes constant operands for which an immediate form of the operation is implemented by the assembler.

Except for the assembler, which is target-dependent, Twobit has no further knowledge of primitive data types and operations.

It is always possible to lift all known local procedures to the outermost lambda expression. This is not always desirable, however, because the formal parameters that must be added represent extra registers that must be saved across a non-tail-recursive call. This is a target-dependent tradeoff, which should be evaluated by the interprocedural register allocator.

At each stage of incremental lambda lifting,
after the flow equations have been
solved but before the source code is transformed, Twobit calls
`POLICY:LIFT?`

, a predicate that determines whether
lifting will be performed.
`POLICY:LIFT?`

is given three arguments to examine:

- The lambda expression that contains the internal definitions to be lifted.
- The enclosing lambda expression to which the internal definitions will be lifted.
- The solutions to the flow equations, which list the formal parameters that will be added if lifting is performed.

Many factors could be taken into account when deciding whether to lift a group of known local procedures. Here are just a few:

- If the body of the source lambda expression contains a lambda expression for which a closure must be created anyway, then its internal definitions could share that closure without lifting.
- If the destination lambda expression already contains internal definitions, then adding more will not add to the cost of creating a closure at the destination.
- If lifting would add too many new formal parameters, then it should not be done.
- If a procedure contains non-tail-recursive calls or a large number of local variables, then adding new local variables is likely to increase the cost of register spilling within that procedure.
- If many procedures are available for lifting, then it is likely that the programmer defined them for use in a computation whose cost will dominate the cost of creating a closure.

For example, Twobit currently transforms

(define (sieve n) (define (new-generator next-prime prime) (let ((psquare (* prime prime))) (lambda () (define (loop t) (if (or (< t psquare) (not (zero? (remainder t prime)))) t (loop (next-prime)))) (loop (next-prime))))) ...)into

((lambda () (define .sieve_3 (lambda (.n_5) ...)) (define .new-generator_6 (lambda (.next-prime_10 .prime_10) ((lambda (.psquare_11) (define .loop_13 (lambda (.t_15) (if ((lambda (.T93_16) (if .T93_16 .T93_16 (not (zero? (remainder .t_15 .prime_10))))) (< .t_15 .psquare_11)) .t_15 (.loop_13 (.next-prime_10))))) (lambda () (.loop_13 (.next-prime_10)))) (* .prime_10 .prime_10)))) ...))Here the known local procedure

`.loop_13`

has been
lifted out of only one lambda expression. Lifting to the enclosing
lambda expression would add `.psquare_11`

as another
argument, and there would be no benefit at all: A closure must be
created anyway for the body of the lambda expression that defines
`.loop_13`

, and `.loop_13`

can share that
closure at no cost.
The policy used for incremental lambda lifting does not appear to be critical in practice. It is usually advantageous to lift all known local procedures to the outermost level, as is done by conventional lambda lifting.

Since the compiler can by definition locate all calls to a known local procedure, it can reorder the procedure's formal parameters and rewrite each call to match the new order. In Twobit, the formal parameters are just names for the registers that are live at entry to the procedure, so reordering the parameters amounts to interprocedural register targeting.

Register targeting is easier than interprocedural register allocation, because register allocation involves decisions concerning which values should be kept in registers and how long they should be kept there. Twobit simply assumes that the arguments to known procedures will be kept in registers for the duration of each call, so all there is to do is to select a particular register for each argument. The goal of register targeting is to minimize register shuffling when one procedure calls another.

Twobit performs interprocedural register targeting by reordering
the formal parameters that it adds to known local procedures during
lambda lifting.
The ordering is chosen by by a

If the set A of formal parameters to be added to `f`

is equal to
the set of formal parameters for the lambda expression L
that defines `f`

, then Twobit adds the new parameters to the
beginning of the parameter list for `f`

, after sorting them
so that, for a call to `f`

from the body of L, these parameters
will already reside in the correct registers.

If A is a proper subset of the formal parameters of L, and this is
the first lifting of `f`

, then the
parameters in A are allocated to the positions they occupy in
the formal parameter list of L, and the original formal
parameters of `f`

are
`x`

of L is not an element of A but
occurs as an actual parameter in some call to `f`

, then the
corresponding formal parameter of `f`

should be placed in the
position left vacant by `x`

.
These heuristics reduce register shuffling for calls
between mutually recursive procedures, including loop nests.

Several

`(integrate-usual-procedures #t)`

tells Twobit to assume that Scheme's usual procedures will not be redefined, which allows Twobit to generate inline code for them.`(benchmark-mode #t)`

tells Twobit to compile the`(define (f ...) ...)`

syntax for a top-level definition as if it were(define f (let () (define (f ...) ...) f))

This compiler switch has no effect on the`(define f ...)`

syntax for top-level definitions, nor does it affect the interpretation of internal definitions.`(local-optimizations #t)`

activates Twobit's Pass 5.

The first two switches are taken from MacScheme. They allow a programmer to assert that preserving the precise semantics of Scheme when certain top-level procedures are dynamically redefined is less important that generating efficient code.

The benefits of
`benchmark-mode`

are explained by

Andrew W Appel. Loop headers in lambda-calculus or CPS. Princeton University CS-TR-460-94, June 1994.