TIL and TAL

Presented by Vasileios Koutavas.
Transcribed by Carl Eastlund.

Typed Intermediate Languages

Typed intermediate languages (TIL) were introduced in part to alleviate the problems which make high level languages such as ML unsuitable for systems programming. Tardity, Morrisett, et al. in 1996 cited runtime performace and data representation as the major obstacles to be overcome.

Matthias: Performance is too broad. Garbage Collection!
Vasileios: Yes, that is a part of it.

Polymorphism requires functions such as the identity function to operate on values of all types. Because various types of data are represented with different sizes in memory, polymorphic functions must operate on boxed data. The data itself is allocated on the heap and each function processes a pointer to it. Language features such as garbage collection may add additional information to the data in the heap; this is called tagging. Data that is boxed must be accessed at an extra level of indirection; this leads to poor runtime performance. Because it cannot be decided statically which data flows into which functions, all data must be boxed. Even monomorphic functions suffer performance slowdown due to boxing. Boxed and tagged data is also inappropriate for communications with hardware devices. For instance, an array of boxed integers cannot be passed directly to a network card for transmission. These were the issues TIL set out to address.

A TIL compiler performs a series of transformations on the input language. The first set of steps is called type-directed translation, in which each intermediate language is typed. TIL is best illustrated by the first such language in Tardity and Morrisett's compiler, which was called λiML. This is the lambda calculus with ML-style polymorphism and intensional type analysis.

Greg: This has the Hindley-Milner type system?
Vasileios: Yes, and intensional type analysis.
Greg: Why have type inference, and not just explicit types?
Vasileios: I will show you.

Intensional Type Analysis

Harper and Morrisett in 1995 considered the improvement of polymorphic arrays with efficient representations for base types. For instance, the polymorphic array subscripting operation normally has type sub : ∀α. α array × int → α. This operation can be updated for the efficient implementations of BoolArray and FloatArray with 1-bit and 64-bit unboxed elements respectively, yielding the following code.

sub = Λα. typecase α of
      Bool ⇒ Boolsub
    | Float ⇒ Floatsub
    | τ ⇒ Boxedsub
    | β × γ ⇒ λ (<x,t>,i). <sub[β](x,i), sub[γ](y,i)>
    

This code also treats arrays of pairs as pairs of arrays; the efficient array types are used wherever their elements are part of the whole type.

Sam: This seems like C++ templates, where there are separate functions for every type.
Vasileios: Yes; one function sub will dispatch to many functions to perform its operations.

What is the type of sub?

Matthias: You gave the type of sub before.
Vasileios: That was in the source language. This is in λiML.

The arrays used are not polymorphic, therefore we need a new type constructor to represent our array type. The operation's new type is sub : ∀α. RecArray[α] × int ← α. The definition of RecArray mirrors that of sub.

RecArray = Λα. typecase α of
           Bool ⇒ BoolArray
         | Float ⇒ FloatArray
         | τ ⇒ BoxedArray
         | β × γ ⇒ RecArray[β] × RecArray[γ]
    
Peter: Is this a dependent type?
Vasileios: No, it is a recursive type.
Vasilieos: This type system has recursion. Is it decidable?
Sam: It follows the structure of its inputs.
Vasileios: Correct. The type system is based on well-founded induction, and therefore will always terminate.
Scribe: The following question was misunderstood at the time. The answer was obtained afterwards, by email.
Felix: What about the destructuring of recursive types such as List[α] = (α × List[&alpha]) + unit?
Vasileios: That definition of the list type is not expressible in λiML.
Vasileios: What effect does this type system have on performance?
Sam: That depends on whether intensional type analysis takes place at runtime or compile time.
Ryna: How do we dispatch on types without using tags?

This type system introduces several costs that did not exist before. All types must be constructed and kept at runtime. Multiple versions of every function must be compiled for all types. There is an extra level of indirection on polymorphic functions in choosing which version to run.

Sam: So this is like C++, with separate copies of all the functions.
Matthias: Hold on.

But these concerns do not exist in ML. When compiling an entire ML program, a monomorphic type can be inferred for every instantiation of a polymorphic function. All choices of function calls can be made statically, and the list of function instantiations to compile can be reduced.

Matthias: What about modules and separate compilation?
Vasileios: Yes, that can still be a problem. But some of these issues can still be solved at link time.

TIL and Safety

Morrisett and Tardity in 1996 benchmarked a TIL compiler and found it to run with a smaller memory footprint than non-TIL compilers. This benchmark was performed without any modules or separate compilation; all static analysis optimizations were performed.

Unlike proof carrying code, TIL does not provide any proofs about the final assembly code generated. There is a safety condition in type-directed translation, however. If e1 : t1 in L1 then e2 : t2 in L2 where expression e1 translates to e2 and type t1 translates to t2 when translating from language L1 to L2.

L1 => e1 : t1
      |    |
      |    |
L2 => e2 : t2
    

This can be used to verify the correctness of each step of typed translation.

Felix: Does every t1 translate to the same t2? I could imagine the target language having a more expressive type language than the source.
Vasileios: Yes, they always translate to the same type.
Matthias: There is some prehistory to TIL. Earlier, there was a broken version that used copying of types such as arrays, and effects could then be lost. Harper had Morrisett and Tardity work out intensional type analysis to fix this. There was also some work in the late '80s in which type systems such as C's had to work out what bit patterns would be used for types. In the case of separate compilation, they had to make sure that similar bit patterns for differing types never passed into each other's locations.

Typed Assembly Language

Morrisett, Walker, Crary, and Clew in 1999 wrote a compiler from System F to TAL. TAL was an assembly language conceived of as an instance of Necula's proof carrying code.

Proof carrying code raises two major issues.

  • What properties should a security policy require?
  • How can producers of the code construct the necessary proofs?
  • The answer to the first question is largely context-dependent. The second question is undecidable in general; it is impossible to automatically prove even simple properties for arbitrary code. TAL uses type safety as its security policy, and leaves responsibility for proof generation with the programmer. The cycle of programming, compiling, and programming again to fix type errors can be seen as a form of interactive theorem proving.

    The TAL compiler begins with its high level typed language and compiles down to TAL; each intermediate language must be able to express the desired security policy in its type languag. The initial implementation used control flow safety and memory safety as its policy.

    TAL's operations are based on an abstract machine modeled by a three-tuple (H,R,I). I is the set of instruction sequences; each sequence is some number of instructions terminated by an unconditional jump. R is a mapping from registers to values. H is the heap; it maps labels to instruction sequences. A sample transition in this abstract machine is:

    (H, R, jmp v) → (H, R, I′) where R(v) = l and H(l) = I′

    Control safety is present whenever there is always a valid transition in the abstract machine. This is guaranteed by the type system. Each instruction set I has type code(Γ), where Γ is a typing of all registers. Labels have the same types as their instruction set in the heap; individual instructions have types of the form Γ1→Γ2. A sample typing rule in this system is as follows:

    i : Γ1→Γ2   I : code1)
    i ; I : code2)
    Matthias: What if the type system cannot prove a program is control safe?
    Vasileios: Then it is not type safe.
    Matthias: Then it won't run?
    Vasileios: Yes. The program will not pass the type checker, which is the first stage of compilation, and will not be compiled.
    Loops in these programs are encoded as jumps. The invariants of the loops are the types of all labels within the loops.
    Felix: So all labels need a single type.
    Vasileios: Yes, the Γ terms are requirements for the types of all registers for each label.
    Sam: We must have a subtyping relation on Γs.
    Vasileios: In the case of unrestricted registers, the Γ term will contain a type variable and there is subtyping.
    Kenichi: Is it possible for registers to hold different types at different times?
    Vasileios: Yes, each instruction maps some Γ1 to some other Γ2.
    Kenichi: What about the flow of values within a loop?
    Vasileios: The types of labels and subtyping rules require only correct loops.