An Introduction to Compiler Verification

January 28, 2013

Last semester I started working on a compiler verification project with Amal Ahmed. We've been investigating whether the CPS transformation presented in [1] continues to preserve equivalence when the source and target languages are extended with recursive types, and hence, non-termination. On a high level, it looks like the equivalence preservation property should hold, yet after some initial progress we've become stuck on proving a continuation shuffling lemma. I plan to write more on that soon, but for now I'd like to give an introduction to compiler verification.

What are we verifying about compilers?

Compilers are software that transform programs from a source language into programs of a target language. For instance, GHC compiler transforms Haskell programs into native machine code. Of course the difference between a language such as Haskell and assembly is very large, so compilers break up the work into multiple transformation passes between several intermediate languages.

How do we know that compilers, such as the GHC compiler, are “correct”? That is, how do we know that the meaning we encode in our source program is preserved in the compiler-emitted target program?

This is where compiler verification comes into play. The goal of compiler verification is to prove that a compiler preserves key properties of the program it is transforming. In specific, we are concerned with preserving the semantics, or behavior of a program, as well as the equivalence between two programs.

Semantics preservation

A semantics preserving transformation from a program $s$ , of source language $S$ , to a program $t$ , of target language $T$ , ensures that $t$ behaves as prescribed by the semantics of $s$ . This means that we expect the same observable behavior from running $t$ that we get by running $s$ . Formally, the transformation $⇝$ is semantics preserving when $\forall B . s ⇓ B \Leftrightarrow t ⇓ B$ , where $B$ is an observable behaviour and $s ⇝ t$ .

Semantics preservation is what one would intuitively expect from a compiler: to transform a program while maintaining its observable behaviours. The need for equivalence preservation is more subtle, and becomes apparent when we start thinking about module compilation and linking target-level code that has been compiled from various source languages.

Equivalence preservation

An equivalence preserving transformation ensures that if two source components are in some way equivalent in the source language, then their target language translations are also equivalent. The canonical notion of program equivalence for many applications is observational, or contextual equivalence. Two programs are contextually equivalent if no well-typed context can make the programs exhibit observably different behaviors.

A non-equivalence preserving transformation would mean that there exists a target context $C_{T}$ where $C_{T} [t_{1}] ≉_{T} C_{T} [t_{2}]$ given that $s_{1} \approx_{S} s_{2}$ , $s_{1} ⇝ t_{1}$ and $s_{2} ⇝ t_{2}$ . Ensuring that such a context can't arise is important for programmers, enabling them to work at the level of abstraction provided by $S$ and not worry about the mechanics of $T$ .

Let's take an example out of Kennedy's overview of how compiling C# to the .NET Intermediate Language is not equivalence preserving [13]: At the C# level, true and false are the only values that inhabit the type bool. On the other hand, bool and byte are interchangeable at the IL level. Hence, while the WriteLine statement below would never execute at C# level, some tricky IL code could pass NotNot a byte value not representing true (1) or false (0) to execute the WriteLine statement.

public void NotNot(bool b) {
  bool c = !b;
  if (!c != b) { Console.WriteLine("This cannot happen"); }
}

This means that equivalence of the following C# code is not preserved after compilation to the IL.

public bool Eq1(bool x, bool y) { return x==y; }
public bool Eq2(bool x, bool y) { return (x==true)==(y==true); }

Full abstraction

Traditionally researchers were concerned whether the denotational model of Plotkin's PCF was fully abstract. Here “a model is called fully abstract if two terms are semantically equal precisely when they are observationally equal, i.e., when they may interchange in all programs with no difference in observable behaviour.” [6]

In the context of compiler verification, the full abstraction of a transformation means that it preserves both semantics and equivalence, or rather it is equivalence reflecting and preserving.

To show that $⇝$ is fully abstract, we will need to show it is both equivalence preserving and reflecting. Consider $s_{1} ⇝ t_{1}$ and $s_{2} ⇝ s_{2}$ :

Equivalence reflection means that $t_{1} \approx_{T} t_{2}$ implies $s_{1} \approx_{S} s_{2}$ . Reflection captures the notion of compiler correctness, or the preservation of semantics. That is, a compiler translation is incorrect if it maps non-equivalent source terms to equivalent target terms.

Equivalence preservation means that $s_{1} \approx_{S} s_{2}$ implies $t_{1} \approx_{T} t_{2}$ . Preservation is harder to prove than reflection, hence it is often proven indirectly by assuming $t_{1}$ and $t_{2}$ are not equivalent and deriving a contradiction.

Note that if the set of possible contexts at the target level is restricted to those that can be obtained by translating source contexts, then proving full abstraction becomes easy since equivalence preservation falls out of proving equivalence reflection. Such a restriction would rule out the ability to reason about most modern languages, for instance compiled Java code is often linked with code from other source languages at the JVM byte-code level.

How do you verify compilers?

There are two main approaches to compiler verification: logical relations and bisimulation.

Logical relations are a tool for generalizing, or strengthening an induction hypothesis as a means to make a proof go through. They can be based on either denotational or operational semantics, and the basic idea behind them is to define relations on well-typed terms via structural induction on the syntax of types. Logical relations were originally introduced to prove strong normalization of the simply typed $λ$ -calculus but have since been adapted to work with various exotic language features. For instance, step-indexing allows for reasoning about non-terminating languages [5] and indexing with Kripke worlds allows for reasoning about mutable references [7].

Bisimulations, roughly put, are relations over the operational semantics of source and target languages. They “define a relation R between pairs of configurations, such that if two configurations are related by R, and each takes a step, then the resulting configurations are related by R, and if two final configurations are related by R, then they represent the same answer” [2]. According to Dreyer et al. [9] there are two main types of bisimulations used for equivalence related compiler verification: environmental bisimulations [2, 3, 12] and normal form bisimulations [4].

Both techniques have their advantages and disadvantages, though I'll leave deeper explorations of these techniques to a future post.

Closed vs. Open world compilers

One important distinction when verifying a compiler is whether to assume a closed or open world.

With a closed world, compilers must have access to entire, closed programs, which includes linked libraries. Projects like CompCert utilize this closed world assumption to do compiler verification. In the case of CompCert, a simulation proof is set up, where the source and compiled programs are run and deemed semantically equivalent if they produce the same trace of observable events [10]. Often while dealing with compilers in the real world, we cannot assume a closed world. For instance, it would be extremely useful to be able to verify that the compilation of an unlinked library is correct, which does not fit in a closed world.

On the other hand, an open world enables compilers to work with open programs that can then later be linked together. In order to work in an open world, Benton and Hurr [8] set up a cross-language logical relation to prove semantics preservation of a transformation from a purely functional language to the SECD machine. Using their technique, given $s_{1} ⇝ t_{1}$ , in order to prove the correctness of the linking of $t_{1}$ with $t_{2}$ , one must provide a corresponding $s_{2}$ and show $s_{2} \approx t_{2}$ . Given a large $t_{2}$ , the requirement to provide a corresponding $s_{2}$ such that $s_{2} ⇝ t_{2}$ becomes too onerous for this proof technique to be practically applicable.

This brings us to exciting new research by Perconti and Ahmed that attempts to remedy the need to provide a corresponding $s_{2}$ by utilizing interoperability semantics (originally introduced by Matthews and Findler [11]). As presented by Perconti at HOPE 12, they are attempting to prove semantics preservation of a ML to assembly compiler by embedding source and target languages, $S$ and $T$ , into a larger language, $S T$ (in reality they define multiple such languages for each compiler pass). Corresponding semantic boundary wrappings ( $ST$ and $TS$ ) are provided, enabling a wrapped $TS (t)$ term to be used as a $S$ term in proofs. Their technique for proving semantics preservation is then to use a logical relation over the $S T$ language to prove contextual equivalence between $s$ and $ST (t)$ terms.

Conclusion

In this post we've covered what properties one looks for when verifying a compiler. Semantics preservation, also known as compiler correctness, refers to a compiler's ability to preserve the meaning of a program throughout a transformation. Equivalence preservation ensures that programmers can think in terms of source level abstractions, and not worry about how modules and libraries interact at the target level. A compiler that exhibits both properties can be said to be fully abstract.

From there, I briefly introduced to common techniques for compiler verification: logical relations and bisimulations.

Lastly, I covered the difference between closed and open world compiler verifications and alluded to some ongoing research on open world compilers.

All of this brings many more questions and things to explore:

I'd like to better understand bisimulation techniques and am curious whether they scale to open world compilers?
Logical relations can have some pretty crazy formulations, for instance the logical relation presented in [9] is indexed by a state transition systems. How else can logical relations be indexed?
The interoperability semantics technique used by Perconti and Ahmed, as well as in previous work [1], seems too good to be true. How do interoperability wrappers eliminate the need for the decompilation of a $T$ term into an $S$ term?
Are there other useful notions of equivalence for compiler verification other than observational equivalence?

Phillip Mates