Proof Carrying Code

Presented by Carl Eastlund
Transcribed by Vasileios Koutavas

Yuan Yu '92, Automated Proofs of object code for a microprocessor

This paper, although it does not introduces the idea of Proof Carrying Code, it sets the background for it. Its purpose is to make possible to construct proofs about machine code.

Proving properties about machine code is important because of the following reasons:

For the above reasons, we can't be sure about the validity of some properties of the code, unless we verify the actual machine code.

In order to accomplish that, Yu defines a formal specification of the machine and constructs predicates about its properties of the form:

	Pre ==> Requirements(Code)

If pre is false then we can't guarantee the conformance of the Code with the requirements.

Using this approach he was able to prove certain properties for the Berkeley C string library. These properties involved safety and correctness of the functions in the library. [From the 369 newsgroup: "the proofs showed that each function produced correct output in the proper location as well as that it observed all conventions for accessing memory and registers. They caught some existing bugs in the code in the course of their work, and were able to get corrections applied to subsequent versions. --Carl"]

Necula and Lee '96, Safe Kernel Extensions Without Run-time Checks

In this paper Necula and Lee's goal was to enable the integration of user code for packet filtering in the kernel. In order to assure that user code won't do anything "bad" to the rest of the kernel, they extended Yu's idea into the Proof Carrying Code framework.

In PCC there is some code producer and code consumer. The main steps of the PCC process are:

If the code passes the above steps successfully, then it is safe to be integrated in the kernel.

In this paper Necula and Lee worked mainly on the process of generating Safety Predicates. For this they defined an Abstract Machine that provides a necessary abstraction level over the actual DecAlpha machine (e.g. providing conditional transitions, based on the value of a predicate -- see example below)

The states of the abstract machine were of the form (\ro, pc), where \ro is the state of all registers, and pc the program counter.

Each of the instructions of the Abstract Machine corresponds to a (possibly conditional) transition to a state. e.g.:

ADDQ r_s, op, r_d       ~~~> (\ro[r_d <-- r_s(+)op], pc+1)
LDQ r_d, n(r_s)
    and saferd(r_s(+)n) ~~~> (\ro[r_d <-- sel(r_m, r_s(+)n)], pc+1)

The main point of using this Abstract Machine is that in order to prove the Safety Policy for a given program, it is sufficient to prove that the A.M. will not halt for this program.

The actual Safety Policy consists of:

Some of the VCGen rules are:

ADDQ r_s, op, r_d       ~~~> VC_pc = VC_(pc+1)[r_d <-- r_s(+)op]
LDQ r_d, n(r_s)
    and saferd(r_s(+)n) ~~~> VC_pc = saferd(r_s(+)n) /\
                                       VC_(pc+1)[R_d <-- sel(r_m,
RET                     ~~~> VC_pc = Post

This form of verification condition generation does not allow jumps to previous instructions (loops). It turned out though that even this is useful since the packet filtering algorithms don't contain loops.

As an example consider the following piece of code: it assumes that the machine has 1 word of data memory and 1 word of tag for access. If the tag is 0 then the code is not allowed to access the data word.

                       ;; tag address is in r_0
L1: ADDQ r_0, 8, r_1   ;; calc data addr from r_0 and put it in r_1
L2: LDQ r_0, 8(r_0)    ;; put data in r_0 using offset from tag addr
L3: LDQ r_2, -8(r_1)   ;; put tag in r_2 using offset from data addr
L4: ADDQ r_0, 1, r_0   ;; incr r_0
L5: BEQ r_2, L7        ;; test if data is writable
L6: STQ r_0, 0(r)      ;; store data from r to r_0

This code uses strange offset arithmetic to access the tag and data, and also the test of whether the data is writable is not done in the beginning, where one would expect. This suggests that the code is the output of an optimizer; but it is still verifiable.

The verification condition for this code is:

forall (r_0, r_m). Pre ==> saferd(r_0(+)8) /\
                           saferd(r_0(+)8(-)8) /\
			   (sel(r_m, r_0(+)8(-)8)=0 => true) /\
			   (sel(r_m, r_0(=)8(-)8)=/=0 => 
In order to show that the code is safe, all we have to do is to prove this predicate.

Necula and Lee used a theorem prover (that they did not actually name) in order to generate the predicates and their proofs. For checking the proofs they used the Edinburgh Logical Framework, a relatively simple theorem prover.

They were able to verify the safety of packet filter code, before execution, and therefore remove all run-time checks in them. This introduced a startup latency, but it speed up the code a lot, making the overall performance much better.

An other point that they were concerned abount, was that the size of the proof was 3 times larger than the code itself, and in the worst case was exponential.

The contribution of this paper was that it provided a framework of running safely code, without runtime checks. The weak points of this approach was the size of the produced code, and the difficulty of the Safety Condition and Proof generation.

Necula and Lee '98, Design and Implementation of a Certified Compiler

This paper involved the creation of a compiler for a type-safe subset of C that automatically produces proofs of memory safety and type safety.

The compiler is composed from the following components:

  1. An actual compiler that takes as input a program written in the source language and produces annotated target code and the appropriate types, and
  2. A certifier that consists of:
    1. A Verification Condition Generator, that takes the annotated code and the types and produces a Safety Predicate,
    2. A theorem prover that takes the Safety Predicate and produces a proof of safety or a counter-example, and
    3. A proof checker that checks that the proofs that are produced are correct.

The use of the proof checker seems redundant, since the theorem prover should produce correct proofs. But by using the proof checker, we reduce the trustable code of the entire compiler to just the code of the checker. This is very important, since the proof checker is a small program (comparable for example to the theorem prover) and it feasible to make sure that it is correct.

In this particular implementation, all the programs that are going to pass the first compilation stage, are going to be certified. This is because the properties which are verified and checked are ensured at the source level by the semantics of the language (type-safe subset of C). What is basically checked is the correctness of the compilation process. If for instance the compiler takes out the bound checks for arrays, then it must prove the safety of this optimization.

For the properties that this implementation of PCC guarantees, the only necessary annotations that the code must contain are the types of loops.

The positive results of this work are:

  1. Reliable proof generation,
  2. Reliable Safety Predicate generation,
  3. The ability to automatically catch compiler bugs,
  4. The fact that the trusted code is very small, and
  5. The fact that the produced code is very robust, since the compiler allows optimizations. Proving some of those optimizations was difficult, while for others was trivial. For example register allocation optimization did not affect the Verification Condition at all, and therefore proving its correctness, was just showing that the V.C before the optimization and after was the same.

The drawbacks of this work was:

  1. The produced Safety Predicates were restricted (e.g they could not express runtime properties), and
  2. This implementation cannot be extended to arbitrary languages, like the entire C language, since it relies on the type-safety of the language.