Modeling Programming Languages in Theorem Provers

Presented by Dale Vaillancourt
Transcribed by Dave Herman


There are a number of limitations of pencil-and-paper proofs that PL researchers involved in the development of automatic theorem proving have attempted to tackle. Humans are not good at mentally managing complexity or working out tedious details; furthermore, as we scale up languages and models, hand-written proofs are difficult to extend.

J Strother Moore ’99: Proving Theorems About Java-Like Byte Code

As we recall from Dale’s last talk, Moore’s ACL2 theorem prover is a pure, applicative Common Lisp with a special deftheorem form. The theorem prover has an embedded account of itself, and reflectively extends its collection of knowledge with each theorem that it successfully proves. Thus once ACL2 proves a theorem, it can use that theorem as a primitive to prove further theorems.

Moore describes a “standard style” for reasoning about programming languages with ACL2. He desribes an abstract machine for a Toy Java Virtual Machine (TJVM). Moore models each instruction of the machine with a semantic function:

instruction x state → state
state = frame x heap x class-list
frame = PC x bytecode x ...

He defines a transition function “step”, which examines the PC and dispatches on the instruction.

step : state → state

He also defines an iterated step function, which executes a finite number of steps of the abstract machine.

tjvm : nat x state → state

Moore’s paper describes how to prove theorems about programs written in bytecode. Consider the following function Math.fact, which computes the factorial of an int:

   (def Math.fact (n)
1    (load n)                    ;; load the argument n
2    (ifle0 8)                   ;; jump forward 8 if n <= 0
3    (load n)                    ;; load n onto the stack (for step 8)
4    (load n)                    ;; load n onto the stack (as Math.fact arg)
5    (push 1)                    ;; push the constant 1
6    (sub)                       ;; subtract 1 from n
7    (invokestatic Math.fact 1)  ;; invoke Math.fact on (n-1)
8    (mul)                       ;; multiply the result by n
9    (ret)                       ;; return the product
10   (push 1)                    ;; push the constant 1
11   (ret))                      ;; return 1

The theorem we wish to prove about this method is the following:

(tjvm (fact-clock n) S0) = S1

where S0 is characterized by:

and S1 is characterized by:

The above theorem relies on an auxiliary function that computes the exact number of steps we need to run the virtual machine to complete the execution of Math.fact(n):

(defun fact-clock (n)
  (if (zero? n)
      5                         ;; should this be 4? we weren't sure
      (++ 7
          (fact-clock (- n 1))
          2)))

This gives ACL2 a blueprint for proving correctness: it matches the execution of Math.fact to the shape of the definition of fact-clock. The theorem follows via induction on n. “5” tells ACL2 to run for 5 steps to prove the base case, and the inductive case says to run for 7 steps, apply the inductive hypothesis, and run for 2 more steps. Note that if we inlined the final if-clause to (++ 9 (fact-clock (- n 1))), ACL2 would no longer be able to prove the theorem.

Moore describes this trick and a few accompanying techniques as “the Method”. The Method first involves proving a few arithmetic lemmas, for example about the ++ function, in order to give ACL2 a hint about the proof structure. Next ACL2 needs to know facts about the ADT’s involved, such as a the frame stack. For example, by proving theorems such as:

(top (push x s)) = x

ACL2 can learn about the algebraic properties of the ADT’s without getting lost trying to prove properties about the representation (often lists). There are a couple more lemmas as well, one about memory management rules, and another about the step function that makes the output proofs more readable.

The theorem states total correctness, in the Hoare sense, since fact-clock serves as a variant function. Moore claims that “verification of a system of methods is no harder than the combination of the verification of each method individually.”

Unfortunately, when it comes to verifying the correctness of methods that mutate the heap, we have to reason about the entire heap: there is no local reasoning. ACL2 is pure, so the heap is modeled purely in the usual way. There are limitations: for example, modeling cyclic structures in the heap is clumsy, and the extra complication in the model leaks into the proofs.

In a 2003 paper with Liu, Moore extends his method of modeling programming languages in ACL2 to a nearly complete JVM, including threads, scheduling, synchronization, exceptions, exception handling, and class loaders. (A notable omission is I/O.) They attempt to verify a class initialization protocol and they claim to have this task partially accomplished. They note that properties of scheduling such as fairness are particularly difficult to express without some kind of temporal logic.

Nipkow et al ’99 make a similar attempt with μJava and the μJVM. The authors are interested in issues like type soundness (and the well-known relevant classloader issues) and bytecode verification. Since the authors are interested in relating Java code to compiled bytecode, they have to model two different languages. However, the two different languages have almost everything in common except for the language of the body of methods. The authors parameterize the definition of the language over the representation type for the method body language:

α prog = (α cdecl) list
α cdecl = cname ... (α mdecl) list
α mdecl = sig x ty x α

ty ::= Void | Bool | Int | NullType | Class cname

They define μJava with a small-step reduction semantics and prove type soundness in the style of Wright and Felleisen. The μJVM semantics is almost identical to that of Moore’s TJVM. The bytecode verifier for the μJVM guarantees that a verified program is type-safe.

Nipkow et al have very recently (in the last few weeks) released a tech report with very ambitious scope. They define the Java-like Jinja language (“Jinja Is Not JAva”) with both a big-step and a small-step semantics, as well as a corresponding virtual machine and bytecode verifier. The authors are the first to model “definite assignment,” Java’s restriction that every variable must be written before its first read. The virtual machine has two flavors: an aggressive VM with no safety checks, and a defensive VM with run-time checks. They then prove that any program that passes the bytecode verifier runs the same in either version of the VM. Finally, they formalize a compiler from the high-level language to the bytecode, and prove that the compiler preserves types and the big-step semantics. The proof involves 20,000 lines of Isabelle and over 1000 theorems.


Valid XHTML 1.1 Valid CSS