Presented by Dale Vaillancourt

Transcribed by Dave Herman

There are a number of limitations of pencil-and-paper proofs that PL researchers involved in the development of automatic theorem proving have attempted to tackle. Humans are not good at mentally managing complexity or working out tedious details; furthermore, as we scale up languages and models, hand-written proofs are difficult to extend.

As we recall from Dale’s last talk, Moore’s ACL2 theorem prover is a pure, applicative Common Lisp with a special deftheorem form. The theorem prover has an embedded account of itself, and reflectively extends its collection of knowledge with each theorem that it successfully proves. Thus once ACL2 proves a theorem, it can use that theorem as a primitive to prove further theorems.

Moore describes a “standard style” for reasoning about programming languages with ACL2. He desribes an abstract machine for a Toy Java Virtual Machine (TJVM). Moore models each instruction of the machine with a semantic function:

instruction x state → state state = frame x heap x class-list frame = PC x bytecode x ...

He defines a transition function “step”, which examines the PC and dispatches on the instruction.

step : state → state

He also defines an iterated step function, which executes a finite number of steps of the abstract machine.

tjvm : nat x state → state

Moore’s paper describes how to prove theorems about programs written in
bytecode. Consider the following function `Math.fact`

, which computes the
factorial of an int:

(def Math.fact (n) 1 (load n) ;; load the argument n 2 (ifle0 8) ;; jump forward 8 if n <= 0 3 (load n) ;; load n onto the stack (for step 8) 4 (load n) ;; load n onto the stack (as Math.fact arg) 5 (push 1) ;; push the constant 1 6 (sub) ;; subtract 1 from n 7 (invokestatic Math.fact 1) ;; invoke Math.fact on (n-1) 8 (mul) ;; multiply the result by n 9 (ret) ;; return the product 10 (push 1) ;; push the constant 1 11 (ret)) ;; return 1

The theorem we wish to prove about this method is the following:

(tjvm (fact-clock n) S_{0}) = S_{1}

where S_{0} is characterized by:

- the next instruction is a call to
`Math.fact`

`Math.fact`

is defined- the top of the stack is
`n`

in nat

and S_{1} is characterized by:

- the top of the stack contains
`n!`

- the heap and class-list remain unchanged from S
_{0}

The above theorem relies on an auxiliary function that computes the exact number
of steps we need to run the virtual machine to complete the execution of
`Math.fact(n)`

:

(defun fact-clock (n) (if (zero? n) 5 ;; should this be 4? we weren't sure (++ 7 (fact-clock (- n 1)) 2)))

This gives ACL2 a blueprint for proving correctness: it matches the execution of
`Math.fact`

to the shape of the definition of fact-clock. The theorem
follows via induction on `n`

. “5” tells ACL2 to run for 5
steps to prove the base case, and the inductive case says to run for 7 steps,
apply the inductive hypothesis, and run for 2 more steps. Note that if we
inlined the final `if`

-clause to ```
(++ 9 (fact-clock (- n
1)))
```

, ACL2 would no longer be able to prove the theorem.

Moore describes this trick and a few accompanying techniques as “the
Method”. The Method first involves proving a few arithmetic lemmas, for
example about the `++`

function, in order to give ACL2 a hint about
the proof structure. Next ACL2 needs to know facts about the ADT’s
involved, such as a the frame stack. For example, by proving theorems such as:

(top (push x s)) = x

ACL2 can learn about the algebraic properties of the ADT’s without getting lost trying to prove properties about the representation (often lists). There are a couple more lemmas as well, one about memory management rules, and another about the step function that makes the output proofs more readable.

The theorem states total correctness, in the Hoare sense, since
`fact-clock`

serves as a variant function. Moore claims that
“verification of a system of methods is no harder than the combination of
the verification of each method individually.”

Unfortunately, when it comes to verifying the correctness of methods that mutate the heap, we have to reason about the entire heap: there is no local reasoning. ACL2 is pure, so the heap is modeled purely in the usual way. There are limitations: for example, modeling cyclic structures in the heap is clumsy, and the extra complication in the model leaks into the proofs.

In a 2003 paper with Liu, Moore extends his method of modeling programming languages in ACL2 to a nearly complete JVM, including threads, scheduling, synchronization, exceptions, exception handling, and class loaders. (A notable omission is I/O.) They attempt to verify a class initialization protocol and they claim to have this task partially accomplished. They note that properties of scheduling such as fairness are particularly difficult to express without some kind of temporal logic.

Nipkow et al ’99 make a similar attempt with μJava and the μJVM. The authors are interested in issues like type soundness (and the well-known relevant classloader issues) and bytecode verification. Since the authors are interested in relating Java code to compiled bytecode, they have to model two different languages. However, the two different languages have almost everything in common except for the language of the body of methods. The authors parameterize the definition of the language over the representation type for the method body language:

α prog = (α cdecl) list α cdecl = cname ... (α mdecl) list α mdecl = sig x ty x α ty ::= Void | Bool | Int | NullType | Class cname

They define μJava with a small-step reduction semantics and prove type soundness in the style of Wright and Felleisen. The μJVM semantics is almost identical to that of Moore’s TJVM. The bytecode verifier for the μJVM guarantees that a verified program is type-safe.

Nipkow et al have very recently (in the last few weeks) released a tech report with very ambitious scope. They define the Java-like Jinja language (“Jinja Is Not JAva”) with both a big-step and a small-step semantics, as well as a corresponding virtual machine and bytecode verifier. The authors are the first to model “definite assignment,” Java’s restriction that every variable must be written before its first read. The virtual machine has two flavors: an aggressive VM with no safety checks, and a defensive VM with run-time checks. They then prove that any program that passes the bytecode verifier runs the same in either version of the VM. Finally, they formalize a compiler from the high-level language to the bytecode, and prove that the compiler preserves types and the big-step semantics. The proof involves 20,000 lines of Isabelle and over 1000 theorems.