Origins of Theorem Provers

Presented by Dale Vaillancourt
Transcribed by Dave Herman


We have already seen some simple theorems about programs in the form of types. Types are assertions about the sets of values that can be produced by expressions in a program. Type checkers, according to the Curry-Howard isomorphism, are proof checkers (not theorem provers): the types are the propositions, the expressions are the proofs of the propositions, and the type checker verifies that the proofs are valid. This presentation focuses on programs that prove more general kinds of theorems.

Mathematicians in the 1950’s first started implementing general theorem provers using naive, brute-force search methods such as constructing truth tables for statements in propositional logic. Martin Davis and Hilary Putnam approached the problem of recognizing the inconsistency of a set of first-order formulae by constructing the Herbrand universe: by repeatedly applying the constructors of the algebra to the set of constants in lexicographic order, their algorithm would theoretically find an inconsistency in finite time. However, this universe could quickly grow enormous.

Robinson (1965) introduced the technique of resolution to improve the naive search algorithm by searching for inconsistencies incrementally while constructing the universe. This allowed him to add optimizations to prune the search space, such as eliminating redundant clauses from a proposition. He could also apply heuristics to transform clauses that might cause the algorithm to diverge into equivalent, provable clauses, effectively increasing the number of cases for which the algorithm would terminate.

Woodrow Bledsoe was an avid proponent of resolution until he discovered that the proofs it generated were nearly unreadable. The results were so unnatural that they gave humans little insight into the nature of a proof. So he jumped ship and developed his own technique of theorem proving using term rewriting. Bledsoe exploited the insight that when humans perform algebraic proofs, we take an equational theory and implicitly use it as a reduction relation: if the theory is made up of rules of the form A = B, then we can treat these as reduction relations AB and repeatedly rewrite propositions of the form A into propositions of the form B until discovering a result.

Not all equational theories are “well-behaved” as reduction relations, however. In particular, a proof must be finite to be valid. So theorem provers based on term rewriting must use well-founded, or Noetherian, induction. That is, there must be some measure of terms that shows that the right-hand sides of rewrite rules are always smaller than the left-hand sides. For such well-behaved equational theories, Knuth and Bendix (1970) introduced an algorithm to generate automatically a rewrite system given a well-behaved equational theory.

Some equational rules do not decrease the size of the program, and can consequently cause a reduction sequence to diverge. The most famous culprit is commutativity. As a solution, Plotkin (1972) introduced a technique, known as T-unification, for incorporating arbitrary equational theories into unification itself. So rather than treating commutativity as one of the rewrite rules, the algorithm could unify terms modulo commutativity. For example, the expressions (x + 3) and (3 + 7) would fail to unify in the traditional algorithm, but Plotkin’s extension would succeed with the substitution {7/x}. Furthermore, Plotkin invented higher-order unification, which performs matching modulo the α, β, and η rules of lambda-calculus. Of course, this relies on comparison of arbitrary lambda terms for β-equivalence, and is therefore undecidable. (Matthias mentioned that the undecidability of higher-order unification was published in a criticism by Huet.) However, other forms of T-unification have turned out to be useful: to assist with the proofs his work on compiler correctness proofs, Mitch Wand published a paper with Greg Sullivan on the MSTP program (“Mitch’s Simple Theorem Prover”), which used unification modulo α-equivalence.

One of the important applications of automatic theorem provers has been in proofs about programs. Burstall (1968) advocated structural induction as a proof technique that has turned out to be essential for proving properties of programs. Burstall proposed the idea that the structure of a proof should be isomorphic to the structure of a program. In order to achieve this, he invented the concept of the data definition (and effectively inspired the HTDP-style design recipe): inductive data definitions led to recursive functions, which in turn led to structural induction proofs.

In their work on the NQTHM theorem prover (1975) and subsequent work on the ACL family of theorem provers, Boyer and Moore have used structural induction as one of their principal proof techniques. By a combination of term rewriting, unification, structural induction, and a number of heuristics, ACL attempts to prove equations about Lisp programs by rewriting the two sides of the equation to match one another. One of the more clever tricks employed by ACL is known as generalization: if a complicated subterm occurs on both sides of an equation, the theorem prover may substitute the subterm with a universally quantified term variable and attempt to prove the more general theorem instead. Even though the resulting theorem is stronger than the original, it sometimes turns out to be easier to prove since the term itself has been simplified. Generally, the hardest part of a proof for ACL is generating the right induction hypothesis. For example, while it succeeds in proving that (reverse (reverse ls)) = ls for a simple definition of reverse, it fails to discover the right induction hypothesis for an accumulator-passing-style definition of reverse.


Valid XHTML 1.1 Valid CSS