Presented by Dale Vaillancourt

Transcribed by Dave Herman

We have already seen some simple theorems about programs in the form of types. Types are assertions about the sets of values that can be produced by expressions in a program. Type checkers, according to the Curry-Howard isomorphism, are proof checkers (not theorem provers): the types are the propositions, the expressions are the proofs of the propositions, and the type checker verifies that the proofs are valid. This presentation focuses on programs that prove more general kinds of theorems.

Mathematicians in the 1950’s first started implementing general
theorem provers using naive, brute-force search methods such as
constructing truth tables for statements in propositional
logic. Martin Davis and Hilary Putnam approached the problem of
recognizing the inconsistency of a set of first-order formulae by
constructing the *Herbrand universe*: by repeatedly applying the
constructors of the algebra to the set of constants in lexicographic
order, their algorithm would theoretically find an inconsistency in
finite time. However, this universe could quickly grow enormous.

Robinson (1965) introduced the technique of resolution to improve the naive search algorithm by searching for inconsistencies incrementally while constructing the universe. This allowed him to add optimizations to prune the search space, such as eliminating redundant clauses from a proposition. He could also apply heuristics to transform clauses that might cause the algorithm to diverge into equivalent, provable clauses, effectively increasing the number of cases for which the algorithm would terminate.

Woodrow Bledsoe was an avid proponent of resolution until he
discovered that the proofs it generated were nearly unreadable. The
results were so unnatural that they gave humans little insight into
the nature of a proof. So he jumped ship and developed his own
technique of theorem proving using term rewriting. Bledsoe exploited
the insight that when humans perform algebraic proofs, we take an
equational theory and implicitly use it as a *reduction*
relation: if the theory is made up of rules of the form **A = B**,
then we can treat these as reduction relations **A** →
**B** and repeatedly rewrite propositions of the form **A** into
propositions of the form **B** until discovering a result.

Not all equational theories are “well-behaved” as
reduction relations, however. In particular, a proof must be finite to
be valid. So theorem provers based on term rewriting must use
well-founded, or *Noetherian*, induction. That is, there must be
some measure of terms that shows that the right-hand sides of rewrite
rules are always smaller than the left-hand sides. For such
well-behaved equational theories, Knuth and Bendix (1970) introduced
an algorithm to generate automatically a rewrite system given a
well-behaved equational theory.

Some equational rules do not decrease the size of the program, and can
consequently cause a reduction sequence to diverge. The most famous
culprit is commutativity. As a solution, Plotkin (1972) introduced a
technique, known as *T-unification*, for incorporating arbitrary
equational theories into unification itself. So rather than treating
commutativity as one of the rewrite rules, the algorithm could unify
terms modulo commutativity. For example, the expressions **(x +
3)** and **(3 + 7)** would fail to unify in the traditional
algorithm, but Plotkin’s extension would succeed with the
substitution **{7/x}**. Furthermore, Plotkin invented
*higher-order unification*, which performs matching modulo the
α, β, and η rules of lambda-calculus. Of course, this
relies on comparison of arbitrary lambda terms for β-equivalence,
and is therefore undecidable. (Matthias mentioned that the
undecidability of higher-order unification was published in a
criticism by Huet.) However, other forms of T-unification have turned
out to be useful: to assist with the proofs his work on compiler
correctness proofs, Mitch Wand published a paper with Greg Sullivan on
the MSTP program (“Mitch’s Simple Theorem Prover”),
which used unification modulo α-equivalence.

One of the important applications of automatic theorem provers has been in proofs about programs. Burstall (1968) advocated structural induction as a proof technique that has turned out to be essential for proving properties of programs. Burstall proposed the idea that the structure of a proof should be isomorphic to the structure of a program. In order to achieve this, he invented the concept of the data definition (and effectively inspired the HTDP-style design recipe): inductive data definitions led to recursive functions, which in turn led to structural induction proofs.

In their work on the NQTHM theorem prover (1975) and subsequent work
on the ACL family of theorem provers, Boyer and Moore have used
structural induction as one of their principal proof techniques. By a
combination of term rewriting, unification, structural induction, and
a number of heuristics, ACL attempts to prove equations about Lisp
programs by rewriting the two sides of the equation to match one
another. One of the more clever tricks employed by ACL is known as
*generalization*: if a complicated subterm occurs on both sides
of an equation, the theorem prover may substitute the subterm with a
universally quantified term variable and attempt to prove the more
general theorem instead. Even though the resulting theorem is stronger
than the original, it sometimes turns out to be easier to prove since
the term itself has been simplified. Generally, the hardest part of a
proof for ACL is generating the right induction hypothesis. For
example, while it succeeds in proving that ```
(reverse (reverse
ls)) = ls
```

for a simple definition of `reverse`

, it
fails to discover the right induction hypothesis for an
accumulator-passing-style definition of `reverse`

.