\documentstyle [11pt,mysecs]{article}
\source{/home/wand/unif/lfp/abstract.tex}
\input{macros-for-abstract}

\title{A Little Goes a Long Way:\\  A Simple Tool to Support Denotational
Compiler-Correctness Proofs}

\support{Work supported by the National
Science Foundation under grant numbers CCR-9014603 and CCR-9304144.}

% \draftfalse

\begin{document}

\maketitle

\begin{abstract}
In a series of papers in the early 80's we proposed a paradigm for
semantics-based compiler correctness.  In this paradigm, the source and target
languages are given denotational semantics in the same \l-theory, so
correctness proofs can be carried out within this theory.  In many cases, the
proofs have a highly structured form.  We show how a simple proof strategy,
based on an algorithm for \a-matching, can be used to build a tool that can
automate the routine cases of these proofs.

\end{abstract}

\section{Introduction}

In a series of papers in the early 80's
\cite{Wand82b,Wand82c,Wand83b,Clinger84} we proposed a paradigm for
semantics-based compiler correctness, and over the last several years we have
begun putting this paradigm into practice
{\cite{WandOliva92,OlivaRamsdellWand94}}.  In this paradigm, the source and
target languages are given denotational semantics in the same \l-theory, so
that most of the steps of the correctness proof can be done within the theory.
Furthermore, the compiler is given by a compositional (syntax-directed)
translation, so properties of the compiler can be proved by structural
induction on the grammar of the source language.

If the semantics of programs in the source language is given by $\sem{P}$,
the semantics of the target language is given by $\sem{T}$, and the compiler on
programs is given by $\sem{CP}$, then the correctness of the compiler is given
by the requirement that for all programs $p$,
\begin{equation}
\label{eqn:compiler-correctness}
\sembrack{T}{\sembrack{CP}{p}} = \sembrack{P}{p} 
\end{equation}

% where equality is taken in the semantic theory.\footnote{For this we can use any convenient \l-theory.  For
% ``conventional'' semantics we could use the theory of some model of the
% \l-calculus \cite{OlivaRamsdellWand94}, or the theory of $\beta$-conversion
% \cite{Wand82b}, or the theory of $\beta$-conversion plus some fixed-point
% rules \cite{Wand83b}.}

Such an equation is proven by structural induction on source programs; this
typically involves an induction hypothesis for each non-terminal of the source
language.  The reasoning involves higher-order equational reasoning in a
suitable \l-theory \cite{Meyer82} plus the use of the induction hypotheses.

A number of small-to-moderate sized compiler correctness proofs have been done
in this way \cite{Clinger84,GuttmanSwarupRamsdell93,OlivaRamsdellWand94,Wand82b,Wand82c,Wand83b,WandOliva92}. 
As proofs of congruence and soundness scale up, machine support becomes more
and more important.  Ideally, one would like all the simple cases to be
checked by machine, perhaps leaving a few subtle cases to be done by hand.

We have now built a theorem-prover capable of proving many such theorems.  The
key idea in the system is the use of a very simple rule-matching technique,
which we call {\em {\a-matching}}, in place of more complicated techniques
such as higher-order unification.  

[[Need to redo this ]]

In this paper, we will describe the system
and its use.  Section~\ref{sec:hand-example} shows a typical manual proof
step, and Section~\ref{sec:mech-proof} shows how this proof is expressed in
our system.  Section~\ref{sec:tac-prover} describes the internal organization
of the system.  Section~\ref{sec:experiences} describes our experience with
the system, including what it can and cannot do currently.
Section~\ref{sec:current-work} discusses extensions to the system.
Sections~\ref{sec:related-work} and \ref{sec:conclusions} discuss related work
and conclusions.

\section{Example of a Manual Proof}
\label{sec:hand-example}

A theorem like equation~\ref{eqn:compiler-correctness} is proven by
structural induction on source programs; this typically involves an
induction hypothesis for each non-terminal of the source language.
For example, the induction hypothesis for expressions might be
$$\T{\CE{e}t}\rho\zeta = \E{e}\rho (\l v. \T{t} \r \push{v}{\zeta})$$
This induction hypothesis says that if the compilation of $e$ followed by $t$
is run on environment $\r$ and stack $\z$, the result is the same as that of
running $e$ in environment $\r$ and a continuation that runs $t$ on
environment \r\ and the stack obtained by pushing the result $v$ of $e$ onto
\z.\footnote{The reader familiar with \cite{Clinger84} will note some
differences.  We have omitted the separation of the environment into
compile-time and run-time components; our system can handle this as well.
  Also, we have eliminated both Clinger's accumulator
$\epsilon$ and the continuation register $\kappa$.  This entails no loss of
generality, as these quantities can be cached at the top and bottom of the
stack \z.}

Such proofs typically have a highly structured form.  As usual for structural
inductions, we have one induction step for each production in the language.
Let us consider the case of addition expressions.  
The semantics of an addition expression might be given by
$$\sembrack{E}{e_1 + e_2} = \l \r\k. \sembrack{E}{e_1}\r(\l v_1.
\sembrack{E}{e_2}\r(\l v_2. \fun{cps-add} v_1 v_2 \k))$$
This is standard continuation semantics.

A typical production in the target language might be
$$t ::= \ins{add} t$$
representing an \ins{add} instruction followed by additional target code, with
semantics  specified by 
$$\sembrack{T}{\ins{add} t} = \l \r \zeta. \fun{cps-add} (\nth{\zeta}2)
    (\nth{\zeta}1) (\l v. \sembrack{T}{t} \r \push{v}{(\nthcdr{\zeta}2)})$$
Here \r\ is the environment and $\zeta$ is a stack on which the instruction
acts; $\nth{\zeta}i$ denotes the $i$-th element of the stack, and
$\nthcdr{\zeta}i$ represents the stack from which the first $i$ elements have
been removed.  Thus the \ins{add} instruction works by adding the first two
elements of the stack, and invoking the following code $t$ on the stack
obtained by replacing the first two elements of $\zeta$ by their sum; the
environment register is left unchanged.  

The compiler on addition expressions might be given by
$$\CE{e_1+e_2}t = \CE{e_1}(\CE{e_2}(\ins{add} t))$$
Here we write \sem{CE} rather than \sem{CP} since we are compiling an
expression rather than a program.  The equation says that to compile $e_1 +
e_2$ (followed by target code $t$), compile $e_1$ followed by the compilation
of $e_2$, followed by an \ins{add} instruction, followed by $t$.  This is a
reasonable compilation strategy for a stack machine.

To show that the induction hypothesis holds for every source language
expression $e$, we will use structural induction \cite{Burstall69}.  We will
have one induction step for each production in the grammar.  For the
production for addition expressions, we will want to show that if the
induction hypothesis is true at expressions $e_1$ and $e_2$, then it is true
at the expression $e_1 + e_2$ as well.  To do this, we calculate as follows:
\begin{displaymath}
\begin{array}{l}  %% I wish I knew why we need the \relax on the next line.
\relax \T{\CE{e_1+ e_2}t}\rho \zeta \\
= \just{definition of \sem{CE}} \\
\T{\CE{e_1}(\CE{e_2}(\ins{add} t))}\rho \zeta \\
= \just{induction hypothesis at $e_1$} \\
\E{e_1}\rho(\l v_1 . \T{\CE{e_2}(\ins{add} t)}\r \push{v_1}{\zeta}) \\
= \just{induction hypothesis at $e_2$} \\
\E{e_1}\rho(\l v_1 . \E{e_2}\rho (\l v_2 . \T{\ins{add} t}\r
                 \push{v_2}{\push{v_1}{\zeta}})) \\
= \just{definition of \sem{T}, properties of list functions} \\
\E{e_1}\rho(\l v_1 . \E{e_2}\rho 
                 (\l v_2 . \fun{cps-add} v_1 v_2
                     (\l v. \T{t} \rho \push{v}{\zeta}))) \\
= \just{definition of \sem{E}} \\
\E{e_1+e_2}\rho    (\l v. \T{t} \rho \push{v}{\zeta}) \\
\end{array}
\end{displaymath}
as desired.  Here we have labelled each equality with its justification.

In doing this proof, we used only ordinary \b-conversion, plus the knowledge
of the induction hypotheses.  We used the fact that if $M = N$ is provable in
the \l-calculus, so is any substitution instance of $M = N$.  This allowed us
to use a complex substitution for $\kappa$ when we applied the induction
hypothesis at $e_2$.  The rules of inference are shown in
Figure~\ref{fig:lambda}.  It is easy to show that these rules are a
conservative extension of the ordinary \l-calculus (or of any
\l-theory) \cite{WandWang90}.

\begin{figure}
$$\G \proves (\l x. M) N = M[N/x]$$

$$ (M = N) \in \G
\over
\G \proves M\s = N\s
$$

$$\G \proves M = M$$

$$\G \proves M = N 
\over
\G \proves N = M$$

$$\G \proves M = N \quad \G \proves N = P
\over
\G \proves M = P
$$

$$G \proves M = M' \quad \G \proves N = N'
\over
\G \proves MN = M'N'
$$
\caption{Rules for \l-calculus with additional hypotheses}
\label{fig:lambda}
\end{figure}

This proof has a very definite structure: one reduces the left-hand term,
using ordinary $\beta$-reduction and the induction hypotheses, and similarly
reduces the right-hand term, using $\beta$-reduction and the definition of the
original semantics. 


\section{A Mechanized Proof}
\label{sec:mech-proof}

Our theorem-prover was able to automatically verify this example and others
like it.  The input to the theorem-prover is shown in Figures~\ref{fig:input}
and \ref{fig:input2} and the output is shown in
Figures~\ref{fig:output}--\ref{fig:output2}.  Figure~\ref{fig:input}
shows four lists of rules.  The first three lists give the definitions
of the relevant portions of the compiler, the target-language
semantics, and the source-language semantics.  Here we show only the
rules relevant to addition expressions:  one rule in each list.  The
last list, {\tt list-rules}, gives the relevant rules for evaluation of
operations on lists.  

In these rules, constants are represented by quoted symbols ({\tt
{'compiler}}, {\tt 'sum}, etc.); other symbols ({\tt e1}, {\tt e2}, etc.)  are
variables.  Since our system is tuned for dealing with semantics, all
applications and abstractions are internally curried.  The unparsing routine
automatically merges consecutive {\tt lambda}'s and omits unnecessary
parentheses, as shown in Figures~\ref{fig:output}--\ref{fig:output2}.  We use
explicit bound variables rather than deBruijn indices in the internal
representation; this made the algorithms clearer.

\begin{figure}[tp]
\small
\begin{verbatim}
;;; rules for compiler, machine, evaluator, etc.

(define compiler-rules
  '((('compiler ('sum e1 e2) t)
     ('compiler e1 ('compiler e2 ('add-inst t))))))

(define machine-denotation-rules
  '((('target-semantics ('add-inst t) r z)
     ('add-cps ('cadr z) ('car z)
       (lambda (v) ('target-semantics t r ('cons v ('cddr z))))))))

(define source-semantics-rules
  '((('source-meaning ('sum e1 e2) r k)
     ('source-meaning e1 r
       (lambda (v1)
         ('source-meaning e2 r (lambda (v2) ('add-cps v1 v2 k))))))))

(define list-rules
  '((('car ('cons a d)) a)
    (('cadr ('cons a1 ('cons a2 dd))) a2)
    (('cddr ('cons a1 ('cons a2 dd))) dd)))
\end{verbatim}
\caption{Rules for the addition example}
\label{fig:input}
\end{figure}

Figure~\ref{fig:input2} shows the tactics used for this example.  The
procedure {\tt ih} constructs an induction hypothesis for its argument {\tt
pgm}.  {\tt proof-tac} constructs the main tactic for the proof.  This tactic
applies beta-reduction, the various rules and the induction hypotheses as
rewrite rules as often as possible.  Last, the procedure {\tt prove-it}
applies these tactics to each side of the desired conclusion and then tests
the resulting terms for congruence.  Note that there is little here
that is dependent on this example; Section~\ref{sec:gen-induction-steps} shows
how to go directly from the grammar to an appropriate set of hypotheses,
tactics, and goals.

\begin{figure}
\small
\begin{verbatim}
(define ih
  (lambda (pgm)
    (list (lhs pgm) (rhs pgm))))

(define lhs 
  (lambda (pgm)
    `('target-semantics ('compiler ,pgm t) r z)))

(define rhs
  (lambda (pgm)
    `('source-meaning ,pgm r
       (lambda (v) (('target-semantics t) r ('cons v z))))))

(define proof-tac
  (lambda ()
    (repeat
      (or-else-use*
        (report-tac "beta" (leftmost beta))
        (report-tac "definition of compiler"  (rules->tac compiler-rules))
        (report-tac "definition of T"   (rules->tac machine-denotation-rules))
        (report-tac "definition of E"   (rules->tac source-semantics-rules))
        (report-tac "list operations" (rules->tac list-rules))
        (report-tac "ih at e1"        (rules->tac (list (ih 'e1))))
        (report-tac "ih at e2"        (rules->tac (list (ih 'e2))))
        ))))

(define prove-it
  (lambda (pgm)
    (let* ((dummy1 (printf "~%Reducing left-hand side...~%"))
           (d2 (pretty-print (lhs pgm)))
           (lhs (test-tacfn (lhs pgm) (proof-tac)))
           (dummy (printf "~%Reducing right-hand side...~%"))
           (d3 (pretty-print (rhs pgm)))
           (rhs (test-tacfn (rhs pgm) (proof-tac))))
      (printf "~%lhs reduces to:~%")
      (pretty-print lhs)
      (printf "~%rhs reduces to:~%")
      (pretty-print rhs)
      (printf "~%checking for alpha-congruence...  ")
      (alpha-congruent? (parse lhs) (parse rhs) '() '()))))
\end{verbatim}
\caption{Tactics for the addition example}
\label{fig:input2}
\end{figure}


Figures~\ref{fig:output} and \ref{fig:output2} show the output from this run.
We have edited the output only to improve the pretty-printing (and by
eliminating one intermediate step of list manipulation, to keep it all on two
pages).  Notice how closely the steps correspond to the human-generated proof
in Section~\ref{sec:hand-example}.

\begin{figure}
\small
\begin{verbatim}
Reducing left-hand side...
('target-semantics ('compiler ('sum 'e1 'e2) t) r z)
[by definition of compiler]:
('target-semantics
  ('compiler 'e1 ('compiler 'e2 ('add-inst t)))
  r z)
[by ih at e1]:
('source-meaning 'e1 r
  (lambda (v) ('target-semantics
                ('compiler 'e2 ('add-inst t)) r ('cons v z))))
[by ih at e2]:
('source-meaning 'e1 r
  (lambda (v) ('source-meaning 'e2 r
                (lambda (v15) ('target-semantics ('add-inst t) r
                                ('cons v15 ('cons v z)))))))
[by definition of T]:
('source-meaning 'e1 r
  (lambda (v)
    ('source-meaning 'e2 r
      (lambda (v15)
        ('add-cps
          ('cadr ('cons v15 ('cons v z))) ('car ('cons v15 ('cons v z)))
          (lambda (v16)
            ('target-semantics t r
              ('cons v16 ('cddr ('cons v15 ('cons v z)))))))))))
[by list operations]:
('source-meaning 'e1 r
  (lambda (v)
    ('source-meaning 'e2 r
      (lambda (v15)
        ('add-cps v v15
          (lambda (v16) ('target-semantics t r ('cons v16 z))))))))
\end{verbatim}
\caption{Machine-generated proof for the addition example}
\label{fig:output}
\end{figure}

\begin{figure}
\small
\begin{verbatim}
Reducing right-hand side...
('source-meaning ('sum 'e1 'e2) r
  (lambda (v) (('target-semantics t) r ('cons v z))))
[by definition of E]:
('source-meaning 'e1 r
  (lambda (v1)
    ('source-meaning 'e2 r
      (lambda (v2)
        ('add-cps v1 v2
          (lambda (v) ('target-semantics t r ('cons v z))))))))


lhs reduces to:
('source-meaning 'e1 r
  (lambda (v)
    ('source-meaning 'e2 r
      (lambda (v15)
        ('add-cps v v15
          (lambda (v16) ('target-semantics t r ('cons v16 z))))))))

rhs reduces to:
('source-meaning 'e1 r
  (lambda (v1)
    ('source-meaning 'e2 r
      (lambda (v2)
        ('add-cps v1 v2
          (lambda (v) ('target-semantics t r ('cons v z))))))))

checking for alpha-congruence...  #t
\end{verbatim}
\caption{Machine-generated proof for the addition example (part 2)}
\label{fig:output2}
\end{figure}

\section{From Grammars to Proofs}
\label{sec:gen-induction-steps}

[[ Need to fill this in ]]

We are currently working to enhance the usability of the prover.  The first
step in this process is to automatically generate the various induction steps
for the structural induction.  This can be done given a specification of the
abstract syntax of the language and an specification of the relevant
induction hypotheses to be proved for each non-terminal.  A typical such
specification is given in Figure~\ref{fig:ih-generator}; here we have
specified a language with two non-terminals and three productions, and we have
one induction hypothesis for the non-terminal {\tt pgm} and two for the
non-terminal {\tt exp}.  This would give rise to a total of 5 induction steps
to be verified.


\begin{figure}
\small
\begin{verbatim}
(define abstract-syntax
  '((whole-program (exp) pgm)
    (add-exp (exp exp) exp)
    (var-exp (id) exp)))

(define induction-hypotheses
  '((pgm (p)
         (('target-semantics ('compile-pgm p) r z)
          ('source-pgm-semantics p r)))
    (exp (e)
         (('target-semantics ('compile-exp e t) r z)
          ('source-exp-semantics e
            (lambda (v) ('target-semantics t r ('cons v z)))))
         (('target-semantics ('compile-tail-exp e) r k)
          ('source-exp-semantics e r k)))))
\end{verbatim}
\caption{Specification of abstract syntax and induction hypotheses}
\label{fig:ih-generator}
\end{figure}


\section{Organization of the Prover}
\label{sec:tac-prover}

The prover is a straightforward tactically-driven theorem-prover, using
{\em \a-matching} to determine the applicability of all rules other than
\b-reduction.  

The basic operations on tactics are shown in Figure~\ref{fig:tac-ops}.  A
tactic is implemented as a Scheme procedure that takes two arguments: a
success continuation and a failure continuation.  The success continuation is
map from terms to answers, and the failure continuation is thunk that returns
a tactic.  Most operations involve {\tt tacfn}'s, which are maps from terms to
tactics. {\tt succeed} is a tacfn that always succeeds and sends its argument
to its success continuation.  {\tt (fail)} produces a tactic that always
fails.  {\tt (and-then tac1 tacfn1)} applies {\tt tac1}; if it succeeds, its
value is sent to {\tt tacfn1}.  {\tt or-else} establishes a choice point.
{\tt or-else-use*} tries a list of tacfns in sequence.  {\tt repeat} applies
its argument (a tacfn) until it fails, and then succeeds.  The definitions are
familiar and are shown in Figure~\ref{fig:tac-code}.

A more interesting operation on tactics is {\tt leftmost}.  Given a tacfn and
a term, it finds the leftmost place in a term where the tacfn applies.  The
definition, using our operations on tactics, is shown in
Figure~\ref{fig:leftmost}.

\begin{figure}
\small
\begin{verbatim}

;;; Basic types

tac = success-cont * fail-cont -> ans
success-cont = term * fail-cont -> ans
fail-cont = () -> ans
tacfn = term -> tac

;;; Types of basic operations on tactics

succeed : term -> tac
fail    : () -> tac
and-then : tac * (term->tack) -> tac
or-else  : tac * tac -> tac
repeat   : tacfn -> tacfn
or-else-use* : tacfn* -> tacfn
\end{verbatim}
\caption{Basic operations on tactics}
\label{fig:tac-ops}
\end{figure}

\begin{figure}
\small
\begin{verbatim}
(define succeed                         ; val -> tac
  (lambda (v)
    (lambda (succ fail)
      (succ v fail))))

(define fail                            ; () -> tac
  (lambda ()
    (lambda (succ fail)
      (fail))))

(define and-then                        ; tac * (val->tac) -> tac
  (lambda (tac tacfn)
    (lambda (succ fail)
      (tac (lambda (v backtrack) ((tacfn v) succ backtrack))
           fail))))

(define or-else                         ; tac * tac -> tac
  (lambda (t1 t2)
    (lambda (succ fail)
      (t1 succ (lambda () (t2 succ fail))))))

(define or-else-use*                     ; (val->tac)* -> (val->tac)
  (lambda l
    (let or-else-use*-loop ((l l))
      (if (null? l)
        (lambda (v) (fail))
        (lambda (v)
          (or-else ((car l) v)
            ((or-else-use*-loop (cdr l)) v)))))))

(define repeat                          ; (val->tac) -> (val->tac)
  (lambda (tacfn)
    ;; keep doing (tacfn v) until it fails
    (lambda (v)
      (or-else
        (and-then
          (tacfn v)
          (repeat tacfn))
        (succeed v)))))
\end{verbatim}
\caption{Definitions of basic operations on tactics}
\label{fig:tac-code}
\end{figure}

\begin{figure}
\small
\begin{verbatim}
(define leftmost                        ; (term->tac) -> (term->tac)
  (lambda (tacfn)
    (lambda (m)
      ;; find leftmost place in m where tac applies, and apply it,
      ;; rebuilding the term around the result
      (let leftmost-loop ((m m))
        (or-else
          (tacfn m)
          (record-case m
            (app (rator rand)
                 (or-else
                   (and-then
                     (leftmost-loop rator)
                     (lambda (rator1)
                       (succeed (make-app rator1 rand))))
                   (and-then
                     (leftmost-loop rand)
                     (lambda (rand1)
                       (succeed (make-app rator rand1))))))
            (abs (bv body)
                 (and-then
                   (leftmost-loop body)
                   (lambda (body)
                     (succeed (make-abs bv body)))))
            (else (fail)))))))
\end{verbatim}
\caption{Definition of {\tt leftmost}}
\label{fig:leftmost}
\end{figure}

The main problem in building the theorem-prover was handling the user-defined
rules, which serve as the additional hypotheses $\Gamma$.  In most modern
higher-order theorem-provers (eg Isabelle \cite{Paulson89}), the applicability
of a rule to a given term is determined by some powerful mechanism like
higher-order unification.  Humans, on the other hand, typically use much
weaker strategies to determine if a rule is applicable.  In the hand-generated
proof above, for example, a human can easily check each step, because at every
step some subterm of the term being manipulated is \a-congruent (that is,
identical up to a change of bound variables) to some instance of the cited
rule.

We therefore fixed on the problem of {\em {\a-matching}}, which is defined as
follows: 

\begin{definition}[\a-matching]
Given two \l-terms, a pattern $P$ and a subject $S$, find the most general substitution
\s\ such that $P\s$ and $S$ are \a-congruent.
\end{definition}


The algorithm for this problem is a fairly straightforward adaptation of a
conventional unification algorithm, keeping careful track of bound variables
so that no substitution is ever made on a bound variable.  The main function
of the algorithm is \fun{match}.  \fun{match} takes four arguments: two
\l-terms $P$ (the pattern), and $S$ (the subject), and two lists of
identifiers $\Sigma_1$ and $\Sigma_2$.  The output of \fun{match} is the most
general substitution $\theta$ such that $(\l ~ \Sigma_1 . P)\theta =_\a (\l ~
\Sigma_2 . S)$.  We can give the algorithm in pseudocode for the various
cases on $P$ and $S$.  For each case, we give a brief explanation.

\begin{itemize}

\item
If $P$ and $S$ are the same constant, succeed with the empty substitution; if
they are different constants, then fail.

\item
If $P$ and $S$ are both variables, check to see if they occur in the same
position in $\Sigma_1$ and $\Sigma_2$.  If they do, then succeed with the
empty substitution.  If they are bound in different positions, or if one is
free and the other is bound, then fail--- the terms cannot be matched.  If
both are free, return the substitution that substitutes one of the variables
for the other.  

\item
If $P$ is a variable $v$ and $S$ is not, then check to see if any of the
variables of $S$ are bound in $\Sigma_2$.  If so, then fail-- no substitution
in $P$ can have a variable captured.  succeed with the substitution mapping
the variable $v$ to $S$.

[[where does the ordinary occurs-check get done??]]

\item
If $S$ is a variable and $P$ is not, then fail-- substitutions are allowed
only in the pattern.  [Note that this is the only rule that would need to be
changed to do unification rather than matching.]

\item
If $P$ is an application $P_1 P_2$ and $S$ is an application $S_1 S_2$, let
$\theta = \fun{match} (P_1, S_1, \Sigma_1, \Sigma_2)$.  Then let $\theta' =
\fun{match} (P_2\theta, S_2\theta, \Sigma_1 \Sigma_2)$.  If both these steps
succeed, return $\theta \circ \theta'$.  [We recur on applications in the
usual way.]

\item
If $P$ is an abstraction $\l p. P_1$ and $S$ is an abstraction $\l s. S_1$,
then compute $\fun{match}(P_1, S_1, \Sigma_1:p, \Sigma_2:s)$. [We add the
bound variables to the bound-variable lists, and recur on the bodies].

\item
Otherwise fail: $P$ and $S$ are in different syntactic categories.

\end{itemize}

It is easy to see, by adapting the usual proof for unification, that this
function satisfies its specification.

The code for this algorithm is shown in
Figures~\ref{fig:match}--\ref{fig:match2}.  It is coded as a procedure that
takes four arguments and returns a tactic.  If such a $\theta$ exists, it is
sent to the success continuation; otherwise the tactic fails.

\begin{figure}
\small
\begin{verbatim}
(define match
  (lambda (pat subj s1 s2)
    (cond
      ((and (const? pat) (const? subj))
       (if (equal?
             (const->datum pat)      
             (const->datum subj))
         (succeed (make-empty-subst))
         (fail)))
      ((and (var? pat) (var? subj))
       (match-two-vars pat subj s1 s2))
      ((var? pat)
       (match-one-var pat subj s1 s2))
      ((var? subj)                      ; if subj is var and pat
                                        ; isn't, then fail
       (fail))
      ((and (app? pat) (app? subj))
       (and-then
         (match (app->rator pat) (app->rator subj)
           s1 s2)
         (lambda (theta)
           (and-then
             (match (subst (app->rand pat) theta)
                    (app->rand subj) 
                    s1 s2)
             (lambda (theta2)
               (succeed (comp-subst theta theta2)))))))
      ((and (abs? pat) (abs? subj))
       (match (abs->body pat) (abs->body subj)
         (cons (abs->bv pat) s1)
         (cons (abs->bv subj) s2)))
      (else (fail)))))
\end{verbatim}
\caption{Code for \a-matching}
\label{fig:match}
\end{figure}

\begin{figure}
\small
\begin{verbatim}
(define match-two-vars
  (lambda (v10 v20 s10 s20)             ; v1 is pattern variable
    (let ((v1 (var->sym v10))
          (v2 (var->sym v20)))
      (let loop ((s1 s10) (s2 s20))
        (cond
          ((and (null? s1) (null? s2))
           (succeed (make-unit-subst v1 v20)))
          ((or (null? s1) (null? s2))
           ;; don't fail here, raise an error immediately
           (error 'match-two-vars
             "bound-variable lists had different lengths: ~s and ~s"
             s10 s20))
          ((and (eq? v1 (car s1)) (eq? v2 (car s2)))
           (succeed (make-empty-subst)))
          ((or (eq? v1 (car s1)) (eq? v2 (car s2)))
           (fail))
          (else
            (loop (cdr s1) (cdr s2))))))))

(define match-one-var
  (lambda (vm1 subj s1 s2)
    (let ((v1 (var->sym vm1)))
      (if (and
            (not (memq v1 s1))
            (not (non-empty-intersection?
                   s2
                   (fv subj))))
        (succeed (make-unit-subst v1 subj))
        (fail)))))
\end{verbatim}
\caption{Code for \a-matching, continued}
\label{fig:match2}
\end{figure}

The resulting system is very small: about 800 lines of Scheme, including
comments and test cases.  Because it searches only to find rewrite sites (as
in {\tt leftmost}), it operates quickly: generation of a proof like the one in
Figures~\ref{fig:output}--\ref{fig:output2} requires about 3 seconds on our
Sparc-10 processor.

Because the algorithms were tightly
determined by the mathematics and we used some simple coding standards
for data structures, the system has been remarkably reliable:  only
two errors have been discovered since the system passed its initial
tests.


\section{Experience with the prover}
\label{sec:experiences}

We have exercised the theorem-prover on a variety of examples of the sort
sketched above.  Our examples have included conditional expressions, closure
creation and procedure calls with lexical addressing and multiple arguments,
and correct compilation of tail-recursive code
\cite{Clinger84,OlivaRamsdellWand94}. 

Another set of examples proved by the theorem-prover arise from checking the
operational semantics of the target machine.  In our compiler-correctness
framework, we require that the \l-term representing the meaning of the target
code has reduction behavior that closely mimics the behavior of typical
machine instructions.  This means that the target language instructions are
easily implementable by translation to single instructions or short sequences
of instructions on conventional machines.  For the compilers we have
considered, requiring the meaning of each target-machine instruction to be a
tail-form supercombinator {\cite{FriedmanEtAl92,Hughes82}} turns out to be a
reasonable choice.

By examining this reduction behavior, we can obtain an operational
semantics for the target language.  The soundness of this
operational semantics must then be proved.  For example, the
operational semantics of the addition instruction might be given by
\[
\mach{(\ins{add} t)}{u}{\push{\int{v_2}}{\push{\int{v_1}}{z}}} \To
\mach{t}{u}{\push{\int{(v_1+v_2)}}{z}}
\]
We would then have to prove that the denotation of the left and right sides of
this rule are the same.  Our theorem-prover has been able to prove such
soundness results for the examples cited above.

The theorem-prover's main current limits are induced by the fact that it
depends on something like strong normalization and Church-Rosser properties of
the rules we have used.  Adding rules that can be applied in both directions
will lead very quickly to infinite loops.  We hope to explore more complex
tactics as a way around some of these difficulties.


\section{Related Work}
\label{sec:related-work}

As mentioned above, our initial experiments were performed in Isabelle
\cite{Bolek91}.  Isabelle was chosen because it seemed suited to this kind of
higher-order equational reasoning, and it allowed for the introduction of new
theories to manage the induction hypotheses.  The results of the experiment
were unsatisfactory, in part because the transcription of our problem into
Isabelle was less transparent than we would have liked, and in part because it
was more difficult than expected to control the search strategy.

Looking back at this experience, we now believe that the biggest problem was
that the built-in proof strategies in Isabelle were too powerful.  Isabelle
used higher-order unification to locate possible applications of a rule.
Humans, on the other hand, typically use much weaker strategies to determine
if a rule is applicable.  This realization led us to base our system on
\a-matching. 

Another system we explored was Elf \cite{Pfenning91}.  Elf is a
powerful system for finding proofs in the LF logical framework
\cite{HarperHonsellPlotkin93}. 
Hannan has encoded in Elf the correctness of a compiler from the
lambda-calculus to code for the categorical abstract machine, using the
operational semantics of call-by-value as a specification of the source
language \cite{Hannan92}.  Although the encodings are fairly readable once one
understands the principles involved, the translation still requires some
effort.

While there has been considerable work on rewriting systems and unifications
for higher-order terms, most of it has considered unification or pattern up to
\b-convertibility rather than \a-congruence \cite{NadathurMiller88,Nipkow91}
Most of this work is also restricted to typed terms, a restriction that is
unnecessary for us.  The work closest to ours that we are aware of is the
v-unification of Talcott \cite{Talcott93}.  Talcott gives an algorithm for
deciding \a-unification for a class of binding structures that generalize
conventional \l-terms with contexts and holes.  We conjecture that her
algorithm specializes to ours when restricted to the class of terms we
consider. 

It should also be noted that the operational proofs in
\cite{OlivaRamsdellWand94,WandOliva92} are much more first-order in flavor
than the proofs considered here, and a powerful first-order theorem-prover
like Boyer and Moore's Nqthm \cite{BoyerMoore79} may be more appropriate for
these proofs.  We have not yet investigated this issue.

Because successful theorem-provers and proof-checkers are large, complicated
systems, we long resisted the temptation to build our own.  We turned to
construction only when we realized that we could take advantage of the extra
structure in our problem to build a small system tuned to our needs.

\section{Current Work}
\label{sec:current-work}

Two more developments that would be useful are a type-checker and a
case-analysis checker.  A type checker will be a useful consistency checker
for proposed rules.  A case-analysis checker would allow the user to specify a
case analysis of some set of terms and would check to see that the case
analysis is exhaustive.  This tool would be particularly useful for the
soundness proofs, where coming up with a case analysis is the most error-prone
part of the proof.  Another possible development is the application of the
theorem-prover to other kinds of inductions over lambda terms, such as those
typically found in fixed-point reasoning or in reasoning about sequences.


\section{Conclusions}
\label{sec:conclusions}

We have built a small theorem-prover to do higher-order equational reasoning
in the presence of induction hypotheses and other general rules.  We were
pleasantly surprised to find that such a simple strategy sufficed to do large
portions of denotationally-based compiler correctness proofs.

\section*{Acknowledgements}

Bolek Ciesielski struggled heroically with the Isabelle metalogic and with
various systems problems to produce \cite{Bolek91}.  Conversations with Tobias
Nipkow helped us isolate \a-matching as the key problem.

\bibliographystyle{Plain}
\bibliography{refs}


\appendix


\end{document}

% Although such proofs are very stylized and most of their reasoning
% is done within the given \l-theory,
% equation~\label{eqn:compiler-correctness} is not a statement {\em
% in} the theory.  It is a statement {\em about} the theory: the
% statement that a certain class of equations are provable in the
% theory.  Therefore, a tool for

Although general theorem-proving is a large research area, we can observe that
our congruence and soundness proofs typically have a very stylized form,
suggesting that they might be amenable to treatment by a special-purpose tool.