\documentstyle [11pt,mysecs]{article} \source{/home/wand/unif/lfp/abstract.tex} \input{macros-for-abstract} \title{A Little Goes a Long Way:\\ A Simple Tool to Support Denotational Compiler-Correctness Proofs} \support{Work supported by the National Science Foundation under grant numbers CCR-9014603 and CCR-9304144.} % \draftfalse \begin{document} \maketitle \begin{abstract} In a series of papers in the early 80's we proposed a paradigm for semantics-based compiler correctness. In this paradigm, the source and target languages are given denotational semantics in the same \l-theory, so correctness proofs can be carried out within this theory. In many cases, the proofs have a highly structured form. We show how a simple proof strategy, based on an algorithm for \a-matching, can be used to build a tool that can automate the routine cases of these proofs. \end{abstract} \section{Introduction} In a series of papers in the early 80's \cite{Wand82b,Wand82c,Wand83b,Clinger84} we proposed a paradigm for semantics-based compiler correctness, and over the last several years we have begun putting this paradigm into practice {\cite{WandOliva92,OlivaRamsdellWand94}}. In this paradigm, the source and target languages are given denotational semantics in the same \l-theory, so that most of the steps of the correctness proof can be done within the theory. Furthermore, the compiler is given by a compositional (syntax-directed) translation, so properties of the compiler can be proved by structural induction on the grammar of the source language. If the semantics of programs in the source language is given by $\sem{P}$, the semantics of the target language is given by $\sem{T}$, and the compiler on programs is given by $\sem{CP}$, then the correctness of the compiler is given by the requirement that for all programs $p$, \begin{equation} \label{eqn:compiler-correctness} \sembrack{T}{\sembrack{CP}{p}} = \sembrack{P}{p} \end{equation} % where equality is taken in the semantic theory.\footnote{For this we can use any convenient \l-theory. For % ``conventional'' semantics we could use the theory of some model of the % \l-calculus \cite{OlivaRamsdellWand94}, or the theory of $\beta$-conversion % \cite{Wand82b}, or the theory of $\beta$-conversion plus some fixed-point % rules \cite{Wand83b}.} Such an equation is proven by structural induction on source programs; this typically involves an induction hypothesis for each non-terminal of the source language. The reasoning involves higher-order equational reasoning in a suitable \l-theory \cite{Meyer82} plus the use of the induction hypotheses. A number of small-to-moderate sized compiler correctness proofs have been done in this way \cite{Clinger84,GuttmanSwarupRamsdell93,OlivaRamsdellWand94,Wand82b,Wand82c,Wand83b,WandOliva92}. As proofs of congruence and soundness scale up, machine support becomes more and more important. Ideally, one would like all the simple cases to be checked by machine, perhaps leaving a few subtle cases to be done by hand. We have now built a theorem-prover capable of proving many such theorems. The key idea in the system is the use of a very simple rule-matching technique, which we call {\em {\a-matching}}, in place of more complicated techniques such as higher-order unification. [[Need to redo this ]] In this paper, we will describe the system and its use. Section~\ref{sec:hand-example} shows a typical manual proof step, and Section~\ref{sec:mech-proof} shows how this proof is expressed in our system. Section~\ref{sec:tac-prover} describes the internal organization of the system. Section~\ref{sec:experiences} describes our experience with the system, including what it can and cannot do currently. Section~\ref{sec:current-work} discusses extensions to the system. Sections~\ref{sec:related-work} and \ref{sec:conclusions} discuss related work and conclusions. \section{Example of a Manual Proof} \label{sec:hand-example} A theorem like equation~\ref{eqn:compiler-correctness} is proven by structural induction on source programs; this typically involves an induction hypothesis for each non-terminal of the source language. For example, the induction hypothesis for expressions might be $$\T{\CE{e}t}\rho\zeta = \E{e}\rho (\l v. \T{t} \r \push{v}{\zeta})$$ This induction hypothesis says that if the compilation of $e$ followed by $t$ is run on environment $\r$ and stack $\z$, the result is the same as that of running $e$ in environment $\r$ and a continuation that runs $t$ on environment \r\ and the stack obtained by pushing the result $v$ of $e$ onto \z.\footnote{The reader familiar with \cite{Clinger84} will note some differences. We have omitted the separation of the environment into compile-time and run-time components; our system can handle this as well. Also, we have eliminated both Clinger's accumulator $\epsilon$ and the continuation register $\kappa$. This entails no loss of generality, as these quantities can be cached at the top and bottom of the stack \z.} Such proofs typically have a highly structured form. As usual for structural inductions, we have one induction step for each production in the language. Let us consider the case of addition expressions. The semantics of an addition expression might be given by $$\sembrack{E}{e_1 + e_2} = \l \r\k. \sembrack{E}{e_1}\r(\l v_1. \sembrack{E}{e_2}\r(\l v_2. \fun{cps-add} v_1 v_2 \k))$$ This is standard continuation semantics. A typical production in the target language might be $$t ::= \ins{add} t$$ representing an \ins{add} instruction followed by additional target code, with semantics specified by $$\sembrack{T}{\ins{add} t} = \l \r \zeta. \fun{cps-add} (\nth{\zeta}2) (\nth{\zeta}1) (\l v. \sembrack{T}{t} \r \push{v}{(\nthcdr{\zeta}2)})$$ Here \r\ is the environment and $\zeta$ is a stack on which the instruction acts; $\nth{\zeta}i$ denotes the $i$-th element of the stack, and $\nthcdr{\zeta}i$ represents the stack from which the first $i$ elements have been removed. Thus the \ins{add} instruction works by adding the first two elements of the stack, and invoking the following code $t$ on the stack obtained by replacing the first two elements of $\zeta$ by their sum; the environment register is left unchanged. The compiler on addition expressions might be given by $$\CE{e_1+e_2}t = \CE{e_1}(\CE{e_2}(\ins{add} t))$$ Here we write \sem{CE} rather than \sem{CP} since we are compiling an expression rather than a program. The equation says that to compile $e_1 + e_2$ (followed by target code $t$), compile $e_1$ followed by the compilation of $e_2$, followed by an \ins{add} instruction, followed by $t$. This is a reasonable compilation strategy for a stack machine. To show that the induction hypothesis holds for every source language expression $e$, we will use structural induction \cite{Burstall69}. We will have one induction step for each production in the grammar. For the production for addition expressions, we will want to show that if the induction hypothesis is true at expressions $e_1$ and $e_2$, then it is true at the expression $e_1 + e_2$ as well. To do this, we calculate as follows: \begin{displaymath} \begin{array}{l} %% I wish I knew why we need the \relax on the next line. \relax \T{\CE{e_1+ e_2}t}\rho \zeta \\ = \just{definition of \sem{CE}} \\ \T{\CE{e_1}(\CE{e_2}(\ins{add} t))}\rho \zeta \\ = \just{induction hypothesis at $e_1$} \\ \E{e_1}\rho(\l v_1 . \T{\CE{e_2}(\ins{add} t)}\r \push{v_1}{\zeta}) \\ = \just{induction hypothesis at $e_2$} \\ \E{e_1}\rho(\l v_1 . \E{e_2}\rho (\l v_2 . \T{\ins{add} t}\r \push{v_2}{\push{v_1}{\zeta}})) \\ = \just{definition of \sem{T}, properties of list functions} \\ \E{e_1}\rho(\l v_1 . \E{e_2}\rho (\l v_2 . \fun{cps-add} v_1 v_2 (\l v. \T{t} \rho \push{v}{\zeta}))) \\ = \just{definition of \sem{E}} \\ \E{e_1+e_2}\rho (\l v. \T{t} \rho \push{v}{\zeta}) \\ \end{array} \end{displaymath} as desired. Here we have labelled each equality with its justification. In doing this proof, we used only ordinary \b-conversion, plus the knowledge of the induction hypotheses. We used the fact that if $M = N$ is provable in the \l-calculus, so is any substitution instance of $M = N$. This allowed us to use a complex substitution for $\kappa$ when we applied the induction hypothesis at $e_2$. The rules of inference are shown in Figure~\ref{fig:lambda}. It is easy to show that these rules are a conservative extension of the ordinary \l-calculus (or of any \l-theory) \cite{WandWang90}. \begin{figure} $$\G \proves (\l x. M) N = M[N/x]$$ $$ (M = N) \in \G \over \G \proves M\s = N\s $$ $$\G \proves M = M$$ $$\G \proves M = N \over \G \proves N = M$$ $$\G \proves M = N \quad \G \proves N = P \over \G \proves M = P $$ $$G \proves M = M' \quad \G \proves N = N' \over \G \proves MN = M'N' $$ \caption{Rules for \l-calculus with additional hypotheses} \label{fig:lambda} \end{figure} This proof has a very definite structure: one reduces the left-hand term, using ordinary $\beta$-reduction and the induction hypotheses, and similarly reduces the right-hand term, using $\beta$-reduction and the definition of the original semantics. \section{A Mechanized Proof} \label{sec:mech-proof} Our theorem-prover was able to automatically verify this example and others like it. The input to the theorem-prover is shown in Figures~\ref{fig:input} and \ref{fig:input2} and the output is shown in Figures~\ref{fig:output}--\ref{fig:output2}. Figure~\ref{fig:input} shows four lists of rules. The first three lists give the definitions of the relevant portions of the compiler, the target-language semantics, and the source-language semantics. Here we show only the rules relevant to addition expressions: one rule in each list. The last list, {\tt list-rules}, gives the relevant rules for evaluation of operations on lists. In these rules, constants are represented by quoted symbols ({\tt {'compiler}}, {\tt 'sum}, etc.); other symbols ({\tt e1}, {\tt e2}, etc.) are variables. Since our system is tuned for dealing with semantics, all applications and abstractions are internally curried. The unparsing routine automatically merges consecutive {\tt lambda}'s and omits unnecessary parentheses, as shown in Figures~\ref{fig:output}--\ref{fig:output2}. We use explicit bound variables rather than deBruijn indices in the internal representation; this made the algorithms clearer. \begin{figure}[tp] \small \begin{verbatim} ;;; rules for compiler, machine, evaluator, etc. (define compiler-rules '((('compiler ('sum e1 e2) t) ('compiler e1 ('compiler e2 ('add-inst t)))))) (define machine-denotation-rules '((('target-semantics ('add-inst t) r z) ('add-cps ('cadr z) ('car z) (lambda (v) ('target-semantics t r ('cons v ('cddr z)))))))) (define source-semantics-rules '((('source-meaning ('sum e1 e2) r k) ('source-meaning e1 r (lambda (v1) ('source-meaning e2 r (lambda (v2) ('add-cps v1 v2 k)))))))) (define list-rules '((('car ('cons a d)) a) (('cadr ('cons a1 ('cons a2 dd))) a2) (('cddr ('cons a1 ('cons a2 dd))) dd))) \end{verbatim} \caption{Rules for the addition example} \label{fig:input} \end{figure} Figure~\ref{fig:input2} shows the tactics used for this example. The procedure {\tt ih} constructs an induction hypothesis for its argument {\tt pgm}. {\tt proof-tac} constructs the main tactic for the proof. This tactic applies beta-reduction, the various rules and the induction hypotheses as rewrite rules as often as possible. Last, the procedure {\tt prove-it} applies these tactics to each side of the desired conclusion and then tests the resulting terms for congruence. Note that there is little here that is dependent on this example; Section~\ref{sec:gen-induction-steps} shows how to go directly from the grammar to an appropriate set of hypotheses, tactics, and goals. \begin{figure} \small \begin{verbatim} (define ih (lambda (pgm) (list (lhs pgm) (rhs pgm)))) (define lhs (lambda (pgm) `('target-semantics ('compiler ,pgm t) r z))) (define rhs (lambda (pgm) `('source-meaning ,pgm r (lambda (v) (('target-semantics t) r ('cons v z)))))) (define proof-tac (lambda () (repeat (or-else-use* (report-tac "beta" (leftmost beta)) (report-tac "definition of compiler" (rules->tac compiler-rules)) (report-tac "definition of T" (rules->tac machine-denotation-rules)) (report-tac "definition of E" (rules->tac source-semantics-rules)) (report-tac "list operations" (rules->tac list-rules)) (report-tac "ih at e1" (rules->tac (list (ih 'e1)))) (report-tac "ih at e2" (rules->tac (list (ih 'e2)))) )))) (define prove-it (lambda (pgm) (let* ((dummy1 (printf "~%Reducing left-hand side...~%")) (d2 (pretty-print (lhs pgm))) (lhs (test-tacfn (lhs pgm) (proof-tac))) (dummy (printf "~%Reducing right-hand side...~%")) (d3 (pretty-print (rhs pgm))) (rhs (test-tacfn (rhs pgm) (proof-tac)))) (printf "~%lhs reduces to:~%") (pretty-print lhs) (printf "~%rhs reduces to:~%") (pretty-print rhs) (printf "~%checking for alpha-congruence... ") (alpha-congruent? (parse lhs) (parse rhs) '() '())))) \end{verbatim} \caption{Tactics for the addition example} \label{fig:input2} \end{figure} Figures~\ref{fig:output} and \ref{fig:output2} show the output from this run. We have edited the output only to improve the pretty-printing (and by eliminating one intermediate step of list manipulation, to keep it all on two pages). Notice how closely the steps correspond to the human-generated proof in Section~\ref{sec:hand-example}. \begin{figure} \small \begin{verbatim} Reducing left-hand side... ('target-semantics ('compiler ('sum 'e1 'e2) t) r z) [by definition of compiler]: ('target-semantics ('compiler 'e1 ('compiler 'e2 ('add-inst t))) r z) [by ih at e1]: ('source-meaning 'e1 r (lambda (v) ('target-semantics ('compiler 'e2 ('add-inst t)) r ('cons v z)))) [by ih at e2]: ('source-meaning 'e1 r (lambda (v) ('source-meaning 'e2 r (lambda (v15) ('target-semantics ('add-inst t) r ('cons v15 ('cons v z))))))) [by definition of T]: ('source-meaning 'e1 r (lambda (v) ('source-meaning 'e2 r (lambda (v15) ('add-cps ('cadr ('cons v15 ('cons v z))) ('car ('cons v15 ('cons v z))) (lambda (v16) ('target-semantics t r ('cons v16 ('cddr ('cons v15 ('cons v z))))))))))) [by list operations]: ('source-meaning 'e1 r (lambda (v) ('source-meaning 'e2 r (lambda (v15) ('add-cps v v15 (lambda (v16) ('target-semantics t r ('cons v16 z)))))))) \end{verbatim} \caption{Machine-generated proof for the addition example} \label{fig:output} \end{figure} \begin{figure} \small \begin{verbatim} Reducing right-hand side... ('source-meaning ('sum 'e1 'e2) r (lambda (v) (('target-semantics t) r ('cons v z)))) [by definition of E]: ('source-meaning 'e1 r (lambda (v1) ('source-meaning 'e2 r (lambda (v2) ('add-cps v1 v2 (lambda (v) ('target-semantics t r ('cons v z)))))))) lhs reduces to: ('source-meaning 'e1 r (lambda (v) ('source-meaning 'e2 r (lambda (v15) ('add-cps v v15 (lambda (v16) ('target-semantics t r ('cons v16 z)))))))) rhs reduces to: ('source-meaning 'e1 r (lambda (v1) ('source-meaning 'e2 r (lambda (v2) ('add-cps v1 v2 (lambda (v) ('target-semantics t r ('cons v z)))))))) checking for alpha-congruence... #t \end{verbatim} \caption{Machine-generated proof for the addition example (part 2)} \label{fig:output2} \end{figure} \section{From Grammars to Proofs} \label{sec:gen-induction-steps} [[ Need to fill this in ]] We are currently working to enhance the usability of the prover. The first step in this process is to automatically generate the various induction steps for the structural induction. This can be done given a specification of the abstract syntax of the language and an specification of the relevant induction hypotheses to be proved for each non-terminal. A typical such specification is given in Figure~\ref{fig:ih-generator}; here we have specified a language with two non-terminals and three productions, and we have one induction hypothesis for the non-terminal {\tt pgm} and two for the non-terminal {\tt exp}. This would give rise to a total of 5 induction steps to be verified. \begin{figure} \small \begin{verbatim} (define abstract-syntax '((whole-program (exp) pgm) (add-exp (exp exp) exp) (var-exp (id) exp))) (define induction-hypotheses '((pgm (p) (('target-semantics ('compile-pgm p) r z) ('source-pgm-semantics p r))) (exp (e) (('target-semantics ('compile-exp e t) r z) ('source-exp-semantics e (lambda (v) ('target-semantics t r ('cons v z))))) (('target-semantics ('compile-tail-exp e) r k) ('source-exp-semantics e r k))))) \end{verbatim} \caption{Specification of abstract syntax and induction hypotheses} \label{fig:ih-generator} \end{figure} \section{Organization of the Prover} \label{sec:tac-prover} The prover is a straightforward tactically-driven theorem-prover, using {\em \a-matching} to determine the applicability of all rules other than \b-reduction. The basic operations on tactics are shown in Figure~\ref{fig:tac-ops}. A tactic is implemented as a Scheme procedure that takes two arguments: a success continuation and a failure continuation. The success continuation is map from terms to answers, and the failure continuation is thunk that returns a tactic. Most operations involve {\tt tacfn}'s, which are maps from terms to tactics. {\tt succeed} is a tacfn that always succeeds and sends its argument to its success continuation. {\tt (fail)} produces a tactic that always fails. {\tt (and-then tac1 tacfn1)} applies {\tt tac1}; if it succeeds, its value is sent to {\tt tacfn1}. {\tt or-else} establishes a choice point. {\tt or-else-use*} tries a list of tacfns in sequence. {\tt repeat} applies its argument (a tacfn) until it fails, and then succeeds. The definitions are familiar and are shown in Figure~\ref{fig:tac-code}. A more interesting operation on tactics is {\tt leftmost}. Given a tacfn and a term, it finds the leftmost place in a term where the tacfn applies. The definition, using our operations on tactics, is shown in Figure~\ref{fig:leftmost}. \begin{figure} \small \begin{verbatim} ;;; Basic types tac = success-cont * fail-cont -> ans success-cont = term * fail-cont -> ans fail-cont = () -> ans tacfn = term -> tac ;;; Types of basic operations on tactics succeed : term -> tac fail : () -> tac and-then : tac * (term->tack) -> tac or-else : tac * tac -> tac repeat : tacfn -> tacfn or-else-use* : tacfn* -> tacfn \end{verbatim} \caption{Basic operations on tactics} \label{fig:tac-ops} \end{figure} \begin{figure} \small \begin{verbatim} (define succeed ; val -> tac (lambda (v) (lambda (succ fail) (succ v fail)))) (define fail ; () -> tac (lambda () (lambda (succ fail) (fail)))) (define and-then ; tac * (val->tac) -> tac (lambda (tac tacfn) (lambda (succ fail) (tac (lambda (v backtrack) ((tacfn v) succ backtrack)) fail)))) (define or-else ; tac * tac -> tac (lambda (t1 t2) (lambda (succ fail) (t1 succ (lambda () (t2 succ fail)))))) (define or-else-use* ; (val->tac)* -> (val->tac) (lambda l (let or-else-use*-loop ((l l)) (if (null? l) (lambda (v) (fail)) (lambda (v) (or-else ((car l) v) ((or-else-use*-loop (cdr l)) v))))))) (define repeat ; (val->tac) -> (val->tac) (lambda (tacfn) ;; keep doing (tacfn v) until it fails (lambda (v) (or-else (and-then (tacfn v) (repeat tacfn)) (succeed v))))) \end{verbatim} \caption{Definitions of basic operations on tactics} \label{fig:tac-code} \end{figure} \begin{figure} \small \begin{verbatim} (define leftmost ; (term->tac) -> (term->tac) (lambda (tacfn) (lambda (m) ;; find leftmost place in m where tac applies, and apply it, ;; rebuilding the term around the result (let leftmost-loop ((m m)) (or-else (tacfn m) (record-case m (app (rator rand) (or-else (and-then (leftmost-loop rator) (lambda (rator1) (succeed (make-app rator1 rand)))) (and-then (leftmost-loop rand) (lambda (rand1) (succeed (make-app rator rand1)))))) (abs (bv body) (and-then (leftmost-loop body) (lambda (body) (succeed (make-abs bv body))))) (else (fail))))))) \end{verbatim} \caption{Definition of {\tt leftmost}} \label{fig:leftmost} \end{figure} The main problem in building the theorem-prover was handling the user-defined rules, which serve as the additional hypotheses $\Gamma$. In most modern higher-order theorem-provers (eg Isabelle \cite{Paulson89}), the applicability of a rule to a given term is determined by some powerful mechanism like higher-order unification. Humans, on the other hand, typically use much weaker strategies to determine if a rule is applicable. In the hand-generated proof above, for example, a human can easily check each step, because at every step some subterm of the term being manipulated is \a-congruent (that is, identical up to a change of bound variables) to some instance of the cited rule. We therefore fixed on the problem of {\em {\a-matching}}, which is defined as follows: \begin{definition}[\a-matching] Given two \l-terms, a pattern $P$ and a subject $S$, find the most general substitution \s\ such that $P\s$ and $S$ are \a-congruent. \end{definition} The algorithm for this problem is a fairly straightforward adaptation of a conventional unification algorithm, keeping careful track of bound variables so that no substitution is ever made on a bound variable. The main function of the algorithm is \fun{match}. \fun{match} takes four arguments: two \l-terms $P$ (the pattern), and $S$ (the subject), and two lists of identifiers $\Sigma_1$ and $\Sigma_2$. The output of \fun{match} is the most general substitution $\theta$ such that $(\l ~ \Sigma_1 . P)\theta =_\a (\l ~ \Sigma_2 . S)$. We can give the algorithm in pseudocode for the various cases on $P$ and $S$. For each case, we give a brief explanation. \begin{itemize} \item If $P$ and $S$ are the same constant, succeed with the empty substitution; if they are different constants, then fail. \item If $P$ and $S$ are both variables, check to see if they occur in the same position in $\Sigma_1$ and $\Sigma_2$. If they do, then succeed with the empty substitution. If they are bound in different positions, or if one is free and the other is bound, then fail--- the terms cannot be matched. If both are free, return the substitution that substitutes one of the variables for the other. \item If $P$ is a variable $v$ and $S$ is not, then check to see if any of the variables of $S$ are bound in $\Sigma_2$. If so, then fail-- no substitution in $P$ can have a variable captured. succeed with the substitution mapping the variable $v$ to $S$. [[where does the ordinary occurs-check get done??]] \item If $S$ is a variable and $P$ is not, then fail-- substitutions are allowed only in the pattern. [Note that this is the only rule that would need to be changed to do unification rather than matching.] \item If $P$ is an application $P_1 P_2$ and $S$ is an application $S_1 S_2$, let $\theta = \fun{match} (P_1, S_1, \Sigma_1, \Sigma_2)$. Then let $\theta' = \fun{match} (P_2\theta, S_2\theta, \Sigma_1 \Sigma_2)$. If both these steps succeed, return $\theta \circ \theta'$. [We recur on applications in the usual way.] \item If $P$ is an abstraction $\l p. P_1$ and $S$ is an abstraction $\l s. S_1$, then compute $\fun{match}(P_1, S_1, \Sigma_1:p, \Sigma_2:s)$. [We add the bound variables to the bound-variable lists, and recur on the bodies]. \item Otherwise fail: $P$ and $S$ are in different syntactic categories. \end{itemize} It is easy to see, by adapting the usual proof for unification, that this function satisfies its specification. The code for this algorithm is shown in Figures~\ref{fig:match}--\ref{fig:match2}. It is coded as a procedure that takes four arguments and returns a tactic. If such a $\theta$ exists, it is sent to the success continuation; otherwise the tactic fails. \begin{figure} \small \begin{verbatim} (define match (lambda (pat subj s1 s2) (cond ((and (const? pat) (const? subj)) (if (equal? (const->datum pat) (const->datum subj)) (succeed (make-empty-subst)) (fail))) ((and (var? pat) (var? subj)) (match-two-vars pat subj s1 s2)) ((var? pat) (match-one-var pat subj s1 s2)) ((var? subj) ; if subj is var and pat ; isn't, then fail (fail)) ((and (app? pat) (app? subj)) (and-then (match (app->rator pat) (app->rator subj) s1 s2) (lambda (theta) (and-then (match (subst (app->rand pat) theta) (app->rand subj) s1 s2) (lambda (theta2) (succeed (comp-subst theta theta2))))))) ((and (abs? pat) (abs? subj)) (match (abs->body pat) (abs->body subj) (cons (abs->bv pat) s1) (cons (abs->bv subj) s2))) (else (fail))))) \end{verbatim} \caption{Code for \a-matching} \label{fig:match} \end{figure} \begin{figure} \small \begin{verbatim} (define match-two-vars (lambda (v10 v20 s10 s20) ; v1 is pattern variable (let ((v1 (var->sym v10)) (v2 (var->sym v20))) (let loop ((s1 s10) (s2 s20)) (cond ((and (null? s1) (null? s2)) (succeed (make-unit-subst v1 v20))) ((or (null? s1) (null? s2)) ;; don't fail here, raise an error immediately (error 'match-two-vars "bound-variable lists had different lengths: ~s and ~s" s10 s20)) ((and (eq? v1 (car s1)) (eq? v2 (car s2))) (succeed (make-empty-subst))) ((or (eq? v1 (car s1)) (eq? v2 (car s2))) (fail)) (else (loop (cdr s1) (cdr s2)))))))) (define match-one-var (lambda (vm1 subj s1 s2) (let ((v1 (var->sym vm1))) (if (and (not (memq v1 s1)) (not (non-empty-intersection? s2 (fv subj)))) (succeed (make-unit-subst v1 subj)) (fail))))) \end{verbatim} \caption{Code for \a-matching, continued} \label{fig:match2} \end{figure} The resulting system is very small: about 800 lines of Scheme, including comments and test cases. Because it searches only to find rewrite sites (as in {\tt leftmost}), it operates quickly: generation of a proof like the one in Figures~\ref{fig:output}--\ref{fig:output2} requires about 3 seconds on our Sparc-10 processor. Because the algorithms were tightly determined by the mathematics and we used some simple coding standards for data structures, the system has been remarkably reliable: only two errors have been discovered since the system passed its initial tests. \section{Experience with the prover} \label{sec:experiences} We have exercised the theorem-prover on a variety of examples of the sort sketched above. Our examples have included conditional expressions, closure creation and procedure calls with lexical addressing and multiple arguments, and correct compilation of tail-recursive code \cite{Clinger84,OlivaRamsdellWand94}. Another set of examples proved by the theorem-prover arise from checking the operational semantics of the target machine. In our compiler-correctness framework, we require that the \l-term representing the meaning of the target code has reduction behavior that closely mimics the behavior of typical machine instructions. This means that the target language instructions are easily implementable by translation to single instructions or short sequences of instructions on conventional machines. For the compilers we have considered, requiring the meaning of each target-machine instruction to be a tail-form supercombinator {\cite{FriedmanEtAl92,Hughes82}} turns out to be a reasonable choice. By examining this reduction behavior, we can obtain an operational semantics for the target language. The soundness of this operational semantics must then be proved. For example, the operational semantics of the addition instruction might be given by \[ \mach{(\ins{add} t)}{u}{\push{\int{v_2}}{\push{\int{v_1}}{z}}} \To \mach{t}{u}{\push{\int{(v_1+v_2)}}{z}} \] We would then have to prove that the denotation of the left and right sides of this rule are the same. Our theorem-prover has been able to prove such soundness results for the examples cited above. The theorem-prover's main current limits are induced by the fact that it depends on something like strong normalization and Church-Rosser properties of the rules we have used. Adding rules that can be applied in both directions will lead very quickly to infinite loops. We hope to explore more complex tactics as a way around some of these difficulties. \section{Related Work} \label{sec:related-work} As mentioned above, our initial experiments were performed in Isabelle \cite{Bolek91}. Isabelle was chosen because it seemed suited to this kind of higher-order equational reasoning, and it allowed for the introduction of new theories to manage the induction hypotheses. The results of the experiment were unsatisfactory, in part because the transcription of our problem into Isabelle was less transparent than we would have liked, and in part because it was more difficult than expected to control the search strategy. Looking back at this experience, we now believe that the biggest problem was that the built-in proof strategies in Isabelle were too powerful. Isabelle used higher-order unification to locate possible applications of a rule. Humans, on the other hand, typically use much weaker strategies to determine if a rule is applicable. This realization led us to base our system on \a-matching. Another system we explored was Elf \cite{Pfenning91}. Elf is a powerful system for finding proofs in the LF logical framework \cite{HarperHonsellPlotkin93}. Hannan has encoded in Elf the correctness of a compiler from the lambda-calculus to code for the categorical abstract machine, using the operational semantics of call-by-value as a specification of the source language \cite{Hannan92}. Although the encodings are fairly readable once one understands the principles involved, the translation still requires some effort. While there has been considerable work on rewriting systems and unifications for higher-order terms, most of it has considered unification or pattern up to \b-convertibility rather than \a-congruence \cite{NadathurMiller88,Nipkow91} Most of this work is also restricted to typed terms, a restriction that is unnecessary for us. The work closest to ours that we are aware of is the v-unification of Talcott \cite{Talcott93}. Talcott gives an algorithm for deciding \a-unification for a class of binding structures that generalize conventional \l-terms with contexts and holes. We conjecture that her algorithm specializes to ours when restricted to the class of terms we consider. It should also be noted that the operational proofs in \cite{OlivaRamsdellWand94,WandOliva92} are much more first-order in flavor than the proofs considered here, and a powerful first-order theorem-prover like Boyer and Moore's Nqthm \cite{BoyerMoore79} may be more appropriate for these proofs. We have not yet investigated this issue. Because successful theorem-provers and proof-checkers are large, complicated systems, we long resisted the temptation to build our own. We turned to construction only when we realized that we could take advantage of the extra structure in our problem to build a small system tuned to our needs. \section{Current Work} \label{sec:current-work} Two more developments that would be useful are a type-checker and a case-analysis checker. A type checker will be a useful consistency checker for proposed rules. A case-analysis checker would allow the user to specify a case analysis of some set of terms and would check to see that the case analysis is exhaustive. This tool would be particularly useful for the soundness proofs, where coming up with a case analysis is the most error-prone part of the proof. Another possible development is the application of the theorem-prover to other kinds of inductions over lambda terms, such as those typically found in fixed-point reasoning or in reasoning about sequences. \section{Conclusions} \label{sec:conclusions} We have built a small theorem-prover to do higher-order equational reasoning in the presence of induction hypotheses and other general rules. We were pleasantly surprised to find that such a simple strategy sufficed to do large portions of denotationally-based compiler correctness proofs. \section*{Acknowledgements} Bolek Ciesielski struggled heroically with the Isabelle metalogic and with various systems problems to produce \cite{Bolek91}. Conversations with Tobias Nipkow helped us isolate \a-matching as the key problem. \bibliographystyle{Plain} \bibliography{refs} \appendix \end{document} % Although such proofs are very stylized and most of their reasoning % is done within the given \l-theory, % equation~\label{eqn:compiler-correctness} is not a statement {\em % in} the theory. It is a statement {\em about} the theory: the % statement that a certain class of equations are provable in the % theory. Therefore, a tool for Although general theorem-proving is a large research area, we can observe that our congruence and soundness proofs typically have a very stylized form, suggesting that they might be amenable to treatment by a special-purpose tool.