INDUCTION PRINCIPLES ARE THEOREMS TOO (2/23/2007) ================================================= Let's dredge up one of our simple data definitions: ------- (nil-Loz) nil:LoN L:LoN -------------- (cons-Loz) (cons 'z L):LoN If we want to prove a subset S of Loz is actually equal to Loz: nil:S x:S -> (cons 'z x):S ------------------------------ (SIP-LoN) S = LoN according to the structural induction principle. But who gave the SIP any authority on the matter? Well, it turns out that we can actually prove the SIP: it is a theorem. CORRECTNESS OF SIP ================== The path to showing the correctness of the structural induction principle begins with an inquiry regarding the meaning of the data definition: Loz ::= nil | (cons 'z Loz) We call that a definition of Loz, but `Loz' appears on both sides of the equation. Why is okay to define Loz in terms of itself? The answer is simple: the data definition is okay because we can transform it into an equivalent one that does not exhibit any self-reference. We begin by defining a so-called generating function L: L : Set[Loz] -> Set[Loz] L[S] = {nil} U {(cons 'z L) | L:S} Rougly speaking, we obtain L by parameterizing the data definition over a set S. Then we eliminate the self-reference by replacing it with S. How does L behave? Let's look at some examples: L[{}] = {nil} L(L[{}]) = {nil, (cons 'z nil)} L(L(L[{}])) = {nil, (cons 'z nil), (cons 'z (cons 'z nil))} ... Notice that {} : Set[Loz] and that since L has the same domain and range we can apply L again to the output of another call to L. We'll make a lot of use of this, so let's define a notation for repeated applications of a function: Define F^(k)[S] ("apply F k times starting at S") F^(0)[S] = S F^(n+1)[S] = F(F^(n)[S]) i.e., F^(1)[S] = F[S] F^(2)[S] = F(F[S]) F^(3)[S] = F(F(F[S])) ... For example, L^(0)[{}] = {} L^(1)[{}] = L[{}] = {nil} L^(2)[{}] = {nil, (cons 'z nil)} L^(3)[{}] = {nil, (cons 'z nil), (cons 'z (cons 'z nil))} L^(4)[{}] = {nil, (cons 'z nil), (cons 'z (cons 'z nil)), (cons 'z(cons 'z (cons 'z nil)))} ... It looks like each time we apply L to the output of another call to L, we wind up with bigger and bigger subsets of Loz. Suppose, hypothetically, that we compute the whole list above. That leaves us with a sequence of subsets of Loz. Let's take the union of all of these subsets and call it D: D = (Union k:N L^(k)[{}]) I claim that D contains every instance of Loz. Further, there is no self-reference involved in the definition of D. How can we be sure that D is not missing any Loz's? Well, if it were missing any, then we could just apply L to D to get a larger set of Loz's. But, if we look at L(D) we see that it is actually equal to D. In other words, D is "full". Theorem ["D is full of L"]: L(D) = D. Proof: We must show that D and L(D) are subsets of each other. (1) D <= L(D): Choose any x:D. Then it must have been that x is in L^k[{}] for some k:N. We have that L^k({}) = L(L^(k-1)({})) = {nil} U { (cons 'z y) | y : L^(k-1)({}) } We see that x must either be nil or (cons 'z y) for y in L^(k-1)({}). If x = nil, then obviously x:L(D). Now suppose x = (cons 'z y) for some y in L^(k-1)({}). But L^(k-1)({}) <= D because it's part of the big union, and so y:D. Thus x = (cons 'z y) : L(D). (2) L(D) <= D: ("D is L-closed") Must show {nil} U {(cons 'z l) | l:D} <= D. Obviously nil:D. Take any l:D. Then l:L^k({}) for some k. Therefore (cons 'z l):L^(k+1)({}). But L^(k+1)[{}] <= D and so (cons 'z l):D. Q.E.D. [Note that I slipped a quick definition into case (2): _S is F-closed_ iff F(S) <= S.] That shows that D is not missing any Loz's. But that big union puts a lot of stuff in D; did any extra junk manage to leak in? We must check that D doesn't contain more than Loz's. We will do this by showing that D is the smallest L-closed set. We need a quick definition and lemma first: Def'n: We say that _F is monotonic_ iff: if X <= Y, then F(X) <= F(Y). Lemma: L is monotonic. Proof: Suppose X <= Y. L(X) = {nil} U {(cons 'z l) | l:X} L(Y) = {nil} U {(cons 'z l) | l:Y} Must show L(X) <= L(Y). Clearly nil:L(Y). Take (cons 'z l):L(X). Since l:X, l:Y and so (cons 'z l):L(Y). Q.E.D. Theorem: D is the smallest L-closed set. Proof: Suppose E is any other L-closed set [ L(E) <= E ]. We can show by induction on n that all n:N. L^n({}) <= E. Base: L^0({}) = {} <= E. Induction step: Assume L^k({}) <= E. L^(k+1)({}) = L(L^k({})). Using the induction hypothesis and monotonicity of L, we have the result. Then, D = (Union n:N L^n({})) <= E because all the components of the union are contained in E. Q.E.D. I call these last two theorems the Goldilocks theorems: D is not too small, and D is not too big. It's just right. Now that we have expressed the data definition of Loz rigorously without using self-reference, we can turn to the correctness of the structural induction principle: Theorem [SIP for D]: Suppose S <= D and S is L-closed. Then S = D. Proof: Since D is the smallest L-closed set, D <= S. The other inclusion is given, and so S = D. Q.E.D. Wait, where's the induction principle that we were expecting? It's hiding in the definition of _S is L-closed_. Suppose we have some S <= D. To show that L(S) <= S, we expand the definition of L: L(S) = {nil} U {(cons 'z l) | l:S} <= S To show that that is contained in S, we simply need - nil:S and - {(cons 'z l) | l:S} <= S. It's easy to see that the latter condition is equivalent to "if l:S, then (cons 'z l):S". That's the end of the story for Loz. The moral is twofold: (1) We can get rid of the self-reference in data definitions using the big-union construction on the generating function. (2) The SIP comes directly from the closure condition on the big-union. It is just another theorem that we can prove after we eliminate the self-references. We could repeat this exercise using other data definitions like LoN, BTN, Tree, BST, etc. The trick is to read the generating function off of the data definition, and then the rest of the development proceeds similarly (using the new generating function in place of L). In fact, we could give one proof that abstracts over the data definition in question to get the result once and for all, but we will forego that development for now.