================================================================ 1-1. Main Questions in Semantics ================================================================ Goal: Build _models_ of the behavior of programming languages. We have good tools for doing this. First part of course: familiarity with the tools. Key is understanding relationships between different models of the same thing. Second part: applications. Key questions: Syntax: What _is_ a program? | | | | Operational Semantics: What does a program compute? /\ / \ / \ / \ / \ / \ / \ / \ Equational Theories: Denotational Semantics: When are two expressions What does an expression mean? equivalent? Operational Semantics: Big Step: p => v Evaluation Semantics (often p \Downarrow v) Small Step: p1 -> p2 -> ... -> v Reduction Semantics p a program, v a "value" (Winskel calls them "canonical forms") Would like to show equivalence: p ->* v iff p => v. Note: these could be non-deterministic. ================================================================ 1-2. Case Study: PCF. ================================================================ We'll try to stay close to Gunter, chapter 4. Syntax: ====== Types: t,u ::= num | bool | t -> u num and bool are called "ground types" or "scalar types" Write t1 -> t2 -> t3 for t1 -> (t2 -> t3). Expressions or Terms : e ::= c_n (for each non-negative integer n) | succ(e) | pred (e) | true | false | zero?(e) | if e then e else e | x | \x:t. e | e e | Y(e) When in doubt, we always interpret grammars as defining a set of finite trees ("abstract syntax trees"). We'll put in plenty of parens to help you parse. Write: e1 e2 e3 for (e1 e2) e3 \x1:t1,x2:t2.e for \x1:t1.(\x2:t2.e) In general, dots extend as far to the right as possible, unless they obviously don't :) Typing Rules: ============ Typing Environments H: --------------------- H ::= [] | H,x:t Interpret H as a partial function from variables x to types t by H,x:t (x') = t if x' is the same variable as x = H(x) otherwise Notice this makes [](x) undefined for all x. Typing Judgements: H |- e : t ----------------- "Under typing assumptions H, the expression e has type t" [T-const] H |- c_n : num [T-Succ] H |- e : num ------------------ H |- succ(e) : num [T-Pred] H |- e : num ------------------ H |- pred(e) : num [T-True] H |- true : bool [T-False] H |- false : bool [T-zero?] H |- e : num -------------------- H |- zero?(e) : bool [T-if] H |- e0 : bool H |- e1 : t H |- e2 : t ------------------------------------------ H |- if e0 then e1 else e2 : t [T-var] H(x) = t ---------- H |- x : t [T-lambda] H,x:t1 |- e : t2 ------------------------ H |- \x:t1. e : t1 -> t2 [T-app] H |- e0 : (t1 -> t2) H |- e1 : t1 ------------------------------------- H |- (e0 e1) : t2 [T-fix] H |- e : (t->t) --------------- H |- Y(e) : t {The rules T-var, T-lambda, and T-app are in Table 2.1, rather than in Table 4.1 in my copy of Gunter.} We assert "H |- e : t" iff H |- e:t is derivable from these axioms and rules in a finite number of steps. If you want to get technical, a typing judgement is just a triple (H, e, t), and these rules are just inductive rules putting a triple (H, e, t) in a set "|-". But we don't usually need this level of gymnastics. ==== Exercise #1: prove: If H |- e:t and H |- e:t' . Write a Scheme procedure type-of : H * e -> (maybe t) which, given an H and an e, returns the unique t such that H|-e:t, or #f otherwise. ==== There are lots of different ways to set this up. For example, Gunter doesn't have c_k for every k; he just has 0, succ(0), succ(succ(0)), etc. Standard definition of free variables. An expression e is closed iff it has no free variables. Definition: A program is a closed term of ground type. <> ================================================================ 1-3. Operational Semantics (Small-Step) ================================================================ Reduction rules succ(c_m) -> c_{m+1} pred(c_{m+1}) -> c_m (m >= 0. Note: (-1 c_0) "sticks"). zero? c_0 -> true zero? c_n -> false (n != 0) if true e1 e2 -> e1 if false e1 e2 -> e2 (\x.e1) e2 -> e1[e2/x] (beta-reduction) Y(e1) e2 -> (e1 Y(e1)) e2 [Red fix] Example: consider double = Y(M) where M = \f:num->num.\x:num.if zero?(x) 0 (succ(succ(f(pred x))))) We'll write 2 for c_2, etc. So the reduction rule [Red Fix] says (double e) = (Y(M) e) -> ((M Y(M)) e) -> ((M double) e) -> if zero?(e) then 0 else succ(succ(double(pred e))) So we have: (double 2) -> (M[double/f] 2) -> if zero?(2) then 0 else succ(succ(double(pred 2))) -> succ(succ(double(pred 2))) -> succ(succ(double 1)) -> succ(succ(if zero?(1) then 0 else succ(succ(double(pred 1))))) -> succ(succ(succ(succ(double(pred 1))))) -> succ(succ(succ(succ(double 0)))) -> succ(succ(succ(succ(if zero?(0) then 0 else succ(succ(double(pred 0))))))) -> succ(succ(succ(succ(0)))) -> (succ(succ(succ(1)))) -> (succ(succ(2))) -> (succ(3)) -> 4 ;; whew!! <> We had lots of choices about which reductions to apply. If we had done it differently: Could we have gotten a different answer? Could we have gotten an infinite reduction sequence? This tells us the next thing we need to do is to specify the contexts in which we can apply these reductions. We'll make some design choices here, but the framework is very general. The setup is due to Felleisen. Define: Redexes l ::= the left-hand sides of the reduction rules Reduction Contexts K ::= [] | succ(K) | pred(K) | zero?(K) | if K then e else e | (K e) | (v K) | Y(K) Values v ::= c_n | \x.e | Y(v) Stuck terms s ::= pred(0) [only 1 stuck term in this language] Would-be Redexes: q ::= l | s Reduction contexts tell us where in the term we can apply a reduction rule. (People often use E to range over reduction contexts). Values are terms that are "done". Stuck terms are terms with "run-time errors"-- ie those that are not detected by the type-checker. Now we can define the reduction semantics: if (L, R) is a reduction rule, then K[L] -> K[R]. To be completely rigorous about this, we need to define plugging a term into a context -[-]. This is done by induction on the construction of contexts K: [][e] = e succ(K)[e] = succ(K[e]) etc! Notice that in our language, reduction contexts don't bind, so K[e] is the same as K[e/[]] . We will likely come back to this later. <> Note: this is a _substitution semantics_, not an environment semantics. When we restrict the reduction contexts, we'll also restrict some of the reduction rules so that they don't fire "prematurely". So instead of (\x.e1) e2 -> e1[e2/x] (beta) we'll have: ((\x.e) v) -> e[v/x] (beta-value, \beta_v) and instead of Y(e1) e2 -> (e1 Y(e1)) e2 [rec] we'll have Y(v1) v2 -> (v1 Y(v1)) v2 [rec-val] {Note that Y(v1) is automatically a value} Note: this is a call-by-value (or "eager") semantics, because it requires that we reduce the argument to a function to a value before we can substitute it in the function. In "call-by-name" or "lazy" evaluation, we don't require this. We require instead that the argument be substituted BEFORE evaluation. What are the minimal changes to get to this? //work out in class// Note: in reduction rules, we (almost) always consider only closed terms. Substitution is subtle when the substituend can be open: x[e/x] = e y[e/x] = y (y != x) (e1 e2)[e/x] = (e1[e/x] e2[e/x]) (\x.e1)[e/x] = (\x.e1) [x is never free in \x.e1 !] (\y.e1)[e/x] = \z.((e1(z/y))[e/x]), z fresh -- need to worry abt variable capture if x in fv(e1) and y in fv(e). -- e1(z/y) means switch ALL occurrences of z and y, whether free, binding, or bound. If z is truly fresh, there won't be any occurrences of z in e1. -- when e is closed, no need to rename. For all this to make sense, need two basic facts: Thm ("Progress" aka "Unique Decomposition": If e is a closed term and |-e:t for some t, then exactly one of the following holds: (a) e is a value v, or (b) there exists a unique reduction context K and redex L such that e = K[L], or (c) there exists a unique reduction context K and stuck term S such that e = K[S] (In this case we also say e is stuck). Proof: By induction on the structure of e. Cases: (1) e is a constant. Then it is a value, and it can't be a K[e'] for any e', since those terms are never constants. (2) e is a variable. But this contradicts e is closed. (3) e is an abstraction. Then it is a value, and it can't be either of the other two cases. (4) e is an application (e1 e2). Then e can't be a value, so we have to show that it fits exactly one of the two other possibilities. Apply the IH to e1 and proceed by cases: 4.1) e1 is a value v1 (option a) In that case, let's look at the type rules. If (e1 e2):t, then by [T-app] there must be a t2 such that e1:(t2->t) and e2:t2. The c_n all have type num, so this means that e1 must either be a lambda-abstraction or a Y-value. Now let's look at e2. By IH, apply cases to e2. 4.1.1) e2 is a value v2. Then e is either ((\x:t.e1') v2) or (Y(e1'') v2). In either case e = (v1 v2) is a redex, so set K = [] and L = e = (v1 v2). 4.1.2) e2 is of the form K[L]. Then e = (v1 K[L]). So then e = K'[L], where K' = (v1 K). Could e = (e1 e2) be an instance of option c? If it were, then e2 would have to be of the form K[S], but it isn't, since the induction hypothesis tells us the e2 is exactly one of (a), (b), or (c), and we've assumed it's an (a). 4.2) e1 is of the form K[L]. Then e = (e1 e2) = (K[L] e2) = K'[L], where K' = (K e2). And this is the only possibility, since e1 is not a value, nor is it stuck (same argument as above). 4.3) e1 is K[S]. Similar to 4.2. 5) e is one of the other syntactic forms. These are all similar to either case (3) or case (4). This theorem says that well-typed terms don't stick (unless they run across one of the well-documented "non-type" errors). ==== Exercise 2: a) Write a Scheme procedure redex-of : e -> maybe(K, L) which, given a closed expression e, either returns a pair consisting of a reduction context K and a redex L such that e = K[L], or #f if no such pair exists. Represent K however you want. b) Write a Scheme procedure reduce : e -> maybe(e) which, given a closed expression e, returns an expression e' such that e -> e' (as defined above), or #f if no such expression exists. ==== <> Thm (Subject Reduction, aka Safety, aka "Reduction Preserves Well-Typedness"): If e is a closed term and e:t, and e -> e', then e' is a closed term, and e':t. Proof: Must show this holds for individual reduction rules L->R. Then it works on terms K[L] -> K[R], by induction on K. Note that for the case of beta-reduction, we need a lemma to say that if e:t, x:s, and e':s, then e[e'/x]:t . (Substitution Lemma) This tells us that the reduction of a closed well-typed term e:t looks like: e1 = K1[L1] -> K1[R1] = e2 = K2[L2] -> K2[R2] = e3 = K3[L3] -> By subject reduction, e_i : t, so the progress theorem applies. Hence there are exactly 3 posssibilities: 1. The reduction continues indefinitely. 2. The reduction reaches a value v of thpe t. 3. The reduction reaches a stuck term K[S] ================================================================ Operational Semantics (Big-Step) ================================================================ What does a reduction look like? Key fact: once you get into a reduction context, you stay there until you are done, eg (e1 e2) -> (e11 e2) -> (e13 e2) -> ... -> (v e2). Then we have to work on e2: (v e2) = (op E[L]) (v e2) -> (v e21) -> (v e22) -> ... (v w) and then what happens next depends on v and w. We can build this search strategy into a single relation that goes directly from e to v. We'll have a bunch of rules, one for each possibility of v and w. e => v "e evaluates to v": =========================== ------ v => v e1 => \x.e e2 => v2 e[v2/x] => w ----------------- (e1 e2) => w e1 => (fix v1) e2 => v2 v1 (fix v1) v2 => w ---------------- (e1 e2) => w e1 => c_n ----------------- succ(e1) => c_n+1 etc. e1 => c_0 -------------- zero?(e1) => true e1 => c_n+1 -------------- zero?(e2) => false e0 => true e1 => w -------------------------- if e0 then e1 else e2 => w e0 => false e2 => w -------------------------- if e0 then e1 else e2 => w ================ Would like to prove: if e is a closed, well-typed term, then for any v (e => v) iff (e ->* v) [Notice that => talks only about successful computations. It doesn't distinguish an infinite loop from a stuck term!] Forward direction is easy; straightforward induction on definition of e=>v. //work out in class// Reverse takes more effort. %% The next section is bogus? %% We'll need some lemmas that summarize the discussion above %% Definition: K' extends K iff t %% is the smallest relation on reduction %% contexts with the following properties: %% K extends [] %% if K' extends K, then succ(K') extends succ(K) %% succ(K) extends K %% (if K then e1 else e2) extends K %% (K e) extends K %% (v K) extends K %% etc. %% If K1 extends K2 and K2 extends K3, then K1 extends K3. We'll state what follows for PCF, but it holds for any language set up with redexes and reduction contexts that satisfies Unique Decomposition (ie Progress). <> Lemma 1 (Locality): If K[e] ->* e' then either ------------------ (a) There exists a K' such that e' is K[K'[q]] (remember q ranges over would-be redexes), or (b) there exists a v such that e ->* v and K[v] ->* e' Proof: We'll do this by induction on the length of the derivation K[e] ->* e'. // Actually, all we need is: (a) There exists an e'' such that e ->* e'' and K[e''] ->* e' Base case: If e' is K[e], then by progress, either e is of the form K'[q] or it is a value v. Either way, the conditions of the theorem are satisfied. Otherwise: Apply Progress to e. 1) if e is a value v, then v is our value for case (b). 2) if e = K'[L] , and then we have e -> e'' for some e'', and therefore K[e] -> K[e''] ->* e', where the second derivation is shorter than the first. Applying the induction hypothesis to K[e''] ->* e' we have two possibilities: 2.1) e' is K[K'[q]] for some K' and q, in which case we've got what we want or 2.2) there is some value v such that e'' ->* v and K[v] ->* e'. So we have e -> e'' ->* v and K[v] ->* e', as desired. Puzzle: what are the weakest assumptions under which this theorem holds? Lemma 2 (interpolation lemma): --------------------------------- If (e1 e2) ->* v, then there exist v1 and v2 such that e1 ->* v1, e2 ->* v2, and (v1 v2) ->* v where the first two reduction sequences are shorter than the original and the third is no longer than the original //* need to fix proof to match *// Proof. ----- If e = (e1 e2) ->* v, then either e = (K[l] e2) or e = (v1 e2). Case 1: e = (v1 e2). Then v1 ->* v1 with measure 0. If (v1 e2) ->* v, then we have K[e2] ->* v, where K = (v1 []). Applying the preceding lemma, we conclude that there must be some v2 such that e2 ->* v, and this derivation has a smaller measure than that of (v1 e2) ->* v. Case 2: e = (e1 e2) and e1 is not a value. Then we have e = K[e1], where K = ([] e2). By the lemma, there must exist a v1 such that e1 ->* v1, and this reduction is smaller than e ->* v. Now we are in a position to prove the other half of our theorem: Theorem: If (e ->* v), then (e => v). ======= Proof: By induction on the measure of the reduction e ->* v. If the reduction of size 0, it must be that e = v. Hence e=>v by the rule ------ v => v Otherwise, assume wlog that e = (e1 e2). Hence by the interpolation lemma, there must be v1, v2 such that e1->*v1 and e2->*v2 by smaller reductions. Hence, by the IH, we have e1 => v1 and e2 => v2. As noted above, v1 can't be a constant, otherwise the term would be stuck. So v1 must either be a lambda-abstraction or a Y-value. If v1 a lambda-abstraction (\x:t2.e'), we have (v1 v2) -> e'[v2/x] ->* v The interpolation lemma tells us that the reduction (v1 v2) ->* v is no longer than the original. Therefore the reduction e'[v2/x] ->* v is smaller than the original reduction, so we can apply the IH to deduce that e'[v2/x] => w. Hence e => v by the rule e1 => \x.e' e2 => v2 e'[v2/x] => w ----------------- (e1 e2) => w If it's a Y-value Y(v'), Corollary 2 tells us that (v' Y(v') v2) ->* v by a smaller derivation, so by the IH, (v' Y(v') v2) => v. Hence e=>v by the rule e1 => (fix v') e2 => v2 v' (fix v') v2 => w ---------------- (e1 e2) => w So now we have our first real theorem: Theorem (Equivalence of Big and Small-Step Semantics): ===================================================== (e ->* v) iff (e => v) But of course remember this only tells us about terminating terms. ================================================================ Applications of Operational Semantics ================================================================ Let's consider our old friend Double from lecture 1, which we'll abbreviate as D = Y(M) where M = (lambda (f : int -> int) (lambda (x : int) (if (Z? x) 0 (+1 (+1 (f (-1 x))))))) Proposition: for all n, (D c_n) ->* c_2n. Proof: By induction on n. Base step: Let n=0 (D c_0) ->* c_0 (easy!) Induction Step: (n > 0) (D c_n) = YM c_n -> M Y(M) c_n ->* if (zero? c_n) c_0 (+1 (+1 (YM (-1 c_n)))) ->* if false c_0 (+1 (+1 (YM (-1 c_n)))) -> (+1 (+1 (YM (-1 c_n)))) -> (+1 (+1 (YM c_{n-1}))) -> (+1 (+1 (M YM c_{n-1}))) = (+1 (+1 (D c_{n-1}))) ->* (+1 (+1 (c_2{n-1}))) (**) ->* c_2n The reduction marked (**) is justified as follows: (+1 (+1 (D c_{n-1}))) is K[(D c_{n-1})] for the obvious reduction context K. By the induction hypothesis, (D c_{n-1}) ->* c_2{n-1}. //Easier in big-step?? Let's try it in class// ================ Problem Set 3 ================ Exercise: Consider the terms P1 = Y(\f:int->int->int.\x:int\y:int. if (zero? x) y (+1 (f (-1 x) y))) P2 = Y(\f:int->int->int.\x:int\y:int. if (zero? x) y (f (-1 x) (+1 y))) Prove that for any n and m (P1 c_n c_m) ->* c_{n+m} and (P2 c_n c_m ->* c_{n+m} Exercise (McCarthy's 91-function): For this exercise, let's extend our language by adding constants c_k for all integers k (not just the non-negative integers), and by adding operators for addition and comparison that do the obvious things. Now consider the term P = Y(\f.\x.if (> x 100) (- x 10) (f (f (+ x 11)))) Prove: For any integer n, (P c_n) ->* c_k, where (a) if n > 100, then k = x-10 (b) otherwise, k = 91 ================================================================ Context Machines (Felleisen's CK machines). ================================================================ In the small-step semantics, we don't want to be constantly decomposing and reassembling terms from scratch. Instead want to use a representation for contexts E that helps. Trick: consider _composition_ of contexts: E1 o E2 = E1[E2]. Then composition is associative: (E1 o E2) o E3 = E1 o (E2 o E3) = E1[E2[E3]] Furthermore, [] is a left and right identity for o. So we have a monoid! So instead of writing E ::= [] | (E e) | (op E) we can write E ::= [] | ([] e) o E | (op []) o E (outside-in form) or E ::= [] | E o ([] e) | E o (op []) (inside-out form) These define the same set of contexts, namely E ::= F* where F ::= ([] e) | (op []) and concatenation denotes composition. <> //Hmm, we lied here: we never defined E1[E2] !! Fix this in class// We'll use this trick to represent terms in a way that facilitates searching: We'll represent E[e] as a pair . This representation captures the idea that we are _focussing_ on the e in E[e]. Final states: successful termination stuck term Reduction rule: -> L -> R a reduction rule Focus-down rules (one per F): <(e1 e2), E> -> if e1 is not a value <(op e1), E> -> if e1 is not a value Focus-up rules (one per F): -> <(v e2), E> -> <(op v), E> ================================================================ Generalizing the CK machine ================================================================ We'd like to generalize the construction of the CK machine to any small-step semantics specified using evaluation contexts. For this, Unique Decomposition is not quite enough (I think). To make it work smoothly, you need the additional property: Theorem (Strong Decomposition). For each reduction context K, there exists a unique sequence of frames F1,..,Fn such that K = F1 o F2 o ... o Fn Unique Decomposition follows from Strong Decomposition by induction on the size of e. (assuming that values, stuck terms, and redexes are disjoint). I believe the converse is false, but I haven't been able to come up with a truly convincing counter-example. (Brownie points to anybody who can do this.) If we have Strong Decomposition, then we can give the rules for the context machine as: Reduction rule: -> if L -> R is a reduction rule Focus-up rule -> if v is a value Focus-down rule -> if F[e] is not a value and e is not a value. ================================================================ 1-6: CK machine vs. small-step semantics ================================================================ We'd like to show that e ->* v iff ->* Of course, we'll need a stronger induction hypothesis, since the context machine involves non-empty contexts! Theorem: For any small-step semantics that satisfies Strong Decomposition: (i) ->* iff E[e] ->* v (ii) ->* iff E[e] ->* E'[s] (iii) diverges iff E[e] diverges {Note: we also assume that E = F* as above} Lemma 1 (Safety): If -> , then either E[e] = E'[e'] or E[e] -> E'[e'] (in one step!). Proof. By inspection of the CK rules. The refocus rules leave E[e] unchanged, and the reduction rule corresponds to a single small-step reduction. Note: Lemma 1 is enough to establish the => part of the theorem. The next lemma says that if our focus is above a redex or a stuck term, the CK machine will find it. Remember q ::= l | s . Lemma 2 (Refocus-Down): ->* . Proof: By induction on the size of E1, using the top-down grammar. (Homework 3) The next lemma describes what happens when we are focussed on a value. It says that if we are focussed on a value, then if there is a next redex or stuck term, the CK machine will find it, and if there is no next redex, then the machine will pop all the contexts. Lemma 3 (Refocus-Up): (a) If E[v] = E'[q], then ->* (b) If E[v] is a value, then ->* Proof: We prove each of these by induction on E, using the bottom-up grammar. (1) E = []. Trivial. (2) E = E'' o F. Then the focus-up rule says -> . Unique decomposition tells us that there are 3 possibilities for F[v]: (2.1) If F[v] is a value, then apply the induction hypothesis to E'', which is smaller than E. (2.2) If F[v] = E_3[q], then apply Refocus-Down. {This is also impossible for (b): If F[v] = E_3[q], then (E''oF)[v] = (E''oE_3)[q] which would contradict unique decomposition} Now we are ready to prove the theorem: Theorem: (i) ->* iff E[e] ->* v (ii) ->* iff E[e] ->* E'[s] (iii) diverges iff E[e] diverges Proof: As noted above, the => direction for (i) and (ii) follows immediately from Lemma 1. For the reverse direction, we'll use the following induction hypothesis: IH(k): If E[e] ->* v in < k steps, then ->* , and if E[e] ->* E'[s] in < k steps, then ->* {Note the use of "< k", which makes the base step trivial, and simplifies the induction step}. Once we've proved (i) and (ii), (iii) follows immediately. Let r ::= v | E[s] . These correspond to the two possible ways in which the small-step semantics can halt. Let Rep(r) be defined as Rep(r) = if r = v if r = E[s] Then the induction hypothesis can be rephrased as: IH(k): If E[e] ->* r in < k steps, then ->* Rep(r) . We'll need a lemma: Lemma 4: If IH(k) holds and E[q] ->* r in < k+1 steps, then ->* Rep(r). Proof: If q is s, then r is E[s], so is already Rep(r). If q is a redex L, then E[L] -> E[R] ->( -> ->* Rep(r) by IH(k). QED Lemma 4. Now we can proceed to the induction: Base step (k = 0): Vacuously true: you can't have a reduction with fewer than 0 steps. Induction step: Assume IH(k), and E[e] ->* r in fewer than k+1 steps. We proceed by cases on the unique decomposition of e. (1) e is value. We proceed by cases on the unique decomposition of E[e]. (1.1) E[e] is a value. Then ->* by Lemma 3b. (1.2) E[e] = E1[q] ->( ->* by Lemma 3a ->* Rep(r) by Lemma 4. (2) e is E1[q]. Then E[e] = E[E1[q]] ->( = ->* by Lemma 2 ->* Rep(r) by Lemma 4. QED Theorem. The technique of writing IH(k) as (for all k' < k).P(k') is called "course-of-values" or "strong" induction, and it's very handy. <> So now we have an equivalence between the small-step, big-step, and CK machines: Theorem: ======= The following are equivalent: (1) e ->* v (2) e => v (3) ->* (4) for any K, ->* ================ From Small-Step Semantics to Big-Step Semantics ================ <> If our system satisfies Strong Decomposition, then we can define a big-step semantics by the following rules: (1) for each value v, v => v (2) for each F, e => v F[w] => w ---------- F[e] => w (3) for each elementary reduction L -> R R => w ------- L => w Notice that this is not quite the same as our original big-step semantics. For example, our original big-step semantics has a bunch of rules that look like e1 => \x.e e[e2/x] => v ----------------- (e1 e2) => v e1 => fix e2 (fix e2) => v ---------------- (e1 e2) => v (one for each form of the value of e1) whereas the new one has a single rule e1 => w (w e2) => v ----------- (e1 e2) => v and a bunch of rules e1[e2/x] => w --------------- (\x.e1 e2) => w (e (fix e)) => w ---------------- (fix e) => w These systems "clearly" define the same relation => . In the theorem, we'll use the new one. Theorem: (i) if e => v, then for any E, ->* . (ii) for any E, if ->* , then e => v . Notice the difference in quantification between (i) and (ii). (i) asserts that e => v implies ->* for ANY E. (ii) asserts that if ->* for SOME E, then e => v. Hence the following statements are equivalent: (a) e => v (b) for some E, ->* (c) for any E, ->* Proof of Theorem: (i) by induction on the definition of =>, using Lemma 1 where necessary. Note that the decomposition rule works for F[e] even when e is a value. (ii) We proceed by induction on the number of steps in ->* . As before, IH(k) is If E is a context and ->* in < k steps, then e => v. Base step is vacuously true (no derivations of < 0 steps). Proceed by cases on e: If e is a value v (k=0), then v => v by defn of =>. If e is a redex L, then -> ->* . By IH on the cdr of the derivation, R => v. Therefore L => v by the rule R => w ------- L => w If e is of the form F[e'] where e is not a value, then we have -> ->* By Lemma 2, there must exist some w such that ->* . So the entire reduction sequence looks like -> ->* -> ->* Applying the induction hypothesis to the two subsequences, we conclude that F[e]=>w and F[w]=>v. Hence F[e]=>v by the defn of =>. ================================================================ <> 1.9 Another example. For this section, let's remove Y from the language (!). We'll do one more example because it emphasizes the types (which have been largely absent from our discussions of the CK machine, etc.), and because it introduces an important technique (called Logical Relations) that we will use again later. Theorem: If e is closed term of type int or bool, then there is a v such that e ->* v. As before, the question is how to generalize this proposition so we get an induction hypothesis that goes through. The theorem needs to be generalized in two ways: 1. It talks about terms of base type only, and we'll surely need to reason about terms of other types. 2. It talks about closed terms, and we'll surely need to think about open terms as well. There's a general recipe for this, which is called the technique of "logical relations". We start with some property N_b of closed terms of each base type b. In our case, N_b(e) will be the property that e is a closed term of type b, and e ->* v for some v. (ie "e converges"). We generalize in two steps: 1. First we generalize to terms of higher type by defining: N_{t->u}(e) iff for all closed e':t, if N_t(e'), then N_u((e e')). That is, N_u holds on the application (e e'). This gives a definition of N on any closed term of any type t. It says that a term of function type is good iff whenever you apply it to a good term, the resulting application is good. This makes sense because the application is of smaller type than the original. Think of closed terms that satisfy N as being "good". Among other things, this means that if t = u1 -> u2 -> .. -> un -> b, then a closed term e is good iff (e e1 e2 ... en) converges for every good closed e1,...,en of types u1,..,un. 2. Then we generalize to open terms as follows: If e:t and fv(e) \subset {x1:t1 ,..., xn:tn}. Then define a good substitution for e to be any substitution [u1/x1,...u_n/xn] where each u_i is a good closed term of type t_i. Then we say an open term e is good iff for any good substitution s, e.s is a good closed term. (ie, every good instance of e is good). Now we have a definition of N that works for any well-typed term e. In this case we say N is a "logical relation". Now we'll prove that all well-typed terms are good, that is for all terms e:t (open or closed), e satisfies N_t. We do this by induction on the construction of the derivation of e:t. Theorem: If e is closed term of type int or bool, then there is a v such that e ->* v. Let N be defined as above. We show that if e:t, then N_t(e). We proceed by induction on the derivation of e:t. 1. e is a variable x:t. Then any good substitution for x is a closed term satisfying N_t (by definition of "good substitution"). So x itself satisfies N_t. 2. e is an application (e1 e2), where e1:u->t and e2:u . By induction, we can assume that e1 satisfies N_{u->t} and e2 satisfies N_u. To show that (e1 e2) satisfies N_t, we must show that any good instance of (e1 e2) satisfies N_t. So let s be an N-closing substitution for e. Then s is also an N-closing substitution for e1 and e2 as well. Since e1 satisfies N_{u->t}, so does e1.s (and it's closed!), and similarly e2.s is a closed term that satisfies N_u. Therefore, by definition of N_{u->t}, so does (e1.s e2.s) satisfies N_t. And of course (e1.s e2.s) = (e1 e2).s. We've shown that any N-closing instance of (e1 e2) satisfies N_t. Hence (e1 e2) satisfies N_t. 3. e is an abstraction (\x:t1.e'), so t = t1 -> u and e':u. As before, let s be an N-closing substitution for e. We must show that (\x:t1.e').s satisfies N_{t1->u}. To do this, we must show that ((\x:t1.e').s e1) satisfies N_u for any closed e1 satisfying N_t1. How do we do that? Let u = t2 -> t3 -> ... b . By the remark above, we must show that for any e2:t2,..en:tn that are all closed and all satisfy N, (((\x:t1.e').s e1) e2 ... en) converges. (*) Well, (((\x:t1.e').s e1) e2 ... en) -> (e'.s[e1/x] e2 ... en) But wait! If s is an N-closing substitution for \x.e, and e1 satisfies N_t1, then s[e1/x] is an N-closing substitution for e':u. BY THE INDUCTION HYPOTHESIS, e' satisfies N_u. Hence e'.(s[e1/x]) satisfies N_u. Hence for any N-closed e1,...,en of appropriate types, (e'.(s[e1/x]) e2 ...en) converges. Since the term in (*) takes one step and gets to this converging term, the term in (*) converges also, as desired (!!) 4. Now all we have to do is deal with the constants. c_k : int -- converges zero? : int -> bool. Let e be an N-closed term of type int. Then (zero? e) converges, because (by defn of N_int) e converges to some c_k. Hence (zero? e) ->* (zero? c_k) -> {true | false}. +1 and -1 are similar. [WHAT ABOUT (-1 c_0) ??] if -- must show that if e0:bool e1:int e2:int are all N-closed terms, then (if e0 e1 e2) converges. Then e0 converges by defn of N_bool, so (if e0 e1 e2) ->* (if {t|f} e1 e2) -> e1 | e2 , either of which converges by defn of N_int. This is a very common pattern of proof. Note that Y breaks this. Of course it makes the theorem false, but let's see how the proof breaks (always instructive!) Y : (t -> t) -> t as before, let t = t1 -> t2 -> t3 -> ... -> tn -> b. We must show that for any N-closed e1,..,en of appropriate types, and any N-closed term e of type t -> t, Y(e e1 e2 ... en) converges. But looking at the reduction, we get: Y(e e1 e2 ... en) -> (e Y(e) e1 e2 ... en) e is N-closed, but that only tells us what happens when e is applied to an N-closed term. We don't know whether Y(e) satisfies N, so we're stuck. <> **************** This is as far as I got **************** 2. Denotational Semantics 2.1 Limitations of Operational Semantics Operational Semantics is good for figuring out what any particular whole program does, and therefore it is useful for thinking about implementations, but it is less good for determining properties of classes of programs. Consider the following statement: - The terms P1 and P2 above are interchangeable in any context. How might we go about proving such a thing? If we only had to deal with reduction contexts, we could probably proceed as we did above. But we also have to consider contexts like (M P1) vs (M P2) for complicated terms M. We would need to consider "tracking" P1 and P2 throughout their passage through M. We'd have to consider generalizations as they are partially substituted-in, etc, and we'd have (+1 (f (-1 e1) e2) in one term, and (f (-1 e1') (+1 e2')) in the other... yuck! Even simpler, let e be a term containing x free. Then e and (\x.e x) are interchangeable in any context. This is true, but it is quite difficult to show in an operational framework. What we are missing is a framework for reasoning about program _fragments_. What we'd like is a _compositional_ framework, in which we can say: if e1 and e2 are equivalent program fragments, then C[e1] and C[e2] are equivalent programs for any context C (not just evaluation contexts!). We say a definition is _compositional_ iff the meaning of any expression depends only on the meaning of its immediate subexpressions. If meaning is defined in this way, then if e1 and e2 have the same meaning, then C[e1] and C[e2] are guaranteed to have the same meaning, since for any C, the meaning of C[e] depends only on the meaning of e. We already have such a framework for the notion of "meaning" as "type", and the notion of equivalence "has the same type". We got a lot of mileage out of that. Our next step is to do this exercise for a finer notion of equivalence. ================================================================ 2.1 A Simple Language: REC Before doing this in general, let's try a simple language: the language REC of Winkskel chapter 9. Do: Sec 9.1, 9.2. Note 9.2 is a bigstep semantics. Note this language is powerful enough to do the examples from PS3: e1 = if (zero? x) y (+1 (f (-1 x) y)) vs e2 = if (zero? x) y (f (-1 x) (+1 y)) Are these two equivalent in ANY context, ie, can we take letrec f(x,y) = e1 (g = e) ... in e' and replace it by letrec f(x,y) = e2 (g = e) ... in e' be guaranteed to get the same behavior, regardless of what's in the g's, e's or e' . Our operational reasoning doesn't help us much with this. Want a _denotational_ model. Roughly speaking, we want a function that maps each term to its meaning and is compositional. Need to model inherited attributes: function environments p and value environments r, Want a valuation E so that for any e E[e]pr = n iff e.r => n But what about non-termination? Model this making the range of E[e]fr not the integers N but N_bot -- the integers with a special element bot adjoined to it. Then we can get a theorem like this, so that if e.r fails to terminate, then E[e]pr will be equal to bottom. So a procedure will have a denotation in [N -> Nbot]. We can think of this in two ways. (1) as a partial function, ie a set of ordered pairs (n,m) where for each n there is at most one m s.t. (n,m) is in the set. There is a natural order structure on partial functions defined in this way: the subset ordering: f <= g iff [for all n, if f(n) defined then g(n) is defined and equal to f(n)] (2) Consider Nbot itself as an ordered structure, with bot <= n for all n, and order [N -> Nbot] by f <= g iff [for all n, f(n) <= g(n)] Each way gives an order structure of [N -> Nbot], and these are equivalent. If we declare f(x,y)= if (zero? x) y (+1 (f (-1 x) y)) Then the right-hand side by itself defines a function \v0 in [N->Nbot]\(v1in N,v2 in N). E[[if (zero? x) y (+1 (f1 (-1 x) y))]] [f = v0] [x = v1, y = v2] ie, a function [N -> Nbot] -> [N -> Nbot] Let's call this function F. If we intend this declaration to be a letrec, we would expect that the procedure f should be bound to some value v0 such that v0 = F(v0) Of course for some right-hand sides, there may be many solutions to this equation, eg consider the declaration f(x) = f(x) So we'll need to figure out _which_ solution to adopt. For this simple language, this can all be worked out in the partial function semantics (see, eg Manna, Theory of Computation, 1974). But we'll need the whole cpo story soon enough, so we might as well bite the bullet and do it right the first time through. ================================================================ 3. Domain Theory -- Winskel Chapter 8 4. The Operational and Denotational Semantics of REC -- using Winskel, Chapter 9, using the eager (cbv) version of REC. ================================================================ 5. Denotational Semantics of PCF. We'll do the lazy (cbn) version of PCF first, because we that's what we did the operational semantics for. (Also, it's a little easier, and it's what Plotkin did, though we will not use his proof). 5.1 The Standard Model of PCF First we define the _standard model_ of PCF. We first give the semantics of each type: [[int]] = N_bot [[bool]] = B_bot (the three-point cpo) [[t -> u]] = [ [[t]] -> [[u]] ] (the set of all continuous functions from [[t]] to [[u]] For any PCF term e, the set of environments for e, notated Env_e, is the set of all associations {(x1, v1),..., (xn, vn)} such that fv(e) \superset {x1,..,xn} and for each i, if x_i is a variable of type t_i, then v_i \in [[t_i]]. Note that Env_e has an order given by its product structure: r1 <= r2 iff for all x in dom(r1), r1(x) <= r2(x) To each term e of type t, we assign a continuous function [[e]] \in [Env_e -> [[t]]] as follows: to every constant c of type t, we choose [[c]] \in [[t]]. We'll make particular choices of these later, but for now any value in [[t]] will do. [[x]]r = r(x) [[(e1 e2)]]r = ([[e1]]r)([[e2]]r) [[\x.e]]r = f, where f is defined by: for all d in [[t]], f(d) = [[e]](r[x=d]) where x is a variable of type t. PROVIDED this f is continuous. The first thing we have to do is to check that this definition actually gives us a continuous function for each e. This is a little trickier than you might expect: First you have to check that each recursive call [[e']]r' in the definition of [[e]] actually gives [[e']] an argument r' that is an environment for e'. (That's easy). Second, in the abstraction case, we have to check both that each f is continuous in d, and that the whole denotation [[\x.e]] is continuous in r. Let's check these. In each case, let R = be an omega-chain of environments, with limit r*. We want to show that for each e, [[e]](lub R) = lub {[[e]]r | r in R} [[x]](lub R) = (lub R)(x) = lub {r(x) | r in R} = lub {[[x]]r | r in R}. [[(e1 e2)]](lub R) = ([[e1]](lub R)) ([[e2]](lub R)) = (lub_i ([[e1]]r_i)) ([[e2]](lub R)) [IH at e1] = lub_i {([[e1]]r_i)(lub_j [[e2]]r_j) [IH at e2] = lub_i {lub_j {([[e1]]r_i)([[e2]]r_j)} [cont of [[e1]]r] = lub_i {([[e1]]r_i)([[e2]]r_i) } by the diamond lemma, with v_ij = ([[e1]]r_i)([[e2]]r_j). = lub_i {[[(e1 e2)]]r_i} For the abstraction case, observe the following: Consider the function ext_x : Env(\x.e) * [[t]] -> Env_e defined by ext_x (r, d) = r[x=d] This is easily seen to be continuous in both r and d. That is, for any r and d, ext_x(r,_) and ext_x(_,d) are both continuous So we have [[\x.e]]r = f where f is defined by f(d) = [[e]](ext_x(r,d)) So f = [[e]] o ext_x(r, _) is continuous for any particular r, since it is the composition of continuous functions. Now let's check that [[\x.e]] is continuous in r: [[\x.e]](lub R) = [[e]] o ext_x(lub R, _) = [[e]] o lub_i ext_x(r_i, _) ?? = lub_i ([[e]] o ext_x(r_i, _)) since composition is cont. = lub_i {\d.[[e]](ext_x(r_i,d)) Conclusion: for each e:t, [[e]] is a continuous function from Env(e) to [[t]]. Now all we need to do is to assign denotations to each of the constants. But this is easy: [[c_n]] = lift(n) \in N_bot = [[int]] [[+1]] = strict(\n. lift(n+1)) \in [[int]] -> [[int]] [[true]] = lift(truth) \in [[bool]] [[if]] = strict(\b. if b then \xy.x else \xy.y) and last (but not least!-) [[Y_t]] = fix \in [ [[t]] -> [[t]] ] -> [[t]] = [[(t->t) -> t]] 5.2 Comparing the operational and denotational semantics of CBN PCF. We'll use the big-step operational semantics.