================================================================
1-1.  Main Questions in Semantics
================================================================

Goal:  Build _models_ of the behavior of programming languages.

We have good tools for doing this.  

First part of course: familiarity with the tools.  Key is
understanding relationships between different models of the same
thing.

Second part: applications.

Key questions:


             Syntax: What _is_ a program?
                    |
                    |
                    |
                    |
             Operational Semantics: What does a program compute?
                   /\
                  /  \
                 /    \
                /      \
               /        \
              /          \
             /            \
            /              \
Equational Theories:        Denotational Semantics:
When are two expressions    What does an expression mean?
equivalent?        


Operational Semantics:

Big Step:  p => v                  Evaluation Semantics  
                                              (often p \Downarrow v)
Small Step: p1 -> p2 -> ... -> v   Reduction Semantics

p a program, v a "value" (Winskel calls them "canonical forms")

Would like to show equivalence:

p ->* v iff p => v.

Note:  these could be non-deterministic.


================================================================
1-2. Case Study: PCF.
================================================================

We'll try to stay close to Gunter, chapter 4.

Syntax:
======

Types: t,u ::= num | bool | t -> u

num and bool are called "ground types" or "scalar types"

Write t1 -> t2 -> t3 for t1 -> (t2 -> t3).

Expressions or Terms :

e ::=  c_n    (for each non-negative integer n)
     | succ(e) | pred (e)
     | true | false | zero?(e) | if e then e else e
     | x | \x:t. e | e e 
     | Y(e)    

When in doubt, we always interpret grammars as defining a set of finite
trees ("abstract syntax trees").

We'll put in plenty of parens to help you parse.

Write:

   e1 e2 e3 for (e1 e2) e3

   \x1:t1,x2:t2.e for \x1:t1.(\x2:t2.e)

In general, dots extend as far to the right as possible, unless they
obviously don't :)


Typing Rules:
============

Typing Environments H:
---------------------

   H ::= [] | H,x:t

Interpret H as a partial function from variables x to types t by

   H,x:t (x') = t     if x' is the same variable as x
               = H(x)  otherwise

Notice this makes [](x) undefined for all x.


Typing Judgements:   H |- e : t
-----------------

"Under typing assumptions H, the expression e has type t"

[T-const]      H |- c_n : num

[T-Succ]      H |- e : num
              ------------------
              H |- succ(e) : num

[T-Pred]      H |- e : num
              ------------------
              H |- pred(e) : num


[T-True]      H |- true : bool
[T-False]     H |- false : bool

[T-zero?]     H |- e : num
              --------------------
	      H |- zero?(e) : bool

[T-if]        H |- e0 : bool   H |- e1 : t   H |- e2 : t
	      ------------------------------------------
	      H |- if e0 then e1 else e2 : t

[T-var]	      H(x) = t
 	      ----------
	      H |- x : t

[T-lambda]    H,x:t1 |- e : t2
	      ------------------------
	      H |- \x:t1. e : t1 -> t2

[T-app]	      H |- e0 : (t1 -> t2)     H |- e1 : t1
	      -------------------------------------
	      H |- (e0 e1) : t2

[T-fix]	      H |- e : (t->t)
              ---------------
	      H |- Y(e) : t
    
{The rules T-var, T-lambda, and T-app are in Table 2.1, rather than in
Table 4.1 in my copy of Gunter.}

We assert "H |- e : t" iff H |- e:t is derivable from these axioms and
rules in a finite number of steps.

If you want to get technical, a typing judgement is just a triple (H,
e, t), and these rules are just inductive rules putting a triple 
(H, e, t) in a set "|-".  But we don't usually need this level of
gymnastics. 

====

Exercise #1:  prove:  If H |- e:t and H |- e:t' .  Write a Scheme
procedure

type-of : H * e -> (maybe t)

which, given an H and an e, returns the unique t such that H|-e:t, or
#f otherwise.

====

There are lots of different ways to set this up.  For example, Gunter doesn't
have c_k for every k; he just has 0, succ(0), succ(succ(0)), etc.

Standard definition of free variables.  An expression e is closed iff
it has no free variables.

Definition: A program is a closed term of ground type.

<<end Lect 1, 9/5/06>>

================================================================
1-3. Operational Semantics (Small-Step)
================================================================

Reduction rules

succ(c_m)       -> c_{m+1}
pred(c_{m+1})   -> c_m       (m >= 0.  Note: (-1 c_0) "sticks").
zero? c_0      -> true
zero? c_n      -> false     (n != 0)

if true  e1 e2 -> e1
if false e1 e2 -> e2

(\x.e1) e2     -> e1[e2/x]  (beta-reduction)

Y(e1) e2       -> (e1 Y(e1)) e2      [Red fix]


Example: consider

double = Y(M)

where M = \f:num->num.\x:num.if zero?(x) 
                              0 
			      (succ(succ(f(pred x)))))


We'll write 2 for c_2, etc.
 
So the reduction rule [Red Fix] says

(double e) 
= (Y(M) e) 
-> ((M Y(M)) e)
-> ((M double) e)
-> if zero?(e) then 0 else succ(succ(double(pred e)))

So we have:

(double 2)


-> (M[double/f] 2)

-> if zero?(2) then 0 else succ(succ(double(pred 2)))

-> succ(succ(double(pred 2)))

-> succ(succ(double 1))

-> succ(succ(if zero?(1) then 0 else succ(succ(double(pred 1)))))

-> succ(succ(succ(succ(double(pred 1)))))

-> succ(succ(succ(succ(double 0))))

-> succ(succ(succ(succ(if zero?(0) then 0 else succ(succ(double(pred 0)))))))

-> succ(succ(succ(succ(0))))

-> (succ(succ(succ(1))))

-> (succ(succ(2)))

-> (succ(3))

-> 4

;; whew!!

<<end lecture 1, 9/10/08>>

We had lots of choices about which reductions to apply.  If we had
done it differently:

Could we have gotten a different answer?

Could we have gotten an infinite reduction sequence?

This tells us the next thing we need to do is to specify the contexts
in which we can apply these reductions.  

We'll make some design choices here, but the framework is very
general.  The setup is due to Felleisen.

Define:

Redexes             l ::= the left-hand sides of the reduction rules
Reduction Contexts  K ::= [] 
                       | succ(K) | pred(K) | zero?(K)
		       | if K then e else e
                       | (K e) | (v K)
		       | Y(K)

Values              v ::= c_n | \x.e | Y(v)
Stuck terms         s ::= pred(0)      [only 1 stuck term in this language]
Would-be Redexes:   q ::= l | s

Reduction contexts tell us where in the term we can apply a reduction
rule.  (People often use E to range over reduction contexts).

Values are terms that are "done". 

Stuck terms are terms with "run-time errors"-- ie those that are not
detected by the type-checker.  

Now we can define the reduction semantics:

if (L, R) is a reduction rule, 
then K[L] -> K[R].

To be completely rigorous about this, we need to define plugging a
term into a context -[-].  This is done by induction on the
construction of contexts K:

[][e] = e
succ(K)[e] = succ(K[e])
etc!

Notice that in our language, reduction contexts don't bind, so K[e] is
the same as K[e/[]] .

We will likely come back to this later.


<<end lecture 2, 9/7/06>>

Note:  this is a _substitution semantics_, not an environment
semantics.


When we restrict the reduction contexts, we'll also restrict some of
the reduction rules so that they don't fire "prematurely".

So instead of

(\x.e1) e2  ->  e1[e2/x]  (beta)

we'll have:

((\x.e) v) -> e[v/x]  (beta-value, \beta_v)

and instead of

Y(e1) e2        -> (e1 Y(e1)) e2      [rec]

we'll have

Y(v1) v2        -> (v1 Y(v1)) v2      [rec-val]

{Note that Y(v1) is automatically a value}

Note: this is a call-by-value (or "eager") semantics, because it
requires that we reduce the argument to a function to a value before
we can substitute it in the function.

In "call-by-name" or "lazy" evaluation, we don't require this.  We
require instead that the argument be substituted BEFORE evaluation.

What are the minimal changes to get to this?  //work out in class//

Note: in reduction rules, we (almost) always consider only closed terms.

Substitution is subtle when the substituend can be open:

x[e/x]             = e
y[e/x]             = y  (y != x)
(e1 e2)[e/x]       = (e1[e/x]  e2[e/x])
(\x.e1)[e/x]       = (\x.e1)       [x is never free in \x.e1 !]
(\y.e1)[e/x]       = \z.((e1(z/y))[e/x]),     z fresh

            -- need to worry abt variable capture if x in fv(e1) and y
               in fv(e).

            -- e1(z/y) means switch ALL occurrences of z and y,
               whether free, binding, or bound.  If z is truly fresh,
               there won't be any occurrences of z in e1.

            -- when e is closed, no need to rename.


For all this to make sense, need two basic facts:

Thm ("Progress" aka "Unique Decomposition": If e is a closed term and
|-e:t for some t, then exactly one of the following holds:

     (a) e is a value v, or
     (b) there exists a unique reduction context K and redex L such
     that e = K[L], or
     (c) there exists a unique reduction context K and stuck term S
     such that e = K[S] (In this case we also say e is stuck). 

Proof: By induction on the structure of e.  Cases:

 (1) e is a constant.  Then it is a value, and it can't be a K[e'] for
 any e', since those terms are never constants.
 (2) e is a variable.  But this contradicts e is closed.
 (3) e is an abstraction.  Then it is a value, and it can't be either
 of the other two cases.
 (4) e is an application (e1 e2).  Then e can't be a value, so we have
 to show that it fits exactly one of the two other possibilities.

 Apply the IH to e1 and proceed by cases:

   4.1) e1 is a value v1 (option a) In that case, let's look at the type
   	rules.  If (e1 e2):t, then by [T-app] there must be a t2 such
   	that e1:(t2->t) and e2:t2.  The c_n all have type num, so this
   	means that e1 must either be a lambda-abstraction or a
   	Y-value.  

	Now let's look at e2.  By IH, apply cases to e2.

	4.1.1) e2 is a value v2.  Then e is either ((\x:t.e1') v2) or
	(Y(e1'') v2).  In either case e = (v1 v2) is a redex, so set K
	= [] and L = e = (v1 v2).

	4.1.2) e2 is of the form K[L].  Then e = (v1 K[L]).  So then e
	= K'[L], where K' = (v1 K).  Could e = (e1 e2) be an instance
	of option c?  If it were, 
	then e2 would have to be of the form K[S], but it isn't, since
	the induction hypothesis tells us the e2 is exactly one of
	(a), (b), or (c), and we've assumed it's an (a).

   4.2) e1 is of the form K[L].  Then e = (e1 e2) = (K[L] e2) = K'[L],
   where K' = (K e2).  And this is the only possibility, since e1 is
   not a value, nor is it stuck (same argument as above).

   4.3) e1 is K[S].  Similar to 4.2.

 5) e is one of the other syntactic forms.  These are all similar to
 either case (3) or case (4).


This theorem says that well-typed terms don't stick (unless they run
across one of the well-documented "non-type" errors).

====

Exercise 2: a) Write a Scheme procedure

redex-of : e -> maybe(K, L)

which, given a closed expression e, either returns a pair consisting
of a reduction context K and a redex L such that e = K[L], or #f if
no such pair exists.  Represent K however you want.

b) Write a Scheme procedure

reduce : e -> maybe(e)

which, given a closed expression e, returns an expression e' such that
e -> e' (as defined above), or #f if no such expression exists.

====

<<end lect 2, 9/11/08>>

Thm (Subject Reduction, aka Safety, aka "Reduction Preserves
Well-Typedness"): If e is a closed term and e:t, and e -> e', then e'
is a closed term, and e':t.

Proof: Must show this holds for individual reduction rules L->R.  Then
it works on terms K[L] -> K[R], by induction on K.

Note that for the case of beta-reduction, we need a lemma to say that
if e:t, x:s, and e':s, then e[e'/x]:t .  (Substitution Lemma)


This tells us that the reduction of a closed well-typed term e:t looks
like:

e1 = K1[L1] -> K1[R1] = e2 = K2[L2] -> K2[R2] = e3 = K3[L3] ->

By subject reduction, e_i : t, so the progress theorem applies.

Hence there are exactly 3 posssibilities:

1.  The reduction continues indefinitely.
2.  The reduction reaches a value v of thpe t.
3.  The reduction reaches a stuck term K[S]


================================================================
Operational Semantics (Big-Step)
================================================================

What does a reduction look like?  Key fact: once you get into a
reduction context, you stay there until you are done, eg

(e1 e2) -> (e11 e2) -> (e13 e2) -> ... -> (v e2).

Then we have to work on e2:

(v e2) = (op E[L])

(v e2) -> (v e21) -> (v e22) -> ... (v w)

and then what happens next depends on v and w.

We can build this search strategy into a single relation that goes
directly from e to v.  We'll have a bunch of rules, one for each
possibility of v and w.

e => v  "e evaluates to v":
===========================

------
v => v

e1 => \x.e
e2 => v2
e[v2/x] => w
-----------------
(e1 e2) => w

e1 => (fix v1)
e2 => v2
v1 (fix v1) v2 => w
----------------
(e1 e2) => w

e1 => c_n
-----------------
succ(e1) => c_n+1

etc.

e1 => c_0
--------------
zero?(e1) => true

e1 => c_n+1
--------------
zero?(e2) => false

e0 => true
e1 => w
--------------------------
if e0 then e1 else e2 => w

e0 => false
e2 => w
--------------------------
if e0 then e1 else e2 => w

================

Would like to prove:

if e is a closed, well-typed term, then for any v

(e => v) iff (e ->* v)

[Notice that => talks only about successful computations.  It doesn't
distinguish an infinite loop from a stuck term!]

Forward direction is easy; straightforward induction on definition of
e=>v. 

//work out in class//

Reverse takes more effort.  

%% The next section is bogus?
%% We'll need some lemmas that summarize the discussion above

%% Definition:  K' extends K iff t

%% is the smallest relation on reduction
%% contexts with the following properties:

%% K extends []

%% if K' extends K, then succ(K') extends succ(K)

%%  succ(K) extends K

%% (if K then e1 else e2) extends K

%% (K e) extends K

%% (v K) extends K

%% etc.

%% If K1 extends K2 and K2 extends K3, then K1 extends K3.

We'll state what follows for PCF, but it holds for any language set up
with redexes and reduction contexts that satisfies Unique
Decomposition (ie Progress).

<<end lecture 9/15/08>>

Lemma 1 (Locality): If K[e] ->* e' then either
------------------

  (a) There exists a K' such that e' is K[K'[q]]  (remember q ranges
  over would-be redexes), or

  (b) there exists a v such that e ->* v and K[v] ->* e'

Proof: We'll do this by induction on the length of the derivation K[e]
->* e'.  

// Actually, all we need is: 

   (a) There exists an e'' such that e ->* e'' and K[e''] ->* e'

Base case:

    If e' is K[e], then by progress, either e is of the form K'[q] or
    it is a value v.  Either way, the conditions of the theorem are
    satisfied. 

Otherwise:  Apply Progress to e.

  1) if e is a value v, then v is our value for case (b).  

  2) if e = K'[L] , and then we have e -> e'' for some e'', and
  therefore K[e] -> K[e''] ->* e', where the second derivation is
  shorter than the first.

    Applying the induction hypothesis to K[e''] ->* e' we have two
    possibilities:

    2.1) e' is K[K'[q]] for some K' and q, in which case
    we've got what we want 
    or
    2.2) there is some value v such that e'' ->* v and K[v] ->* e'.
    So we have e -> e'' ->* v and K[v] ->* e', as desired.

Puzzle:  what are the weakest assumptions under which this
theorem holds?


Lemma 2 (interpolation lemma):
---------------------------------

If (e1 e2) ->* v, then there exist v1 and v2 such that 

e1 ->* v1, 

e2 ->* v2,

and

(v1 v2) ->* v

where the first two reduction sequences are shorter than the original
and the third is no longer than the original

//* need to fix proof to match *//

Proof.
-----

If e = (e1 e2) ->* v, then either e = (K[l] e2) or e = (v1 e2).

Case 1: e = (v1 e2).  Then v1 ->* v1 with measure 0.  If (v1 e2) ->*
v, then we have K[e2] ->* v, where K = (v1 []).  Applying the
preceding lemma, we conclude that there must be some v2 such that e2
->* v, and this derivation has a smaller measure than that of  (v1 e2)
->* v.

Case 2: e = (e1 e2) and e1 is not a value.  Then we have e = K[e1],
where K = ([] e2).  By the lemma, there must exist a v1 such that e1
->* v1, and this reduction is smaller than e ->* v.

Now we are in a position to prove the other half of our theorem:

Theorem:  If (e ->* v), then (e => v).
=======

Proof:  By induction on the measure of the reduction e ->* v.

If the reduction of size 0, it must be that e = v.  Hence e=>v by the rule

------
v => v

Otherwise, assume wlog that e = (e1 e2).  Hence by the interpolation
lemma, there must be v1, v2 such that e1->*v1 and e2->*v2 by smaller
reductions.  Hence, by the IH, we have e1 => v1 and e2 => v2.

As noted above, v1 can't be a constant, otherwise the term would be
stuck.  So v1 must either be a lambda-abstraction or a Y-value.

If v1 a lambda-abstraction (\x:t2.e'), we have

(v1 v2) -> e'[v2/x] ->* v

The interpolation lemma tells us that the reduction (v1 v2) ->* v is
no longer than the original.  Therefore the reduction e'[v2/x] ->* v
is smaller than the original reduction, so we can apply the IH to
deduce that e'[v2/x] => w.  Hence e => v by the rule


e1 => \x.e'
e2 => v2
e'[v2/x] => w
-----------------
(e1 e2) => w

If it's a Y-value Y(v'), Corollary 2 tells us that (v' Y(v') v2) ->* v
by a smaller derivation, so by the IH, (v' Y(v') v2) => v.  Hence e=>v
by the rule

e1 => (fix v')
e2 => v2
v' (fix v') v2 => w
----------------
(e1 e2) => w


So now we have our first real theorem:

Theorem (Equivalence of Big and Small-Step Semantics):  
=====================================================

(e ->* v) iff (e => v)

But of course remember this only tells us about terminating terms.


================================================================
Applications of Operational Semantics
================================================================

Let's consider our old friend Double from lecture 1, which we'll
abbreviate as D = Y(M) where

M = (lambda (f : int -> int)
      (lambda (x : int)
        (if
          (Z? x)
          0
          (+1 (+1 (f (-1 x)))))))


Proposition:  for all n, (D c_n) ->* c_2n.

Proof: By induction on n.

Base step: 

Let n=0

(D c_0) ->* c_0  (easy!)

Induction Step:  (n > 0)

(D c_n) = YM c_n

-> M Y(M) c_n

->* if (zero? c_n) c_0 (+1 (+1 (YM (-1 c_n))))

->* if false       c_0 (+1 (+1 (YM (-1 c_n))))

->  (+1 (+1 (YM (-1 c_n))))

->  (+1 (+1 (YM c_{n-1})))

->  (+1 (+1 (M YM c_{n-1})))

=   (+1 (+1 (D c_{n-1})))

->* (+1 (+1 (c_2{n-1})))    (**)

->* c_2n

The reduction marked (**) is justified as follows:

(+1 (+1 (D c_{n-1}))) is K[(D c_{n-1})] for the obvious reduction
context K.  By the induction hypothesis, (D c_{n-1}) ->* c_2{n-1}.  


//Easier in big-step??  Let's try it in class//

================ Problem Set 3 ================

Exercise:  Consider the terms

P1 = Y(\f:int->int->int.\x:int\y:int. if (zero? x) y (+1 (f (-1 x) y)))
P2 = Y(\f:int->int->int.\x:int\y:int. if (zero? x) y (f (-1 x) (+1 y)))

Prove that for any n and m

(P1 c_n c_m) ->* c_{n+m}
and
(P2 c_n c_m ->* c_{n+m}

Exercise (McCarthy's 91-function):  For this exercise, let's extend
our language by adding constants c_k for all integers k (not just the
non-negative integers), and by adding operators for addition and
comparison that do the obvious things.

Now consider the term 

P = Y(\f.\x.if (> x 100) (- x 10) (f (f (+ x 11))))


Prove:  For any integer n, (P c_n) ->* c_k, where

  (a) if n > 100, then k = x-10
  (b) otherwise, k = 91


================================================================
Context Machines (Felleisen's CK machines).
================================================================

In the small-step semantics, we don't want to be constantly
decomposing and reassembling terms from scratch.

Instead want to use a representation for contexts E that helps.

Trick:  consider _composition_ of contexts:

E1 o E2 = E1[E2].

Then composition is associative:

(E1 o E2) o E3 = E1 o (E2 o E3) = E1[E2[E3]]

Furthermore, [] is a left and right identity for o.  So we have a
monoid!

So instead of writing

E ::= [] | (E e) | (op E)

we can write

E ::= [] | ([] e) o E | (op []) o E    (outside-in form)
or
E ::= [] | E o ([] e) | E o (op [])    (inside-out form)

These define the same set of contexts, namely

E ::= F*
where
F ::= ([] e) | (op [])

and concatenation denotes composition.

<<end lecture 9/17/08; 258 lines>>
//Hmm, we lied here: we never defined E1[E2] !!  Fix this in class//

We'll use this trick to represent terms in a way that facilitates
searching:  We'll represent E[e] as a pair <e, E>.  This
representation captures the idea that we are _focussing_ on the e in
E[e].

Final states:

<v, []>    successful termination
<s, E>     stuck term

Reduction rule:
<L, E> -> <R, E>     L -> R a reduction rule

Focus-down rules (one per F):
<(e1 e2), E>  -> <e1, E o ([] e2)>   if e1 is not a value
<(op e1), E>  ->  <e1, E o (op [])>  if e1 is not a value

Focus-up rules (one per F):
<v, E o ([] e2)> -> <(v e2), E>
<v, E o (op [])> -> <(op v), E>

================================================================
Generalizing the CK machine
================================================================

We'd like to generalize the construction of the CK machine to any
small-step semantics specified using evaluation contexts.

For this, Unique Decomposition is not quite enough (I think).  To make
it work smoothly, you need the additional property:

Theorem (Strong Decomposition).  For each reduction context K, there
exists a unique sequence of frames F1,..,Fn such that

       K = F1 o F2 o ... o Fn

Unique Decomposition follows from Strong Decomposition by induction on
the size of e. (assuming that values, stuck terms, and redexes are
disjoint). 

I believe the converse is false, but I haven't been able to come up
with a truly convincing counter-example.  (Brownie points to anybody
who can do this.)

If we have Strong Decomposition, then we can give the rules for the
context machine as:

Reduction rule:
<L, E> -> <R, E>     if L -> R is a reduction rule

Focus-up rule
<v, E o F> -> <F[v], E>  if v is a value

Focus-down rule
<F[e], E>  -> <e, E o F>   if F[e] is not a value and e is not a value.


================================================================
1-6: CK machine vs. small-step semantics
================================================================

We'd like to show that 

e ->* v iff <e, []> ->* <v, []>

Of course, we'll need a stronger induction hypothesis, since the
context machine involves non-empty contexts!

Theorem:  

For any small-step semantics that satisfies Strong Decomposition:

   (i)  <e, E> ->* <v, []> iff E[e] ->* v
   (ii) <e, E> ->* <s, E'> iff E[e] ->* E'[s]
   (iii) <e, E> diverges iff E[e] diverges

{Note: we also assume that E = F* as above}

Lemma 1 (Safety):  If <e, E> -> <e', E'>, then either E[e] = E'[e'] or
E[e] -> E'[e']  (in one step!).

Proof.  By inspection of the CK rules.  The refocus rules leave E[e]
unchanged, and the reduction rule corresponds to a single small-step
reduction. 

Note:  Lemma 1 is enough to establish the => part of the theorem.

The next lemma says that if our focus is above a redex or a stuck
term, the CK machine will find it.

Remember q ::= l | s .

Lemma 2 (Refocus-Down): <E1[q], E> ->* <q, E o E1>.

Proof: By induction on the size of E1, using the top-down
grammar. (Homework 3)


The next lemma describes what happens when we are focussed on a
value.  It says that if we are focussed on a value, then if there is a
next redex or stuck term, the CK machine will find it, and if there is no next
redex, then the machine will pop all the contexts.

Lemma 3 (Refocus-Up):

  (a) If E[v] = E'[q], then <v, E> ->* <q, E'>

  (b) If E[v] is a value, then <v, E> ->* <E[v], []>

Proof: We prove each of these by induction on E, using the bottom-up grammar.  

  (1) E = [].  Trivial.

  (2) E = E'' o F.  Then the focus-up rule says <v, E> -> <F[v], E''>.
      Unique decomposition tells us that there are 3 possibilities for
      F[v]:
     
      (2.1) If F[v] is a value, then apply the induction hypothesis to
            E'', which is smaller than E.

      (2.2) If F[v] = E_3[q], then apply Refocus-Down. {This is also
            impossible for (b):  If F[v] = E_3[q], then (E''oF)[v] =
            (E''oE_3)[q] which would contradict unique decomposition}
      

Now we are ready to prove the theorem:

Theorem:  

   (i)  <e, E> ->* <v, []> iff E[e] ->* v
   (ii) <e, E> ->* <s, E'> iff E[e] ->* E'[s]
   (iii) <e, E> diverges iff E[e] diverges

Proof: As noted above, the => direction for (i) and (ii) follows
immediately from Lemma 1.  For the reverse direction, we'll use the
following induction hypothesis:

IH(k):  If E[e] ->* v in < k steps, then <e, E> ->* <v, []> , and
        if E[e] ->* E'[s] in < k steps, then <e, E> ->* <s, E'>

{Note the use of "< k", which makes the base step trivial, and
simplifies the induction step}.

Once we've proved (i) and (ii), (iii) follows immediately.

Let r ::= v | E[s] .  These correspond to the two possible ways in
which the small-step semantics can halt.

Let Rep(r) be defined as 

Rep(r) = <v, []>      if r = v
         <s, E>       if r = E[s]

Then the induction hypothesis can be rephrased as:

IH(k): If E[e] ->* r in < k steps, then <e,E> ->* Rep(r) .

We'll need a lemma:

Lemma 4: If IH(k) holds and E[q] ->* r in < k+1 steps, then
         <q, E> ->* Rep(r).

Proof: If q is s, then r is E[s], so <q, E> is already Rep(r).

If q is a redex L, then E[L] -> E[R] ->(<k) r.

So <L, E> ->  <R, E>
          ->* Rep(r)  by IH(k).    QED Lemma 4.

Now we can proceed to the induction:

Base step (k = 0):  Vacuously true:  you can't have a reduction with
fewer than 0 steps.

Induction step: 

Assume IH(k), and E[e] ->* r in fewer than k+1
steps.  We proceed by cases on the unique decomposition of e.

(1) e is value.  We proceed by cases on the unique decomposition of
    E[e].

    (1.1) E[e] is a value.  Then <e, E> ->* <E[e], []> by Lemma 3b.

    (1.2) E[e] = E1[q] ->(<k+1) r.  Then

          <e, E> ->* <q, E1> by Lemma 3a
                 ->* Rep(r)  by Lemma 4.

(2) e is E1[q].  Then E[e] = E[E1[q]] ->(<k+1) r, and therefore
 
    <e, E> =   <E1[q], E>
           ->* <q, E o E1>   by Lemma 2
           ->* Rep(r)        by Lemma 4.    QED Theorem.

The technique of writing IH(k) as (for all k' < k).P(k') is called
"course-of-values" or "strong" induction, and it's very handy.

<<end lecture 9/18/08; 200 lines>>

So now we have an equivalence between the small-step, big-step, and CK
machines:

Theorem:
=======

The following are equivalent:

(1) e ->* v
(2) e =>  v
(3) <e, []> ->* <v, []>
(4) for any K, <e, K> ->* <v, K>


================
From Small-Step Semantics to Big-Step Semantics
================
<<skipped Fall 2008>>


If our system satisfies Strong Decomposition, then we can define a
big-step semantics by the following rules:

(1) for each value v,

v => v 

(2) for each F,

e => v
F[w] => w
----------
F[e] => w

(3) for each elementary reduction L -> R

R => w
-------
L => w

Notice that this is not quite the same as our original big-step
semantics.  For example, our original big-step semantics has a bunch
of rules that look like

e1      => \x.e
e[e2/x] => v
-----------------
(e1 e2) => v

e1 => fix
e2 (fix e2) => v
----------------
(e1 e2) => v

(one for each form of the value of e1)

whereas the new one has a single rule

e1 => w
(w e2) => v
-----------
(e1 e2) => v

and a bunch of rules

e1[e2/x] => w
---------------
(\x.e1 e2) => w

(e (fix e)) => w
----------------
(fix e) => w

These systems "clearly" define the same relation => .  

In the theorem, we'll use the new one.

Theorem: 
  (i)  if e => v, then for any E, <e, E> ->* <v, E> .
  (ii) for any E, if <e, E> ->* <v, E>, then e => v .

Notice the difference in quantification between (i) and (ii).  (i)
asserts that e => v implies <e, E> ->* <v, E> for ANY E.  (ii) asserts
that if <e, E> ->* <v, E> for SOME E, then e => v.  Hence the
following statements are equivalent:

  (a) e => v
  (b) for some E, <e, E> ->* <v, E>
  (c) for any E,  <e, E> ->* <v, E>

Proof of Theorem: 

  (i) by induction on the definition of =>, using Lemma 1 where
  necessary.  Note that the decomposition rule works for F[e] even
  when e is a value.

  (ii) We proceed by induction on the number of steps in <e, E> ->*
  <v, E> .  As before, IH(k) is

   If E is a context and <e, E> ->* <v, E> in < k steps, then e => v.

   Base step is vacuously true (no derivations of < 0 steps).

   Proceed by cases on e:

   If e is a value v (k=0), then v => v by defn of =>.

   If e is a redex L, then <L, E> -> <R, E> ->* <v, E>.  By IH on the
   cdr of the derivation, R => v.  Therefore L => v by the rule

    R => w
    -------
    L => w

   If e is of the form F[e'] where e is not a value, then we have

   <F[e'], E> -> <e, E o F> ->* <v, E>

   By Lemma 2, there must exist some w such that <e', E o F> ->* <w, E
   o F>.  So the entire reduction sequence looks like

   <F[e'], E> -> <e, E o F> 
              ->* <w, E o F>
              ->  <F[w], E>
              ->* <v, E>

   Applying the induction hypothesis to the two subsequences, we
   conclude that F[e]=>w and F[w]=>v.  Hence F[e]=>v by the defn of =>.

================================================================


<<start lecture 10/6/08>>
1.9 Another example.

For this section, let's remove Y from the language (!).  We'll do one
more example because it emphasizes the types (which have been largely
absent from our discussions of the CK machine, etc.), and because it
introduces an important technique (called Logical Relations) that we
will use again later.

Theorem:  If e is closed term of type int or bool, then there is a v
such that e ->* v.

As before, the question is how to generalize this proposition so we
get an induction hypothesis that goes through.

The theorem needs to be generalized in two ways:

1.  It talks about terms of base type only, and we'll surely need to
    reason about terms of other types.

2.  It talks about closed terms, and we'll surely need to think about
    open terms as well.

There's a general recipe for this, which is called the technique of
"logical relations".  

We start with some property N_b of closed terms of each base type b.

In our case, N_b(e) will be the property that e is a closed term of
type b, and e ->* v for some v.  (ie "e converges").

We generalize in two steps:

1.  First we generalize to terms of higher type by defining:

N_{t->u}(e) iff for all closed e':t, if N_t(e'), then N_u((e e')).
That is, N_u holds on the application (e e').

This gives a definition of N on any closed term of any type t.  It
says that a term of function type is good iff whenever you apply it to
a good term, the resulting application is good.  This makes sense
because the application is of smaller type than the original.

Think of closed terms that satisfy N as being "good". 

Among other things, this means that if t = u1 -> u2 -> .. -> un -> b,
then a closed term e is good iff (e e1 e2 ... en) converges for
every good closed e1,...,en of types u1,..,un.

2.  Then we generalize to open terms as follows:  If e:t and
fv(e) \subset {x1:t1 ,..., xn:tn}.  Then define a good
substitution for e to be any substitution [u1/x1,...u_n/xn] where each u_i 
is a good closed term of type t_i.

Then we say an open term e is good iff for any good substitution s,
e.s is a good closed term.  (ie, every good instance of e is good).

Now we have a definition of N that works for any well-typed term e.
In this case we say N is a "logical relation".

Now we'll prove that all well-typed terms are good, that is for all
terms e:t (open or closed), e satisfies N_t.

We do this by induction on the construction of the derivation of e:t.

Theorem:  If e is closed term of type int or bool, then there is a v
such that e ->* v.

<proof>
Let N be defined as above.  We show that if e:t, then N_t(e).  We
proceed by induction on the derivation of e:t.

1. e is a variable x:t.  Then any good substitution for x is a closed
   term satisfying N_t (by definition of "good substitution").  So x
   itself satisfies N_t.

2. e is an application (e1 e2), where e1:u->t and e2:u . By induction,
   we can assume that e1 satisfies N_{u->t} and e2 satisfies N_u.  To
   show that (e1 e2) satisfies N_t, we must show that any good
   instance of (e1 e2) satisfies N_t.  So let s be an N-closing
   substitution for e.  Then s is also an N-closing substitution for
   e1 and e2 as well.  

   Since e1 satisfies N_{u->t}, so does e1.s (and it's closed!), and
   similarly e2.s is a closed term that satisfies N_u.  Therefore, by
   definition of N_{u->t}, so does (e1.s e2.s) satisfies N_t.  And of
   course (e1.s e2.s) = (e1 e2).s.

   We've shown that any N-closing instance of (e1 e2) satisfies N_t.
   Hence (e1 e2) satisfies N_t.

3. e is an abstraction (\x:t1.e'), so t = t1 -> u and e':u.  As before,
   let s be an N-closing substitution for e.  We must show that
   (\x:t1.e').s satisfies N_{t1->u}.  To do this, we must show that 
   ((\x:t1.e').s e1) satisfies N_u for any closed e1 satisfying N_t1.  

   How do we do that?  Let u = t2 -> t3 -> ... b .  By the remark
   above, we must show that for any e2:t2,..en:tn that are all closed
   and all satisfy N,

   (((\x:t1.e').s e1) e2 ... en)  converges.   (*)

   Well,

   (((\x:t1.e').s e1) e2 ... en)

   -> (e'.s[e1/x] e2 ... en)

   But wait!  If s is an N-closing substitution for \x.e, and e1
   satisfies N_t1, then s[e1/x] is an N-closing substitution for
   e':u.  

   BY THE INDUCTION HYPOTHESIS, e' satisfies N_u.  Hence e'.(s[e1/x])
   satisfies N_u.  Hence for any N-closed e1,...,en of appropriate
   types,

   (e'.(s[e1/x]) e2 ...en)

   converges.  Since the term in (*) takes one step and gets to this
   converging term, the term in (*) converges also, as desired  (!!)

4.  Now all we have to do is deal with the constants. 

    c_k : int   -- converges

    zero? : int -> bool.  Let e be an N-closed term of type int.  Then
    (zero? e) converges, because (by defn of N_int) e converges to
    some c_k.  Hence (zero? e) ->* (zero? c_k) -> {true | false}.

    +1 and -1 are similar.  [WHAT ABOUT (-1 c_0) ??]

    if -- must show that if e0:bool e1:int e2:int are all N-closed
    terms, then (if e0 e1 e2) converges.  Then e0 converges by defn of
    N_bool, so

    (if e0 e1 e2) 
    ->* (if {t|f} e1 e2)
    -> e1 | e2   , either of which converges by defn of N_int.

</proof>

This is a very common pattern of proof.

Note that Y breaks this.  Of course it makes the theorem
false, but let's see how the proof breaks (always instructive!)

   Y : (t -> t) -> t

as before, let t = t1 -> t2 -> t3 -> ... -> tn -> b.

We must show that for any N-closed e1,..,en of appropriate types, and
any N-closed term e of type t -> t,

  Y(e e1 e2 ... en) converges.

But looking at the reduction, we get:

  Y(e e1 e2 ... en)
  -> (e Y(e) e1 e2 ... en)

e is N-closed, but that only tells us what happens when e is applied
to an N-closed term.  We don't know whether Y(e) satisfies N, so we're
stuck. 

<<end lecture 6 10/6/08; 170 lines + went over 91-fcn (pf by comp induc)>>

**************** This is as far as I got ****************


2.  Denotational Semantics

2.1 Limitations of Operational Semantics

Operational Semantics is good for figuring out what any particular
whole program does, and therefore it is useful for thinking about
implementations, but it is less good for determining properties of
classes of programs.  Consider the following statement:

- The terms P1 and P2 above are interchangeable in any context.  

How might we go about proving such a thing?  If we only had to deal
with reduction contexts, we could probably proceed as we did above.
But we also have to consider contexts like

  (M P1) vs (M P2)

for complicated terms M.

We would need to consider "tracking" P1 and P2 throughout their
passage through M.  We'd have to consider generalizations as they are
partially substituted-in, etc, and we'd have (+1 (f (-1 e1) e2) in one
term, and (f (-1 e1') (+1 e2')) in the other...  yuck!

Even simpler, let e be a term containing x free.  Then e and (\x.e x)
are interchangeable in any context.  This is true, but it is quite
difficult to show in an operational framework.

What we are missing is a framework for reasoning about program
_fragments_.   What we'd like is a _compositional_ framework, in which
we can say:

  if e1 and e2 are equivalent program fragments, then C[e1] and C[e2]
  are equivalent programs for any context C (not just evaluation
  contexts!). 

We say a definition is _compositional_ iff the meaning of any
expression depends only on the meaning of its immediate
subexpressions.  If meaning is defined in this way, then if e1 and e2
have the same meaning, then C[e1] and C[e2] are guaranteed to have the
same meaning, since for any C, the meaning of C[e] depends only on the
meaning of e.

We already have such a framework for the notion of "meaning" as
"type", and the notion of equivalence "has the same type".  We got a
lot of mileage out of that.

Our next step is to do this exercise for a finer notion of
equivalence. 

================================================================

2.1 A Simple Language: REC

Before doing this in general, let's try a simple language:  the
language REC of Winkskel chapter 9.


Do: Sec 9.1, 9.2.  Note 9.2 is a bigstep semantics.  Note this
language is powerful enough to do the examples from PS3:

e1 = if (zero? x) y (+1 (f (-1 x) y))
vs
e2 = if (zero? x) y (f (-1 x) (+1 y))

Are these two equivalent in ANY context, ie, can we take

letrec
  f(x,y) = e1
  (g = e) ...
in e'

and replace it by 

letrec
  f(x,y) = e2
  (g = e) ...
in e'

be guaranteed to get the same behavior, regardless of what's in the
g's, e's or e' .  Our operational reasoning doesn't help us much with
this. 

Want a _denotational_ model.  Roughly speaking, we want a
function that maps each term to its meaning and is compositional.

Need to model inherited attributes:  function environments p and
value environments r,

Want a valuation E so that for any e

E[e]pr = n  iff  e.r => n    

But what about non-termination?  Model this making the range of E[e]fr
not the integers N but N_bot -- the integers with a special element
bot adjoined to it.

Then we can get a theorem like this, so that if e.r fails to
terminate, then E[e]pr will be equal to bottom.

So a procedure will have a denotation in [N -> Nbot].

We can think of this in two ways. 

  (1) as a partial function, ie a set of ordered pairs (n,m) where for
      each n there is at most one m s.t. (n,m) is in the set.  There is
      a natural order structure on partial functions defined in this
      way:  the subset ordering:

      f <= g iff [for all n, if f(n) defined then g(n) is defined and
                             equal to f(n)]

  (2) Consider Nbot itself as an ordered structure, with bot <= n for
      all n, and order [N -> Nbot] by

      f <= g iff [for all n, f(n) <= g(n)]


Each way gives an order structure of [N -> Nbot], and these are
equivalent.

If we declare

f(x,y)= if (zero? x) y (+1 (f (-1 x) y))

Then the right-hand side by itself defines a function

\v0 in [N->Nbot]\(v1in N,v2 in N).
    E[[if (zero? x) y (+1 (f1 (-1 x) y))]]
     [f = v0]
     [x = v1, y = v2]

ie, a function 

[N -> Nbot] -> [N -> Nbot]

Let's call this function F.

If we intend this declaration to be a letrec, we would expect that the
procedure f should be bound to some value v0 such that

v0 = F(v0)

Of course for some right-hand sides, there may be many solutions to
this equation, eg consider the declaration

f(x) = f(x)

So we'll need to figure out _which_ solution to adopt.

For this simple language, this can all be worked out in the partial
function semantics (see, eg Manna, Theory of Computation, 1974).  But
we'll need the whole cpo story soon enough, so we might as well bite
the bullet and do it right the first time through.

================================================================

3.  Domain Theory  -- Winskel Chapter 8


4.  The Operational and Denotational Semantics of REC 
    -- using Winskel, Chapter 9, using the eager (cbv) version of REC.

================================================================

5.  Denotational Semantics of PCF.

We'll do the lazy (cbn) version of PCF first, because we that's what
we did the operational semantics for.  (Also, it's a little easier,
and it's what Plotkin did, though we will not use his proof).

5.1 The Standard Model of PCF

First we define the _standard model_ of PCF.  We first give the
semantics of each type:

[[int]] = N_bot
[[bool]] = B_bot  (the three-point cpo)
[[t -> u]] = [ [[t]] -> [[u]] ]    (the set of all continuous
                                    functions from [[t]] to [[u]]


For any PCF term e, the set of environments for e, notated Env_e, is
the set of all associations {(x1, v1),..., (xn, vn)} such that fv(e)
\superset {x1,..,xn} and for each i, if x_i is a variable of type t_i,
then v_i \in [[t_i]].  Note that Env_e has an order given by its
product structure:

      r1 <= r2 iff for all x in dom(r1), r1(x) <= r2(x)

To each term e of type t, we assign a continuous function [[e]] \in
[Env_e -> [[t]]] as follows:

to every constant c of type t, we choose [[c]] \in [[t]].  We'll make
particular choices of these later, but for now any value in [[t]] will do.

[[x]]r = r(x)
[[(e1 e2)]]r = ([[e1]]r)([[e2]]r)
[[\x.e]]r = f, where f is defined by:

	        for all d in [[t]], f(d) = [[e]](r[x=d]) 

               where x is a variable of type t.

               PROVIDED	this f is continuous.
               
The first thing we have to do is to check that this definition
actually gives us a continuous function for each e.  This is a little
trickier than you might expect:

First you have to check that each recursive call [[e']]r' in the definition of
[[e]] actually gives [[e']] an argument r' that is an environment for
e'.  (That's easy).

Second, in the abstraction case, we have to check both that each f is
continuous in d, and that the whole denotation [[\x.e]] is continuous
in r.

Let's check these.  In each case, let R = <r_i | i in N> be an
omega-chain of environments, with limit r*.  We want to show that for
each e,  [[e]](lub R) = lub {[[e]]r | r in R}

[[x]](lub R) = (lub R)(x) = lub {r(x) | r in R} = lub {[[x]]r | r in R}.

[[(e1 e2)]](lub R) = ([[e1]](lub R)) ([[e2]](lub R))

                   = (lub_i ([[e1]]r_i)) ([[e2]](lub R))   [IH at e1]

                   = lub_i {([[e1]]r_i)(lub_j [[e2]]r_j)   [IH at e2]

		   = lub_i {lub_j {([[e1]]r_i)([[e2]]r_j)}  [cont of [[e1]]r]

                   = lub_i {([[e1]]r_i)([[e2]]r_i) } 

                      by the diamond lemma,
		      with v_ij = ([[e1]]r_i)([[e2]]r_j).

                   = lub_i {[[(e1 e2)]]r_i}


For the abstraction case, observe the following:

  Consider the function ext_x : Env(\x.e) * [[t]] -> Env_e defined
  by

  ext_x (r, d) = r[x=d]

  This is easily seen to be continuous in both r and d.  That is, for
  any r and d, ext_x(r,_) and ext_x(_,d) are both continuous

So we have

[[\x.e]]r = f where f is defined by

       f(d) = [[e]](ext_x(r,d))

So f = [[e]] o ext_x(r, _) is continuous for any particular r, since
it is the composition of continuous functions.

Now let's check that [[\x.e]] is continuous in r:

[[\x.e]](lub R) = [[e]] o ext_x(lub R, _) 

                = [[e]] o lub_i ext_x(r_i, _)    ??

                = lub_i ([[e]] o ext_x(r_i, _))   since composition is cont.

                = lub_i {\d.[[e]](ext_x(r_i,d))

Conclusion:  for each e:t, [[e]] is a continuous function from Env(e)
to [[t]].

Now all we need to do is to assign denotations to each of the
constants.  But this is easy:

[[c_n]] = lift(n) \in N_bot = [[int]]

[[+1]] = strict(\n. lift(n+1)) \in [[int]] -> [[int]]

[[true]] = lift(truth) \in [[bool]]

[[if]] = strict(\b. if b then \xy.x else \xy.y)

and last (but not least!-)

[[Y_t]] = fix \in [ [[t]] -> [[t]] ] -> [[t]]  = [[(t->t) -> t]]


5.2 Comparing the operational and denotational semantics of CBN PCF.

We'll use the big-step operational semantics.