2008-01-22 Semantics, Implementing an Evaluator, Introduction to Typed Scheme ======================================================================== A quick note about using `match': The syntax for `match' is (match value [pattern result] ...) The value is matched against each pattern, possibly binding names in the process, and if a pattern matches it evaluates the result expression. The simplest form of a pattern is simply an identifier -- it always matches and binds that identifier to the value: (match (list 1 2 3) [x x]) ; evaluates to the list Another simple pattern is a quoted symbol, which matches that symbol. For example: (match foo ['x "yes"] [else "no"]) will evaluate to "yes" if `foo' is the symbol `x', and to "no" otherwise. Note that `else' is not a keyword here -- it happens to be a pattern that always succeeds, so it behaves like an else clause except that it binds `else' to the unmatched-so-far value. Many patterns look like function application -- but don't confuse them with applications. A `(list x y z)' pattern matches a list of exactly three items and binds the identifiers -- which is useful with nested patterns: (match (list 1 2 3) [(list x y z) (+ x y z)]) ; evaluates to 6 (match '((1) (2) 3) [(list (list x) (list y) z) (+ x y z)]) ; evaluates to 6 There is also a `cons' pattern that matches a non-empty list and then matches the first part against the head for the list and the second part against the tail of the list. In a `list' pattern, you can use `...' to specify that the previous pattern is repeated zero or more times, and bound names get bound to the list of respective matching. One simple consequent is that the `(list hd tl ...)' pattern is exactly the same as `(cons hd tl)', but being able to repeat an arbitrary pattern is very useful: (match '((1 2) (3 4) (5 6) (7 8)) [(list (list x y) ...) (append x y)]) ; evaluates to (1 3 5 7 2 4 6 8) A few more useful patterns: _ -- matches anything, but does not bind (or pattern pattern) -- matches either pattern (careful with bindings) (number: n) -- matches any number and binds it to `n' (symbol: s) -- same for symbols (string: s) -- strings If no pattern matches the value, an error is raised. ======================================================================== >>> Semantics (= Evaluation) Back to BNF -- now, meaning. An important feature of these BNF specifications: we can use the derivations to specify *meaning* (and meaning in our context is "running" a program (or "interpreting", "compiling", but we will use "evaluating")). For example: ::= ; evaluates to the number | + ; evaluates to the sum of evaluating ; and | - ; ... the subtraction of from To do this a little more formally: a. eval() = b. eval( + ) = eval() + eval() c. eval( - ) = eval() - eval() Note the completely different roles of the two "+"s and "-"s. An alternative popular notation for eval(X) is [[X]]: a. [[]] = b. [[ + ]] = [[]] + [[]] c. [[ - ]] = [[]] - [[]] Is there a problem with this definition? Ambiguity: eval(1 - 2 + 3) = ? Depending on the way the expression is parsed, we get either 2 or -4: eval(1 - 2 + 3) = eval(1 - 2) + eval(3) [b] = eval(1) - eval(2) + eval(3) [c] = 1 - 2 + 3 [a,a,a] = 2 eval(1 - 2 + 3) = eval(1) - eval(2 + 3) [c] = eval(1) - (eval(2) + eval(3)) [a] = 1 - (2 + 3) [a,a,a] = -4 Again, be very aware of confusing subtleties which are extremely important: We need parens around a sub-expression only in one case, why? -- When we write: eval(1 - 2 + 3) = ... = 1 - 2 + 3 we have two expressions, but one stands for an *input syntax*, and one stands for a `real' mathematical expression. In a case of a computer implementation, the syntax on the left is (as always) an AE syntax, and the `real' expression on the right is an expression in whatever language we use to implement our AE language. Like we said earlier, ambiguity is not a real problem until the actual parse tree matters. With `eval' it definitely matters, so we must not make it possible to derive any syntax in multiple ways or our evaluation will be non-deterministic. ======================================================================== Quick exercise: We can define a meaning for s and then s in a similar way: eval(0) = 0 eval(1) = 1 eval(2) = 2 ... eval(9) = 9 eval( ) = 10*eval() + eval() Is this exactly what we want? -- Depends on what we actually want... * Example for free stuff that looks trivial: if we were to define the meaning of this way, would it always work? Think an average language that does not give you bignums, making the above rules fail when the numbers are too big. ======================================================================== >>> Semantics: Implementing an Evaluator Now continue to implement the semantics of our syntax -- we express that through an `eval' function that evaluates an expression. We use a basic programming principle -- splitting the code into two layers, one for parsing the input, and one for doing the evaluation. Doing this avoids the mess we'd get into otherwise, for example: (define (eval expr) (cond [(number? expr) expr] [(and (list? expr) (= 3 (length expr)) (or (eq? '+ (car expr)) (eq? '- (car expr)))) (let ([1st (eval (cadr expr))] [2nd (eval (caddr expr))] [op (if (eq? '+ (car expr)) + -)]) (op 1st 2nd))] [else (error 'eval "bad input!")])) This is messy because it combines two very different things -- syntax and semantics -- into a single lump of code. If we split the code, we can easily include decisions like making {+ 1 {- 3 "a"}} syntactically invalid. (Which is not, BTW, what Scheme does...) (Also, this is like the distinction between XML syntax and well-formed XML syntax.) An additional advantage is that by using two separate components, it is simple to replace each one, making it possible to change the input syntax, and the semantics independently -- we only need to keep the same interface data (the AST) and things will work fine. Our `parse' function converts an input syntax to an abstract syntax tree (AST). It is abstract exactly because it is independent of any actual concrete syntax that you type in, print out etc. ======================================================================== >>> Introduction to Typed Scheme * Why Types? * Why Typed Scheme? * What's Different about Typed Scheme? * Some Examples of Typed Scheme for 660 Programs ======================================================================== * Types - Who has used a typed language? - Who has used a typed language that's not Java? Typed Scheme will be both similar to, and very different from, anything you've seen before ======================================================================== * Why Types? - Types help structure programs. - Types provide enforced and mandatory documentation. - Types help catch errors. * Structuring Programs - Data definitions ;; An AE is one of: ;; (make-Num Number) ;; (make-Add AE AE) (define-type AE [Num (n number?)] [Add (l AE?) (r AE?)]) (define-type AE [Num (n Number)] [Add (l AE) (r AE)]) - Data-first The structure of your program comes from the structure of your data. We saw this in 211 and 213 with the design recipe and templates. We see this in this class with the (cases ...) form. Types make this pervasive - we have to think about our data before our code. - A language for describing data We already have two languages for this: the informal one in our contracts, AE -> Number and the more formal one using define-type. (define-type ...) Our type system will unify these, and allow us to be more precise about what we mean. ======================================================================== * Why Typed Scheme? Scheme is the language we all know. Scheme is an excellent language for experimenting with programming languages. - Typed Scheme allows us to take our Scheme programs and typecheck them. Types are an important programming language feature. Typed Scheme will help us understand them. The development of Typed Scheme is happening here, and will benefit from your feedback. ======================================================================== * Interlude: Installing the new version of PLT Scheme ======================================================================== * How Typed Scheme Is Different from Scheme - Typed Scheme will reject your program if there are type errors! - Typed Scheme files start like this: #lang typed-scheme ;; Program goes here. - Typed Scheme requires you to write the contracts on your functions. Scheme: ;; f : Number -> Number (define (f x) (* x (+ x 1))) Typed Scheme: #lang typed-scheme (: f : (Number -> Number)) (define (f x) (* x (+ x 1))) You can also have the type annotations in the definition: #lang typed-scheme (define: (f [x : Number]) : Number (* x (+ x 1))) - Typed Scheme uses types, not functions, in define-type. Scheme: (define-type AE [Num (n number?)] [Add (l AE?) (r AE?)] Typed Scheme: #lang typed-scheme (define-type AE [Num (n Number)] [Add (l AE) (r AE)] - There are other differences, but these will suffice for now. ======================================================================== * Examples (: digit-num : (Number -> (U Number String))) (define (digit-num n) (cond [(<= n 9) 1] [(<= n 99) 2] [(<= n 999) 3] [(<= n 9999) 4] [else "a lot"])) (: fact : (Number -> Number)) (define (fact n) (if (zero? n) 1 (* n (fact (- n 1))))) (: helper : (Number Number -> Number)) (define (helper n acc) (if (zero? n) acc (helper (- n 1) (* acc n)))) (: fact : (Number -> Number)) (define (fact n) (helper n 1)) (: fact : (Number -> Number)) (define (fact n) (define: (helper [n : Number] [acc : Number]) : Number (if (zero? n) acc (helper (- n 1) (* acc n)))) (helper n 1)) (: every? : (All (A) ((A -> Boolean) (Listof A) -> Boolean))) ;; Returns false if any element of lst fails the given pred, true if ;; all pass pred. (define (every? pred lst) (or (null? lst) (and (pred (car lst)) (every? pred (cdr lst))))) (define-type AE [Num (n Number)] [Add (lhs AE) (rhs AE)] [Sub (lhs AE) (rhs AE)]) (: parse-sexpr : (Sexpr -> AE)) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (cond [(number? sexpr) (Num sexpr)] [(and (list? sexpr) (= 3 (length sexpr))) (let ([make-node (match (first sexpr) ['+ Add] ['- Sub] [else (error 'parse-sexpr "don't know about ~s" (first sexpr))])]) (make-node (parse-sexpr (second sexpr)) (parse-sexpr (third sexpr))))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) ======================================================================== Back to our `eval' -- this will be its (obvious) type: (: eval : (AE -> Number)) ;; consumes an AE and computes the corresponding number which leads to some obvious test cases: (equal? 3 (eval (parse "3"))) (equal? 7 (eval (parse "{+ 3 4}"))) (equal? 6 (eval (parse "{+ {- 3 4} 7}"))) Like everything else, the structure of the recursive `eval' code follows the recursive structure of its input. The template is therefore: (: eval : (AE -> Number)) (define (eval expr) (cases expr [(Num n) ...] [(Add l r) ... (eval l) ... (eval r) ...] [(Sub l r) ... (eval l) ... (eval r) ...])) In this case, filling in the gaps is very simple (: eval : (AE -> Number)) (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))])) We can further combine `eval' and `parse' into a single `run' function that evaluates an AE string. (: run : (String -> Number)) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) The resulting *full* code is: -------------------------------------------------------------------- #| BNF for the AE language: ::= | { + } | { - } | { * } | { / } |# ;; AE abstract syntax trees (define-type AE [Num (n Number)] [Add (lhs AE) (rhs AE)] [Sub (lhs AE) (rhs AE)] [Mul (lhs AE) (rhs AE)] [Div (lhs AE) (rhs AE)]) (: parse-sexpr : (Sexpr -> AE)) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list op left right) (let ([make-node (match op ['+ Add] ['- Sub] ['* Mul] ['/ Div] [else (error 'parse-sexpr "don't know about ~s" op)])]) (make-node (parse-sexpr left) (parse-sexpr right)))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : (String -> AE)) ;; parses a string containing an AE expression to an AE AST (define (parse str) (parse-sexpr (string->sexpr str))) (: eval : (AE -> Number)) ;; consumes an AE and computes the corresponding number (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))])) (: run : (String -> Number)) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) ;; tests: (equal? 3 (run "3")) (equal? 7 (run "{+ 3 4}")) (equal? 6 (run "{+ {- 3 4} 7}")) -------------------------------------------------------------------- For anyone who thinks that Scheme is a bad choice, this is a good point to think how much code would be needed in some other language to do the same as above. ========================================================================