# 6  S-expressions

The class of S-expressions gives a name to the fact that lists are heterogeneous and that, in particular, lists can contain lists:

An S-expression (short: `S-exp`) is one of:
1. `(cons S L)` where `S` and `L` are `S-exp`s

2. `AnyValue` (but not a `cons`tructed value).

S-expressions differ from lists in two ways. First, any value is an S-expression. Second, a `cons`'ed S-expression may contain an arbitrary value in the second field, not just a list. Still, most people use `cons` in S-expressions so that there is really a list of S-expressions, which greatly simplifies the design of functions.

S-expressions are tree-shaped pieces of data. They are thus extremely useful for representing any structure form of information. People use them for collections of records, messages between machines, family trees, Web pages, and many more things.

Defining functions that process S-expressions is barely more complicated than defining functions that process lists. After all, we use the same fundamental constant and constructor. This function counts the number of chars inside of strings in an S-expression:

```;; `S-exp  -->  Number`
;; to count the number of characters in strings within `s`
(define (count-char s)
(cond
[(pair? s) (+ (count-char (car s)) (count-char (cdr s)))]
[else (if (string? s) (string-length s) 0)]))
```

To use the function, we construct an S-expression and apply the function to it:

```(count-char (cons (cons (cons 0 "hello") 'silly) "world"))
```

The S-expression in this example contains two strings, one number, and one symbol. The expected result is 10.

## 6.1  Quote, Backquote, Comma

While processing structured pieces of data is important, many programs must also construct large pieces of structured data. To this end, Scheme provides a quotation mechanism for S-expressions that seeks its analog in any modern programming language.4

A quotation mechanism introduces a short-hand for defining constants in a simple and readable manner. The string quotes are a primitive example of a quotation mechanism. A more interesting mechanism is the `quote` form for lists. The syntax for `quote` is `(quote d)` or even shorter `'d`. Roughly speaking, `quote` traverses its single subexpression and inserts `list` to the right of each left parentheses and turns every token into a symbol if it looks like a symbol.5 For example,

 `'(1 2 3)` is short for `(list 1 2 3)`
and
 `'((a 1) (#t "two"))` expands to `(list (list 'a 1) (list #t "two"))`

We also get a cute name for the empty list as a bonus:

 `'()` is a natural abbreviation for `null`

With `quote`, you can write down large pieces of data easily. Say you are a teaching assistant and you want to keep track of your students' grades. Defining the ``grade database'' as one large Scheme expression is straightforward: see figure 11 We simply quote the deeply nested parenthetical structure and have a collection of records. Because of this, quoted S-expressions play a huge role in Scheme programming.

Still, while `quote` alone is a powerful way of writing down large pieces of information as structured data, Scheme has even more powerful tools for this purpose. In particular, Scheme has a mechanism for building complex S-expression shapes from templates. Roughly speaking, a template is an S-expressions with holes. The holes are filled with Scheme values. The mechanism for defining S-expression templates are `quasiquote` (or backquote) and `unquote` (or comma).

At first glance, a `quasiquote` expression is just like `quote` expression. The major difference is that a `quasiquote` expression may contain subexpressions that are prefixed with `unquote` (comma) and that Scheme evaluates these subexpressions rather than quoting them.

Using `quasiquote` and `unquote` permits programmers to write down (large) nearly constant S-expressions in a concise manner. Imagine we represent binary trees with S-expressions. The empty list `()` plays the role of the empty tree. A node is a list with three items: the left branch, the information, the right branch. For readability, we may annotate each piece of a node with a symbol that explains which role the subexpression plays. Here is an example:

```'((left: ())
(information: 5)
(right:
((left: ())
(information: 7)
(right: ()))))
```

This tree has two nodes, three empty subtrees, and two pieces of information (one per node): `5` and `7`.

Naturally, we now want to define a function that consumes two trees and a piece of information to construct a new tree:

```;; `Tree Number Tree  -->  Tree`
(define (make-tree left-tree info right-tree)
`((left: ,left-tree)
(information: ,info)
(right: ,right-tree)))
```

The `quasiquote` expression contains three occurrences of `unquote`d expressions. All of them are variables and simply craft onto the tree template the subtrees and the new piece of data. To create the above tree, we just write

```(make-tree '() 5 (make-tree '() 7 '()))
```

Compare two simpler applications of the function:

 ```(make-tree '() 3 '()) ; produces '((left: ()) (information: 3) (right: ())) ``` and ```(make-tree '() 7 '()) ; produces '((left: ()) (information: 7) (right: ())) ```
The comparison shows how the function builds similar S-expressions from one template.

Of course, the `unquote`d subexpressions doesn't need to be a variable. Suppose we wish to deal with numeric trees that also contain a running sum field. The leaf looks now like this:

```'((sum: 0) (tree: ()))
```

and an interior node has this shape:

```'((sum: 12)
(left: ((sum: 0) (tree: ())))
(information: 5)
(right:
((sum: 7)
(left: ((sum: 0) (tree: ())))
(information: 7)
(right: ((sum: 0) (tree: ()))))))
```

Dealing with such trees requires functions that maintain the extra field properly:

```;; `Tree Number Tree  -->  Tree`
(define (make-sum-tree left info right)
`((sum: ,(+ (tree-sum left) info (tree-sum right)))
(left: ,left)
(information: ,info)
(right: ,right)))

;; `Tree  -->  Number`
(define (tree-number t)
(second (car t)))

;; `Tree`
(define the-empty-tree
'((sum: 0) (tree: ())))
```

In the definition of `make-sum-tree`, we have the same occurrences of `unquote`d variables as in `make-tree`, but we also have a complex `unquote`d arithmetic expression. When we apply `make-sum-tree`, this expression is evaluated and the resulting number becomes the `'sum` field of the new tree.

 ```;; `(listof (cons String (listof Number)))  -->  S-exp` (define (make-table grades) `(table ([cellspacing "10"] [border "1"]) (tr (th "Name") (th "Grades")) ,@(grades-rows grades))) ;; `(listof (cons String (listof Number)))  -->  S-exp` (define (grades-rows grades) (map entry-row grades)) ;; `(cons String (listof Number))  -->  S-exp` (define (entry-row e) `(tr (td ,(car e)) ,@(make-cells (cdr e)))) ;; `(listof Number)  -->  (listof String)` (define (make-cells grades) (map (lambda (n) `(td ,(number-string n))) grades)) ``` Figure 12:  Backquote, unquote, and splice

Finally, on some occasions, a program may have to splice a list of values into an S-expression. For example, you may wish to splice a list of grades into a header template that attaches some additional information to a grade listing:

```;; `String String (listof (cons String (listof Number)))  -->  S-exp`
`((Course: ,course)
(Teacher: ,teacher)
```

We could accomplish this insertion with `list`, `cons`, and `append` instead, but this would seriously reduce the readability of such templates.

Functions like `header` play an important role in Web programming (among many other things). S-expressions provide a simple and effective representation of HTML and XML inside of Scheme. Thus, if we wish to produce HTML pages, it is often convenient to define just such functions. Consider the specific example of converting a list of grades into an (S-expression representation of an) HTML table. That is, say we want a function `make-table` such that

```(make-table '(("fritz" 1 1 1) ("ursula" 2 2 2)))
produces
'(table ((cellspacing "10") (border "1"))
(tr (td "fritz") (td " 1 ") (td " 1 ") (td " 1 "))
(tr (td "ursula") (td " 2 ") (td " 2 ") (td " 2 ")))
```

Figure 12 shows a program fragment that accomplishes just this kind of task. The program consists of four functions. Two wrap headers around tables and rows; two process rows and cells. The latter are written with a powerful loop operator, which we get to know in the next section.

## 6.2  S-expressions and Recursion

The class of S-expressions serves as a universal data representation language in Scheme. With S-expressions, it is easy to represent all forms of information conveniently. As a matter of fact, some are so common that they are built in via primitives that work on just those subsets.

One of these is the notion of an association list. An association list is a list of `cons` cells where the `car` field is always a symbol. The latter serves as a key to the data in the `cdr` field. It is similar to a dictionary that translates symbols to meaningful information. Take a look at this phone directory:

(define PHONE-DIRECTORY
'((matthias `"Matthias Felleisen"` `"713.333.2001"`)
(matthew    `"Matthew Flatt"`    `"218.432.1119"`)
(robby    `"Robert F. Findler"`    `"713.317.2278"`)
(shriram    `"Shriram Krishnamurthi"`    `"713.289.0003"`)
...

The symbols in each list are roughly like nicknames, and the rest of each list is a formal name and a phone number.

Scheme also provides a function for looking up the data that comes with a symbol in an association list:

```;; `Symbol (Listof (cons Symbol X))  -->  (union #f (List Symbol X))`
;; find a pair in `l` whose first field is `s` (if there is one)
(define (assq s l)
(cond
[(null? l) #f]
[else (if (eq? (car (car l)) s) (car l) (assq s (cdr l)))]))

(equal? '(a 9) (assq 'a '((b 2) (c 3) (a 9) (e 11))))
(not (assq 'z '((b 2) (c 3) (a 9) (e 11))))
```

The function is another instance of our template for list-processing functions. The slightly unusual aspect is that `assq` works with lists inside of lists.

With `assq` we can compose a program for looking up phone numbers:

```;; `Symbol  -->  String`
(define (lookup nickname)
(let ([r (assq nickname PHONE-DIRECTORY)])
(if r
(string-append (second r) ": " (third r))
```

This function looks up the nickname inn `PHONE-DIRECTORY`, checks whether `r` is `#f` or some `cons` cell, and creates a string as an answer. Remember that `if`, like `cond`, distinguishes `#f` from all other values, not just `#t`.

Unfortunately, it is not only easy to encode data as S-expressions, it also easy to forget which subset of the class of S-expressions was chosen for a data representation. Because of that, we propose to use data definitions that describe the structure of the chosen data explicitly. For example, an association list is defined as

An association list is a `(Listof (cons Symbol X))`.
or, more verbose, as
An association list is one of:
1. `null`

2. `(cons (cons Symbol X) L)` where `X` is any kind of value and `L` is an association list.

We have also dealt with trees as S-expressions already:

A tree (also known as `(Treeof X)`) is one of:
1. `X`

2. `(list Tree Tree)`.

This is a generic tree definition, where proper data shows up at the leaves only.

S-expressions can also be used to encode Scheme's syntax:

An AST is one of:
1. `Symbol`

2. `Number`

3. `(list '+ AST AST)`

4. `(list '- AST AST)`.

Since this definition is somewhat unusual, let's construct some examples and let's see how they represent Scheme expressions:
 `5` `5` `'x` `x` `'(+ 5 4)` `(+ 5 4)` `'(- x 1)` `(- x 1)` `'(+ (- x 1) (+ x 1))` `(+ (- x 1) (+ x 1))`
That is,
• a number represents a Scheme number, and

• a symbol by itself represents a Scheme variable,

• an S-expression with a `'+` in the first position and two numbers is an additio expression, and

• an S-expression with a `'-` as the first symbol is a subtraction expression. If it involves a symbol, it is a so-called variable expression.

In general, we can find the represented expression by just stripping off the quote.

When data definitions become as large as the one for AST (or even larger), it is also a good practice to define selectors for the various pieces similar to the selectors that a structure definition introduces:

```(define (addition-left-operand a) (second a))

(define (subtraction-left-operand a) (second a))

(define (subtraction-right-operand a) (third a))
```

In this particular world, it may even make sense to use the same selectors for both kinds of complex expressions and additional selector for the operator:

```(define (operator a) (car a))

(define (left-operand a) (second a))

(define (right-operand a) (third a))
```

In either case, the functions that process `AST`s become more readable and less dependent on the actually chosen data representation. Switching to post-fix representations or structures later on is almost no problem, as long as we use the selectors and not the concrete representation.

To design a function template for this subclass of S-expressions, we proceed the same way we did for lists in section 4.3. First, we observe that the data definition has four clauses, which suggests that the function should have four clauses:

```(define (function-for-ast an-ast)
(cond
[(symbol? an-ast) ...]
[(number? an-ast) ...]
[(eq? (operator an-ast) '+) ...]
[(eq? (operator an-ast) '-) ...]))
```

Second, we can also see that in clauses 3 and 4 the given `AST` should have two more components:

```(define (function-for-ast an-ast)
(cond
[(symbol? an-ast) ...]
[(number? an-ast) ...]
[(eq? (operator an-ast) '+)
... (left-operand an-ast) ... (right-operand an-ast) ...]
[(eq? (operator an-ast) '-)
... (left-operand an-ast) ... (right-operand an-ast) ...]))
```

Last but not least, we see that the data definition refers to itself for the four subexpressions that we just added. This suggests that functions on `AST`s naturally recur for those four expressions:

```(define (function-for-ast an-ast)
(cond
[(symbol? an-ast) ...]
[(number? an-ast) ...]
[(eq? (operator an-ast) '+)
... (function-for-ast (left-operand an-ast)) ...
... (function-for-ast (right-operand an-ast)) ...]
[(eq? (operator an-ast) '-)
... (function-for-ast (left-operand an-ast)) ...
... (function-for-ast (right-operand an-ast)) ...]))
```

Equipped with this template we can now write some functions on `AST`s:

• The first function determines whether the given symbolic representation of a Scheme expression is free of variables, which are represented as symbols. It is straightforward instance of the template.

```;; `AST  -->  Boolean`
;; determine whether `an-ast` is free of symbols as leaves
(define (no-variables an-ast)
(cond
[(symbol? an-ast) #f]
[(number? an-ast) #t]
[(eq? (operator an-ast) '+)
(and (no-variables (left-operand an-ast))
(no-variables (right-operand an-ast)))]
[(eq? (operator an-ast) '-)
(and (no-variables (left-operand an-ast))
(no-variables (right-operand an-ast))])))
```

• The second function substitutes symbols with values, which mimics what a high school student and the Scheme evaluator do during the evaluation of function applications. Note the use of backquote and comma in the last two `cond` clauses for the re-creation of (representations of) additions and subtractions.

```;; `AST Symbol Number  -->  AST`
;; replace the symbol `x` with  `val` throughout `an-ast`
(define (substitute an-ast x val)
(cond
[(symbol? an-ast) (if (eq? an-ast x) val an-ast)]
[(number? an-ast) an-ast]
[(eq? (operator an-ast) '+)
`(+ ,(substitute (left-operand an-ast))
,(substitute (right-operand an-ast)))]
[(eq? (operator an-ast) '-)
`(- ,(substitute (left-operand an-ast))
,(substitute (right-operand an-ast))])))
```

• The last function determines the value of an arithmetic `AST`, that is, an `AST` that has no variables. Compare how this definition uses the `+` and `-` functions rather than the `'+` and `'-` symbols like the preceding definition.

```;; `AST  -->  Number`
;; replace the symbol `x` with  `val` throughout `an-ast`
(define (evaluate an-ast x val)
(cond
[(symbol? an-ast) (error 'evaluate "undefined identifier")]
[(number? an-ast) an-ast]
[(eq? (operator an-ast) '+)
(+ (evaluate (left-operand an-ast))
(evaluate (right-operand an-ast)))]
[(eq? (operator an-ast) '-)
(- (evaluate (left-operand an-ast))
(evaluate (right-operand an-ast))])))
```

As before, it is a bad idea to follow the function template blindly. On occasion, we need to compose functions on `AST`s rather than just design new ones. Here is an example:

```;; `AST (Listof (list Symbol Number))  -->  Number`
;; evaluate `an-ast` after substituting all symbols in `env`
;; with their corresponding numbers
(define (eval an-ast env)
(evaluate
(substitute* an-ast env)))

;; `AST (Listof (list Symbol Number))  -->  AST`
;; substitute all symbols in `an-ast` according to their association
;; in `env`
(define (substitute* an-ast env)
(cond
[(null? an-ast) an-ast]
[else (substitute (substitute* an-ast (cdr env))
(car (car env))
(cdr (car env)))]))

(equal?
'(+ (- 1 1) (+ 5 1))
(substitute* '(+ (- x 1) (+ y 1)) (list (cons x 1) (cons y 5))))

(equal?
6
(eval '(+ (- x 1) (+ y 1)) (list (cons x 1) (cons y 5))))
```

The function `eval` composes a new function, `substitute*`, with an ``old'' function, `evaluate`. If the given association list contains a number for every symbol that occurs in `an-ast`, `eval` will compute the same number as Scheme for the S-expression. A function like `eval` is a part of Scheme's repertoire. Look in Help Desk for details, because Scheme's `eval` is far more powerful than the one we just discussed.

In conclusion, self-referential data definitions are easy to deal with in programs. All we need to do is write functions whose organization is similar to the organization of the data definition. Furthermore, you have just seen a very powerful, and a very self-serving, application of S-expressions. In other parts of this book, you will see more practical applications of this important and old Scheme idea.

4 Scheme inherited these mechanisms from LISP, like all of its syntax, and we are grateful for that.

5 Yes, there are precise definitions.