6  S-expressions

The class of S-expressions gives a name to the fact that lists are heterogeneous and that, in particular, lists can contain lists:

An S-expression (short: S-exp) is one of:
  1. (cons S L) where S and L are S-exps

  2. AnyValue (but not a constructed value).

S-expressions differ from lists in two ways. First, any value is an S-expression. Second, a cons'ed S-expression may contain an arbitrary value in the second field, not just a list. Still, most people use cons in S-expressions so that there is really a list of S-expressions, which greatly simplifies the design of functions.

S-expressions are tree-shaped pieces of data. They are thus extremely useful for representing any structure form of information. People use them for collections of records, messages between machines, family trees, Web pages, and many more things.

Defining functions that process S-expressions is barely more complicated than defining functions that process lists. After all, we use the same fundamental constant and constructor. This function counts the number of chars inside of strings in an S-expression:

;; S-exp  -->  Number
;; to count the number of characters in strings within s
(define (count-char s)
    [(pair? s) (+ (count-char (car s)) (count-char (cdr s)))]
    [else (if (string? s) (string-length s) 0)]))

To use the function, we construct an S-expression and apply the function to it:

(count-char (cons (cons (cons 0 "hello") 'silly) "world"))

The S-expression in this example contains two strings, one number, and one symbol. The expected result is 10.

6.1  Quote, Backquote, Comma

While processing structured pieces of data is important, many programs must also construct large pieces of structured data. To this end, Scheme provides a quotation mechanism for S-expressions that seeks its analog in any modern programming language.4

A quotation mechanism introduces a short-hand for defining constants in a simple and readable manner. The string quotes are a primitive example of a quotation mechanism. A more interesting mechanism is the quote form for lists. The syntax for quote is (quote d) or even shorter 'd. Roughly speaking, quote traverses its single subexpression and inserts list to the right of each left parentheses and turns every token into a symbol if it looks like a symbol.5 For example,

'(1 2 3) is short for (list 1 2 3)
'((a 1) (#t "two")) expands to (list (list 'a 1) (list #t "two"))

We also get a cute name for the empty list as a bonus:

'() is a natural abbreviation for null

With quote, you can write down large pieces of data easily. Say you are a teaching assistant and you want to keep track of your students' grades. Defining the ``grade database'' as one large Scheme expression is straightforward: see figure 11 We simply quote the deeply nested parenthetical structure and have a collection of records. Because of this, quoted S-expressions play a huge role in Scheme programming.

Figure 11:  S-expressions for managing data

Still, while quote alone is a powerful way of writing down large pieces of information as structured data, Scheme has even more powerful tools for this purpose. In particular, Scheme has a mechanism for building complex S-expression shapes from templates. Roughly speaking, a template is an S-expressions with holes. The holes are filled with Scheme values. The mechanism for defining S-expression templates are quasiquote (or backquote) and unquote (or comma).

At first glance, a quasiquote expression is just like quote expression. The major difference is that a quasiquote expression may contain subexpressions that are prefixed with unquote (comma) and that Scheme evaluates these subexpressions rather than quoting them.

Using quasiquote and unquote permits programmers to write down (large) nearly constant S-expressions in a concise manner. Imagine we represent binary trees with S-expressions. The empty list () plays the role of the empty tree. A node is a list with three items: the left branch, the information, the right branch. For readability, we may annotate each piece of a node with a symbol that explains which role the subexpression plays. Here is an example:

'((left: ()) 
  (information: 5)
    ((left: ())
     (information: 7)
     (right: ()))))

This tree has two nodes, three empty subtrees, and two pieces of information (one per node): 5 and 7.

Naturally, we now want to define a function that consumes two trees and a piece of information to construct a new tree:

;; Tree Number Tree  -->  Tree
(define (make-tree left-tree info right-tree)
  `((left: ,left-tree)
    (information: ,info)
    (right: ,right-tree)))

The quasiquote expression contains three occurrences of unquoted expressions. All of them are variables and simply craft onto the tree template the subtrees and the new piece of data. To create the above tree, we just write

(make-tree '() 5 (make-tree '() 7 '()))

Compare two simpler applications of the function:

(make-tree '() 3 '()) ; produces 
'((left: ())
  (information: 3)
  (right: ()))
(make-tree '() 7 '()) ; produces 
'((left: ())
  (information: 7)
  (right: ()))
The comparison shows how the function builds similar S-expressions from one template.

Of course, the unquoted subexpressions doesn't need to be a variable. Suppose we wish to deal with numeric trees that also contain a running sum field. The leaf looks now like this:

'((sum: 0) (tree: ()))

and an interior node has this shape:

'((sum: 12)
  (left: ((sum: 0) (tree: ()))) 
  (information: 5)
    ((sum: 7)
     (left: ((sum: 0) (tree: ())))
     (information: 7)
     (right: ((sum: 0) (tree: ()))))))

Dealing with such trees requires functions that maintain the extra field properly:

;; Tree Number Tree  -->  Tree
(define (make-sum-tree left info right)
  `((sum: ,(+ (tree-sum left) info (tree-sum right)))
    (left: ,left) 
    (information: ,info)
    (right: ,right)))

;; Tree  -->  Number
(define (tree-number t)
  (second (car t)))

;; Tree
(define the-empty-tree
  '((sum: 0) (tree: ())))

In the definition of make-sum-tree, we have the same occurrences of unquoted variables as in make-tree, but we also have a complex unquoted arithmetic expression. When we apply make-sum-tree, this expression is evaluated and the resulting number becomes the 'sum field of the new tree.

;; (listof (cons String (listof Number)))  -->  S-exp
(define (make-table grades)
  `(table ([cellspacing "10"] [border "1"])
          (tr (th "Name") (th "Grades"))
          ,@(grades-rows grades)))

;; (listof (cons String (listof Number)))  -->  S-exp
(define (grades-rows grades)
  (map entry-row grades))

;; (cons String (listof Number))  -->  S-exp
(define (entry-row e)
  `(tr (td ,(car e)) ,@(make-cells (cdr e))))

;; (listof Number)  -->  (listof String)
(define (make-cells grades)
  (map (lambda (n) `(td ,(number-string n))) grades))

Figure 12:  Backquote, unquote, and splice

Finally, on some occasions, a program may have to splice a list of values into an S-expression. For example, you may wish to splice a list of grades into a header template that attaches some additional information to a grade listing:

;; String String (listof (cons String (listof Number)))  -->  S-exp
(define (header course teacher grades)
  `((Course: ,course)
    (Teacher: ,teacher)
    (Number of students: ,(length grades))

We could accomplish this insertion with list, cons, and append instead, but this would seriously reduce the readability of such templates.

Functions like header play an important role in Web programming (among many other things). S-expressions provide a simple and effective representation of HTML and XML inside of Scheme. Thus, if we wish to produce HTML pages, it is often convenient to define just such functions. Consider the specific example of converting a list of grades into an (S-expression representation of an) HTML table. That is, say we want a function make-table such that

(make-table '(("fritz" 1 1 1) ("ursula" 2 2 2)))
'(table ((cellspacing "10") (border "1"))
   (tr (th "Name") (th "Grades"))
   (tr (td "fritz") (td " 1 ") (td " 1 ") (td " 1 "))
   (tr (td "ursula") (td " 2 ") (td " 2 ") (td " 2 ")))

Figure 12 shows a program fragment that accomplishes just this kind of task. The program consists of four functions. Two wrap headers around tables and rows; two process rows and cells. The latter are written with a powerful loop operator, which we get to know in the next section.

6.2  S-expressions and Recursion

The class of S-expressions serves as a universal data representation language in Scheme. With S-expressions, it is easy to represent all forms of information conveniently. As a matter of fact, some are so common that they are built in via primitives that work on just those subsets.

One of these is the notion of an association list. An association list is a list of cons cells where the car field is always a symbol. The latter serves as a key to the data in the cdr field. It is similar to a dictionary that translates symbols to meaningful information. Take a look at this phone directory:

    '((matthias "Matthias Felleisen" "713.333.2001")
       (matthew    "Matthew Flatt"    "218.432.1119")
       (robby    "Robert F. Findler"    "713.317.2278")
       (shriram    "Shriram Krishnamurthi"    "713.289.0003")

The symbols in each list are roughly like nicknames, and the rest of each list is a formal name and a phone number.

Scheme also provides a function for looking up the data that comes with a symbol in an association list:

;; Symbol (Listof (cons Symbol X))  -->  (union #f (List Symbol X))
;; find a pair in l whose first field is s (if there is one)
(define (assq s l)
    [(null? l) #f]
    [else (if (eq? (car (car l)) s) (car l) (assq s (cdr l)))]))

(equal? '(a 9) (assq 'a '((b 2) (c 3) (a 9) (e 11))))
(not (assq 'z '((b 2) (c 3) (a 9) (e 11))))

The function is another instance of our template for list-processing functions. The slightly unusual aspect is that assq works with lists inside of lists.

With assq we can compose a program for looking up phone numbers:

;; Symbol  -->  String
(define (lookup nickname)
  (let ([r (assq nickname PHONE-DIRECTORY)])
    (if r 
        (string-append (second r) ": " (third r))
        (string-append ( symbol--> string nichname " not found")))))

This function looks up the nickname inn PHONE-DIRECTORY, checks whether r is #f or some cons cell, and creates a string as an answer. Remember that if, like cond, distinguishes #f from all other values, not just #t.

Unfortunately, it is not only easy to encode data as S-expressions, it also easy to forget which subset of the class of S-expressions was chosen for a data representation. Because of that, we propose to use data definitions that describe the structure of the chosen data explicitly. For example, an association list is defined as

An association list is a (Listof (cons Symbol X)).
or, more verbose, as
An association list is one of:
  1. null

  2. (cons (cons Symbol X) L) where X is any kind of value and L is an association list.

We have also dealt with trees as S-expressions already:

A tree (also known as (Treeof X)) is one of:
  1. X

  2. (list Tree Tree).

This is a generic tree definition, where proper data shows up at the leaves only.

S-expressions can also be used to encode Scheme's syntax:

An AST is one of:
  1. Symbol

  2. Number

  3. (list '+ AST AST)

  4. (list '- AST AST).

Since this definition is somewhat unusual, let's construct some examples and let's see how they represent Scheme expressions:
5                  5
'x                  x
'(+ 5 4) (+ 5 4)
'(- x 1) (- x 1)
'(+ (- x 1) (+ x 1)) (+ (- x 1) (+ x 1))
That is,

In general, we can find the represented expression by just stripping off the quote.

When data definitions become as large as the one for AST (or even larger), it is also a good practice to define selectors for the various pieces similar to the selectors that a structure definition introduces:

(define (addition-left-operand a) (second a))

(define (addition-right-operand a) (third a))

(define (subtraction-left-operand a) (second a))

(define (subtraction-right-operand a) (third a))

In this particular world, it may even make sense to use the same selectors for both kinds of complex expressions and additional selector for the operator:

(define (operator a) (car a))

(define (left-operand a) (second a))

(define (right-operand a) (third a))

In either case, the functions that process ASTs become more readable and less dependent on the actually chosen data representation. Switching to post-fix representations or structures later on is almost no problem, as long as we use the selectors and not the concrete representation.

To design a function template for this subclass of S-expressions, we proceed the same way we did for lists in section 4.3. First, we observe that the data definition has four clauses, which suggests that the function should have four clauses:

(define (function-for-ast an-ast)
    [(symbol? an-ast) ...]
    [(number? an-ast) ...]
    [(eq? (operator an-ast) '+) ...]
    [(eq? (operator an-ast) '-) ...]))

Second, we can also see that in clauses 3 and 4 the given AST should have two more components:

(define (function-for-ast an-ast)
    [(symbol? an-ast) ...]
    [(number? an-ast) ...]
    [(eq? (operator an-ast) '+)
     ... (left-operand an-ast) ... (right-operand an-ast) ...]
    [(eq? (operator an-ast) '-)
     ... (left-operand an-ast) ... (right-operand an-ast) ...]))

Last but not least, we see that the data definition refers to itself for the four subexpressions that we just added. This suggests that functions on ASTs naturally recur for those four expressions:

(define (function-for-ast an-ast)
    [(symbol? an-ast) ...]
    [(number? an-ast) ...]
    [(eq? (operator an-ast) '+) 
     ... (function-for-ast (left-operand an-ast)) ... 
     ... (function-for-ast (right-operand an-ast)) ...]
    [(eq? (operator an-ast) '-)
     ... (function-for-ast (left-operand an-ast)) ...
     ... (function-for-ast (right-operand an-ast)) ...]))

Equipped with this template we can now write some functions on ASTs:

As before, it is a bad idea to follow the function template blindly. On occasion, we need to compose functions on ASTs rather than just design new ones. Here is an example:

;; AST (Listof (list Symbol Number))  -->  Number
;; evaluate an-ast after substituting all symbols in env 
;; with their corresponding numbers 
(define (eval an-ast env)
    (substitute* an-ast env)))

;; AST (Listof (list Symbol Number))  -->  AST
;; substitute all symbols in an-ast according to their association
;; in env 
(define (substitute* an-ast env)
    [(null? an-ast) an-ast]
    [else (substitute (substitute* an-ast (cdr env))
	              (car (car env))
		      (cdr (car env)))]))

  '(+ (- 1 1) (+ 5 1))
  (substitute* '(+ (- x 1) (+ y 1)) (list (cons x 1) (cons y 5))))

  (eval '(+ (- x 1) (+ y 1)) (list (cons x 1) (cons y 5))))

The function eval composes a new function, substitute*, with an ``old'' function, evaluate. If the given association list contains a number for every symbol that occurs in an-ast, eval will compute the same number as Scheme for the S-expression. A function like eval is a part of Scheme's repertoire. Look in Help Desk for details, because Scheme's eval is far more powerful than the one we just discussed.

In conclusion, self-referential data definitions are easy to deal with in programs. All we need to do is write functions whose organization is similar to the organization of the data definition. Furthermore, you have just seen a very powerful, and a very self-serving, application of S-expressions. In other parts of this book, you will see more practical applications of this important and old Scheme idea.

4 Scheme inherited these mechanisms from LISP, like all of its syntax, and we are grateful for that.

5 Yes, there are precise definitions.