On this page:
22 The Poetry of S-expressions
22.1 Trees
22.2 Forests
22.3 S-expressions
22.4 Designing with Intertwined Data
22.5 Simplifying Functions
23 Incremental Refinement
23.1 Data Analysis
23.2 Refining Data Definitions
23.3 Refining Functions
24 Refining Interpreters
24.1 Interpreting Expressions
24.2 Interpreting Variables
24.3 Interpreting Functions
24.4 Interpreting Everything
25 The Commerce of XML
25.1 XML as S-expressions
25.2 Rendering XML Enumerations
25.3 Configuring State Machines
25.3.1 Configurations
25.3.2 Interpreting Configurations
26 Simultaneous Processing
26.1 Designing Functions for Two Complex Arguments
27 Summary

IV Intertwined Data

The data definitions for lists and natural numbers stand out due to their self-references. As it turns out, though, many interesting classes of data require even more complex data definitions than these two. Indeed, there is no end to the variations, and it therefore becomes a critical skill for a programmer to take any data definition as the starting point for a program. And that’s what the design recipe is all about.

In this part, we first work out a slightly generalized version of the design recipe that works for all forms of structural data definitions. Then we introduce the concept of iterative refinement, which is one of the reasons why all programmers are little scientists. Two more chapters are basically practice on refining data and functions: one on interpreters and one on XML processing. The last chapter expands the design recipe one more time, applying it to function that process two complex arguments at the same time.

22 The Poetry of S-expressions

Programming resembles poetry in many ways. For example, programmers—like poets—practice their skill on seemingly pointless ideas. In this spirit, this chapter introduces increasingly complex forms of data without a true purpose. Even when we provide a motivational background, the chosen kinds of data are “purist,” and it is unlikely that you will ever encounter these particular kinds of data again.

Nevertheless, this chapter shows the full power of the design recipe and introduces you to the kinds of data that real-world programs cope with. To connect this material with what you will encounter in your life as a programmer, we label each section with appropriate names: trees, forests, XML. The last one is a bit misleading, because it is really about S-expressions; the connection between S-expressions and XML is clarified in The Commerce of XML, which in contrast to this chapter, comes much closer to real-world uses of complex forms of data.

22.1 Trees

All of us have a family tree. One way to draw a family tree is to add a node every time a child is born. From the node, we can draw connections to the father and mother of the child. For those people in the tree whose parents are unknown, there is no connection to draw. The result is a so-called ancestor family tree because, given any person in the tree, we can find all of the person’s known ancestors.

Figure 54: A family tree

Figure 54 displays a three-tier family tree. Gustav is the child of Eva and Fred, while Eva is the child of Carl and Bettina. In addition to people’s names and family relationships, the tree also records years of birth and eye colors. Based on this sketch, you can easily imagine a family tree reaching back many generations and one that records other kinds of information.

Once a family tree is large, it makes sense to represent it as data and to design programs that process this kind of data. Given that a node in a family tree combines five pieces of information—the father, the mother, the name, the birth date, and the eye color—we should define a structure type:

(define-struct child (father mother name date eyes))

The structure type definition calls a data definition:
; A Child is a structure:
;   (make-child Child Child String N String)
While this data definition looks straightforward, it is also useless. It refers to itself but, because it doesn’t have any clauses, there is no way to create a proper instance Child. Roughly, we would have to write

(make-child (make-child (make-child ...) ...) ...)

without end. To avoid such pointless data definitions, we demand that a self-referential data definition has several clauses and that at least one of them does not refer back to the data definition.

Let’s postpone the data definition for a moment; let’s experiment instead. Suppose we are about to add a child to an existing family tree and that we already have representations for the parents. In that case, we can simply construct a new child structure. For example, to represent Adam in a program that already represents Carl and Bettina, it suffices to add the following child structure:

(define Adam (make-child Carl Bettina "Adam" 1950 "hazel"))

assuming Carl and Bettina stand for representations of Adam’s parents.

One problem is that we don’t always know a person’s parents, and its solution points to the correct data definitions. In the family tree of figure 54, we don’t know Bettina’s parents. Yet, even if we don’t know a person’s father or mother, we must still use some piece of data for the parent fields in a child representation of Bettina. Whatever data we choose, it must signal an absence of information. On the one hand, we could use false, "none", or empty from the pool of existing values. On the other hand, we should really say that it is missing information about a family tree, and we can achieve this objective best with the introduction of a structure type with an appropriate name:

(define-struct no-parent ())

Now, to construct a child structure for Bettina, we say

(make-child (make-no-parent) (make-no-parent) "Bettina" 1926 "green")

Of course, if only one piece of information is missing, we fill just that field with this special value.

The experiment suggests two insights. First, we are not looking for a data definition that describes how to generate instances of child structures but for a data definition that describes how to represent family trees. Second, the data definition consists of two clauses, one for the nodes describing unknown family trees and another one for known family trees:
(define-struct no-parent ())
(define-struct child (father mother name date eyes))
; A FT (family tree) is one of:
;  (make-no-parent)
;  (make-child FT FT String N String)
Since the “no parent” tree is going to show up a lot in our programs, we define MTFT as a short-hand and revise the data definition a bit:
(define MTFT (make-no-parent))
; A FT (family tree) is one of:
;  MTFT
;  (make-child FT FT String N String)
The use of a constant in the data definition should not bother you. You can replace a name by its value.

Following the design recipe from Designing with Self-Referential Data Definitions, we use the data definition to create examples of family trees. Specifically, we translate the family tree in figure 54 into our data representation. The information for Carl is easy to translate into a data:

(make-child MTFT MTFT "Carl" 1926 "green")

Bettina and Fred are represented with similar instances of child. The case for Adam calls for nested children, one for Carl and one for Bettina:
(make-child (make-child MTFT MTFT "Carl" 1926 "green")
            (make-child MTFT MTFT "Bettina" 1926 "green")
            "Adam"
            1950
            "hazel")
Since the records for Carl and Bettina are also needed to construct the records for Dave and Eva, it is better to introduce definitions that name specific instances of child and to use the variable names elsewhere. Figure 55 illustrates this approach for the complete data representation of the family tree from figure 54. Take a close look; the tree serves as our running example for the following design exercise.

; Oldest Generation:
(define Carl (make-child MTFT MTFT "Carl" 1926 "green"))
(define Bettina (make-child MTFT MTFT "Bettina" 1926 "green"))
 
; Middle Generation:
(define Adam (make-child Carl Bettina "Adam" 1950 "hazel"))
(define Dave (make-child Carl Bettina "Dave" 1955 "black"))
(define Eva (make-child Carl Bettina "Eva" 1965 "blue"))
(define Fred (make-child MTFT MTFT "Fred" 1966 "pink"))
 
; Youngest Generation:
(define Gustav (make-child Fred Eva "Gustav" 1988 "brown"))

Figure 55: A data representation of the sample family tree

Instead of designing a concrete function on family trees, let us first look at the generic organization of such a function. That is, let us work through the design recipe as much as possible without having a concrete task in mind. We start with the header material, i.e., step 2 of the recipe:
; FT -> ???
; ...
(define (fun-for-FT a-ftree) ...)
Even though we aren’t stating the purpose of the function, we do know that it consumes a family tree and that this form of data is the main input. The “???” in the signature says that we don’t know what kind of data the function produces; the “...” remind us that we don’t know its purpose.

The lack of purpose implies that we cannot make up functional examples. Nevertheless, we can exploit the organization of the data definition for FT to design a template. Since it consists of two clauses, the template must consist of a cond expression with two clauses:
; FT -> ???
(define (fun-for-FT a-ftree)
  (cond
    [(no-parent? a-ftree) ...]
    [else ; (child? a-ftree)
      ...]))
The first deals with MTFT, the second with child structures.

In case the argument to fun-for-FT satisfies no-parent?, the structure contains no additional data, so the first clause is complete. For the second clause, the input contains five pieces of data, which we indicate with five selectors in the template:
; FT -> ???
(define (fun-for-FT a-ftree)
  (cond
    [(no-parent? a-ftree) ...]
    [else
     (... (child-father a-ftree) ...
      ... (child-mother a-ftree) ...
      ... (child-name a-ftree) ...
      ... (child-date a-ftree) ...
      ... (child-eyes a-ftree) ...)]))

The last addition to templates concerns self-references. If a data definition refers to itself, the function is likely to recur and templates indicate so with suggestive natural recursions. Unlike lists, the data definition for family trees requires two self-references and the template therefore needs two recursions:
; FT -> ???
(define (fun-for-FT a-ftree)
  (cond
    [(no-parent? a-ftree) ...]
    [else
      (... (fun-for-FT (child-father a-ftree)) ...
       ... (fun-for-FT (child-mother a-ftree)) ...
       ... (child-name a-ftree) ...
       ... (child-date a-ftree) ...
       ... (child-eyes a-ftree) ...)]))
Specifically, fun-for-FT is applied to the data representation for fathers and mothers in the second cond clause, because the second clause of the data definition contains corresponding self-references.

Let us now turn to a concrete function, blue-eyed-child?. Its purpose is to determine whether any child structure in a given family tree has blue eyes. You may copy, paste, and rename fun-foor-FT to get its template; we replace “???” with Boolean and add a purpose statement;
; FT -> Boolean
; to determine whether a-ftree contains a child
; structure with "blue" in the eyes field
(define (blue-eyed-child? a-ftree)
  (cond
    [(no-parent? a-ftree) ...]
    [else
      (... (child-name a-ftree) ...
       ... (child-date a-ftree) ...
       ... (child-eyes a-ftree) ...
       ... (blue-eyed-child? (child-father a-ftree)) ...
       ... (blue-eyed-child? (child-mother a-ftree)) ...)]))
When you work in this fashion, you need to replace the template’s generic name with a specific one.

Checking with our recipe, we realize that we need to backtrack and develop some examples before we move on to the definition step. If we start with Carl, the first person in the family tree, we see that Carl’s family tree does not contain a node with a "blue" eye color. Specifically, the node representing Carl says the eye color is "green" and, given that Carl’s ancestor trees are empty, they cannot possibly contain a node with "blue" eye color:

(check-expect (blue-eyed-child? Carl) false)

In contrast, Gustav contains a node for Eva who does have blue eyes:

(check-expect (blue-eyed-child? Gustav) true)

Now we are ready to define the actual function. The function distinguishes between two cases: a no-parent node and a child node. For the first case, the answer should be obvious even though we haven’t made up any examples. Since the given family tree does not contain any child node whatsoever, it cannot contain one with "blue" as eye color. Hence the result in the first cond clause is false.

For the second cond clause, the design requires a lot more work. Again following the design recipe, we first remind ourselves what the expressions in the template accomplish:
  1. (child-name a-ftree) extracts the child’s name;

  2. (child-date a-ftree) extracts the child’s date of birth

  3. (child-eyes a-ftree), which extracts the child’s eye color

  4. according to the purpose statement for blue-eyed-child?, (blue-eyed-child? (child-father a-ftree)) determines whether some node in the father’s FT has "blue" eyes; and

  5. (blue-eyed-child? (child-mother a-ftree)) determines whether someone in the mother’s FT has blue eyes;

It is now time to combine the values of these expressions in an appropriate manner.

Clearly, if the given child structure contains "blue" in the eyes field, the function’s answer must be true. Furthermore, the expressions concerning names and birth dates are useless, which leaves us with the two recursive calls. As stated, (blue-eyed-child? (child-father a-ftree)) processes the father’s family tree and (blue-eyed-child? (child-mother a-ftree)) the mother’s. If either of these expressions returns true, a-ftree contains a child with "blue" eyes—even if we don’t know where exactly this node is located. Since the consideration of the father’s tree, the mother’s tree, and the child node itself covers all nodes in a-ftree, there are no other possibilities.

Our analysis suggests that the result should be true if one of the following three expressions is true:
  • (string=? (child-eyes a-ftree) "blue")

  • (blue-eyed-child? (child-father a-ftree))

  • (blue-eyed-child? (child-mother a-ftree))

which, in turn, means we need to combine them with or:
(or (string=? (child-eyes a-ftree) "blue")
    (blue-eyed-child? (child-father a-ftree))
    (blue-eyed-child? (child-mother a-ftree)))
Figure 56 pulls everything together in a single definition.

; FT -> Boolean
; to determine whether a-ftree contains a child
; structure with "blue" in the eyes field
 
(check-expect (blue-eyed-child? Carl) false)
(check-expect (blue-eyed-child? Gustav) true)
 
(define (blue-eyed-child? a-ftree)
  (cond
    [(no-parent? a-ftree) false]
    [else (or (string=? (child-eyes a-ftree) "blue")
              (blue-eyed-child? (child-father a-ftree))
              (blue-eyed-child? (child-mother a-ftree)))]))

Figure 56: Finding a blue-eyed child in an ancestor tree

Since this function is the very first one to use two recursions, we simulate the Stepper’s action for (blue-eyed-child? Carl) to give you an impression of how it all works:

   (blue-eyed-child? Carl)

== (blue-eyed-child? (make-child MTFT MTFT "Carl" 1926 "green"))

==

(cond

  [(no-parent? (make-child MTFT MTFT "Carl" 1926 "green")) false]

  [else

    (or (string=?

          (child-eyes (make-child MTFT MTFT "Carl" 1926 "green"))

          "blue")

        (blue-eyed-child?

          (child-father (make-child MTFT MTFT "Carl" 1926 "green")))

        (blue-eyed-child?

          (child-mother (make-child MTFT MTFT "Carl" 1926 "green"))))])

==

(or (string=?

      (child-eyes (make-child MTFT MTFT "Carl" 1926 "green"))

      "blue")

    (blue-eyed-child?

      (child-father (make-child MTFT MTFT "Carl" 1926 "green")))

    (blue-eyed-child?

      (child-mother (make-child MTFT MTFT "Carl" 1926 "green"))))

==

(or (string=? "green" "blue")

    (blue-eyed-child?

      (child-father (make-child MTFT MTFT "Carl" 1926 "green")))

    (blue-eyed-child?

      (child-mother (make-child MTFT MTFT "Carl" 1926 "green"))))

==

(or false

    (blue-eyed-child? MTFT)

    (blue-eyed-child? MTFT))

== (or false false false)

== false

The calculation uses several steps where we replace equals by equals, assuming appropriate auxiliary calculations, something the Stepper wouldn’t do. For example:

   (blue-eyed-child? MTFT)

== (blue-eyed-child? (make-no-parent))

==

(cond

  [(no-parent? (make-no-parent)) false]

  [else

    (or (string=? (child-eyes (make-no-parent)) "blue")

        (blue-eyed-child? (child-father (make-no-parent)))

        (blue-eyed-child? (child-mother (make-no-parent))))])

==

(cond

  [true false]

  [else

    (or (string=? (child-eyes (make-no-parent)) "blue")

        (blue-eyed-child? (child-father (make-no-parent)))

        (blue-eyed-child? (child-mother (make-no-parent))))])

== false

We trust you have seen such auxiliary calculations in math.

Exercise 242: Develop count-persons. The function consumes a family tree node and counts the child structures in the tree.

Exercise 243: Develop the function average-age. It consumes a family tree node and the current year. It produces the average age of all child structures in the family tree.

Exercise 244: Develop the function eye-colors, which consumes a family tree node and produces a list of all eye colors in the tree. An eye color may occur more than once in the resulting list.

Hint: Use the append operation to concatenate lists:

     (append (list 'a 'b 'c) (list 'd 'e))

  == (list 'a 'b 'c 'd 'e)

We discuss the development of functions like append in section Simultaneous Processing.

Exercise 245: Suppose we need the function blue-eyed-ancestor?. It is like blue-eyed-child? but responds with true only when an ancestor, not the given child node, has blue eyes.

Even though the functions differ, their signatures are the same:
; FT -> Boolean
; to determine whether a-ftree contains a child node with blue eyes
(define (blue-eyed-ancestor? a-ftree) ...)
To appreciate the difference, we take a look at Eva:

(check-expect (blue-eyed-child? Eva) true)

Eva is blue-eyed. Because she does not have a blue-eyed ancestor, however, we get

(check-expect (blue-eyed-ancestor? Eva) false)

Now suppose a friend comes up with this solution:
(define (blue-eyed-ancestor? a-ftree)
  (cond
    [(no-parent? a-ftree) false]
    [else (or (blue-eyed-ancestor? (child-father a-ftree))
              (blue-eyed-ancestor? (child-mother a-ftree)))]))
Explain why this function fails the tests. What would be the result of (blue-eyed-ancestor? A) no matter which A you choose? Can you fix your friend’s solution?

22.2 Forests

It is a short step from a family tree to an entire forest of information about families:
; A FF (family forest) is one of:
;  empty
;  (cons FT FF)
A family forest could represent the information a town maintains about all its families. Here are some trees excerpts from figure 54 arranged as forests:
(define ff1 (list Carl Bettina))
(define ff2 (list Fred Eva))
(define ff3 (list Fred Eva Carl))
The first two forests contain two unrelated families, and the third one illustrates that unlike in real forests, trees in family forests can overlap.

Here is a representative problem concerning family trees:

Sample Problem: Design the function blue-eyed-child-in-forest?, which determines whether a family forest contains a child with "blue" in the eyes field.

The straightforward solution is displayed in figure 57.

; FF -> Boolean
; does the forest contain any child node with "blue" eyes
 
(check-expect (blue-eyed-child-in-forest? ff1) false)
(check-expect (blue-eyed-child-in-forest? ff2) true)
(check-expect (blue-eyed-child-in-forest? ff3) true)
 
(define (blue-eyed-child-in-forest? a-forest)
  (cond
    [(empty? a-forest) false]
    [else (or (blue-eyed-child? (first a-forest))
              (blue-eyed-child-in-forest? (rest a-forest)))]))

Figure 57: Finding a blue eyed child in a family forest

We leave it to you to figure out the signature, the purpose statement, and the examples for this solution. As for the template, the design can employ the list template because blue-eyed-child-in-forest? consumes lists. If each item on the list were a structure with an eyes field and nothing else, the function would iterate over those structures using the selector function for the eyes field and a string comparison. In this case, each item is a family tree but luckily, we already know how to process family trees.

Let us step back for a moment and inspect how we explained figure 57. The starting point is a pair of data definitions where the second refers to the first and both refer to themselves. The result is a pair of functions where the second refers to the first and both refer to themselves. In other words, the function definitions refer to each other the same way the data definitions refer to each other. To some extent, this relationship is a simple generalization of what we have seen so far. The difference is that two data definitions and two functions are involved, but not counting the self-references of both data definitions, we have encountered this situation before in some of our projects; we just happen to gloss over it. Instead of dwelling on the idea in the context of trees, we move on to S-expressions and articulate a generalized design recipe afterwards.

Exercise 246: Reformulate the data definition for FF with the List-of abstraction. Now do so for the signature of blue-eyed-child-in-forest?. Finally, define blue-eyed-child-in-forest? using one of the list abstractions introduced in the previous chapter.

Exercise 247: Design the function average-age. It consumes a family forest and a year, specified as a natural number. From this data, it produces the average age of all child nodes in the forest. Note: If the trees in this forest overlap, the result isn’t a true average because some people may contribute more than others. For this exercise, act as if the trees don’t overlap.

22.3 S-expressions

While Intermezzo: Quote, Unquote introduced S-expressions on an informal basis, it is possible to describe them with a combination of three data definitions:
; An S-expr (S-expression) is one of:
;  Atom
;  SL
; An SL (S-list) is one of:
;  empty
;  (cons S-expr SL)
; An Atom is one of:
;  Number
;  String
;  Symbol
Recall that Symbols look like strings with a single quote at the beginning and with no quote at the end.

The idea of S-expressions is due to John McCarthy and his Lispers, who created S-expressions in 1958 so that they could process Lisp programs with other Lisp programs. This seemingly circular reasoning may sound esoteric but as mentioned in Intermezzo: Quote, Unquote, S-expressions are a versatile form of data that is often rediscovered, most recently with applications to the world wide web. Working with S-expressions thus prepares a discussion of how to design functions for highly intertwined data definitions.

Exercise 248: Define atom?. The function is not a function that comes with ISL+.

Up to this point in this book, no data has required a data definition as complex as the one for S-expressions. And yet, with one extra hint, you can design functions that process S-expressions if you follow the design recipe. To demonstrate this point, let us work through a specific example:

Sample Problem: Design the function count, which determines how many times some symbol sy occurs in some S-expression s.

While the first step calls for data definitions and appears to have been completed, remember that it also calls for the creation of data examples, especially when the definition is complex.

A data definition is supposed to be a prescription of how to create data, and its “test” is whether it is usable. One point that the data definition for S-expr makes is that every Atom is an element of S-expr, and you know that Atoms are easy to fabricate:
'hello
20.12
"world"
In the same vein, every SL is an S-expr, and you know how to make up lists:
empty
(cons 'hello (cons 20.12 (cons "world" empty)))
(cons (cons 'hello (cons 20.12 (cons "world" empty)))
      empty)
The first two are obvious, the third one deserves a second look. It repeats the second S-expr but nested inside (cons ... empty). What this means is that it is a list that contains a single item, namely, the second example. You can simplify the example with list:
(list (cons 'hello (cons 20.12 (cons "world" empty))))
; or
(list (list 'hello 20.12 "world"))

Indeed, with the quotation mechanism of Intermezzo: Quote, Unquote it is even easier to write down S-expressions. Here are the last three repeated in this notation:
> '()

empty

> '(hello 20.12 "world")

(list 'hello #i20.12 "world")

> '((hello 20.12 "world"))

(list (list 'hello #i20.12 "world"))

To help you out, we evaluate these examples in the interactions area of DrRacket so that you can see the result, which is closer to the above constructions than the quote notation.

With quote, it is quite easy to make up complex examples:
> '(define (f x)
     (+ (* 3 x x) (* -2 x) 55))

(list

 'define

 (list 'f 'x)

 (list '+ (list '* 3 'x 'x) (list '* -2 'x) 55))

This example may strike you as odd because it looks like a definition in BSL but, as the interaction with DrRacket shows, it is just a piece of data. Here is another one:
> '((6 f)
    (5 e)
    (4 d))

(list (list 6 'f) (list 5 'e) (list 4 'd))

This piece of data looks like a table, associating letters with numbers. The last example is a piece of art:
> '(wing (wing (wing body wing) wing) wing)

(list 'wing (list 'wing (list 'wing 'body 'wing) 'wing) 'wing)

It is now time to write down the rather obvious header for count:
; S-expr Symbol -> N
; count all occurrences of sy in sexp
(define (count sexp sy)
  0)
Since the header is obvious, we move on to functional examples. If the given S-expr is 'world and the to-be-counted symbol is 'world, the answer is obviously 1. Here are some more examples, immediately formulated as tests:
(check-expect (count 'world 'hello) 0)
(check-expect (count '(world hello) 'hello) 1)
(check-expect (count '(((world) hello) hello) 'hello) 2)
You can see how convenient quotation notation is for test cases. When it comes to templates, however, thinking in terms of quote or even list would be disastrous.

Before we move on to the template step, we need to give you one hint:

For intertwined data definitions, create one template per data definition. Create them in parallel. Make sure they refer to each other in the same way the data definitions do.

This hint sounds more complicated than it is. For our problem, it means we need three templates:
  1. one for count, which counts occurrences of symbols in S-exprs;

  2. one for a function that counts occurrences of symbols in SLs;

  3. and one for a function that counts occurrences of symbols in Atoms;

And here are three partial templates, with conditionals as suggested by the three data definitions:
(define (count sexp sy)
 (cond
   [(atom? sexp) ...]
   [else ...]))
 
(define (count-sl sl sy)
  (cond
    [(empty? sl) ...]
    [else ...]))
 
(define (count-atom at sy)
  (cond
    [(number? at) ...]
    [(string? at) ...]
    [(symbol? at) ...]))
The template for count contains a two-pronged conditional, because the data definition for S-expr has two clauses. It uses the atom? function to distinguish the case for Atoms from the case for SLs. The template named count-sl consumes an element of SL and a symbol, and because SL is basically a list, count-sl also contains a two-pronged cond. Finally, count-atom is supposed to work for Atoms and Symbols. And this means that its template checks for the three distinct forms of data mentioned in the data definition of Atom.

The next step in the template creation process is to take apart compound pieces of data:
(define (count sexp sy)
 (cond
   [(atom? sexp) ...]
   [else ...]))
 
(define (count-sl sl sy)
  (cond
    [(empty? sl) ...]
    [else (... (first sl) ... (rest sl) ...)]))
 
(define (count-atom at sy)
  (cond
    [(number? at) ...]
    [(string? at) ...]
    [(symbol? at) ...]))
Since only count-sl deals with constructed data, we add just two selector expressions.

The last step in the template creation process calls for an inspection of self-references in the data definitions. In our context, this means self-references and references from one data definition to another and (possibly) back. Let us inspect the cond lines in the three templates:
  1. The atom? line in count corresponds to the first line in the definition of S-expr, which points to Atom. To indicate this reference from one data definition to the other inside the template, we add (count-atom sexp sy), meaning we interpret sexp as an Atom and let the appropriate function deal with it.

  2. Following the same line of thought, the second cond line in count calls for the addition of (count-sl sexp sy).

  3. The empty? line in count-sl corresponds to a line in the data definition that makes no reference to another data definition.

  4. In contrast, the else line contains two selector expressions, and each of them extracts a different kind of value. Specifically, (first sl) is an element of S-expr, which means that we should wrap it in (count ...). After all, count is responsible for counting inside of arbitrary S-exprs. In the same vein, (rest sl) corresponds to a self-reference, and we know that we need to deal with those via recursive function calls.

  5. Finally, all three cases in Atom refer to atomic forms of data. Therefore the count-atom function does not need to change.

(define (count sexp sy)
 (cond
   [(atom? sexp) (count-atom sexp sy)]
   [else (count-sl sexp sy)]))
 
(define (count-sl sl sy)
  (cond
    [(empty? sl) ...]
    [else (... (count (first sl) sy)
           ... (count-sl (rest sl) sy) ...)]))
 
(define (count-atom at sy)
  (cond
    [(number? at) ...]
    [(string? at) ...]
    [(symbol? at) ...]))

Figure 58: A template for S-expressions

Figure 58 presents the three complete templates. Filling in the blanks in these templates is straightforward:
; S-expr Symbol -> N
; count all occurrences of sy in sexp
(define (count sexp sy)
 (cond
   [(atom? sexp) (count-atom sexp sy)]
   [else (count-sl sexp sy)]))
 
; SL Symbol -> N
; count all occurrences of sy in sl
(define (count-sl sl sy)
  (cond
    [(empty? sl) 0]
    [else (+ (count (first sl) sy) (count-sl (rest sl) sy))]))
 
; Atom Symbol -> N
; count all occurrences of sy in at
(define (count-atom at sy)
  (cond
    [(number? at) 0]
    [(string? at) 0]
    [(symbol? at) (if (symbol=? at sy) 1 0)]))
With this idea in mind, you can easily explain any random line in these definitions:
  • [(atom? sexp) (count-atom sexp sy)] checks that sexp is an atom. Hence, it interprets the S-expr as an Atom and calls count-atom on it. Doing so means the latter function counts how often sy occurs in sexpwhich is precisely what we want, but specialized to the appropriate kind of data.

  • [else (+ (count (first sl) sy) (count-sl (rest sl) sy))] means the given list consists of two parts: an S-expr and an SL. By using count and count-sl, the corresponding functions are used to count how often sy appears in each part, and the two numbers are added up—yielding the total number of sys in all of sexp.

  • [(symbol? at) (if (symbol=? at sy) 1 0)] tells us that if an Atom is a Symbol, sy occurs once if it is equal to sexp and otherwise it does not occur at all. Since the two pieces of data are atomic, there is no other possibility.

Voilà, you have systematically designed a function for S-expressions.

Exercise 249: In a sense, we designed one program that consists of three connected functions. To express this fact, we should use local to organize the definitions:
; S-expr Symbol -> N
; count all occurrences of sy in sexp
(define (count sexp sy)
  (local (; S-expr Symbol -> N
          ; the main function
          (define (count-sexp sexp sy)
            (cond
              [(atom? sexp) (count-atom sexp sy)]
              [else (count-sl sexp sy)]))
 
          ; SL Symbol -> N
          ; count all occurrences of sy in sl
          (define (count-sl sl sy)
            (cond
              [(empty? sl) 0]
              [else (+ (count-sexp (first sl) sy) (count-sl (rest sl) sy))]))
 
          ; Atom Symbol -> N
          ; count all occurrences of sy in at
          (define (count-atom at sy)
            (cond
              [(number? at) 0]
              [(string? at) 0]
              [(symbol? at) (if (symbol=? at sy) 1 0)])))
    ;  IN
    (count-sexp sexp sy)))
Notice that we also renamed the first function to indicate that its primary argument is an S-expr.

Copy the above definition into DrRacket and validate with the test suite for count that it works properly.

The second argument to the local functions, sy, never changes. It is always the same as the original symbol. Hence you can eliminate it from the local function definitions and function applications. Do so and re-test the revised definition. In Simplifying Functions, we show you how to simplify these kinds of definitions even more—which is easy to do when you have designed functions systematically.

Exercise 250: Design depth. The function consumes an S-expression and determines its depth. An atom has a depth of 1. The depth of a list of S-expressions is the maximum depth of its items plus 1.

Exercise 251: Design the substitute function. It consumes an S-expression s and two symbols, old and new. The result is like s with all occurrences of old replaced by new.

Exercise 252: Reformulate the data definition for S-expr so that the first clause is expanded into the three clauses of Atom and the second clause uses the List-of abstraction.

Re-design the count function for this data definition. Note: This kind of simplification is not always possible though experienced programmers tend to recognize and use such opportunities.

Exercise 253: Abstract the data definitions for S-expr and SL so that they abstract over the kinds of atoms that may appear.

22.4 Designing with Intertwined Data

The jump from self-referential data definitions to collections of data definitions with mutual references is far smaller than the one from data definitions for finite data to self-referential data definitions. Indeed, the design recipe for self-referential data definitions—see Designing with Self-Referential Data Definitionsneeds only minor adjustments to apply to this seemingly complex situation:
  1. The need for “nests” of mutually related data definitions is similar to the one for the need for self-referential data definitions. The problem statement deals with many distinct kinds of information, and one form of information refers to other kinds.

    Before you proceed in such situations, you should draw arrows on top of the data definitions that connect references to definitions. For example, the definition for S-expr contains references to SL and Atom, which you should connect with the two respective definitions. Similarly, the definition of SL contains one self-reference and one reference back to SL.

    Figure 59: Arrows for nests of data definitions

    Like self-referential data definitions, these nests also call for validation. At a minimum, you must be able to construct some examples for every individual data definition. Before you even start, make sure some of the data definitions contain clauses that do not refer to any of the other data definitions that come with this nest.

    You can, and should, also check the validity of self-referential data definitions with the creation of examples. Keep in mind that the definition may be invalid if it is impossible to generate examples from them.

  2. The key change is that you must design as many functions in parallel as there are data definitions. Each function specializes for one of the data definitions; all remaining arguments remain the same. Based on that, you start with a signature, a purpose statement, and a dummy definition for each function.

  3. Be sure to work through functional examples that use all self and mutual references in the nest of data definitions. For finding mistakes, it is best to derive the examples for the auxiliary functions from the examples for the primary function.

    Take a look at the examples for count above. It comes with three examples. The first one can obviously serve as an example for count-atom:

    (check-expect (count-atom 'world 'hello) 0)

    In contrast, the other two are good examples for count-sl:
    (check-expect (count '(world hello) 'hello) 1)
    (check-expect (count '(((world) hello) hello) 'hello) 2)

  4. For each function design the template according to its primary data definition. Use figure 33 to guide the template creation up to the last step.

    The last template creation step now calls for a check for all self and cross references. Use the data definitions annotated with arrows to guide this step. For each arrow, include a self-call or a cross-call from one function to another. See figure-ref{fig:def-fun-mutual-arrows} for the arrow-annotated version of the templates for counting atoms in an S-expr.

    Figure 60: Arrows for mutual references in templates

    Last but not least, replace the arrows with actual function calls.

    Note: Once again, you should notice the symmetry between the arrow-enriched data definition in figure 59 and the arrow-enriched template in figure 60. This symmetry is evidence that the design recipe provides a natural way for going from problems to solutions.

  5. For the design of the body we start with those cond lines that do not contain natural recursions or calls to other functions. They are called base cases. The corresponding answers are typically easy to formulate or are already given by the examples. After that, you deal with the self-referential cases and the cases of cross-function calls. Let the questions and answers of figure 35 guide you.

  6. Run the test suite. If an auxiliary function is broken, you will get two error reports if you followed our advice concerning examples. A single fix should eliminate both. Do make sure that running the tests covers all the pieces of the function.

22.5 Simplifying Functions

Exercise 249 shows how to use local to organize a function that deals with an intertwined form of data. This organization also helps simplify functions once we have know that the data definition is final. To demonstrate this point, we explain how to simplify the solution of exercise 251.

Here is a complete definition of the substitute function, using local and three auxiliary functions as suggested by the data definition:
; S-expr Symbol Atom -> S-expr
; replace all occurrences of old in sexp with new
 
(check-expect (substitute 'world 'hello 0) 'world)
(check-expect (substitute '(world hello) 'hello 'bye) '(world bye))
(check-expect (substitute '(((world) bye) bye) 'bye '42) '(((world) 42) 42))
 
(define (substitute sexp old new)
  (local (; S-expr -> S-expr
          (define (subst-sexp sexp)
            (cond
              [(atom? sexp) (subst-atom sexp)]
              [else (subst-sl sexp)]))
 
          ; SL -> S-expr
          (define (subst-sl sl)
            (cond
              [(empty? sl) empty]
              [else (cons (subst-sexp (first sl)) (subst-sl (rest sl)))]))
 
          ; Atom -> S-expr
          (define (subst-atom at)
            (cond
              [(number? at) at]
              [(string? at) at]
              [(symbol? at) (if (symbol=? at old) new at)])))
    ;  IN
    (subst-sexp sexp)))
We have included test cases so that you can re-test the function after each edit.

Exercise 254: Copy and paste the above definition into DrRacket, including the test suite. Run and validate that the test suite passes. As you read along the remainder of this section, perform the edits and re-run the test suites to confirm the validity of our arguments.

Since we know that SL describes lists of S-expr, we can use map to simplify subst-sl:
(define (substitute.v1 sexp old new)
  (local (; S-expr -> S-expr
          (define (subst-sexp sexp)
            (cond
              [(atom? sexp) (subst-atom sexp)]
              [else (subst-sl sexp)]))
 
          ; SL -> S-expr
          (define (subst-sl sl)
            (map subst-sexp sl))
 
          ; Atom -> S-expr
          (define (subst-atom at)
            (cond
              [(number? at) at]
              [(string? at) at]
              [(symbol? at) (if (symbol=? at old) new at)])))
    ;  IN
    (subst-sexp sexp)))
Its original definition says that subst-sexp is applied to every item on sl; the new definition expresses the same idea more succinctly with map.

For the second simplification step, we need to introduce eq?, another built-in function of our programming language. All you need to know for now is that eq? determines whether two pieces of atomic data are equal. In other words, if s is a symbol and t is a number, (eq? s t) evaluates to false because a string and a number can never be equal; but (eq? s t) produces true, if t is also a symbol and spells exactly like s.

With this in mind, the third local function becomes a one-liner:
(define (substitute.v2 sexp old new)
  (local (; S-expr -> S-expr
          (define (subst-sexp sexp)
            (cond
              [(atom? sexp) (subst-atom sexp)]
              [else (subst-sl sexp)]))
 
          ; SL -> S-expr
          (define (subst-sl sl)
            (map subst-sexp sl))
 
          ; Atom -> S-expr
          (define (subst-atom at)
            (if (eq? at old) new at)))
    ;  IN
    (subst-sexp sexp)))

At this point the last two local definitions consist of a single line. Furthermore, neither definition is recursive. Hence we can in-line the functions in subst-sexp, the first local function. In-lining means replacing (subst-atom sexp) with (if (eq? sexp old) new sexp), that is, we replace the parameter at with the actual argument sexp.While sexp is also a parameter, this substitution is really acceptable because it, too, stands in for an actual value. Similarly, for (subst-sl sexp) we put in (map subst-sexp sexp):
(define (substitute.v3 sexp old new)
  (local (; S-expr -> S-expr
          (define (subst-sexp sexp)
            (cond
              [(atom? sexp) (if (eq? sexp old) new sexp)]
              [else (map subst-sexp sexp)])))
    ;  IN
    (subst-sexp sexp)))
All we are left with now is a function whose definition introduces one local function, which is called on the same major argument. If we systematically supplied the other two arguments, we would immediately see that the local function can be used in lieu of the outer one.

Here is the result of translating this last thought into code:
(define (substitute sexp old new)
  (cond
    [(atom? sexp) (if (eq? sexp old) new sexp)]
    [else (map (lambda (s) (substitute s old new)) sexp)]))

23 Incremental Refinement

When you develop real-world programs, you may confront complex forms of information and the problem of representing them with data. The best strategy to approach this task is to use iterative refinement, a well-known scientific process. A scientist’s problem is to represent a part of the real world using some form of mathematics. The result of the effort is called a model. The scientist then tests the model in many ways, in particular by predicting the outcome of experiments. If the discrepancies between the predictions and the measurements are too large, the model is refined with the goal of improving the predictions. This iterative process continues until the predictions are sufficiently accurate.

Consider a physicist who wishes to predict a rocket’s flight path. While a “rocket as a point” representation is simple, it is also quite inaccurate, failing to account for air friction. In response, the physicist may add the rocket’s rough contour and introduce the necessary mathematics to represent friction. This second model is a refinement of the first model. In general, a scientist iterates this process until the model predicts the rocket’s flight path with sufficient accuracy.

A programmer trained in a computer science department should proceed like this physicist. The key is to find an accurate data representation of the real-world information and functions that process them appropriately. Complicated situations call for a refinement process to get to a sufficient data representation combined with the proper functions. The process starts with the essential pieces of information and adds others as needed. Sometimes a programmer must refine a model after the program has been deployed because users request additional functionality.

So far we have used iterative refinement for you when it came to complex forms of data. This chapter illustrates iterative refinement as a principle of program development with an extended example, representing and processing (portions of) a computer’s file system. We start with a brief discussion of the file system and then iteratively develop three data representations. Along the way, we propose some programming exercises so that you see how the design recipe also helps modify existing programs.

23.1 Data Analysis

Before you turn off DrRacket, you want to make sure that all your work—your programs, your data—are stashed away somewhere. Otherwise you have to reenter everything when you fire up DrRacket next. So you ask your computer to save programs and data in files. A file is a sequence of small pieces of characters, also called bytes. For our purposes here, a file resembles a string.

Figure 61: A sample directory tree

On most computer systems, files are organized in directories or folders. Roughly speaking, a directory contains some files and some more directories. The latter are called subdirectories and may contain yet more subdirectories and files, and so on. The entire collection is also called a directory tree.

Figure 61 contains a graphical sketch of a small directory tree, and the picture explains why computer scientists call them trees. Contrary to computer scientists, the figure shows the tree growing upwards, with a root directory named TS. The root directory contains one file, called read!, and two subdirectories, called Text and Libs, respectively. The first subdirectory, Text, contains only three files; the latter, Libs, contains only two subdirectories, each of which contains at least one file. Finally each box has one of two annotations: a directory is annotated with DIR, and a file is annotated with a number, its size.

Exercise 255: How many times does a file name read! occur in the directory tree TS? Can you describe the path from the root directory to the two occurrences? What is the total size of all the files in the tree? What is the total size of the directory if each directory node has size 1? How many levels of directories does it contain?

23.2 Refining Data Definitions

Exercise 255 lists some of the questions that users routinely ask about directories. To answer such questions, the computer’s operating system provides programs that can answer just such questions. If you want to design such programs, you need to develop a data representation for directory trees.

In this section, we use iterative refinement to develop three such data representations. For each stage, we need to decide which attributes to include and which to ignore. Consider the directory tree in figure 61 and imagine how it is created. When a user first creates a directory, it is empty. As time goes by, the user adds files and directories. In general, a user refers to files by names but mostly thinks of directories as containers of other things.

Model 1: Our thought experiment suggests that our first model should focus on files as atomic entities with a name and directories as containers. Here is a data definition that deals with directories as lists and files as symbols, i.e., their names:
; A Dir.v1 (short for directory) is one of:
;  empty
;  (cons File.v1 Dir.v1)
;  (cons Dir.v1 Dir.v1)
 
; A File.v1 is a Symbol.
The names have a .v1 suffix to distinguish them from future refinements.

Exercise 256: Translate the directory tree in figure 61 into a data representation according to model 1.

Exercise 257: Design the function how-many, which determines how many files a given Dir.v1 contains. Remember to follow the design recipe; exercise 256 provides you with data examples.

Model 2: If you solved exercise 257, you know that this first data definition is still reasonably simple. But, it also obscures the nature of directories. With this first representation, we would not be able to list all the names of the subdirectories of some given directory. To model directories in a more faithful manner than containers, we must introduce a structure type that combines a name with a container:

(define-struct dir (name content))

This new structure type, in turn, suggests the following revision of the data definition:
; A Dir.v2 is a structure:
;   (make-dir Symbol LOFD)
 
; A LOFD (short for list of files and directories) is one of:
;  empty
;  (cons File.v2 LOFD)
;  (cons Dir.v2 LOFD)
 
; A File.v2 is a Symbol.
Note how the data definition for Dir.v2 refers to the definition for LOFDs and the one for LOFDs refers back to that of Dir.v2. The two definitions are mutually recursive.

Exercise 258: Translate the directory tree in figure 61 into a data representation according to model 2.

Exercise 259: Design the function how-many, which determines how many files a given Dir.v2 contains. Exercise 258 provides you with data examples. Compare your result with that of exercise 257.

Exercise 260: Show how to equip a directory with two more attributes: size and readability. The former measures how much space the directory itself (as opposed to its files and subdirectories) consumes; the latter specifies whether anyone else besides the user may browse the content of the directory.

Model 3: Like directories, files have attributes. To introduce these, we proceed just as above. First, we define a structure for files:

(define-struct file (name size content))

Second, we provide a data definition:
; A File.v3 is a structure:
;   (make-file Symbol N String)
As indicated by the field names, the symbol represents the name of the file, the natural number its size, and the string its content.

Finally, let us split the content field of directories into two pieces: a list of files and a list of subdirectories. This change requires a revision of the structure type definition:

(define-struct dir.v3 (name dirs files))

Here is the refined data definition:
; A Dir.v3 is a structure:
;   (make-dir.v3 Symbol Dir* File*)
 
; A Dir* is one of:
;  empty
;  (cons Dir.v3 Dir*)
 
; A File* is one of:
;  empty
;  (cons File.v3 File*)
Following a convention in computer science, the use of * as the ending of a name suggests “many” and is a marker distinguishing the name from similar ones (File.v3 and Dir.v3).

Exercise 261: Translate the directory tree in figure 61 into a data representation according to model 3. Use "" for the content of files.

Exercise 262: Design the function how-many, which determines how many files a given Dir.v3 contains. Exercise 261 provides you with data examples. Compare your result with that of exercise 259.

Considering the complexity of the above data definition you should contemplate how anyone can systematically design correct functions. Why are you confident that how-many produces correct results?

Exercise 263: Use List-of to simplify the data definition Dir.v3. Then use ISL+’s list processing functions from figure 51 to simplify the function definition(s) for the solution of exercise 262.

Starting with a simple representation of the first model and refining it step by step, we have developed a reasonably accurate data representation for directory trees. Indeed, this third data representation captures the nature of a directory tree much more faithfully than the first two. Based on this model, we can create a number of other functions that users expect from a computer’s operating system.

23.3 Refining Functions

To make the following exercises somewhat realistic, DrRacket comes with the "dir.ss" teachpack. This teachpack introduces the two structure type definitions from model 3, though without the .v3 suffix. Furthermore, the teachpack provides a function that creates model 3-style data representations of directory trees on your computer:
; String -> Dir.v3
; creates a data representation of the directory that a-path identifies
(define (create-dir a-path) ...)
For example, if you open DrRacket and enter the following three lines into the definitions area:
(require htdp/dir)
(define d0 (create-dir "/Users/...")) ; on OS X
(define d1 (create-dir "/var/log/")) ; on OS X
(define d2 (create-dir "C:\\Users\\...")) ; on Windows
you get data representations of directories on your computer after you save and then run the program.The function informs you via print outs about inaccessible directories. Indeed, you could use create-dir to map the entire file system on your computer to an instance of Dir.v3. Warning 1: For large directory trees, DrRacket may need a lot of time to build a representation. Use create-dir on small directory trees first. Warning 2: Do not define your own dir structure type. The teachpack already defines them, and you must not define a structure type twice.

Although create-dir delivers only a representation of a directory tree, it is sufficiently realistic to give you a sense of what it is like to design programs at that level. The following exercises illustrate this point. They use Dir to refer to the generic idea of a data representation for directory trees. Use the simplest data definition of Dir that allow you to complete the respective exercise. Feel free to use the data definition from exercise 263 and the functions from figure 51.

Exercise 264: Use create-dir to create data representations of some sample directories on your computer. Then use how-many from exercise 262 to count how many files they contain. Why are you confident that how-many produces correct results for these directories?

Exercise 265: Design find?. The function consumes a Dir and a file name and determines whether or not a file with this name occurs in the directory tree.

Exercise 266: Design the function ls, which lists the names of all files and directories in a given Dir.

Exercise 267: Design du, a function that consumes a Dir and computes the total size of all the files in the entire directory tree. Assume that storing a directory in a Dir structure costs 1 file storage unit. In the real world, a directory is basically a special file and its size depends on how large its associated directory is.

The remaining exercises rely on the notion of a path, which for our purposes, is a list of names:
; Path = [List-of Symbol]
; interp.: directions on how to find a file in a directory tree
For example, the path from TS to part1 in figure 61 is (list 'TS 'Text 'part1). Similarly, the path from TS to Code is (list 'TS 'Libs 'Code).

Exercise 268: Design find. The function consumes a directory d and a file name f. If (find? d f) is true, find produces a path to a file with name f; otherwise it produces false.

Hint: While it is tempting to first check whether the file name occurs in the directory tree, you have to do so for every single subdirectory. Hence it is better to combine the functionality of find? and find.

Challenge: The find function discovers only one of the two files named read! file in figure 61. Design find-all, which is generalizes find and produces the list of all paths that lead to f in d. What should find-all produce when (find? f d) is false? Is this part of the problem really a challenge compared to the basic problem?

Exercise 269: Design the function ls-R, which lists the paths to all files in a given Dir. Challenge: Modify ls-R so that its result includes all paths to directories, too.

Exercise 270: Re-design find-all from exercise 268 using ls from exercise 269. This is design by composition, and if you solved the challenge part of exercise 269 your new function can find directories, too.—One line functions are cool, and soon you will see how to discover them directly.

24 Refining Interpreters

DrRacket is a program. As you can imagine, it is a complex program, dealing with many different kinds of data. Like most complex programs, DrRacket also consists of many functions: one that allows programmers to edit text; another one that acts like the interactions area; a third one checks whether definitions and expressions are grammatical; and so on.

In this chapter, we show you how to design the function that implements the heart of the interactions area. Naturally, we use iterative refinement to design the evaluator. As a matter of fact, the very idea of focusing on the evaluator part of DrRacket is another instance of refinement, though the obvious one of implementing only one piece of functionality for a complex program.

Simply put, the interactions area performs the task of determining the values of expressions that you enter. After you click RUN, the interactions area knows about all the definitions. It is then ready to accept an expression that may refer to these definitions, to determine the value of this expression, and to repeat this cycle as often as you wish. For this reason, many people also refer to the interactions are as the read-eval-print loop, where eval is short for evaluator, a function is also called an interpreter.

Like this book, our refinement process starts with numeric BSL expressions. They are simple; they do not assume an understanding of definitions; and even your brother in fourth grade can determine their value. Once you understand this first step, you know the difference between a BSL expression and its representation. Next we move on to expressions with variables. The last step is to add definitions.

24.1 Interpreting Expressions

Our first task is to agree on a data representation for BSL programs, that is, we must figure out how to represent a BSL expression as a piece of BSL data. This sounds strange and unusual, but it is not difficult. Suppose we just want to represent numbers, additions, and multiplications for a start. Clearly, numbers can stand for numbers. Additions and multiplications, however, call for a class of compound data because they consist of an operator and two pieces.

Following the design recipes of this book, a straightforward way to represent additions and multiplications is to define two structure types: one for additions and another one for multiplications, each with two fields:
The intention is that the left field contains one operand—the one to the “left” of the operator—and the right field contains the other operand. The nearby table shows three examples.

BSL expression

representation of BSL expression

3

3

(+ 1 1)

(make-add 1 1)

(* 3 10)

(make-mul 3 10)

The next question is how we should represent an expression with subexpressions:

(+ (* 3 3) (* 4 4))

The surprisingly simple answer is that fields may contain any value. In this particular case, left and right may contain representations of expressions; and you may nest this as deep as you wish as the next table shows.

BSL expression

representation of BSL expression

(+ (* 1 1) 10)

(make-add (make-mul 1 1) 10)

(+ (* 3 3)
   (* 4 4))
(make-add (make-mul 3 3)
          (make-mul 4 4))
(+ (* (+ 1 2) 3)
   (* (* (+ 1 1) 2)
         4))
(make-add (make-mul (make-add 1 2) 3)
          (make-mul (make-mul (make-add 1 1) 2)
                    4))

Let’s look at some examples. These examples cover all cases: numbers, variables, simple expressions, and nested expressions.

Exercise 271: Formulate a data definition for the representation of BSL expressions based on the structure type definitions of add and mul. Let us use BSL-expr in analogy for S-expr for the new class of data.

Translate the following expressions into data:
  1. (+ 10 -10)

  2. (+ (* 20 3) 33)

  3. (+ (* 3.14 (* 2 3)) (* 3.14 (* -1 -9)))

InterpretRemember that “interpret” here means “translate from data into information.” In contrast, the word “interpreter” in the title of this chapter refers to a program that consumes the representation of a program and produces its value. While the two ideas are related, they are not the same. the following data as expressions:
  1. (make-add -1 2)

  2. (make-add (make-mul -2 -3) 33)

  3. (make-mul (make-add 1 (make-mul 2 3)) (make-mul 3.14 12))

Now that you have a data representation for BSL programs, it is time to design an evaluator. This function consumes a representation of a BSL expression and produces its value. Again, this function is unlike any you have ever designed so it pays off to experiment with some examples. To this end, you can either use the rules of arithmetic to figure out what the value of an expression is or you can “play” in the interactions area of DrRacket. Take a look at the following table for our examples.

BSL expression

its representation

its value

3

3

3

(+ 1 1)

(make-add 1 1)

2

(* 3 10)

(make-mul 3 10)

30

(+ (* 1 1) 10)

(make-add (make-mul 1 1) 10)

11

(+ (* 1 1) 10)

(make-add (make-mul 1 1) 10)

11

Exercise 272: Formulate a data definition for the class of values to which a representation of a BSL expression can evaluate.

Exercise 273: Design eval-expression. The function consumes a representation of a BSL expression (according to exercise 271) and computes its value.

Exercise 274: Develop a data representation for boolean BSL expressions constructed from true, false, and, or, and not. Then design eval-bool-expression. The function consumes a representative of boolean BSL expression and computes its value. What is the value of a boolean expression?

Exercise 275: S-expressions offer a convenient way to express BSL:
> (+ 1 1)

2

> '(+ 1 1)

(list '+ 1 1)

> (+ (* 3 3) (* 4 4))

25

> '(+ (* 3 3) (* 4 4))

(list '+ (list '* 3 3) (list '* 4 4))

By simply putting a quote in front of an expression, we get a piece of ISL+ data. This representation is convenient only for the person who types the representation of a BSL expression on a keyboard. Interpreting an S-expression representation is clumsy, mostly because not all S-expressions represent BSL-exprs:
"hello world"
'(+ x 1)
'(* (- "hello" "world") 10)

People invented parsers to check whether some piece of data conforms to a data definition. A parser consumes a “raw” piece of input and, if it does conform, it produces a parse tree, which in our specific case is the corresponding BSL-expr. If not, it signals an error, like the checked functions from Input Errors.

Figure 62 presents a BSL parser for S-expressions. Specifically, the parse function consumes an S-expr and produces an BSL-exprif and only if the given S-expression is the result of quoting a BSL expression that has a BSL-expr representative.

Create test cases for the parse function until DrRacket tells you that all expressions in the definitions area are covered during the test run.

What is unusual about the definition of this program with respect to the design recipe? Note: One unusual aspect is that the function uses length on the list argument. Real parsers do not use length because it slows the functions down.

Discuss: should a programming language be designed for the convenience of the programmer who uses it or for the convenience of the programmer who implements it?

(define WRONG "wrong kind of S-expression")
 
(define-struct add (left right))
(define-struct mul (left right))
 
; S-expr -> BSL-expr
; create representation of a BSL expression for s (if possible)
(define (parse s)
  (local (; S-expr -> BSL-expr
          (define (parse s)
            (cond
              [(atom? s) (parse-atom s)]
              [else (parse-sl s)]))
 
          ; SL -> BSL-expr
          (define (parse-sl s)
            (local ((define L (length s)))
              (cond
                [(< L 3)
                 (error WRONG)]
                [(and (= L 3) (symbol? (first s)))
                 (cond
                   [(symbol=? (first s) '+)
                    (make-add (parse (second s)) (parse (third s)))]
                   [(symbol=? (first s) '*)
                    (make-mul (parse (second s)) (parse (third s)))]
                   [else (error WRONG)])]
                [else
                 (error WRONG)])))
 
          ; Atom -> BSL-expr
          (define (parse-atom s)
            (cond
              [(number? s) s]
              [(string? s) (error "strings not allowed")]
              [(symbol? s) (error "symbols not allowed")])))
    (parse s)))

Figure 62: From S-expr to BSL-expr

24.2 Interpreting Variables

Since the first section ignores constant definitions, an expression does not have a value if it contains a variable. Indeed, if we do not know what x stands for, it makes no sense to evaluate (+ 3 x). Hence, one first refinement of the evaluator is to add variables to the expressions that we wish to evaluate. The assumption is that the definitions area contains a definition such as

(define x 5)

and that someone wants to evaluate expressions containing x in the interactions area:
> x

5

> (+ x 3)

8

> (* 1/2 (* x 3))

7.5

Indeed, you could imagine a second constant definition, say (define y 3), and interactions that involve two variables:
> (+ (* x x)
     (* y y))

34

Exercise 275 implicitly proposes symbols as representations for variables. After all, if you were to choose quoted S-expressions to represent expressions with variables, symbols would appear naturally:
> 'x

'x

> '(* 1/2 (* x 3))

(list '* 0.5 (list '* 'x 3))

The obvious alternative is a string, so that "x" would represent x. Other representations are also possible but this book is not about designing evaluators, so we stick with symbols. From this decision, it follows how to modify the data definition from exercise 271:
; A BSL-var-expr is one of:
;  Number
;  Symbol
;  (make-add BSL-var-expr BSL-var-expr)
;  (make-mul BSL-var-expr BSL-var-expr)
We simply add one clause to the data definition.

As for data examples, the following table shows some BSL expressions with variables and their BSL-var-expr representation:

BSL expression

representation of BSL expression

x

'x

(+ x 3)

(make-add 'x 3)

(* 1/2 (* x 3))

(make-mul 1/2 (make-mul 'x 3))

(+ (* x x)
   (* y y))
(make-add (make-mul 'x 'x)
          (make-mul 'y 'y))
They are all taken from the interactions above, meaning you know the results when x is 5 and y 3.

One way to determine the value of variable expressions is to replace all variables with the values that they represent. This is the way you know from mathematics classes in school, and it is perfectly fine way.

Exercise 276: Design subst. The function consumes a BSL-var-expr e, a Symbol x, and a Number v. It produces a BSL-var-expr like e with all occurrences of x replaced by v.

Exercise 277: Design the function numeric?, which determines whether a BSL-var-expr is also a BSL-expr. Here we assume that your solution to exercise 271 is the definition for BSL-var-expr without the line for Symbol.

Exercise 278: Design eval-variable. The function consumes a BSL-var-expr and determines its value if numeric? is true. Otherwise it signals an error, saying that it is impossible to evaluate an expression that contains a variable.

In general, a program defines many constants in the definitions area and expressions contain more than one variable. To evaluate such expressions, we need a representation of the definition area when it contains a series of constant definitions. For this exercise we use association lists:
; An AL (association list) is [List-of Association].
; An Association is (cons Symbol (cons Number empty)).
Make up elements of AL.

Design eval-variable*. The function consumes a BSL-var-expr e and an association list da. Starting from e, it iteratively applies subst to all associations in da. If numeric? holds for the result, it determines its value; otherwise it signals the same error as eval-variable. Hint: Think of the given BSL-var-expr as an atomic value and traverse the given association list instead. Or use a loop from figure 51. Note: We provide this hint because the creation of this function requires a little design knowledge from Simultaneous Processing.

Exercise 279: Modify the parser in figure 62 so that it creates BSL-var-expr if if it is an appropriate BSL expression.

Exercise 278 relies on the mathematical approach to constant definitions. If a name is defined to stand for some value, then all occurrences of the name can be replaced with the value. Substitution performs this replacement once and for all before the evaluation process even starts.

An alternative approach is to mingle substitution and replacement. That is, the evaluator starts processing the expression immediately but also carries along a the representation of the definitions area. Every time the evaluator encounters a variable, it looks in the definitions area for its value and uses it.

Exercise 280: Design lookup-con. The function consumes an AL da and a Symbol x. It produces the value of x in daif there is a matching Association; otherwise it signals an error.

Exercise 281: Design eval-var-lookup. This function has the same signature as eval-variable*:
; BSL-var-expr AL -> Number
(define (eval-var-lookip e da) ...)
It does not use substitution, however. Instead, the function traverses the expression in the manner that the design recipe for BSL-var-expr suggests and “carries along” da. When it encounters a symbol x, the function looks up the value of x in da.

24.3 Interpreting Functions

At this point, you understand how to evaluate BSL programs that consist of constant definitions and variable expressions. Naturally you should want to add function definitions so that you know at least in principle how to deal with the complete programming language. As you may remember from school and BSL: Meaning, people evaluate function applications of the shape (f a) by substituting a for x in e assuming (define (f x) e) is the definition for f. In short, substitution comes in handy again.

The goal of this section is to refine the evaluator of Interpreting Variables so that it can cope with function applications and function definitions. Put differently, we want to design an evaluator with you that simulates DrRacket when the definitions area contains a number of function definitions and a programmer enters an expression in the interactions area that contains applications of these functions.

For simplicity, let us assume that all functions in the definitions area consume one argument. Indeed, for the first two exercises we go further and assume that there is only one function definition.

Exercise 282: Extend the data representation of Interpreting Variables to include the application of a programmer-defined function. Recall that a function application consists of two pieces: a name and an expression. The former is the name of the function that is applied; the latter is the argument.

Use your data definition to represent the following expressions:
  1. (k (+ 1 1))

  2. (* 5 (k (+ 1 1)))

  3. (* (i 5) (k (+ 1 1)))

We refer to this newly defined class of data with BSL-fun-expr.

Exercise 283: Design eval-definition1. The function consumes four arguments:
  1. an BSL-fun-expr e

  2. a symbol f, which represents a function name;

  3. a symbol x, which represents the functions’s parameter; and

  4. an BSL-fun-expr b, which represents the function’s body.

If the terminology poses any difficulties, do re-read BSL: Grammar.

The function determines the value of the expression e. It proceeds as follows. Subexpressions from BSL-expr are evaluated as in exercise 273. For a function application of f,
  1. eval-definition1 evaluates the argument,

  2. it then substitutes the value of the argument for x in b; and

  3. evaluates the resulting expression with eval-definition1.

Here is how to express the steps as code, assuming arg is the argument of the function application:

(eval-definition1 (subst b x (eval-definition1 arg f x b)))

Note: You may have noticed that evaluation of a function application uses a form of recursion that you have not encountered so far. It is dubbed generative recursion and we discuss it in Generative Recursion.

If eval-definition1 encounters a variable, it signals the same error as eval-variable from exercise 278. Also, for function applications that do not refer to f, eval-definition1 signals an error as if it had encountered a variable.

Warning: The use of generative recursion introduces a new element into your computations: non-termination. That is, given some argument, a program may not deliver a result or signal an error but run forever. For fun, you may wish to construct an input for eval-definition1 that causes it to run forever. Use STOP to terminate the program.

For an evaluator that mimics the interaction area, we need a representation of the definitions area. Like in Interpreting Variables, we assume that it is a list of definitions.

Exercise 284: Provide a structure type definition and a data definition for function definitions. Recall that a function definition has three essential attributes:
  1. the function’s name,

  2. the function’s parameter, which is also a name, and

  3. the function’s body, which is a variable expression that usually contains the parameter.

We use BSL-fun-def to refer to the class of data representations for function definitions.

Use your data definition to represent the following BSL function definitions:
  1. (define (f x) (+ 3 x))

  2. (define (g y) (f (* 2 y)))

  3. (define (h v) (+ (f v) (g v)))

Next, define the class BSL-fun-def* to represent definitions area that consist of just one-argument function definitions. Translate the definitions area that defines f, g, and h into your data representation and name it da-fgh.

Finally, design the function lookup-def with the following header:
; BSL-fun-def* Symbol -> BSL-fun-def
; retrieve the definition of f in da
; or signal "undefined function" if da does not contain one
(check-expect (lookup-def da-fgh 'g) g)
(define (lookup-def da f) ...)
Looking up a function definition is needed for the evaluation of expressions in BSL-fun-expr.

Exercise 285: Design eval-function*. The function consumes the BSL-fun-expr representation of some expression e and the BSL-fun-def* representation of a definitions area da. It produces the result that DrRacket shows if you evaluate e in the interactions area assuming the definitions area contains da.

The function works like eval-function1 from exercise 283. For an application of some function f, it
  1. evaluates the argument;

  2. looks up the definition of f in the BSL-fun-def representation of da;

  3. substitutes the value of the argument for the function parameter in the function’s body; and

  4. evaluates the new expression via recursion.

Remember that the representation of a function definition for f comes with a parameter and a body.

Like DrRacket, eval-function* signals an error when it encounters a variable or an application whose function is not defined in the definitions area.

Exercise 286: Modify the parser in figure 62 so that it creates BSL-fun-expr if it is an appropriate BSL expression. Also see exercise 279.

Exercise 287: Figure 63 presents a BSL definitions parser for S-expressions. Specifically, the def-parse function consumes an S-expr and produces an BSL-fun-defif and only if the given S-expression is the result of quoting a BSL definition that has a BSL-fun-def representative.

Create test cases for the def-parse function until DrRacket tells you that all expressions in the definitions area are covered during the test run.

Note: The exercises assumes that you have a solution for exercise 286. That is, you have a function parse that turns an S-expr into a BSL-fun-expr representation, if possible.

With def-parse you have the essential ingredient for a parser that consumes an S-expression representation of a definitions area and produces a BSL-fun-def*. Now design the function da-parse, which parses a SL as a BSL-fun-def* assuming the former is a list of quoted BSL definitions.

(define WRONG "wrong kind of S-expression")
 
(define-struct def (name para body))
; see exercise 284
 
{S-expr} -> (tech "BSL-fun-def")
; create representation of a BSL definition for s (if possible)
(define (def-parse s)
  (local (; S-expr -> BSL-fun-def
          (define (def-parse s)
            (cond
              [(atom? s) (error WRONG)]
              [else
               (if (and (= (length s) 3) (eq? (first s) 'define))
                   (head-parse (second s) (parse (third s)))
                   (error WRONG))]))
          ; S-expr BSL-expr -> BSL-fun-def
          (define (head-parse s body)
            (cond
              [(atom? s) (error WRONG)]
              [else
               (if (not (= (length s) 2))
                   (error WRONG)
                   (local ((define name (first s))
                           (define para (second s)))
                     (if (and (symbol? name) (symbol? para))
                         (make-def name para body)
                         (error WRONG))))])))
    (def-parse s)))

Figure 63: From S-expr to BSL-fun-def

24.4 Interpreting Everything

Take a look at the following BSL program:
(define close-to-pi 3.14)
 
(define (area-of-circle r)
  (* close-to-pi (* r r)))
 
(define (volume-of-cylinder r h)
  (* h (area-of-circle r)))
Think of these definitions as the definitions area in DrRacket. After you click RUN, you can evaluate expressions involving close-to-pi, area-of-circle, and volume-of-cylinder at the prompt in the interactions area:
> (area-of-circle 1)

#i3.14

> (volume-of-cylinder 1 2)

#i6.28

> (* 3 close-to-pi)

#i9.42

The goal of this section is to refine your evaluator again so that it can mimic this much of DrRacket.

Exercise 288: Formulate a data definition for the representation of DrRacket’s definition area. Concretely, the data representation should work for a sequence that freely mixes constant definitions and one-argument function definitions. Make sure you can represent the definitions area consisting of three definitions at the beginning of this section.—We use BSL-da-all for this class of data.

Design the function lookup-con-def, It consumes a BSL-da-all da and a symbol x. It produces the representation of a constant definition whose name is x, if such a piece of data exists in da; otherwise the function signals an error saying that no such constant definition can be found.

Design the function lookup-fun-def, It consumes a BSL-da-all da and a symbol f. It produces the representation of a function definition whose name is f, if such a piece of data exists in da; otherwise the function signals an error saying that no such function definition can be found.

Exercise 289: Design eval-all. Like eval-function* from exercise 285, this function consumes the representation of an expression and of a definitions area. It produces the same value that DrRacket shows if the expression is entered at the prompt in the interactions area and the definitions area contains the appropriate definitions. Hint: Your eval-all function should process variables in the given expression like eval-var-lookup in exercise 281.

Exercise 290: It is cumbersome to enter the structure-based data representation of a BSL expressions and a definitions area. It is much easier to quote an actual expression or a definitions area after surrounding it with parentheses.

Design a function eval-all-sexpr. It consumes an S-expr and an Sl. The former is supposed to represent an expression and the latter a list of definitions. The function parses both with the appropriate parsing functions and then uses eval-all from exercise 289 to evaluate the expression. Hint: You must slightly modify da-parse from exercise 287 so that it can parse constant definitions, too.

You should know that eval-all-sexpr makes it straightforward to check whether it really mimics DrRacket’s evaluator.

At this point, you know a lot about interpreting BSL. Here are some of the missing pieces: Booleans with cond or if; Strings and such operations string-length or string-append; and lists with empty, empty?, cons, cons?, first, rest; and so on. Once your evaluator can cope with all these, it is basically complete, because your evaluators already know how to interpret recursive functions. Now when we say “trust us, you know how to design these refinements,” we mean it.

25 The Commerce of XML

XML is a widely used data language. Programmers often use XML data to send messages from a program running on one computer to another. For example, when you point your web browser at a particular web site, you are really connecting a program on your computer to a program on a different computer, and the latter may be sending a form of XML data to the former. Once the web browser receives XML data, it starts rendering it as an image on your computer’s monitor.

XML data

rendered in a browser

<ul>

 <li> hello </li>

 <li> <ul>

       <li> one </li>

       <li> two </li>

      </ul>

 </li>

 <li> world </li>

 <li> good bye </li>

</ul>

The above table illustrates this idea with a concrete example. On the left, you see a piece of XML data that a web site may send to your web browser. On the right, you see how one popular browser renders this snippet graphically.

This chapter explains the basics of XML and processing it data as another design exercise concerning intertwined data definitions and iterative refinement. XML as S-expressions starts with an informal comparison of S-expressions and XML data and uses it to formulate a full-fledged data definition. The remaining two sections explain with two distinct examples how to process this kind of data. Rendering XML Enumerations explains how to render enumerations like the above; Configuring State Machines illustrates how to use XML files to configure “large” programs, a common tool used for modern applications. If you think XML is too old-fashioned and a 2012 book should use JSON instead, remember that this chapter is mostly a design exercise and feel free to replicate the exercise for any modern data exchange format that you consider fit.

25.1 XML as S-expressions

The most basic piece of XML data looks like this:

  <machine> </machine>

It is called an element and “machine” is the name of the element. The two parts of the elements are like parentheses that delimit the content of an element. When there is no content between the two parts—other than white space—XML allows a shorthand:

  <machine />

But as far as we are concerned here, this is shorthand is equivalent to the explicit version.

From an S-expression perspective, an XML element is a named pair of parentheses that surround some content. While we could use a structure type to represent these two pieces:

(define-struct element (name content))

and one of the Racket libraries does so, an S-expression representation is even more natural:

'(machine)

This piece of data has the opening and closing parenthesis, and it comes with space to embed content.

Here is a piece of XML data with content:

  <machine><action /></machine>

Remember that the <action /> part is a shorthand, meaning we are really looking at this piece of data:

  <machine><action></action></machine>

In general, the content of an XML element is a series of XML elements, e.g.,

  <machine><action /><action /><action /></machine>

Expand the shorthand for <action /> before you continue.

To represent the first nested XML example in ISL+, we can use the element structure type from above:

(make-element "machine" (list (make-element "action" '())))

Since we now know that the content of an element is a sequence of XML data, we naturally use lists of (representations of) XML elements for the content field; '() signals the lack of content. The S-expression alternative looks much simpler than that:

'(machine (action))

When you look at the piece of XML data with a sequence of three <action /> elements as its content, you realize that you really want to distinguish such elements from each other. To this end, XML elements come with attributes. For example,

  <machine initial="red"></machine>

is the “machine” element equipped with one attribute whose name is “initial” and whose value is “red” between string quotes. Here is complex XML element with nested elements that have attributes too:

  <machine initial="red">

   <action state="red"    next="green" />

   <action state="green"  next="yellow" />

   <action state="yellow" next="red" />

  </machine>

We use blanks, indentation, and line breaks to make the element readable but this white space has no meaning for our XML data here.

The introduction of attributes requires a modification of the structure representation:

(define-struct element (name attributes content))

In addition, a consistently structural representation also calls for an attribute structure type:

(define-struct attribute (name value))

Here we represent attributes as instances of this structure type that combine symbols with strings. A sequence of attributes is a list of such instances. Now it is straightforward to get a piece of ISL+ data for our first machine element:

(make-element 'machine '((initial "red")) '())

The complex machine with nested actions corresponds to a nest of structure instances and lists: (racketblock (make-element "machine" ’((initial "red")) (list (make-element "state" ’((state "red") (next "green"))) (make-element "state" ’((state "green") (next "yellow"))) (make-element "state" ’((state "yellow") (next "red"))))) )

In contrast, S-expressions for these “machine” elements look much simpler than these two nests of structure instances:

'(machine ((initial "red")))

To add attributes to an element, we use a list of lists where each of the latter contains two items: a symbol and a string. The symbol represents the name of the attribute and the string its value. This idea naturally applies to complex forms of XML data, too:
'(machine ((initial "red"))
          (action ((state "red") (next "green")))
          (action ((state "green") (next "yellow")))
          (action ((state "yellow") (next "red"))))
For now just note how the attributes are marked by two opening parentheses and the remaining list of (representations of) XML elements have one opening parenthesis.

A bit more general, you may also recall the idea from Intermezzo: Quote, Unquote, which used S-expressions to represent XHTML, a special dialect of XML. In particular, the intermezzo shows how easily a programmer can write down non-trivial XML data and even templates of XML representations—using backquote and unquote. Of course, the preceding chapter points out that in order to determine you need a parser to determine whether any given S-expression is a representation of XML data, and parser is a complex and unusual kind of function.

Nevertheless, we choose to go with a representation of XML based on S-expressions to demonstrate the usefulness of these old in practical terms. Let us proceed gradually to work out a data definition. Here is a first attempt:
; An Xexpr.v0 (short for X-expression) is
;   (cons Symbol empty)
This is the “named parentheses” idea from the beginning of this section. Equipping this element representation with content is easy:
; An Xexpr.v1 is
;   (cons Symbol [List-of Xexpr.v1])
The symbolic name becomes the first item on a list that otherwise consists of XML element representatives.

The last refinement step is to add attributes. Since the attributes in an XML element are optional, the revised data definition should have two clauses:
; An Xexpr.v2 is
;  (cons Symbol [List-of Xexpr.v2])
;  (cons Symbol (cons [List-of Attribute] [List-of Xexpr.v2]))
Our sample data representations from above suggest this definition for Attribute:
; An Attribute is
;   (cons Symbol (cons String empty))
The question is whether this data definition is practical for processing or how to make it practical.

Exercise 291: Eliminate the use of List-of from the data definition Xexpr.v2.

Exercise 292: Represent the following XML data as elements of Xexpr.v2:
  1. <transition from="seen-e" to="seen-f" />

  2. <ul><li><word /><word /></li><li><word /></li></ul>

  3. <end></end>

Which one could be represented in Xexpr.v0 or Xexpr.v1?

Exercise 293: Interpret the following elements of Xexpr.v2 as XML data:
  1. '(server ((name "example.org")))

  2. '(carcassonne (board (grass)) (player ((name "sam"))))

  3. '(start)

Which ones are elements of Xexpr.v0 or Xexpr.v1?

Roughly speaking, X-expressions simulate structures via lists. The simulation is convenient for programmers; it asks for the least amount of keyboard typing. For example, if an X-expression does not come with an attribute list, it is simply omitted. This choice of data representation comes with a trade-off, namely, when it comes to processing these lists. The best way to deal with the problem is to provide functions that make X-expressions look like structures, especially functions that access the quasi-fields: xexpr-name, xexpr-attributes, and xexpr-content. Once we have such functions, we can use lists and act as if they were structures.

These functions parse S-expressions, and parsers are tricky to design. So let us follow the design recipe carefully, starting with an exhaustive collection of data examples:
(define a0 '((initial "red")))
 
(define e0 '(machine))
(define e1 `(machine ,a0))
(define e2 '(machine (action)))
(define e3 '(machine () (action)))
(define e4 `(machine ,a0 (action) (action)))
The first definition introduces a list of attributes, which is reused twice in the construction of X-expressions. The definition of e0 reminds us that an X-expression may not come with either attributes or content. You should be able to explain that e2 and e3 are basically equivalent.

Next we formulate a signature, a purpose statement, and a header:
; Xexpr.v2 -> [List-of Attribute]
; retrieve the list of attributes of xe
(define (xexpr-attribute xe) '())
Here we focus on xexpr-attribute; we leave the other two as exercises.

Making up functional examples requires a decision for X-expressions without attributes, e.g., '(machine) or '(machine (action)). While our chosen representation completely omits missing attributes, we would have to supply '() if we represented the equivalent XML data with structures. We therefore decide that xexpr-attribute should produce '() for such X-expressions:
(check-expect (xexpr-attributes e0) '())
(check-expect (xexpr-attributes e1) '((initial "red")))
(check-expect (xexpr-attributes e2) '())
(check-expect (xexpr-attributes e3) '())
(check-expect (xexpr-attributes e4) '((initial "red")))
Before you read on, make sure you can explain all of the examples.

It is time to develop the template. Since the data definition for Xexpr.v2 is complex, we proceed slowly, step by step. First, while the data definition distinguishes two kinds of X-expressions, both clauses describe data constructed by consing a symbol onto a list. Second, what differentiates the two clauses is the rest of the list and especially the optional presence of a list of attributes. Let us translate these two insights into a template:
(define (xexpr-attributes xe)
  (local ((define optional-loa+content (rest xe)))
    (cond
      [(empty? optional-loa+content) ...]
      [else ...])))
The local definition chops off the name of the X-expression and leaves the remainder of the list, which may or may not start with a list of attributes. The key is that it just a list, and the two cond clauses indicate so. Third, this list is not defined via a self-reference but as the optional cons of some attributes onto a possibly empty list of X-expressions. In other words, we still need to distinguish the two usual cases and extract the usual pieces:
(define (xexpr-attributes xe)
  (local ((define optional-loa+content (rest xe)))
    (cond
      [(empty? optional-loa+content) ...]
      [else (... (first optional-loa+content)
             ... (rest optional-loa+content) ...)])))
But the function does not refer to itself in the usual manner. If you wish to add recursion to this template, then it must be added via a list-processing function for the rest expression .

You should be able to see at this point that recursion is not needed for the task at hand, and so we switch to the fifth step of the design recipe. Clearly, there are no attributes if the given X-expression comes with nothing but a name. In the second clause, the question is whether the first item on the list is a list of attributes. Since this question involves processing a list, we make a wish:
; design list-of-attributes?
; given: [List-of Attribute] or [List-of Xexpr.v2]
; wanted: true if it is the first one, false otherwise
Assuming this function is designed, it is straightforward to finish xexpr-attributes:
(define (xexpr-attributes xe)
  (local ((define optional-loa+content (rest xe)))
    (cond
      [(empty? optional-loa+content) '()]
      [else (local ((define possible-loa (first optional-loa+content)))
              (if (list-of-attributes? possible-loa)
                  possible-loa
                  '()))])))
If the first item is a list of attributes, the function produces it; otherwise there are no attributes.

For the design of list-of-attributes?, we proceed in the same manner and get this definition:
; [List-of Attribute] or [List-of Xexpr.v2] -> Boolean
; is the given value a list of attributes?
(define (list-of-attributes? x)
  (cond
    [(empty? x) true]
    [else (local ((define possible-attribute (first x)))
            (cons? possible-attribute))]))
We skip the details of the design process because they are unremarkable. What is remarkable is the signature of this function. Instead of specifying a single data definition as possible inputs, the signatures combines two data definitions separated with the English word “or.” In ISL+ such an informal signature with a definite meaning is acceptable on occasion;You may also do this for programs in the numerous scripting languages that are popular right now; do not try this in so-called typed programming languages. do not use it too often, however.

Exercise 294: Design the functions xexpr-name and xexpr-content.

Exercise 295: Add the “natural” self-reference to the xexpr-attributes template.

Exercise 296: Formulate a data definition that replaces the informal “or” signature for the definition of the list-of-attributes? function.

Exercise 297: Design attribute-value. The function consumes a list of attributes and a symbol. If the attributes list associates the symbol with a string, the function retrieves this string; otherwise it signals an error. Hint: Consider using assq, a function that comes with ISL+.

For the remainder of this chapter, we assume that xexpr-name, xexpr-attributes, and xexpr-content exist. Additionally, we also use the attribute-value function from exercise 297. To keep the prose simple, we use Xexpr to refer to Xexpr.v2.

25.2 Rendering XML Enumerations

XML is really a family of languages, similar to, yet also distinct from, the teaching languages in DrRacket. For example, XHTML is the language for sending web in XML format, while MathML specifies how to render mathematical expressions. In this section, we illustrate how to design a rendering function for a small snippet of XHTML, specifically the enumerations from the beginning of this section.

The ul tag surrounds a so-called unordered HTML list. Each item of this list is tagged with li, which tends to contain words but also other elements possibly including enumerations. With “unordered” the authors of HTML do not mean that the order does not matter; rather the intention is that each item is labeled with a bullet during rendering not a number.

Since Xexpr does not come with plain strings, it is not immediately obvious how to represent XHTML enumerations in a subset. One option is to refine the data representation one more time, so that an Xexpr could be a String. Racket, the language from which the teaching languages are derived, offers libraries based on this optoon. Another option is to introduce a representation for the text within enumerations:

; A XWord is '(WORD ((text String))).

This section follows this path.

Exercise 298: Make up three examples of XWords. Design the functions word?, which checks whether any ISL+ value is in XWord, and word-text, which extracts the value of the only attribute of a XWord.

Exercise 299: Refine the definition of Xexpr.v2 so that an you can represent XML elements that are plain strings. Use this refinement to represent the enumerations in this section.

Given the representation of words, representing an XHTML-style enumeration of words is straightforward:
; An XEnum.v1 is (cons 'UL [List-of XItem.v1])
; An XItem.v1 is (cons 'LI (cons XWord empty))
To make the elements’ names stands out, we use capitalization but this has no significance otherwise.

Exercise 300: Argue that every element of XEnum.v1 is also an XExpr.

Here is a sample element of XEnum.v1:
(define e0
  '(UL
    (LI (word ((text "one"))))
    (LI (word ((text "two"))))))
It corresponds to the inner enumeration of the example from the beginning of the chapter. Rendering it with "2htdp/image" should yield an image like this:

image

The radius of the bullet and the distance between the bullet and the text are matters of aesthetic; here the idea matters.

Remember we do not create these expressions in the definitions area; we experiment in the interactions area, just like you should.

To create this kind image, you might use this ISL+ expression:
(define e0-rendered
  (above/align 'left
               (beside/align 'center BULLET (text "one" 12 'black))
               (beside/align 'center BULLET (text "two" 12 'black))))
assuming BULLET is a rendering of a bullet. But let us design the function carefully.

Since the data representation requires two data definitions, the design recipe tells you that you must design two functions in parallel. A second look reveals, however, that in this particular case the second data definition is disconnected from the first one, meaning we can deal with it separately. So here is the definition:
; XItem.v1 -> Image
; render a single item as a "word" prefixed by a bullet
(define (render-item1 i)
  (beside/align 'top BULLET (text (word-text (second i)) 12 'black)))
The function consumes an “item” and renders it in the natural manner. It is defined as the composition of many functions: second, word-text from exercise 298, text from "2htdp/image", and so on.

Exercise 301: Before you read on, equip the definition of render-item1 with at least one test; make sure that the tests are formulated so that they don’t truly depend on the nature of BULLET. Then explain how the function works; keep in mind that the purpose statement explains only what it does.

Now we can focus on the design of a function that renders an enumeration. Using the example from above, you can formulate the result of the first two design steps easily:
; XEnum.v1 -> Image
; render a simple enumeration as an image
 
(check-expect (render e0) e0-rendered)
 
(define (render-enum1 xe)
  empty-image)
The key step is the development of a template. According to the data definition, an element of XEnum.v1 contains one interesting piece of data, namely, the rest of the given list. The first item is always 'UL, so there is no need to extract it. With this in mind, the first template draft looks quite simple:
(define (render-enum1 xe)
   (... (rest xe) ...))
The extracted piece of data is an element of [List-of XItem.v1], and this is the major clue.

The design recipe tells you that you should design a separate function whenever you encounter a complex form of data, such as this list of items. Abstraction tells you that you might be able to reuse an existing abstraction, say a list-processing function from figure 51. Let us try this approach here.

For this particular problem, render-enum1 is supposed to process a list and create a single image from it. The only two list-processing abstractions whose signatures fit the bill are foldr and foldl. If you also study their purpose statements, you see a pattern that is like the e0-rendered example above, especially for foldr. Hence we attempt to use this function:
(define (render-enum1 xe)
   (local (; XItem.v1 Image -> Image
           (define (deal-with-one-item fst-itm image-so-far)
              ...))
     (foldr deal-with-one-item empty-image (rest xe))))
From the type matching, you also know that:
  1. the first argument to foldr must be a two-argument function, which consumes one item at a time and the image built up so far;

  2. the second argument must be an image;

  3. the last argument is the list.

We consider empty-image the correct starting point because nothing is known about the image; naturally, (rest xe) is the list to be processed.

This design-by-abstraction-reuse leaves us with one step: the definition of function that “folds” over the list and the image that foldr has created so far. From its signature, you can see that it consumes one item and one image. You can also guess that render-item1 is the function needed to render the item. Doing so yields two images that you need to combine, namely, the image of the first item and the image of the rest of the items in the enumeration. Clearly, you want to stack those on top of each other, which is precisely what above accomplishes:
(define (render-enum1 xe)
  (local (; XItem.v1 Image -> Image
          (define (deal-with-one-item fst-itm image-so-far)
            (above/align 'left (render-item fst-itm) image-so-far)))
    (foldr add-image-for-item empty-image (rest xe))))
And the example suggests the use of above/align and 'left.

Flatt enumerations are common but they are also just a simple approximation of the full-fledged case. In the real world, web browsers must be able to display nested enumerations that are sent over the web, and at least in principle, the nesting can be arbitrarily deep.In case you are wondering whether arbitrary nesting is really the right way to think about this problem, we recommend that you develop an alternative data definition that allows only three levels of enumeration. Then use the design recipe to create functions that render such enumerations.

In XML and its web browser dialect XHTML, nesting is straightforward. Any element may show up as the content of any other element. To represent this relationship in our limited data representation of XHTML, we simply say that an item is either a word or another enumeration:
; An XEnum.v2 is (cons 'OL [List-of XItem.v2])
; An XItem.v2 is one of:
;  (cons 'LI (cons XWord empty))
;  (cons 'LI (cons XEnum.v2 empty))
Before you read on, contemplate it is necessary to refine both the definition for XEnum.v1 as well as the one for XItem.v1.

The next question is how this change to the data definition affects the rendering functions. Put differently, how do render-enum1 and render-item1 have to change so that they can cope with elements of XEnum.v2 and XItem.v2, respectively. Software engineers faces these kinds of questions all the time, and it is in this situation where the design recipe shines.

(define SIZE 12)
(define COLOR 'black)
(define BULLET
  (beside (circle 1 'solid 'black) (text " " SIZE COLOR)))
 
; Image -> Image
; mark item with bullet  
(define (bulletize item)
  (beside/align 'top BULLET item))
 
(define e0 ...)
(define e0-rendered ...)
 
; XEnum.v2 -> Image
; render an XEnum.v2 as an image
 
(check-expect (render-enum e0) e0-rendered)
 
(define (render-enum an-enum)
  (local ((define (process-one-image item image-so-far)
            (above/align 'left (render-item item) image-so-far)))
    (foldr process-one-image empty-image (rest an-enum))))
 
; XItem.v2 -> Image
; render one XItem.v2 as an image
 
(check-expect
  (render-item '(li (word ((text "one")))))
  (bulletize (text "one" SIZE COLOR)))
 
(check-expect (render-item `(li ,e0)) (bulletize e0-rendered))
 
(define (render-item an-item)
  (local ((define d (second an-item)))
    (cond
      [(word? d) (bulletize (text (word-text d) SIZE COLOR))]
      [else (bulletize (render-enum d))])))

Figure 64: Refining functions in response to refinements of data definitions

Figure 64 shows the complete answer. Since the real change is confined to the data definitions for XItem.v2, it should not come as a surprise that the real change to the renderer shows up in the function for rendering items. While the rendering function for XItem.v1 does not need to distinguish between different forms of input, the one for XItem.v2 is forced to use a cond because the corresponding data definition lists two different kinds of items. Given that this data definition is close to one from the real world, the distinguishing characteristic is not something simple—like empty vs consbut the second part of the item. If it is an Word with, the rendering function proceeds as before; otherwise it is an enumeration. In the latter case, render-item uses render-enum to deal with the data—because the data definition for XItem.v2 refers back to XEnum.v2 precisely for this point.

Exercise 302: Use the recipe to design the rendering functions for XEnum.v2 and XItem.v2 from scratch. You should come up with the same functions.

Exercise 303: The repetition of (beside/align 'top BULLET ...) should disturb you. Edit the function definition so that this juxtaposition operation shows up only once. Why are you confident that your change works?

The tests are one other noteworthy point in figure 64. When you work on intertwined data, it is a good idea to derive one test from the other. If they go wrong, the failure message tends to tell you in which function the mistake is.

Exercise 304: Change the data representation of words so that they are plain strings. Then re-define the functions word? and word-text from exercise 298. Do you need to redefine anything else to get the functions from figure 64 to work?

With this change, the data definition no longer specifies a subset of Xexpr from the preceding section.

Exercise 305: Design a program that counts all occurrences of "hello" in an enumeration.

Exercise 306: Design a program that replaces all occurrences of "hello" with "bye" in an enumeration.

25.3 Configuring State Machines

A Bit More About Worlds explains the basics of finite stat machines. Exercise 93

25.3.1 Configurations

using x-expression to design ’real-world’ programs for money

25.3.2 Interpreting Configurations

26 Simultaneous Processing

processing two (or more) forms of complex data simultaneously

26.1 Designing Functions for Two Complex Arguments

solve exercise 277 here

27 Summary

This fourth part of the book is about the design of