9  Blocks

So far we have focused on the various kinds of data that Scheme provides. This includes functions, because they are just another class of data. With functions we also covered loops; after all, they are just functions. If we wish to write large programs, however, we need additional means to organize programs.

Blocks are the most basic means for organizing programs, and Scheme provides a large variety of blocks. The primary purpose of a block is to allow functions to establish local variable definitions. Since functions are first-class values in Scheme, blocks also provide a mechanism to hide functions and variables from the rest of the program. Put differently, first-class functions and blocks work together to establish primitive but effective security zones.

9.1  Local Definitions

Suppose your part of a program consists of a bunch of function that collaborate but the rest of the program needs just one of them. You want to develop these functions as usual in Scheme but when you're all done, you want to hide all the auxiliary functions and variables.

;; (Listof X) (X X  -->  Boolean)  -->  (Listof X)
;; to sort alox according to <x via successive merges
;; (define (merge-sort alox <x) ...)

(define (merge-sort alox <x)
  (merge-until-singleton 
    <x (map list alox)))

(define (merge-until-singleton <x l)
  (cond
    [(null? (cdr l)) (car l)]
    [else (merge-until-singleton 
           <x (merge-neighbors <x l))]))

(define (merge-neighbors <x l)
  (cond
    [(null? l) null]
    [(null? (cdr l)) l]
    [else
     (cons (merge <x (car l) (second l))
           (merge-neighbors
	     <x (cdr (cdr l))))]))

(define (merge <x l1 l2)
  (cond
    [(null? l1) l2]
    [(null? l2) l1]
    [(<x (car l1) (car l2))
     (cons (car l1)
       (merge <x (cdr l1) l2))]
    [(<x (car l2) (car l1))
     (cons (car l2)
       (merge <x l1 (cdr l2)))]
    [else
      (cons (car l1)
	(cons (car l2)
	  (merge <x (cdr l1) (cdr l2))))]))

   
(define (merge-sort alox <x)
  (local ((define (merge-sort alox <x)
	    (merge-until-singleton 
	      <x (map list alox)))
	  
	  (define (merge-until-singleton <x l)
            (cond
              [(null? (cdr l)) (car l)]
              [else (merge-until-singleton 
                     <x (merge-neighbors <x l))]))
          
          (define (merge-neighbors <x l)
            (cond
              [(null? l) null]
              [(null? (cdr l)) l]
              [else
               (cons (merge <x (car l) (second l))
                     (merge-neighbors
		       <x (cdr (cdr l))))]))
	  
          (define (merge <x l1 l2)
            (cond
              [(null? l1) l2]
              [(null? l2) l1]
              [(<x (car l1) (car l2))
               (cons (car l1)
		 (merge <x (cdr l1) l2))]
              [(<x (car l2) (car l1))
               (cons (car l2)
		 (merge <x l1 (cdr l2)))]
              [else
		(cons (car l1)
		  (cons (car l2)
		    (merge <x (cdr l1) (cdr l2))))])))
    (merge-sort alox <x)))
Figure 18:  Local definitions for merge-sort

Take a look at the left half of figure 18. It contains four function definitions that together make up a merge sort routine. The main function is at the top; the three auxiliary functions merge neighboring pairs of lists into one list -- preserving the order of the items -- until there is only one, sorted list left. The only interesting function is merge-sort, which consumes a list and a predicate for comparing two items from the list. Clearly, a good program organization should hide all functions but merge-sort.

The easiest way to achieve this form of hiding in Scheme is with local. A local-expression has this shape:

(local ((define <variable> <rhs-expression>)
	...)
  <body-expression>)

It consists of a sequence of local definitions. Each of these definitions looks just like a global definition, using define or define-struct as the keyword. The last line is the body expressions.

We tend to think of the local definitions as equations, and call the <variable>s left-hand sides. The <rhs-expression>s are the right hand sides of the equations. The <variable>s are visible within the local-expression -- both in the equations and the <body-expression> but not outside.

A local-expression evaluates all the equations and associates the <variable>s with the values during the evaluation of the <body-expression>. The evaluation of an equation may refer to the value of an earlier <variable>. For example,

(local ([define r (shape-radius s)]
	[define area (* pi r r)])
  ...)

It is acceptable for the definition of area to refer to r, because r is defined before area.

Let us return to the merge-sort example. Using local, it is straightforward to create a function definition that hides all auxiliary definitions. We define a new function whose function body is a local-expression and place all definitions with cut-and-paste into the definition part of the local. The body of the local is a call to the main function. In our example, the local then contains the definitions of merge-sort, merge-until-singleton, merge-neighbors, and merge; the body is a call to merge-sort. Since the variable names defined inside of local don't interfere with variable names outside of local, it is perfectly acceptable to use the same function name for the globally visible function as one of the locally defined functions.

9.2  Scope, Extent, and Garbage Collection

The combination of blocks and first-class functions introduces a significant and useful distinction between the scope of a variable and the extent of values. While this was already visible with the introduction of functions as value -- because nested lambda-expression introduce nested variable scopes -- it is even more obvious and more important with local.

Consider the equations in the following local-expression:

;; (Number  -->  Number)  -->  (Number  -->  Number)
(define (derivative f)
  (local ((define (fprime x)
	    (/ (- (f (+ x epsilon)) (f (- x epsilon)))
	       2 epsilon))
	  (define epsilon 0.01))
    fprime))

The first one introduces a function definition whose function body refers to the second one, a variable definition. The body of the local-expression is just fprime, the first locally defined variable, which means that (derivative f) produces a function. When this function is invoked, it must necessarily use the definitions of fprime and epsilon -- even though these definitions aren't visible outside of the local-expression.

The distinction is one of syntax and semantics, notation and meaning. Like most languages, Scheme is lexically scoped. That is, the program text determines where a variable is visible. In contrast, values live forever and can flow everywhere. With it, all values that are connected to a value are usable. Thus, if some structure flows into a function, it can extract its components, the components of these components, and so on. Similarly, if a function receives some other function, it can invoke this function and all values that are connected to this function.

If you ever have any questions about the scoping rules of Scheme, use DrScheme's syntax checker. Like the syntax checker of conventional programming environments, it high-lights the various pieces of syntax in distinct colors. In addition, it also graphically illustrates the scope of a program with arrows. The arrows show up when a programmer mouses over a variable definition or a variable occurrence. They point from the definition of the variable to its use. Using such arrows, it is easy to see which variables belong together and which ones don't.

Take a look at figure 19 for a concrete example. There are two definitions for merge-sort: the global one and the local one. The arrow that originates from the global one points to the test case. The arrow that originates from the local definition of merge-sort points to the application of the function in the body of local. In addition, the figure also shows where the function merge-until-singleton is used and where its parameter l is used inside the function body.

Figure 19:  Lexical scope

When Flow comes on-line do a screen shot for merge-local, to illustrate the distinction.

Explaining the extent of values is of course impossible, because it is a concern of meaning not notation. The best we can therefore do is approximate the flow of values and draw approximate flow graphs for values on top of programs. As mentioned, the result of calling derivative is a function. Originally fprime is the name of the function but this name is only valid within the local-expression. Indeed, we could (and often must) give the result a new name:

;; Number  -->  Number
(define sinp (derivative sin))

;; Number  -->  Number
(define cosp (derivative cos))

;; Number  -->  Number
(define polyp 
  (derivative (lambda (x) (+ (* 3 x x) (* -2 x) 12))))

Here we used derivative three times, producing three different functions. Each of these new functions receives a name: sinp, cosp, and polyp, respectively. When any of these functions is invoked, it refers back to the definition of fprime and with it to the original argument of derivative, which are sin, cos, and a polynomial, respectively. In other words, sinp keeps fprime and with it sin alive.

At first glance, keeping all values alive forever sounds like a huge demand for a programming system. After all, all these values need a lot of storage space. Fortunately, the old Lispers recognized a long time ago that most computations lose track of their values eventually. At that point, the value doesn't make a difference to the outcome of the computation and the programming system can safely reclaim the storage that this value claims.

The process of reclaiming storage space for values is called garbage collection. Like all semantic processes that work with syntax, it is an approximate process, but by keeping it conservative, we know that it never collects a value that the computation might need again. To do this safely and yet efficiently, researchers have developed a multitude of automated proof techniques. Since software is much better at keeping track of such details as connections among values, these techniques eventually reached a sophistication where they are far better than programmers at keeping track of values and their storage space. In this day and age, it is almost always foolish for a programmer to want to do this manually and we're better off leaving this task to a busy beaver.

9.3  Variations on a theme

For some uses, local-expressions are notationally too heavy. Scheme therefore offers three additional forms of blocks:

  1. the let-expression with a number of parallel, local definitions:

    (let ([<variable> <rhs-expression>]
          ...)
      <body-expression>)
    

    All definitions are visible in <body-expression>, but they are invisible to each other.

  2. the let*-expression with a number of sequential, local definitions:

    (let* ([<variable-1> <rhs-expression-1>]
           [<variable-2> <rhs-expression-2>]
           ...)
      <body-expression>)
    

    This kind of block introduces one definition at a time. Thus, <rhs-expression2> can use the value of <variable-1>. All of the definitions are visible in <body-expression>.

  3. the letrec-expression with a number of mutually recursive, local definitions:

    (letrec ([<variable-1> <rhs-expression-1>]
    	 [<variable-2> <rhs-expression-2>]
    	 ...)
      <body-expression>)
    

    The definitions are visible to each other and in <body-expression>. This form of block is usually used to introduce mutually referential functions (as lambda-expression). It is an older form of local, which is preferable due to its improved syntax and error-reporting.

Let's use our new acquaintances to revisit an old friend from the early sections of this chapter: fill-or-truncate from page 5. Its purpose is to turn a string into an n-character wide string, either by truncating the line or by adding spaces. The length of the string naturally plays a central role in this computation and deserves a name:

;; Number String  -->  String
;; to create a string of n characters, 
;; by truncating the tail or adding spaces as needed 
(define (fill-or-truncate n s)
  (let ([char# (string-length s)])
    (cond
      [(>= char# n) (substring s 0 n)]
      [else (string-append s (make-string (- n char#) #\space))])))

Here is an even more interest interesting use again: the list stays the same for all apps, presumably the resulting function is applied many times, so we remember the length forever:

;; (Listof X)  -->  ( -->  X)
;; generate a function that randomly chooses an item from a list

(define (pick-from-list l)
  (lambda ()
    (list-ref l (random (length l)))))

                
(define (pick-from-list l)
  (let ([n (length l)])
    (lambda ()
      (list-ref l (random n)))))

;; Example: 
;;  -->  (union Color 'star)
(define roll-color-die
  (pick-from-list (list red green blue yellow white 'star)))
The left version of pick-from-list is something that we could have written at the end of section 5.2 without any hesitation. Since the result of pick-from-list is presumably called quite often, we want to avoid the re-computation of (length l) on every call and therefore use a let to name the length outside of the lambda, which is what the right version of the function does. This use of let is interesting because it establishes a name for a small lambda-created function, but in that its value lives for a long time (as long as the function lives).

The definitions of pick-from-list and roll-color-die are an excerpt from a game-playing program. The following fragment is from a different module in the same program:

;; Color  -->  Color
;; roll a color die with five colors and one star, for star return given color c
;; effect: record roll in protocol 
(define (roll-star-die c) 
  (let* ([chosen (roll-color-die)]
         [new (if (eq? 'star chosen) c chosen)])
    (record-for-protocol "roll die" new)
    new))

It clearly demonstrates the pragmatics of let*. The value for new depends on the chosen symbol and is used twice: once for recording the move in the protocol of the game and once as the return value of the function.

[crash-Z-G-1.gif]                                
;; Number Number Number  -->  Number
;; to compute the volume of the sand-clock
(define (volume height-cone height-top diameter)
  (let* ([r (/ diameter 2)]
	 [area (* pi r r)]
	 [height (+ height-cone height-top)]
	 [full-cone (* 1/3 area height)]
	 [r-top (* (/ height-top height-cone) r)]
	 [area-top (* pi r-top r-top)]
	 [top-cone (* 1/3 height-top area-top)]
	 [bottom-cone (- full-cone top-cone)])
    (* 2 bottom-cone)))
Figure 20:  Fortran via let*

When we have MrFlow, we can also show the PDG.

With let*, we can also write Scheme programs that look like plain old Fortran without a single assignment statement. Take a look at figure 20. The program on the right computes the volume of a sandclock

look up word
on the left. It consists of a single let* block that sequentially computes the various values, one after another. A trained functional programmer would probably have written something like that:

;; Number Number Number  -->  Number
;; to compute the volume of the sand-clock
(define (volume height-cone height-top diameter)
  (* 2 
     (- (cone (+ height-cone height-top) diameter)
	(cone height-top (* (/ height-top height-cone) diameter)))))

;; Number Number  -->  Number
(define (cone height diameter) 
  (* 1/3 height (area-of-disk (/ diameter 2))))

;; Number  -->  Number
(define (area-of-disk r)
  (* pi r r))

While neither program uses assignments to compute the desired quantity, the second one actually composes several functions to compute the value (and thus reflects the problem's quantities).

9.4  Named Blocks -- Making loops on the fly

The last kind of block expression that Scheme provides is similar to a cross between a kitchen sink, a power drill, and a golf cart. Like all such highly generalized tools, it is useful but difficult to understand. Name let, as it is officially called, simultaneously defines a recursive functions, adds a let-style block, and then applies the function. Fortunately, it all sounds more complex than it really is.

The syntax of named let is just what it says it is, i.e., a let expression with a name:

(let <loop-variable> ([<variable> <rhs-expression>]
		      ...)
  <body-expression>)

We call the extra name <loop-variable>, because we tend to use the construct to write simple loops. The name let-expression creates a function whose parameters are <variable> ..., whose body is <body-expression>, and immediately applies the function to <rhs-expression> .... Put differently, the above is equivalent to this expression:

(local ((define (<loop-variable> <variable> ...)
	  <body-expression>))
  (<loop-variable> <rhs-expression> ...))

The extra variable is the name of a function that consumes as many arguments as the block introduces.

Since all this sounds complicated, let's look at name let in action: 1=

(let loop ()
  (let ([candidate (random N)])
    (if (= n1 candidate) (loop) candidate)))
1
(let* ([n1 (random N)]
       [n2 ])
  ...)

The purpose of this program fragment is to pick two distinct random numbers between 0 and N (exclusive). Picking the first one is easy; we just evaluate (random N). Picking the second one requires a loop. The named let-expression creates this loop. Each iteration first picks a candidate number. If this number is different from n1, we have found our second random number. If this number is the same, we need to continue our search. We accomplish this by applying the function (to no arguments) again.

Picking two distinct random numbers is easy. This takes four lines in no matter what language we use. But now suppose we need more than just two distinct randomly picked numbers.6 Naturally we write a function for this purpose:

;; N N  -->  N
;; to pick i distinct integers in [0,N)
(define (random-i i N) ...)

The problem is that the function needs to be recursive to pick many different things, but the recursion has little to do with the given values. The named let construct was invented for just such cases. Here, its arguments need to be the random numbers we have picked so far -- so that we don't pick the same number twice. Otherwise, this let-loop looks just like the one we just wrote:

... (let search ([l '()])
      (if (= (length l) i)
	  l
	  (let ([candidate (random N)])
	    (if (member candidate l)
		(search l)
		(search (cons candidate l)))))) ...

The function first checks whether it has enough numbers. If not, it picks a random candidate and, depending on whether it is a member of the list of choices or not, loops with or without the new candidate added to the list.

To make things more efficient, the search loop should also keep track of the number of random numbers that have been picked:

;; N N  -->  N
;; to pick i distinct integers in [0,N)
(define (random-i i N)
  (let search ([l '()][l# 0])
    (if (= l# i)
	l
	(let ([candidate (random N)])
	  (if (member candidate l)
	      (search l l#) 
	      (search (cons candidate l) (+ 1 l#)))))))

Even a cursory look at random-i shows that search's kind of recursion differs from all the simple recursions we have seen in this section so far. Sometimes the recursive call consumes a new argument, sometimes it just takes the old argument. Indeed, if for some reason search always picks the second branch of the cond-expression, the loop never stops. (When can this happen?) The next section explains the tricks that went into the construction of search and illustrates the use of let-blocks with a few more interesting examples.


6 Well, in reality, we will need a number of distinct items such as game tokens, trading ratios, and so on, but for this illustration, we use numbers.