Programming with Components in MzScheme

Matthias Felleisen and Matthew Flatt

Draft: Mon Dec 7 17:13:05 CST 1998

Components | Units and Signatures | Linking Units Together | The Pragmatics of Using Units | Critique | References

Programs Consist of Components

Over the past few years, software engineering and programming languages have (re)discovered the idea that programs consist of components and, indeed, are manufactured from components. Of course, components are also manufactured from components; only a small number of components will actually consist of traditional code.

[Program Structure]

Researchers think of the majority of programmers as component assemblers who produce programs in a mostly codeless fashion by wiring together existing components.

Different people have different perceptions. To some, components are classes; to others, they are mixins. Microsoft advertises COM and DCOM (now ActiveX) as a framework for component programming; Corba is their major competition in the commercial world. For an overview of different approaches, see Szyperski's book.

MzScheme incorporates a new sub-language that supports programming with components: the unit language. The unit language borrows elements from a variety of module languages and synthesizes them into a new set of constructs.


Units and Signatures

Unit is the MzScheme name for a component. An atomic unit collects a number of Scheme definitions. Some of these definitions are visible from the outside; others are permanently hidden. If units were simply collections of definitions, some visible, some hidden, they would be nothing but a mechanism for defining libraries. While libraries are the most common form of re-usable code, they are not components. A component must also import (references to) values from other boxes so that one and the same component can build dictionaries from different container classes. Hence, to maximize a unit's re-use potential, a programmer can also specify definitions that are assumed to exist in the context of a unit. These are a unit's imports. A compound unit is then a unit in which several units are wired together, that is, a component programmer has connected one unit's imports with another unit's exports and so on.

In this section, we describe atomic units and signatures, the mechanism for describing imports and exports. In the next section, we turn our attention to compound programming and how programmers become component programmers.

Units

A unit consists of three sections: an import section, a body, and an export section. The import and export section describe the unit's context dependence and which services it can provide to a context. The body defines the services.

Here is the basic picture:

[Atomic Unit]

The textual syntax of an atomic unit does not quite reflect this graphical syntax. Instead, it first specifies its exports, then its imports, and finally the body. The body consists of Scheme definitions and expressions:

(unit/sig
  exports
  (import imports ...)
  def_or_exp)
The tokens unit/sig and import are keywords. By convention, we capitalize the name of a unit.

Signatures

The exports and imports of a unit are specified via signatures. A signature is a basically list of variable name. For the sake economy, signatures are specified separately from units in MzScheme:

(define-signature sig-name (variable-name ...))
By convention, we capitalize the entire name of a signature, though other conventions are feasible and used by PLT.

A Simple Example of Atomic Units

Let us take a look at a simple example: tables that associate names with values. Here is the complete definition:

(define-signature TABLE (empty extend lookup))

(define Table
  (unit/sig TABLE
    (import)
    (define empty '())
    (define (extend env x val)
      (cons (list x val) env))
    (define (lookup env x)
      (let ((r (assq env x)))
	(if r (cadr r) (error 'lookup "not found: ~s" x))))))
The first definition introduces a signature with three names: empty, extend, lookup. The second one specifies a unit that defines the same three variables and exports them. It imports nothing. The unit Table implements tables via the usual association-list data structure.

Invoking Units

A unit that does not import anything is a program. We can excute the code in a program by invoking the unit:

(invoke-unit/sig a-unit)
That will execute the unit's expressions and definitions in the specified order; the result value is the value of the last expression (if any). Thus, if we were to invoke Table, we would get no result and nothing else would change:
	  Welcome to DrScheme, version 53.
	  Language: R4RS+.
	  > (invoke-unit/sig Table)
	  > lookup
	  reference to undefined identifier: lookup

For debugging purposes, MzScheme also supports a form of program invocation that, roughly, adds the exported name to the top-level environment:

	  Welcome to DrScheme, version 53.
	  Language: R4RS+.
	  > (invoke-open-unit/sig Table)
	  > lookup
	  #
	  > 
This form is to be used with care. It is preferable to use units via linking instead.

Linking Units Together

Once basic units are available, programmers can implement large systems by wiring together units. Roughly speaking, a programmer creates a graph whose nodes are units and whose edges are connections between one unit's exports and another unit's imports:

[Compound Unit]

The result, as the picture indicates, is another unit. In MzScheme, such units are called compound units.

The syntax of a compound unit is similar to that of a letrec, though it also specifies an import and an export section (in the order suggested by unit pictures):

(compound-unit/sig
  (import (tag import) ...)
  (link
    (tag_1 : sig (unit_1 links ...))
    ...
    (tag_N : sig (unit_N links ...)))
  (export (open tag) ...))

A compound unit's imports is a (possibly empty) sequence of tagged signatures. The link-specification sets up N nodes in a graph, each with a name called a "tag." Each node is created from a unit and its link specification. A link specifies where some import comes from; in its simplest form it is just a tag, which is either some other node or an import. Finally, the exports of a compound unit are derived from the exports of its nodes; for now, we consider only the case where all exports from some tag are re-exported from the compound unit. The details of a compound-unit are best explained with examples.

A Simple Example of Compounding

Let us return to our running example. Suppose we wish to use the tables to interpret a statically typed language. Both the type checker and the interpreter will need to use tables, though for slightly different purposes. Hence, both the interpreter and the type checker unit will specify TABLE as an import:

(define-signature TYPES (type-checker))

(define Types
  (unit/sig TYPES
    (import TABLE)
    (define (type-checker exp)
      (i exp empty))
    (define (i exp env) ... )))

(define-signature INTERPRETER (interpreter))

(define Interpreter
  (unit/sig INTERPRETER
    (import TABLE TYPES)
    (define (interpreter exp)
      (if (type-checker exp)
	  (i exp empty)
	  (error 'interpreter "doesn't type check")))
    (define (i exp env) ...)))
But the Interpreter also depends on the type checker, so it also specifies TYPES as an input.

Linking the three units is straightforward:

(define Main 
  (compound-unit/sig 
    (import)
    (link (T : TABLE       (Table))
	  (C : TYPES       (Types T))
	  (I : INTERPRETER (Interpreter T C)))
    (export (open I))))
This code creates three nodes: T, C, and I. The first labels the component for tables, the second one that for the type checker, and the third the one for the interpreter proper. Since Table imports nothing, it is not followed by any links. But, Types and Interpreter import one and two sets of names, respectively. Hence, they are linked with one and two nodes, respectively. More specifically, Types is linked to T and Interpreter is linked to T and C.

The compound unit imports nothing and exports all of I's exports. Hence, we can get the following to work:

      > (invoke-open-unit/sig Main)
      > (interpreter '(let x 1 (let y 2 (+ x y))))
      3

A Complex Example of Compounding

If Tables, Types, and Interpreter are implemented by different programmers, they may wish to provide "maximally complete" units. More specifically, the Types programmer may wish to deliver a unit that imports nothing and exports the type checker; the Interpreter programmer may wish to produce a component that imports the type checker, but nothing else, and exports the interpreter. Here are these two units:

(define Types2 
  (compound-unit/sig 
    (import)
    (link (T : TABLE (Table))
	  (C : TYPES (Types T)))     
    (export (open C))))

(define Interpreter2
  (compound-unit/sig 
    (import (C : TYPES))
    (link (T : TABLE (Table))
	  (I : INTERPRETER (Interpreter T C)))     
    (export (open I))))
The first compound unit is similar to the ones we have seen so far; the second one, however, imports TYPES and uses it as a link. An import is used as a link in the exact same manner as a node: by reference to its tag. Otherwise, the definitions do not contain anything new.

To link Types2 and Interpreter2, we need to define yet another compound unit:

(define Main2
  (compound-unit/sig
    (import)
    (link (T : TYPES (Types2))
	  (I : INTERPRETER (Interpreter2 T)))
    (export (open I))))
This unit links Types2 and Interpreter2. While the former requires no imports, the latter is linked to T, the node created from Types2.

Testing Main2 is identical to testing Main.

The Pragmatics of Using Units

In principle, we can program with components based on the given description. In practice, however, we need a richer language than the one sketched so far, and we should understand a few conventions and tricks that make units truly useful.

Comments in Signatures

Signatures should not be presented without comments. Each name in a signature should be described via types, invariants, and whatever comments are necessary. Here is a rewrite of our running example, the TABLE signature:

(define-signature TABLE 
  (empty  ; the empty environment 
   extend ; env domain-element range-element -> env
   lookup ; env domain-element -> range-element
   ;; additional invariants
   ; (lookup empty any) = error
   ; (lookup (extend env x y) x) = y if (eq? x y)
   ; (lookup (extend env x y) z) = (lookup env) if (not (eq? x z))
   ))
The comments clearly specify that The additional equations completely specify the functions' behavior.

Translucent Data Encapsulation

In large projects, we may wish to hide the representation of a data structure from our clients. Hiding the nature of data structure is best done by introducing a local structure:

(define Table-safe
  (unit/sig TABLE
    (import)
    (define-struct table (t))
    (define empty (make-table '()))
    (define (extend env x val)
      (if (table? env) (extend* (table-t env) x val) 
	  (error 'extend "table expected; given ~s" env)))
    (define (extend* env x val)
      (make-table (cons (list x val) env)))
    (define (lookup env x)
      (if (table? env) (lookup* (table-t env) x)
	  (error 'lookup "table expected; given ~s" env)))
    (define (lookup* env x)
      (let ((r (assq x env)))
	(if r (cadr r) (error 'lookup "not found: ~s" x))))))

In some cases, we may also wish to grant access to (parts of) a structure, but not the ability to create (via allocation or assignment) a structure. To this end, MzScheme offers another way to specify elements of signatures:

(struct identifier (identifier ...) omission ...)
where
omission is one of:
     -selectors
     -setters
     (- variable)

To grant access to tables, without allowing clients to fake a table, we could define a signature TABLE-SAFE that enriches TABLE as follows:

(define-signature TABLE-SAFE
  ((open TABLE) (struct table (t) -setters (- make-table))))
Obviously, Table-safe satisfies this export signature:
(define Table
  (unit/sig TABLE-SAFE
    (import)
    ...))
Better still, it also satisfies TABLE, which is just a refinement of TABLE-SAFE and can thus still be used wherever Table is used.

Renaming on Import tags and/or plain renaming

Exports from Compound Units

Embedded Units more stuff here, tutorial style

Critique

Ideally, a programmer should be able to wire together components that were manufactured independently by two non-cooperating programmers. Unfortunately, the two components can only be linked directly if the two programmers happened to chose the same names--an unlikely event. Otherwise, the compound programmer needs to add a significant amount of name management code.

Consider the following (contrived) units. The first defines a unit that creates a binary search tree from two imports: an ORDERED structure and an INFORMATION structure:

(define-signature ORDERED
  (; type ord
   less ; ord ord -> bool
   ; an strict ordering 
   equal ; ord ord -> bool
   ; an equality predicate
   ))

(define-signature INFORMATION
  (; type ord
   ; type stuff 
   (struct info (key stuff))
   ; (structure:info key stuff)
   ))

(define Bst
  (unit/sig BST
    (import ORDERED INFORMATION)
    ...))
The second one defines an integer unit:
(define-signature INT
  (; type int
   zero ; int
   succ ; int -> int
   ; ...
   < ; ord ord -> bool
   ; an strict ordering 
   = ; ord ord -> bool
   ; an equality predicate
   ))

(define Int
  (unit/sig INT
    (import)
    ...))
Clearly, Int should be a unit that Bst can import, but due to the name clash the two cannot be linked directly.

One way to overcome the problem is define an adapter [cmp GoF book] unit:

(define IntAdapter
  (compound-unit/sig
    (import)
    (link (O : INT (Int))
	  (A : ORDERED ((unit/sig ORDERED
			  (import INT)
			  (define equal =)	
			  (define less  <))
			O)))
    (export (open A))))
It defines all those things demanded by ORDERED with trivial redefinitions in a "one-shot" unit. But this is a truly ugly solution and is only feasible if the adaptee does not introduce state or stateful definitions (e.g. a define-struct).

References

Batory, Product Line Architectures (PPT)

Felleisen, Components and Types

Szyperski, Component Software, Addison-Wesley, 1998