Components and Types

Matthias Felleisen

Components | A Blitz Course | Functors | Critique | Warning | References

Programs Consist of Components

Small programs are collections of type, function, and value definitions. In the past, we have produced large programs by organizing groups of such definitions into files and "gluing" them together in some form. This method of producing large programs, however, is inefficient and does not encourage the re-use of functions beyond basic libraries. One solution to overcome this problem is to use an object-oriented language. In an object-oriented languages, programmers group together some of these definitions in classes. Each class represents a collection of data; the "functions" in a class represent those actions that are naturally associated with this data. To re-use definitions, a programmer can extend a class, override some of its existing functionality, and add new types, functions, and data components as desired. For a more complete description, see Batory's work, especially his referenced talk on Product Line Architectures and its type equations. ML is a typed functional programming languages that offers an alternative to these dominant approaches. It provides a linguistic tool for grouping definitions: modules. In contrast to a class, a module is simply a group of definitions with no other associations. One can view modules as a middle ground between highly structured classes (data) and unstructured groups in files. The design of ML recognizes that a typical program consists of layers of interconnected components:

[Program Structure]

Each component implements some services. Many also require the services of other components. The design of ML also reflects the belief that the components and connections form a dag-like hierarchy (i.e. a graph without cycles).


Manufacturing Components

The design of ML also recognizes that program components should be manufactured for re-use. In particular, when a programmer puts together a new component, the component should specify what kind of things it expects from the layer underneath, not the specific components. Dually, the component will only specify what kind of things it provides to the layer above:

[Functors]

The ellipses in this sketch are interfaces; the box is a component.

Given a collection of such components, a programmer can create systems by linking together appropriate boxes. If the boxes are programmed to their interfaces, they are easily interchangeable and re-usable.

An ML Blitz Course

For readers who need a quick refresher, here is a quick and dirty summary by example of the first eight chapters.

Functors, Signatures, and Structures

Our description suggests that the templates for components are functions that map components to components. More generally, the "programming language of components" is a functional language whose values are structures and whose types are signatures. In ML, these "functions" are called functors; their interfaces are referred to as signatures. The result of a functor application is a structure, which, of course, also play the role of inputs to other functors. The next few sections explain the precise role of functors, signatures, and structures.

Functors and Signatures

Here is the syntax of a functor declaration:

functor F(structure s1: SIG1; ...; structure s2: SIG2) 
  :> SIG 
  = 
  struct 
    ...
  end
The header specifies three parts: The functor body is roughly a sequence of type, function, and value definitions. These definitions may refer to the imported structures (see below). They must define all those things promised in the export signature SIG. A signature is defined as follows:
signature SIG 
  = 
  sig
    ...
  end
It introduces a name that stands for a bundle of type and value specifications. When the ML type checker checks the type correctness of a modular program, it uses the information in signatures to verify that everything works out.

A Simple Example

Consider the following concrete example of a signature and a functor definition:

signature VARIABLES 
  = 
  sig 
    type var
    val equal : var * var -> bool
  end

functor Variables() 
  :> VARIABLES
  = 
  struct 
    type var = string 
    fun equal(v1,v2) = (v1 = v2)
  end
The header of the functor definition states that Variables imports nothing and creates a structure with signature VARIABLES. According to this signature, the functor exports the type var and a function named equal that consumes a pair of vars and produces a bool. It is easy to see that the functor body indeed introduces these definitions. Indeed, from the functor definition we can also deduce that var stands for string, but of course the signature does not reveal this fact. Now let us take a look at the following definitions:
signature SD 
  = 
  sig
    type term
  end 

functor StaticDistance(structure v : VARIABLES) 
  :> SD
  = 
  struct 
    ...
  end
It sketches a functor whose single import is a structure with signature VARIABLES. Inside of the functor's definition, we can only assume that v specifies some type var, but nothing else. In other words, we have completely hidden how variables are implemented.

Using Values from Structures

Inside of a functor we can refer to the pieces of an imported structure in two ways:

  1. With
    s.name
    we refer to the entity called name in import s. The notation only makes sense if the signature for s specifies name.
  2. If the functor's body contains the statement
    open s
    , all definitions specified in the signature of s become lexically bound. Adding this statement is dangerous if we don't know the entire signature because it may shadow some other binding. Use it with care.

Linking and Computing

After all the signatures and functors are defined, we can link the components together. We start with the bottom layer, that is, with those components that do not import other components:

(* Layer 0 *)
(* ------- *)
structure s1 = F1()
structure s2 = F2()
These definitions create two structures s1 and s2 from the functors F1 and F2. Neither F1 nor F2 import any other components.

Now that we have the 0th or bottom layer, we can create those components that link to the bottom layer:

(* Layer 1 *)
(* ------- *)
structure s3 = F3(structure s = s1; structure t = s2)
structure s4 = F4(structure a = s3; structure t = s2)
...
This example assumes that both functors consume two components. Both components are "passed by label" to the functors, i.e., the formal parameter is explicitly connected to the argument. The first functor imports s1 and s2; the second one is linked to s3 (the result of applying F3 a first time) and s1.

Since the language of components is a functional language, we can also use nested function application to link the components together:

(* Layer 0 *)
(* ------- *)
structure s2 = F2() (* retains name because used more than once *)

(* Layer 1 *)
(* ------- *)
structure s4 = F4(structure a = F3(structure s = F1(); structure t = s2); 
                  structure t = s2)
...
The structure resulting from F2() remains named because it is used twice.

[Note: The application of a functor can have a side-effect for both type checking and evaluation. It is therefore not just an option, but a necessity to consider whether F2 should be applied once or twice. The precise nature of the problem is beyond the scope of these notes.]

A Simple Example: Tables

Let us design a functor that creates environments for interpreters, i.e., tables that associate some form of variable with some form of value. An interpreter for a statically typed programming language requires at least two kinds of environments: one for mapping variables to run-time values and another one for mapping type variables to types. So being able to create environments several times, for different kinds of variables, is a natural desire.

When we design functors, we start by considering its export signature. An environment table supports two kinds of actions:

  1. the extension of an existing environment with a variable and a value, and
  2. the lookup of a variable in an existing environment.
Since an extension already assumes an environment, the environment component should also provide a basic environment: the empty environment.

This sketch suggests the following outline of a signature:

signature TABLE = 
    sig
	type table 

	val empty : table
	val extend : table * a-variable * a-value -> table
	val lookup : table * a-variable -> a-value
    end
The signature is incomplete because it doesn't specify what kind of types a-variable and a-value are. For that part, we need to think ahead and see whether these types are constrained in any way or whether they can be arbitrary.

The answer is straightforward in both cases. On one hand, to lookup a variable in an environment table, we need to compare two variables, that is, we need a comparison operation. In other words, to describe the "domain" of a table we need a type and an operation. This means the functor for creating tables imports a structure that specifies these two things. On the other hand, the kind of values we can store in a table is arbitrary. Thus we can use a type variable to stand for a-value. Here is the final signature for tables:

signature TABLE = 
    sig
	type domain
	type 'a table 

	val empty : 'a table
	val extend : 'a table * domain * 'a -> 'a table

	exception NotFound of domain
	val lookup : 'a table * domain -> 'a
    end
It says that a table is a parametric type. The parameter 'a of the table type stands in for the kinds of values that a table can store. Furthermore, empty, extend, and lookup are polymorphic operations on this type. Finally, the signature includes an exception NotFound because looking up a variable does not succeed if the table is empty or, more generally, if the table does not contain an association for the variable.

Next we write down what kind of structures we expect as imports to the functor. As mentioned already, the table functor must import a structure that describes the domain elements. We also agreed that this structure must specify a type and an equality operation on the type:

signature DOMAIN =
    sig
	type var
	val equal : var * var -> bool
    end

Now we are ready to specify the header of the functor for environment tables:

functor Table(structure domain : DOMAIN) :> TABLE 
That is, the functor inputs a DOMAIN structure and creates a TABLE structure, or, it imports items according to the DOMAIN specification and exports items as specified by TABLE.

Coding the definitions in the functor is a straightforward exercise:

functor Table(structure d : DOMAIN) 
    :> TABLE 
    =
    struct 
	type domain = d.var
	type 'a table = domain -> 'a

	exception NotFound of domain
	val empty = 
	    (fn n => raise NotFound(n))

	fun extend(t,n,m) = 
	    (fn i => 
	     if d.equal(n,i) then m
	     else t(i))

	fun lookup(t,n) = t(n)
    end

Here are two functors for creating DOMAIN structures:

functor Var() :> DOMAIN 
    = 
    struct 
	type var = int
	val equal = (op =)
    end

functor TypeVar() :> DOMAIN
    = 
    struct 
	type var = string
	val equal = (op =)
    end
The first one implements variables as integers ("static distance coordinates"), the second as strings. In both case, the equality operation is just equality.

Linking functors together means to write down the intended functor applications. In this example, the Var and TypeVar functors make up layer 0:

structure var = Var()

structure type_var = TypeVar()
Layer 1 is created with two applications of the Table functor:
structure run_time = Table(structure domain = var);

structure tck_time = Table(structure domain = type_var);

After linking, we are ready to evaluate some simple examples, e.g., extending an environment with a variable and a value and looking up the same variable:

run_time1.extend(run_time1.empty,1,true) 
or
open run_time1;

extend(empty,1,true) 
Both expressions should create a bool table. Unfortunately, the type checking fails (in both both cases). To understand this problem, we need to discuss the rules of typing components and propagating types through functor applications.

Refining Signatures, Propagating Types

The expression

run_time1.extend(run_time1.empty,1,true) 
is type-checked according to the type information specified in the signatures. The signature of run_time1.extend is
TABLE and specifies the type of extend as
'a table * domain * 'a -> 'a table
The type checker can obviously set 'a to bool but it cannot verify that 1 is of type domain. After all, the signature only specifies that the type exists, it doesn't say what it is.

From the functor definition we know that the type domain is the same as the type var in the functor's argument. From the functor applications we know that the type var might be int or string. And, of course, we can imagine other applications of Table with yet more implementations of the type var. The type checker, however, cannot infer this knowledge from just the signature of Table. Worse, it can't even type check an expression like

var.equal(1,1)
because the signature of DOMAIN specifies only that var exports a type var but not what it is. In short, our signatures introduce a completely abstract type var, and it is impossible for us at the moment to create any elements of that type.

The first step towards a solution is to reveal the nature of the var types in the functors Var and TypeVar. Since programmers also want to specify the kernel of their signature DOMAIN only once, ML provides the construct where for the refinement of existing signatures. Here is how we can use it:

functor Var() 
    :> DOMAIN where type var = int
    = 
    struct
      ... as above ...
    end

functor TypeVar() 
    :> DOMAIN where type var = string
    = 
    struct 
      ... as above ...
    end
Now the type checker can type var.equal(1,1) but it still can't type expressions over runtime1.

To solve this last problem, we need to add a where-clause to Table's signature:

functor Table(structure d : DOMAIN) 
    :> TABLE where type domain = d.var 
    =
    struct 
     ... as above ...
    end
It reveals that the exported type domain and the imported type var (from structure d) are identical. The type checker verifies this specification (by inspecting the functor body) and uses the information to check expressions over the structure that results from applying the functor. That is, we propagate types with where-clause propagates as necessary.

Changing Components

At first glance, the problem seems to stem from our desire to hide the exact nature of types. For the types var in DOMAIN and domain in DOMAIN it is already obvious why we want to hide their nature. They differ from case to case. Although less obvious, the same is true for 'a table in TABLE. Suppose we wish to implement tables via lists, not functions. Because we didn't reveal anything about the implementation of tables, we can actually do this without changing anything else in the program:

functor Table_via_list(structure d : DOMAIN) 
    :> TABLE where type domain = d.var 
    =
    struct 
	type domain = d.var
	type 'a table = (domain * 'a) list

	val empty = []

	fun extend(t,i,m) = (i,m)::t

	exception NotFound of domain
	fun lookup((i,m)::t,n) 
	    = if d.equal(n,i) 
	         then m 
	         else lookup(t,i)
	  | lookup([],n) = raise NotFound(n)
    end
We simply program a functor with the imports and exports, but a new body. We can use the same name if we wish to replace all uses of the original functor. Or we can use a new name if we wish to use both forms of tables, say, for efficiency reasons, in the program.

In general, Opaque types increase re-usability. Keeping types opaque in signatures forces programmers to use the published operations, not their own functions on these types. Hence, we can easily change the implementation of structures without affecting (not even re-type-checking) those modules that import the modified structure.

The "Diamond Problem"

Consider the following situation:

[Diamond Structure]

Component A uses services from two other components: B and C. In general, makeA will use functions from b to produce inputs for functions in c (and possibly vice versa). Indeed, this is often desirable and necessary but to do this properly a programmer may have to specify additional constraints on the relationship between imports.

The problem is best explained with a concrete example, let us study the design of a binary search tree component. Recall that a binary search tree is a binary tree that stores information in a sorted manner. More specifically, each node contains a key and each leaf contains some piece of information, which includes a key. Given some node, all leafs to the left have a key that is less than the key in the node; all leafs to the right have a key that is larger or equal than the key in the node. Here is the definition of the appropriate datatype:

type key  = ... some type with an order ...
type info = ... some compound type, with keys ...

datatype bst = Leaf of info | Node of bst * key * bst
The basic operations on binary search trees are creating a tree from some information, searching a tree for information that contains a specific tree, and inserting nodes into the tree:
fun create(i) = Leaf(i)
fun insert(i,b) = ...
fun lookup(k,b) = ... 

The description suggests that the Bst functor imports two structures: one concerning information and another concerning ordered keys:

Based on this outline, we can refine the code for, say, lookup. When lookup encounters a leaf, it must check whether the information contains the key we are looking for. If yes, it return the information; otherwise it raises an exception:

	exception NotFound
	fun lookup(k,Leaf(x)) = 
	    if o.equal(k,i.info_key(x)) then x else raise NotFound 
	  | lookup(k,Node(l,i,r)) =
	    if o.less(k,i) then lookup(k,l) else lookup(k,r)
The code for a node is equally straightforward.

The key to the function definition is that what i.info_key returns is fed into o.equal, that is, the underlying type of the ordered structure and the keys in an information structure must be identical. If they aren't, we have a type conflict.

We could again use a where-clause to specify the desired constraint for the type checker:

functor Bst_F(structure i : INFO;
	      structure o : ORDERED 
	      where type o.ord = i.key)
Of course, we could also swap the equation and demand i.key = o.ord. In either case, the type checker will ensure that the two types are equal as it attempts to type check the functor.

In general, a functor might import many structures and three or more may need to agree on a type. For this quite common situation, ML provides a sharing-clause, which permits programmers to specify the constraint in a succinct fashion. For the running example, the specification is

functor Bst_F(structure i : INFO;
	      structure o : ORDERED 
	      sharing type i.key = o.ord)
In general, sharing type might be followed by many equations, e.g.,

a.atype = b.btype = c.ctype = ...

[Note: The generativity of datatype makes it necessary to speak of type identity (in the sense of Scheme's eq? rather than type equality (in the sense of Scheme's equal? but, as mentioned above, this topic is beyond the scope of the notes here.]

A Critique of ML's Perspective

ML's designer clearly showed a lot of foresight in the early 1980s with their careful design of program components and a language for linking them. Software engineers only now re-discover these ideas, typically called architectures or product lines (see Batory's talk). However, ML's module language still does not accommodate re-use as much as "product line programming" demands. There are three specific problems. First, ML makes it difficult to re-use functors with signatures that weren't designed together. Second, linking enforces a program organization that adheres to a strictly hierarchical schema; as practice has shown, systems are often cyclic graphs of components, not just dags. Finally, the language of linking is a restrictive first-order functional language. Again, practice suggests that on occasion "component programmers" want more power.

Renaming at Module Boundaries

Suppose some ML programmer has created a functor for integers:

signature INTS
    = 
    sig
	type my_int

	val zero : my_int

	val succ : my_int -> my_int
	val is_zero : my_int -> bool

	val equal : my_int * my_int -> bool
	val less : my_int * my_int -> bool 
    end

functor Ints() :> INTS 
    = 
    struct 
	datatype my_int = zero | succ of my_int 
	    
	fun is_zero(zero) = true
	  | is_zero(succ(x)) = false

	...
    end 
The functor's exports clearly implements values that semantically satisfies the axioms for
ORDERED -- except that the name of the relevant type in INTS is my_int and that in ORDERED is ord. Hence, we cannot directly use the structure created by Ints where an ORDERED structure is needed. Instead, we must implement a functor that adapts the Ints() structure to ORDERED:
functor Ints_to_Ord(structure i : INTS) 
    :> ORDERED 
    =
    struct 
	open i
	type ord = my_ints
    end

structure ints = Ints() 

structure ord_ints = Ints_to_ord(ints) 
The functor imports an INTS structure, opens it, adds a type definition for ord, and thus creates a structure that implementes ORDERED.

[Note 1: ML's full module language slightly simplifies this step, but the problem exists nevertheless.]

[Note 2: This functor implements the ML-equivalent of a GoF adapter pattern.]

Mutual References among Collections of Structures

In many cases, it is natural for a programmer to divide up type and function definitions among components so that they form a cyclic graph. ML does not accommodate mutual references during linking and thus forces programmer to merge all components with mutual references into one large module.

Let us briefly return to the realm of interpreters again. For a dynamically typed language, we only need one form of environment table. We might therefore specify tables with an explicit range type:

signature TABLE = 
    sig
	type domain
	type value
    	type table 

	val empty : table
	val extend : table * domain * value -> table
	val lookup : table * domain -> value
    end

functor Table_F(...) :> TABLE = ...

Similarly, we might specify closures in a separate package:

signature CLOSURE = 
    sig
      
      type code
      type environment 
      type closure 

      make_closure : code * environment -> closure 

      type value 
      apply_closure : closure * value -> value
    end

functor Closure(...) :> CLOSURE = ...
But clearly, a closure is one of many kinds of values and an environment is a part of a closure. Thus, to create a closure structure, we need a environment structure and vice versa. ML's rules disallow this kind of natural division of labor.

A Full-Fledged Language for "Component Programming"

In many system development situations, it is necessary to link in a conditional manner. While a GUI is still under development, a programmer may wish to link a component to a text-based i/o module; indeed, for maintenance reasons, this capability should be maintained beyond the first phase. Hence, we should be able to write the following kind of expression:

structure foo = if gui_exist() then Foo_with_gui_sig(gui) else Foo_with_txt(txt)
More generally, we might wish to program with components in a language that is pretty much a full-fledged programming language, perhaps even core ML itself. This would also mean that the programmer has to learn fewer syntactic constructs and has more liberties in composing systems.

Warning

This essay does not represent the (complete) viewpoint of traditional MLers. For this story, you may wish to read Bob Harper's presentations on SML and its Module Language.

References

Batory, Product Line Architectures (PPT)

Felleisen and Flatt, Component Programming with MzScheme (Draft)

Harper, Programming in Standard ML

Harper, Module Language