Each component implements some services. Many also require the
services of other components. The design of ML also reflects the belief
that the components and connections form a dag-like hierarchy (i.e. a
graph without cycles).

**Manufacturing Components**

The design of ML also recognizes that program components should be manufactured for re-use. In particular, when a programmer puts together a new component, the component should specify what kind of things it expects from the layer underneath, not the specific components. Dually, the component will only specify what kind of things it provides to the layer above:

The ellipses in this sketch are *interfaces*; the box is a
component.

Given a collection of such components, a programmer can create systems by linking together appropriate boxes. If the boxes are programmed to their interfaces, they are easily interchangeable and re-usable.

**Functors and Signatures**

Here is the syntax of a functor declaration:

functor F(structure s1: SIG1; ...; structure s2: SIG2) :> SIG = struct ... endThe header specifies three parts:

- the name of the functor: F;
- its imported structures as formal parameters (names): s1 ... s2;
- the signatures of the imports: SIG1 ... SIG2 (which are typically names);
- and the export signature of the functor: SIG.

signature SIG = sig ... endIt introduces a name that stands for a bundle of type and value specifications. When the ML type checker checks the type correctness of a modular program, it uses the information in signatures to verify that everything works out.

**A Simple Example**

Consider the following concrete example of a signature and a functor definition:

signature VARIABLES = sig type var val equal : var * var -> bool end functor Variables() :> VARIABLES = struct type var = string fun equal(v1,v2) = (v1 = v2) endThe header of the functor definition states that

`Variables`

imports nothing and creates a structure with signature
`VARIABLES.`

According to this signature, the functor exports
the type `var`

and a function named equal that consumes a pair
of `var`

s and produces a `bool`

. It is easy to see
that the functor body indeed introduces these definitions. Indeed, from
the functor definition we can also deduce that `var`

stands
for `string`

, but of course the signature does not reveal
this fact.
Now let us take a look at the following definitions:
signature SD = sig type term end functor StaticDistance(structure v : VARIABLES) :> SD = struct ... endIt sketches a functor whose single import is a structure with signature

`VARIABLES`

. Inside of the functor's definition, we can only
assume that `v`

specifies some type `var`

, but
nothing else. In other words, we have completely hidden how variables are
implemented.
**Using Values from Structures**

Inside of a functor we can refer to the pieces of an imported structure in two ways:

- With
s.name

we refer to the entity called`name`

in import`s`

. The notation only makes sense if the signature for`s`

specifies`name`

. - If the functor's body contains the statement
open s

, all definitions specified in the signature of`s`

become lexically bound. Adding this statement is dangerous if we don't know the entire signature because it may shadow some other binding. Use it with care.

**Linking and Computing**

After all the signatures and functors are defined, we can link the components together. We start with the bottom layer, that is, with those components that do not import other components:

(* Layer 0 *) (* ------- *) structure s1 = F1() structure s2 = F2()These definitions create two structures s1 and s2 from the functors F1 and F2. Neither F1 nor F2 import any other components.

Now that we have the 0th or bottom layer, we can create those components that link to the bottom layer:

(* Layer 1 *) (* ------- *) structure s3 = F3(structure s = s1; structure t = s2) structure s4 = F4(structure a = s3; structure t = s2) ...This example assumes that both functors consume two components. Both components are "passed by label" to the functors, i.e., the formal parameter is explicitly connected to the argument. The first functor imports

`s1`

and `s2`

; the second one is linked to
`s3`

(the result of applying F3 a first time) and
`s1`

.
Since the language of components is a functional language, we can also use nested function application to link the components together:

(* Layer 0 *) (* ------- *) structure s2 = F2() (* retains name because used more than once *) (* Layer 1 *) (* ------- *) structure s4 = F4(structure a = F3(structure s = F1(); structure t = s2); structure t = s2) ...The structure resulting from F2() remains named because it is used twice.

[Note: The application of a functor can have a side-effect for both type checking and evaluation. It is therefore not just an option, but a necessity to consider whether F2 should be applied once or twice. The precise nature of the problem is beyond the scope of these notes.]

**A Simple Example: Tables**

Let us design a functor that creates environments for interpreters, i.e., tables that associate some form of variable with some form of value. An interpreter for a statically typed programming language requires at least two kinds of environments: one for mapping variables to run-time values and another one for mapping type variables to types. So being able to create environments several times, for different kinds of variables, is a natural desire.

When we design functors, we start by considering its export signature. An environment table supports two kinds of actions:

- the extension of an existing environment with a variable and a value, and
- the lookup of a variable in an existing environment.

This sketch suggests the following outline of a signature:

signature TABLE = sig type table val empty : table val extend : table * a-variable * a-value -> table val lookup : table * a-variable -> a-value endThe signature is incomplete because it doesn't specify what kind of types

`a-variable`

and `a-value`

are. For that part, we
need to think ahead and see whether these types are constrained in any
way or whether they can be arbitrary.
The answer is straightforward in both cases. On one hand, to lookup a
variable in an environment table, we need to compare two variables, that
is, we need a comparison operation. In other words, to describe the
"domain" of a table we need a type and an operation. This means the
functor for creating tables imports a structure that specifies these two
things. On the other hand, the kind of values we can store in a table is
arbitrary. Thus we can use a *type variable* to stand for
`a-value`

. Here is the final signature for tables:

signature TABLE = sig type domain type 'a table val empty : 'a table val extend : 'a table * domain * 'a -> 'a table exception NotFound of domain val lookup : 'a table * domain -> 'a endIt says that a table is a parametric type. The parameter

`'a`

of the table type stands in for the kinds of values that a table can
store. Furthermore,
`empty`

,
`extend`

,
and
`lookup`

are polymorphic operations on this type. Finally, the signature includes
an exception NotFound because looking up a variable does not succeed if
the table is empty or, more generally, if the table does not contain an
association for the variable.
Next we write down what kind of structures we expect as imports to the functor. As mentioned already, the table functor must import a structure that describes the domain elements. We also agreed that this structure must specify a type and an equality operation on the type:

signature DOMAIN = sig type var val equal : var * var -> bool end

Now we are ready to specify the header of the functor for environment tables:

functor Table(structure domain : DOMAIN) :> TABLEThat is, the functor inputs a DOMAIN structure and creates a TABLE structure, or, it imports items according to the DOMAIN specification and exports items as specified by TABLE.

Coding the definitions in the functor is a straightforward exercise:

functor Table(structure d : DOMAIN) :> TABLE = struct type domain = d.var type 'a table = domain -> 'a exception NotFound of domain val empty = (fn n => raise NotFound(n)) fun extend(t,n,m) = (fn i => if d.equal(n,i) then m else t(i)) fun lookup(t,n) = t(n) end

Here are two functors for creating DOMAIN structures:

functor Var() :> DOMAIN = struct type var = int val equal = (op =) end functor TypeVar() :> DOMAIN = struct type var = string val equal = (op =) endThe first one implements variables as integers ("static distance coordinates"), the second as strings. In both case, the equality operation is just equality.

Linking functors together means to write down the intended functor applications. In this example, the Var and TypeVar functors make up layer 0:

structure var = Var() structure type_var = TypeVar()Layer 1 is created with two applications of the Table functor:

structure run_time = Table(structure domain = var); structure tck_time = Table(structure domain = type_var);

After linking, we are ready to evaluate some simple examples, e.g., extending an environment with a variable and a value and looking up the same variable:

run_time1.extend(run_time1.empty,1,true)or

open run_time1; extend(empty,1,true)Both expressions should create a

`bool table`

. Unfortunately,
the type checking fails (in both both cases). To understand this problem,
we need to discuss the rules of typing components and propagating types
through functor applications.
**Refining Signatures, Propagating Types**

The expression

run_time1.extend(run_time1.empty,1,true)is type-checked according to the type information specified in the signatures. The signature of

`run_time1.extend`

is TABLE and specifies the type of `extend`

as
'a table * domain * 'a -> 'a tableThe type checker can obviously set

`'a`

to `bool`

but it cannot verify that 1 is of type `domain`

. After all,
the signature only specifies that the type exists, it doesn't say what it
is.
From the functor *definition* we know that the type
`domain`

is the same as the type `var`

in the
functor's argument. From the functor *applications* we know that the
type `var`

might be `int`

or `string`

.
And, of course, we can imagine other applications of `Table`

with yet more implementations of the type var. The type checker, however,
cannot infer this knowledge from just the signature of Table. Worse, it
can't even type check an expression like

var.equal(1,1)because the signature of DOMAIN specifies only that

`var`

exports a type `var`

but not what it is. In short, our
signatures introduce a completely abstract type `var`

, and it is
impossible for us at the moment to create any elements of that type.
The first step towards a solution is to reveal the nature of the
`var`

types in the functors Var and TypeVar. Since programmers
also want to specify the kernel of their signature `DOMAIN`

only
once, ML provides the construct `where`

for the refinement of
existing signatures. Here is how we can use it:

functor Var() :> DOMAIN where type var = int = struct ... as above ... end functor TypeVar() :> DOMAIN where type var = string = struct ... as above ... endNow the type checker can type

`var.equal(1,1)`

but it still
can't type expressions over runtime1.
To solve this last problem, we need to add a `where`

-clause
to `Table`

's signature:

functor Table(structure d : DOMAIN) :> TABLE where type domain = d.var = struct ... as above ... endIt reveals that the exported type

`domain`

and the imported type
`var`

(from structure d) are identical. The type checker
verifies this specification (by inspecting the functor body) and uses the
information to check expressions over the structure that results from
applying the functor. That is, we propagate types with where-clause
propagates as necessary.
**Changing Components**

At first glance, the problem seems to stem from our desire to hide the
exact nature of types. For the types `var`

in DOMAIN and ```
domain
```

in DOMAIN it is already obvious why we want to hide their
nature. They differ from case to case.
Although less obvious, the same is true for `'a table`

in
TABLE. Suppose we wish to implement tables via lists, not functions.
Because we didn't reveal anything about the implementation of tables,
we can actually do this without changing anything else in the program:

functor Table_via_list(structure d : DOMAIN) :> TABLE where type domain = d.var = struct type domain = d.var type 'a table = (domain * 'a) list val empty = [] fun extend(t,i,m) = (i,m)::t exception NotFound of domain fun lookup((i,m)::t,n) = if d.equal(n,i) then m else lookup(t,i) | lookup([],n) = raise NotFound(n) endWe simply program a functor with the imports and exports, but a new body. We can use the same name if we wish to replace all uses of the original functor. Or we can use a new name if we wish to use both forms of tables, say, for efficiency reasons, in the program.

In general, Opaque types increase re-usability. Keeping types opaque in
signatures forces programmers to use the published operations, not their
own functions on these types. Hence, we can easily change the
implementation of structures *without* affecting (not even
re-type-checking) those modules that import the modified structure.

**The "Diamond Problem"**

Consider the following situation:

`makeA`

will use functions from b to produce inputs for
functions in c (and possibly vice versa). Indeed, this is often desirable
and necessary but to do this properly a programmer may have to specify
additional constraints on the relationship between imports.
The problem is best explained with a concrete example, let us study the
design of a binary search tree component. Recall that a binary search tree
is a binary tree that stores information in a sorted manner. More
specifically, each node contains a `key`

and each leaf contains
some piece of information, which includes a key. Given some node,
all leafs to the left have a key that is less than the key in the node; all
leafs to the right have a key that is larger or equal than the key in the
node. Here is the definition of the appropriate datatype:

type key = ... some type with an order ... type info = ... some compound type, with keys ... datatype bst = Leaf of info | Node of bst * key * bstThe basic operations on binary search trees are creating a tree from some information, searching a tree for information that contains a specific tree, and inserting nodes into the tree:

fun create(i) = Leaf(i) fun insert(i,b) = ... fun lookup(k,b) = ...

The description suggests that the Bst functor imports two structures: one concerning information and another concerning ordered keys:

- An ordered structure must introduce a type, an equality predicate, and an
ordering predicate.
signature ORDERED = sig type ord val equal : ord * ord -> bool val less : ord * ord -> bool end

- An information structure must contain keys and values proper. It must
also provide functions for combining the two and for extracting them:
signature INFO = sig type info type key type values val make : key * values -> info val info_key : info -> key val info_val : info -> values end

- The signature for binary search trees and the functor header are now
obvious:
signature BST = sig type info type key type bst val create : info -> bst val insert : info * bst -> bst val lookup : key * bst -> info end functor Bst_F(structure i : INFO; structure o : ORDERED) :> BST where type info = i.info and type key = i.key = ...

Based on this outline, we can refine the code for, say,
`lookup`

. When `lookup`

encounters a leaf, it must
check whether the information contains the key we are looking for. If yes,
it return the information; otherwise it raises an exception:

exception NotFound fun lookup(k,Leaf(x)) = if o.equal(k,i.info_key(x)) then x else raise NotFound | lookup(k,Node(l,i,r)) = if o.less(k,i) then lookup(k,l) else lookup(k,r)The code for a node is equally straightforward.

The key to the function definition is that what `i.info_key`

returns is fed into `o.equal`

, that is, the underlying type of
the ordered structure and the keys in an information structure must be
identical. If they aren't, we have a type conflict.

We could again use a `where`

-clause to specify the desired
constraint for the type checker:

functor Bst_F(structure i : INFO; structure o : ORDERED where type o.ord = i.key)Of course, we could also swap the equation and demand

```
i.key =
o.ord
```

. In either case, the type checker will ensure that the two
types are equal as it attempts to type check the functor.
In general, a functor might import many structures and three or more may
need to agree on a type. For this quite common situation, ML provides a
`sharing`

-clause, which permits programmers to specify the
constraint in a succinct fashion. For the running example, the
specification is

functor Bst_F(structure i : INFO; structure o : ORDERED sharing type i.key = o.ord)In general,

`sharing type`

might be followed by many equations,
e.g.,
`a.atype = b.btype = c.ctype = ...`

[Note: The generativity of datatype makes it necessary to speak of type identity (in the sense of Scheme's eq? rather than type equality (in the sense of Scheme's equal? but, as mentioned above, this topic is beyond the scope of the notes here.]

ML's designer clearly showed a lot of foresight in the early 1980s with their careful design of program components and a language for linking them. Software engineers only now re-discover these ideas, typically called architectures or product lines (see Batory's talk). However, ML's module language still does not accommodate re-use as much as "product line programming" demands. There are three specific problems. First, ML makes it difficult to re-use functors with signatures that weren't designed together. Second, linking enforces a program organization that adheres to a strictly hierarchical schema; as practice has shown, systems are often cyclic graphs of components, not just dags. Finally, the language of linking is a restrictive first-order functional language. Again, practice suggests that on occasion "component programmers" want more power.

**Renaming at Module Boundaries**

Suppose some ML programmer has created a functor for integers:

signature INTS = sig type my_int val zero : my_int val succ : my_int -> my_int val is_zero : my_int -> bool val equal : my_int * my_int -> bool val less : my_int * my_int -> bool end functor Ints() :> INTS = struct datatype my_int = zero | succ of my_int fun is_zero(zero) = true | is_zero(succ(x)) = false ... endThe functor's exports clearly implements values that semantically satisfies the axioms for ORDERED -- except that the name of the relevant type in INTS is

`my_int`

and that in ORDERED is
`ord`

. Hence, we cannot directly use the structure created by
Ints where an ORDERED structure is needed. Instead, we must implement a
functor that adapts the Ints() structure to ORDERED:
functor Ints_to_Ord(structure i : INTS) :> ORDERED = struct open i type ord = my_ints end structure ints = Ints() structure ord_ints = Ints_to_ord(ints)The functor imports an INTS structure, opens it, adds a type definition for ord, and thus creates a structure that implementes ORDERED.

[Note 1: ML's full module language slightly simplifies this step, but the problem exists nevertheless.]

[Note 2: This functor implements the ML-equivalent of a GoF adapter pattern.]

**Mutual References among Collections of Structures**

In many cases, it is natural for a programmer to divide up type and function definitions among components so that they form a cyclic graph. ML does not accommodate mutual references during linking and thus forces programmer to merge all components with mutual references into one large module.

Let us briefly return to the realm of interpreters again. For a dynamically typed language, we only need one form of environment table. We might therefore specify tables with an explicit range type:

signature TABLE = sig type domain type value type table val empty : table val extend : table * domain * value -> table val lookup : table * domain -> value end functor Table_F(...) :> TABLE = ...

Similarly, we might specify closures in a separate package:

signature CLOSURE = sig type code type environment type closure make_closure : code * environment -> closure type value apply_closure : closure * value -> value end functor Closure(...) :> CLOSURE = ...But clearly, a closure is one of many kinds of values and an environment is a part of a closure. Thus, to create a

`closure`

structure, we
need a `environment`

structure and vice versa. ML's rules
disallow this kind of natural division of labor.
**A Full-Fledged Language for "Component Programming"**

In many system development situations, it is necessary to link in a conditional manner. While a GUI is still under development, a programmer may wish to link a component to a text-based i/o module; indeed, for maintenance reasons, this capability should be maintained beyond the first phase. Hence, we should be able to write the following kind of expression:

structure foo = if gui_exist() then Foo_with_gui_sig(gui) else Foo_with_txt(txt)More generally, we might wish to program with components in a language that is pretty much a full-fledged programming language, perhaps even core ML itself. This would also mean that the programmer has to learn fewer syntactic constructs and has more liberties in composing systems.

This essay does not represent the (complete) viewpoint of traditional MLers. For this story, you may wish to read Bob Harper's presentations on SML and its Module Language.

Batory, Product Line Architectures (PPT)

Felleisen and Flatt, Component Programming with MzScheme (Draft)

Harper, Programming in Standard ML

Harper, Module Language