![[Program Structure]](compound.jpg)
Each component implements some services. Many also require the
services of other components. The design of ML also reflects the belief
that the components and connections form a dag-like hierarchy (i.e. a
graph without cycles).
Manufacturing Components
The design of ML also recognizes that program components should be manufactured for re-use. In particular, when a programmer puts together a new component, the component should specify what kind of things it expects from the layer underneath, not the specific components. Dually, the component will only specify what kind of things it provides to the layer above:
![[Functors]](functor.jpg)
The ellipses in this sketch are interfaces; the box is a component.
Given a collection of such components, a programmer can create systems
by linking together appropriate boxes. If the boxes are programmed to
their interfaces, they are easily interchangeable and re-usable.
Functors and Signatures
Here is the syntax of a functor declaration:
A Simple Example
Consider the following concrete example of a signature and a functor
definition:
Using Values from Structures
Inside of a functor we can refer to the pieces of an imported
structure in two ways:
Linking and Computing
After all the signatures and functors are defined, we can link the
components together. We start with the bottom layer, that is, with those
components that do not import other components:
Now that we have the 0th or bottom layer, we can create those components
that link to the bottom layer:
Since the language of components is a functional language, we can also
use nested function application to link the components together:
[Note: The application of a functor can have a side-effect for both type
checking and evaluation. It is therefore not just an option, but a
necessity to consider whether F2 should be applied once or twice. The
precise nature of the problem is beyond the scope of these notes.]
A Simple Example: Tables
Let us design a functor that creates environments for interpreters, i.e.,
tables that associate some form of variable with some form of value. An
interpreter for a statically typed programming language requires at least
two kinds of environments: one for mapping variables to run-time values
and another one for mapping type variables to types. So being able to
create environments several times, for different kinds of variables, is a
natural desire.
When we design functors, we start by considering its export
signature. An environment table supports two kinds of actions:
This sketch suggests the following outline of a signature:
The answer is straightforward in both cases. On one hand, to lookup a
variable in an environment table, we need to compare two variables, that
is, we need a comparison operation. In other words, to describe the
"domain" of a table we need a type and an operation. This means the
functor for creating tables imports a structure that specifies these two
things. On the other hand, the kind of values we can store in a table is
arbitrary. Thus we can use a type variable to stand for
Next we write down what kind of structures we expect as imports to the
functor. As mentioned already, the table functor must import a structure
that describes the domain elements. We also agreed that this structure
must specify a type and an equality operation on the type:
Now we are ready to specify the header of the functor for environment
tables:
Coding the definitions in the functor is a straightforward exercise:
Here are two functors for creating DOMAIN structures:
Linking functors together means to write down the intended functor
applications. In this example, the Var and TypeVar functors make up layer
0:
After linking, we are ready to evaluate some simple examples, e.g.,
extending an environment with a variable and a value and looking up the
same variable:
Refining Signatures, Propagating Types
The expression
From the functor definition we know that the type
The first step towards a solution is to reveal the nature of the
To solve this last problem, we need to add a Changing Components
At first glance, the problem seems to stem from our desire to hide the
exact nature of types. For the types In general, Opaque types increase re-usability. Keeping types opaque in
signatures forces programmers to use the published operations, not their
own functions on these types. Hence, we can easily change the
implementation of structures without affecting (not even
re-type-checking) those modules that import the modified structure.
The "Diamond Problem"
Consider the following situation:
An ML Blitz Course
For readers who need a quick refresher, here is a
quick and dirty summary by example of the first eight chapters.
Functors, Signatures, and Structures
Our description suggests that the templates for components are functions
that map components to components. More generally, the "programming
language of components" is a functional language whose values are
structures and whose types are signatures. In ML, these "functions" are
called functors; their interfaces are referred to as
signatures. The result of a functor application is a
structure, which, of course, also play the role of inputs to
other functors. The next few sections explain the precise role of
functors, signatures, and structures.
functor F(structure s1: SIG1; ...; structure s2: SIG2)
:> SIG
=
struct
...
end
The header specifies three parts:
The functor body is roughly a sequence of type, function, and value
definitions. These definitions may refer to the imported structures (see
below). They must define all those things promised in the export
signature SIG.
A signature is defined as follows:
signature SIG
=
sig
...
end
It introduces a name that stands for a bundle of type and value
specifications. When the ML type checker checks the type correctness
of a modular program, it uses the information in signatures to verify
that everything works out.
signature VARIABLES
=
sig
type var
val equal : var * var -> bool
end
functor Variables()
:> VARIABLES
=
struct
type var = string
fun equal(v1,v2) = (v1 = v2)
end
The header of the functor definition states that Variables
imports nothing and creates a structure with signature
VARIABLES. According to this signature, the functor exports
the type var and a function named equal that consumes a pair
of vars and produces a bool. It is easy to see
that the functor body indeed introduces these definitions. Indeed, from
the functor definition we can also deduce that var stands
for string, but of course the signature does not reveal
this fact.
Now let us take a look at the following definitions:
signature SD
=
sig
type term
end
functor StaticDistance(structure v : VARIABLES)
:> SD
=
struct
...
end
It sketches a functor whose single import is a structure with signature
VARIABLES. Inside of the functor's definition, we can only
assume that v specifies some type var, but
nothing else. In other words, we have completely hidden how variables are
implemented.
s.name
we refer to the entity called
name in import s. The notation only makes sense
if the signature for s specifies name.
open s
, all
definitions specified in the signature of s become lexically
bound. Adding this statement is dangerous if we don't know the entire
signature because it may shadow some other binding. Use it with care.
(* Layer 0 *)
(* ------- *)
structure s1 = F1()
structure s2 = F2()
These definitions create two structures s1 and s2 from the functors F1
and F2. Neither F1 nor F2 import any other components.
(* Layer 1 *)
(* ------- *)
structure s3 = F3(structure s = s1; structure t = s2)
structure s4 = F4(structure a = s3; structure t = s2)
...
This example assumes that both functors consume two components. Both
components are "passed by label" to the functors, i.e., the formal
parameter is explicitly connected to the argument. The first functor
imports s1 and s2; the second one is linked to
s3 (the result of applying F3 a first time) and
s1.
(* Layer 0 *)
(* ------- *)
structure s2 = F2() (* retains name because used more than once *)
(* Layer 1 *)
(* ------- *)
structure s4 = F4(structure a = F3(structure s = F1(); structure t = s2);
structure t = s2)
...
The structure resulting from F2() remains named because it is used
twice.
Since an extension already assumes an environment, the environment
component should also provide a basic environment: the empty
environment.
signature TABLE =
sig
type table
val empty : table
val extend : table * a-variable * a-value -> table
val lookup : table * a-variable -> a-value
end
The signature is incomplete because it doesn't specify what kind of types
a-variable and a-value are. For that part, we
need to think ahead and see whether these types are constrained in any
way or whether they can be arbitrary.
a-value. Here is the final signature for tables:
signature TABLE =
sig
type domain
type 'a table
val empty : 'a table
val extend : 'a table * domain * 'a -> 'a table
exception NotFound of domain
val lookup : 'a table * domain -> 'a
end
It says that a table is a parametric type. The parameter 'a
of the table type stands in for the kinds of values that a table can
store. Furthermore,
empty,
extend,
and
lookup
are polymorphic operations on this type. Finally, the signature includes
an exception NotFound because looking up a variable does not succeed if
the table is empty or, more generally, if the table does not contain an
association for the variable.
signature DOMAIN =
sig
type var
val equal : var * var -> bool
end
functor Table(structure domain : DOMAIN) :> TABLE
That is, the functor inputs a DOMAIN structure and creates a TABLE
structure, or, it imports items according to the DOMAIN specification and
exports items as specified by TABLE.
functor Table(structure d : DOMAIN)
:> TABLE
=
struct
type domain = d.var
type 'a table = domain -> 'a
exception NotFound of domain
val empty =
(fn n => raise NotFound(n))
fun extend(t,n,m) =
(fn i =>
if d.equal(n,i) then m
else t(i))
fun lookup(t,n) = t(n)
end
functor Var() :> DOMAIN
=
struct
type var = int
val equal = (op =)
end
functor TypeVar() :> DOMAIN
=
struct
type var = string
val equal = (op =)
end
The first one implements variables as integers ("static distance
coordinates"), the second as strings. In both case, the equality
operation is just equality.
structure var = Var()
structure type_var = TypeVar()
Layer 1 is created with two applications of the Table functor:
structure run_time = Table(structure domain = var);
structure tck_time = Table(structure domain = type_var);
run_time1.extend(run_time1.empty,1,true)
or
open run_time1;
extend(empty,1,true)
Both expressions should create a bool table. Unfortunately,
the type checking fails (in both both cases). To understand this problem,
we need to discuss the rules of typing components and propagating types
through functor applications.
run_time1.extend(run_time1.empty,1,true)
is type-checked according to the type information specified in the
signatures. The signature of run_time1.extend is TABLE and specifies the type of extend as
'a table * domain * 'a -> 'a table
The type checker can obviously set 'a to bool
but it cannot verify that 1 is of type domain. After all,
the signature only specifies that the type exists, it doesn't say what it
is.
domain is the same as the type var in the
functor's argument. From the functor applications we know that the
type var might be int or string.
And, of course, we can imagine other applications of Table
with yet more implementations of the type var. The type checker, however,
cannot infer this knowledge from just the signature of Table. Worse, it
can't even type check an expression like
var.equal(1,1)
because the signature of DOMAIN specifies only that var
exports a type var but not what it is. In short, our
signatures introduce a completely abstract type var, and it is
impossible for us at the moment to create any elements of that type.
var types in the functors Var and TypeVar. Since programmers
also want to specify the kernel of their signature DOMAIN only
once, ML provides the construct where for the refinement of
existing signatures. Here is how we can use it:
functor Var()
:> DOMAIN where type var = int
=
struct
... as above ...
end
functor TypeVar()
:> DOMAIN where type var = string
=
struct
... as above ...
end
Now the type checker can type var.equal(1,1) but it still
can't type expressions over runtime1.
where-clause
to Table's signature:
functor Table(structure d : DOMAIN)
:> TABLE where type domain = d.var
=
struct
... as above ...
end
It reveals that the exported type domain and the imported type
var (from structure d) are identical. The type checker
verifies this specification (by inspecting the functor body) and uses the
information to check expressions over the structure that results from
applying the functor. That is, we propagate types with where-clause
propagates as necessary.
var in DOMAIN and
domain in DOMAIN it is already obvious why we want to hide their
nature. They differ from case to case.
Although less obvious, the same is true for 'a table in
TABLE. Suppose we wish to implement tables via lists, not functions.
Because we didn't reveal anything about the implementation of tables,
we can actually do this without changing anything else in the program:
functor Table_via_list(structure d : DOMAIN)
:> TABLE where type domain = d.var
=
struct
type domain = d.var
type 'a table = (domain * 'a) list
val empty = []
fun extend(t,i,m) = (i,m)::t
exception NotFound of domain
fun lookup((i,m)::t,n)
= if d.equal(n,i)
then m
else lookup(t,i)
| lookup([],n) = raise NotFound(n)
end
We simply program a functor with the imports and exports, but a new body.
We can use the same name if we wish to replace all uses of the original
functor. Or we can use a new name if we wish to use both forms of tables,
say, for efficiency reasons, in the program.
![[Diamond Structure]](diamond.jpg)
makeA will use functions from b to produce inputs for
functions in c (and possibly vice versa). Indeed, this is often desirable
and necessary but to do this properly a programmer may have to specify
additional constraints on the relationship between imports.
The problem is best explained with a concrete example, let us study the
design of a binary search tree component. Recall that a binary search tree
is a binary tree that stores information in a sorted manner. More
specifically, each node contains a key and each leaf contains
some piece of information, which includes a key. Given some node,
all leafs to the left have a key that is less than the key in the node; all
leafs to the right have a key that is larger or equal than the key in the
node. Here is the definition of the appropriate datatype:
type key = ... some type with an order ... type info = ... some compound type, with keys ... datatype bst = Leaf of info | Node of bst * key * bstThe basic operations on binary search trees are creating a tree from some information, searching a tree for information that contains a specific tree, and inserting nodes into the tree:
fun create(i) = Leaf(i) fun insert(i,b) = ... fun lookup(k,b) = ...
The description suggests that the Bst functor imports two structures: one concerning information and another concerning ordered keys:
signature ORDERED =
sig
type ord
val equal : ord * ord -> bool
val less : ord * ord -> bool
end
signature INFO
=
sig
type info
type key
type values
val make : key * values -> info
val info_key : info -> key
val info_val : info -> values
end
signature BST
=
sig
type info
type key
type bst
val create : info -> bst
val insert : info * bst -> bst
val lookup : key * bst -> info
end
functor Bst_F(structure i : INFO;
structure o : ORDERED)
:> BST where type info = i.info and type key = i.key
= ...
Based on this outline, we can refine the code for, say,
lookup. When lookup encounters a leaf, it must
check whether the information contains the key we are looking for. If yes,
it return the information; otherwise it raises an exception:
exception NotFound fun lookup(k,Leaf(x)) = if o.equal(k,i.info_key(x)) then x else raise NotFound | lookup(k,Node(l,i,r)) = if o.less(k,i) then lookup(k,l) else lookup(k,r)The code for a node is equally straightforward.
The key to the function definition is that what i.info_key
returns is fed into o.equal, that is, the underlying type of
the ordered structure and the keys in an information structure must be
identical. If they aren't, we have a type conflict.
We could again use a where-clause to specify the desired
constraint for the type checker:
functor Bst_F(structure i : INFO; structure o : ORDERED where type o.ord = i.key)Of course, we could also swap the equation and demand
i.key =
o.ord. In either case, the type checker will ensure that the two
types are equal as it attempts to type check the functor.
In general, a functor might import many structures and three or more may
need to agree on a type. For this quite common situation, ML provides a
sharing-clause, which permits programmers to specify the
constraint in a succinct fashion. For the running example, the
specification is
functor Bst_F(structure i : INFO; structure o : ORDERED sharing type i.key = o.ord)In general,
sharing type might be followed by many equations,
e.g.,
a.atype = b.btype = c.ctype = ...[Note: The generativity of datatype makes it necessary to speak of type
identity (in the sense of Scheme's eq? rather than type equality (in the
sense of Scheme's equal? but, as mentioned above, this topic is beyond the
scope of the notes here.]
ML's designer clearly showed a lot of foresight in the early 1980s with
their careful design of program components and a language for linking
them. Software engineers only now re-discover these ideas, typically called
architectures or product lines (see Batory's talk). However, ML's module
language still does not accommodate re-use as much as "product line
programming" demands. There are three specific problems. First, ML makes
it difficult to re-use functors with signatures that weren't designed
together. Second, linking enforces a program organization that adheres to a
strictly hierarchical schema; as practice has shown, systems are often
cyclic graphs of components, not just dags. Finally, the language of
linking is a restrictive first-order functional language. Again, practice
suggests that on occasion "component programmers" want more power.
Renaming at Module Boundaries
Suppose some ML programmer has created a functor for integers:
A Critique of ML's Perspective
signature INTS
=
sig
type my_int
val zero : my_int
val succ : my_int -> my_int
val is_zero : my_int -> bool
val equal : my_int * my_int -> bool
val less : my_int * my_int -> bool
end
functor Ints() :> INTS
=
struct
datatype my_int = zero | succ of my_int
fun is_zero(zero) = true
| is_zero(succ(x)) = false
...
end
The functor's exports clearly implements values that semantically satisfies
the axioms for ORDERED -- except that the name of
the relevant type in INTS is my_int and that in ORDERED is
ord. Hence, we cannot directly use the structure created by
Ints where an ORDERED structure is needed. Instead, we must implement a
functor that adapts the Ints() structure to ORDERED:
functor Ints_to_Ord(structure i : INTS)
:> ORDERED
=
struct
open i
type ord = my_ints
end
structure ints = Ints()
structure ord_ints = Ints_to_ord(ints)
The functor imports an INTS structure, opens it, adds a type definition for
ord, and thus creates a structure that implementes ORDERED.
[Note 1: ML's full module language slightly simplifies this step, but the problem exists nevertheless.]
[Note 2: This functor implements the ML-equivalent of a GoF adapter pattern.]
Mutual References among Collections of Structures
In many cases, it is natural for a programmer to divide up type and function definitions among components so that they form a cyclic graph. ML does not accommodate mutual references during linking and thus forces programmer to merge all components with mutual references into one large module.
Let us briefly return to the realm of interpreters again. For a dynamically typed language, we only need one form of environment table. We might therefore specify tables with an explicit range type:
signature TABLE =
sig
type domain
type value
type table
val empty : table
val extend : table * domain * value -> table
val lookup : table * domain -> value
end
functor Table_F(...) :> TABLE = ...
Similarly, we might specify closures in a separate package:
signature CLOSURE =
sig
type code
type environment
type closure
make_closure : code * environment -> closure
type value
apply_closure : closure * value -> value
end
functor Closure(...) :> CLOSURE = ...
But clearly, a closure is one of many kinds of values and an environment is
a part of a closure. Thus, to create a closure structure, we
need a environment structure and vice versa. ML's rules
disallow this kind of natural division of labor.
A Full-Fledged Language for "Component Programming"
In many system development situations, it is necessary to link in a conditional manner. While a GUI is still under development, a programmer may wish to link a component to a text-based i/o module; indeed, for maintenance reasons, this capability should be maintained beyond the first phase. Hence, we should be able to write the following kind of expression:
structure foo = if gui_exist() then Foo_with_gui_sig(gui) else Foo_with_txt(txt)More generally, we might wish to program with components in a language that is pretty much a full-fledged programming language, perhaps even core ML itself. This would also mean that the programmer has to learn fewer syntactic constructs and has more liberties in composing systems.
This essay does not represent the (complete) viewpoint of traditional
MLers. For this story, you may wish to read Bob Harper's presentations on
SML and its Module Language.
Batory,
Product Line Architectures (PPT)
Felleisen and Flatt,
Component Programming with MzScheme (Draft)
Harper,
Programming in Standard ML
Harper,
Module Language
References