\documentstyle [11pt]{article}
\title{Demeter Demystified}
\author{Mitchell Wand}
\parskip =0.75ex plus 0.25ex minus 0.10ex

\begin{document}
\maketitle

\begin{abstract}
My understanding of what Karl's stuff is about, what it's good for, and what
its limitations are.
\end{abstract}

For the past eight years or so, Karl has been working on projects aimed at
making object-oriented programming less onerous.  In the process, he has
developed a programming model and vocabulary that newcomers often find
difficult.  In this document, I will give my understanding of these projects
and their significance.  

This is not a scholarly survey with proper citations.  It is instead
impressionistic and critical.  I will try throughout to be as accurate as I
can, but errors in fact or interpretation are of course mine, and not Karl's.

Karl's work seems to divide into four parts:

\begin{itemize}
\item The Demeter system
\item The Law of Demeter
\item Propagation Patterns 
\item Adaptive Programming
\end{itemize}

\section{The Demeter System}

The Demeter system is a program framework for implementing the hierarchical
data model in an object-oriented system.  It starts with the notion of a {\em
class graph} or {\em class dictionary graph}.

\begin{itemize}
\item In database terminology, a class graph is roughly a database schema in
the hierarchical model.
\item In Lisp terminology, it is roughly a {\tt DEFSTRUCT} with type
information. 
\item In SML terminology, it is roughly a {\tt datatype} definition.
\end{itemize}

For example, in SML one might declare such hierarchical data by

\begin{verbatim}
datatype Exp = Ident_Exp of Ident
  | Abstraction_Exp of (Ident * Exp)
  | Application_Exp of (Exp * Exp list)
\end{verbatim}

In SML, operations on such data would typically be specified by pattern
expressions.  In Pascal or C (or even in Scheme) one would generally implement
such data by records containing a discriminant or tag field.  Operations on
this data would dispatch on the tag field.

The Demeter system provides a tool for implementing such data in an
object-oriented language.  It does so by creating a class {\tt Exp} with
subclasses \verb|Ident_Exp|, \verb|Abstraction_Exp|, and \verb|Application_Exp|.
Instead of having a single procedure {\tt foo}, it associates a {\tt
foo} method with each subclass.  (In C++, this will be a virtual method
associated with the superclass {\tt Exp}).

The Demeter system provides a set of generic tools for use with this
implementation strategy, including generated parsers and unparsers, a
graphical front-end, etc.

The notion of a class dictionary graph in Demeter is somewhat richer than this
analogy would indicate.  Class graphs in Demeter add to the SML {\tt datatype}
definition the following capabilities:

\begin{enumerate}
\item SML datatype definitions define {\em types} (such as {\tt Exp}) and {\em
constructors} (such as \verb|Abstraction_Exp|).  In Demeter, the corresponding
entities are called {\tt alternation classes} and {\tt construction classes}.
Also, in Demeter, the data fields have names, corresponding to instance
variables.  Hence in Demeter one would write something like

\begin{verbatim}
Exp : Ident_Exp | Abstraction_Exp | Application_Exp .
Ident_Exp = <identifier> Ident .
Abstraction_Exp = "fn" <bound_var> Ident "=>" <body> Exp .
Application_Exp = <rator> Exp <rands> "(" List(Exp) ")" .
\end{verbatim}

The quoted strings specify the concrete syntax for the parser and unparser.
Demeter offers a variety of input syntaxes; this is the so-called ``concise'' syntax.

\item 
Demeter also allows instance variables to be associated with alternation
classes; such variables become instance variables of each of the alternatives.
This gives a flavor of inheritance.

\item
Demeter also allows alternation classes to have other alternation classes as
alternatives.  This allows the designer considerable flexibility in grouping
classes.  This also means that many different class graphs actually specify
the same set of objects.  One can decide whether two class graphs determine
the same objects by transforming the class graph until it uses no inheritance
(like a {\tt datatype} declaration) and removing any dead classes.  This can
be done by a sequence of local transformations.

\end{enumerate}

The Demeter system was originally implemented with Common Lisp Flavors as its
target language.  This system did not get wide usage, perhaps because it
offered little value above the existing {\tt defstruct}.  The Demeter
data-definition system was then ported to C++, where it got considerably more
attention, perhaps because programming in C++ is so difficult that almost
anything helps.  We have little data on the actual usage of Demeter/C++, and
still less data on how much of its usage outside Northeastern is due to the
data-definition facilities and how much is due to the propagation patterns
discussed below.

More recently, versions of Demeter have been developed for Stk and Perl5;
versions for Java and the Booch-Rumbaugh Unified Modeling Language UML are in
the works or contemplated.

Each of these implementations is a preprocessor that extracts the Demeter
definitions and generates appropriate segments of code in the target language;
Demeter does not attempt to process code written in the target language, such
as the bodies of methods.

\section{The Law of Demeter}

In some object systems, instance variables and methods are assumed to be
private; in others, they are assumed to be public.  The Law of Demeter was an
attempt to formulate a rational notion of visibility that would bridge this
gap.  This was claimed to increase program coherence, so that knowledge of the
internals of a class would be restricted to that class and its neighbors, even
if the language would permit references in other contexts.

There were various versions of the Law: strong vs.\ weak, run-time vs.\
compile-time.  The Law of Demeter appears to be quite independent of the
Demeter system described above.  

\section{Propagation Patterns}

Demeter's underlying data model is that of a directed graph whose nodes are
either atomic data or are labelled by their classes (the so-called {\em
constructor classes}), and where the edges are labelled by instance variables.
This graph is called the {\em object graph}.  A class dictionary determines
which such graphs are legal in any application.

One often wishes to traverse some part of an object graph, perhaps performing
some operation on the nodes as they are traversed.  Lieberherr observed that
often one could specify such traversals by simply specifying the classes of
the source and target nodes.  Such a traversal specification is called a {\em
propagation directive}.  The traversal specification determines a spanning
tree of a subset of the object graph.  This tree is then traversed
depth-first, and actions can be performed at each node.  The action to be
performed at each node is called its {\em wrapper}, and is determined by the
class of the node.  Because the program executes a depth-first tree traversal,
we can distinguish the prefix and postfix visits to a node; hence we have
prefix and postfix wrappers.  Actions can also be associated with edges.  The
entire specification, including the propagation directive and the wrappers, is
called a {\em propagation pattern}.  (I find the terminology of propagation
patterns and wrappers unmotivated at best; I don't know if others find them
equally discomforting).

Thus, to print out all the variables in an expression, we might write
something like:
\begin{verbatim}
*operation* print_vars ()
 *traverse* *from* Exp *to* Ident_Exp
 *wrapper* Ident_Exp
  *prefix* (@ print (self.ident) @)
\end{verbatim}
Here we have written the wrapper in pseudo-code, since Demeter does not make a
commitment about the target language.

To print out all the free variables, we might write
\begin{verbatim}
*operation* print_free_vars (bound_vars)
 *traverse* *from* Exp *to* Ident_Exp
 *wrapper* Ident_Exp
  (@ if !(member (self.ident, bound_vars) 
     then print (self.ident); @)
 *wrapper* Abstraction_Exp
  *prefix* 
  (@ bound_vars = cons (self.bound_var, bound_vars) @)
  *suffix*
  (@ bound_vars = cdr (bound_vars) @)
\end{verbatim}

Thus a propagation pattern is a program framework: it constructs a control
structure into which the programmer can insert bits of code.  The
computational model of this framework is that of depth-first traversal of a
spanning tree; the set of paths determines a tree that is traversed in the
usual order, with the appropriate actions taken on the prefix and postfix
visits to each node.

The propagation pattern tool takes a propagation pattern (traversal
specification) and a class graph (schema) and produces a program that will
perform the traversal on any object graph of the schema.

Full-blown propagation patterns have additional features not shown above:

\begin{itemize}
\item The traversal specifications may be more complex than shown above:  they
may include union, concatenation, etc., of paths, and the ability to specify
that certain edges are not to be included.

\item The operations may also include {\em transportation directives} that
allow additional bits of state to be carried along the traversal.
\end{itemize}

The propagation pattern framework has several shortcomings:

\begin{itemize}
\item Its formulation in terms of paths is somewhat misleading.  While
propagation directives are formulated in terms of a set of paths, it is more
intuitive to think of the paths as determining a spanning tree for a
contiguous set of nodes in the graph; then the tree is traversed, with the
prefix and postfix actions performed at the appropriate times in the
traversal.

\item If the node in the tree has multiple branches, it is not clear how to
specify actions to take at intermediate visits to the node, or how to order
visits to the subtrees.  Thus, in our examples above, it is not clear that the
variables will be printed out in the same order in which they occur in the
program.  (The implementation always allows escape into the full target
language, so no program is excluded, but the formalism itself does not support
specification of the order of the depth-first traversal.)


\item 
The semantics of a traversal specification on a graph that changes during
traversal has been the subject of considerable debate within the Demeter
group.  Similarly, the semantics of a propagation pattern acting on a graph
structure is arbitrary.

\item 
The framework is essentially imperative.  It is not so clear how to do more
complicated tasks that are easily expressed recursively, such as annotating a
parse tree, or constructing a copy of a parse tree with each node annotated by
its set of free variables.  Such operations would seem at best to require
hand-simulation of the data structures involved in the recursion; the
manipulation of \verb|bound_vars| is simple enough, but more complicated
examples seem to require uncomfortably much hand-coding.

\end{itemize}


\section{Adaptive Programming}

Because a propagation pattern usually specifies the classes of only the
starting and ending nodes, the propagation pattern can be used on object
graphs from a variety of class graphs (schemas).  For example, the propagation
patterns above would continue to work if we added a conditional expression to
our language by writing

\begin{verbatim}
Exp : Ident_Exp | Abstraction_Exp | Application_Exp | Conditional_Exp .
Conditional_Exp = "if" <test_exp> Exp
                  "then" <true_exp> Exp 
                  "else" <false_exp> Exp .
\end{verbatim}

This observation is why Karl calls programming with propagation patterns {\em
adaptive programming}:  the propagation patterns adapts to changes in the
schema without modification.

A propagation pattern may be thought of as polymorphic in the ``type'' of
the graphs that it may be called upon to traverse.  Karl sometimes uses the
term {\em customizer} to denote a class graph when it used as input to a
propagation pattern.  (I find this terminology confusing; I would have
expected the customizer to be the program that takes a propagation pattern and
a class graph and produces a customized traversal program.)  

A fundamental thesis of Karl's recent work is that this kind of polymorphism
is useful.  This is a pragmatic question, about which we have little data.
While the system has been used intensively (most importantly for its own
development, of course), there has been no attempt to monitor and quantify the
steps by which schemata and design have evolved.  Such quantitative measures
would help determine the importance of this particular kind of adaptiveness.

It is likely that many evolutionary schema changes involve adding small bits
of data or additional cases similar to existing ones, like adding a
conditional expression to our example.  With such changes, the program will
continue to do {\em something} with the new data, but it is not clear whether
what it does will be correct.  For instance, consider adding a {\tt
let}-expression to our example:
\begin{verbatim}
Let_Exp = "let" <bound_var> Ident "=" <rhs> Exp "in" <body> Exp
\end{verbatim}
Even though we continued to use the instance variable \verb|bound_var| for the
bound variable (so that the \verb|*bypassing* bound_var| directive is still
appropriate), we need to add detailed knowledge about the new construct for
the program to work correctly (i.e., the variable is bound in the body but not
the rhs).  It seems just as likely that most schema changes involve this kind
of new knowledge.  It would seem that capturing such knowledge is a key to
change management.  
	
I have considered adaptiveness separately from the propagation pattern
mechanism because I believe they really are separate.  It is entirely possible
that propagation patterns will be judged to be a worthwhile programming
construct even if it turns out that the claim of adaptiveness is unjustified;
and it is also possible that the study of adaptiveness or resilience in the
face of change will be worthwhile even if propagation patterns are not widely
adopted. 

\end{document}

