\documentstyle [11pt]{article} \title{Demeter Demystified} \author{Mitchell Wand} \parskip =0.75ex plus 0.25ex minus 0.10ex \begin{document} \maketitle \begin{abstract} My understanding of what Karl's stuff is about, what it's good for, and what its limitations are. \end{abstract} For the past eight years or so, Karl has been working on projects aimed at making object-oriented programming less onerous. In the process, he has developed a programming model and vocabulary that newcomers often find difficult. In this document, I will give my understanding of these projects and their significance. This is not a scholarly survey with proper citations. It is instead impressionistic and critical. I will try throughout to be as accurate as I can, but errors in fact or interpretation are of course mine, and not Karl's. Karl's work seems to divide into four parts: \begin{itemize} \item The Demeter system \item The Law of Demeter \item Propagation Patterns \item Adaptive Programming \end{itemize} \section{The Demeter System} The Demeter system is a program framework for implementing the hierarchical data model in an object-oriented system. It starts with the notion of a {\em class graph} or {\em class dictionary graph}. \begin{itemize} \item In database terminology, a class graph is roughly a database schema in the hierarchical model. \item In Lisp terminology, it is roughly a {\tt DEFSTRUCT} with type information. \item In SML terminology, it is roughly a {\tt datatype} definition. \end{itemize} For example, in SML one might declare such hierarchical data by \begin{verbatim} datatype Exp = Ident_Exp of Ident | Abstraction_Exp of (Ident * Exp) | Application_Exp of (Exp * Exp list) \end{verbatim} In SML, operations on such data would typically be specified by pattern expressions. In Pascal or C (or even in Scheme) one would generally implement such data by records containing a discriminant or tag field. Operations on this data would dispatch on the tag field. The Demeter system provides a tool for implementing such data in an object-oriented language. It does so by creating a class {\tt Exp} with subclasses \verb|Ident_Exp|, \verb|Abstraction_Exp|, and \verb|Application_Exp|. Instead of having a single procedure {\tt foo}, it associates a {\tt foo} method with each subclass. (In C++, this will be a virtual method associated with the superclass {\tt Exp}). The Demeter system provides a set of generic tools for use with this implementation strategy, including generated parsers and unparsers, a graphical front-end, etc. The notion of a class dictionary graph in Demeter is somewhat richer than this analogy would indicate. Class graphs in Demeter add to the SML {\tt datatype} definition the following capabilities: \begin{enumerate} \item SML datatype definitions define {\em types} (such as {\tt Exp}) and {\em constructors} (such as \verb|Abstraction_Exp|). In Demeter, the corresponding entities are called {\tt alternation classes} and {\tt construction classes}. Also, in Demeter, the data fields have names, corresponding to instance variables. Hence in Demeter one would write something like \begin{verbatim} Exp : Ident_Exp | Abstraction_Exp | Application_Exp . Ident_Exp = Ident . Abstraction_Exp = "fn" Ident "=>" Exp . Application_Exp = Exp "(" List(Exp) ")" . \end{verbatim} The quoted strings specify the concrete syntax for the parser and unparser. Demeter offers a variety of input syntaxes; this is the so-called ``concise'' syntax. \item Demeter also allows instance variables to be associated with alternation classes; such variables become instance variables of each of the alternatives. This gives a flavor of inheritance. \item Demeter also allows alternation classes to have other alternation classes as alternatives. This allows the designer considerable flexibility in grouping classes. This also means that many different class graphs actually specify the same set of objects. One can decide whether two class graphs determine the same objects by transforming the class graph until it uses no inheritance (like a {\tt datatype} declaration) and removing any dead classes. This can be done by a sequence of local transformations. \end{enumerate} The Demeter system was originally implemented with Common Lisp Flavors as its target language. This system did not get wide usage, perhaps because it offered little value above the existing {\tt defstruct}. The Demeter data-definition system was then ported to C++, where it got considerably more attention, perhaps because programming in C++ is so difficult that almost anything helps. We have little data on the actual usage of Demeter/C++, and still less data on how much of its usage outside Northeastern is due to the data-definition facilities and how much is due to the propagation patterns discussed below. More recently, versions of Demeter have been developed for Stk and Perl5; versions for Java and the Booch-Rumbaugh Unified Modeling Language UML are in the works or contemplated. Each of these implementations is a preprocessor that extracts the Demeter definitions and generates appropriate segments of code in the target language; Demeter does not attempt to process code written in the target language, such as the bodies of methods. \section{The Law of Demeter} In some object systems, instance variables and methods are assumed to be private; in others, they are assumed to be public. The Law of Demeter was an attempt to formulate a rational notion of visibility that would bridge this gap. This was claimed to increase program coherence, so that knowledge of the internals of a class would be restricted to that class and its neighbors, even if the language would permit references in other contexts. There were various versions of the Law: strong vs.\ weak, run-time vs.\ compile-time. The Law of Demeter appears to be quite independent of the Demeter system described above. \section{Propagation Patterns} Demeter's underlying data model is that of a directed graph whose nodes are either atomic data or are labelled by their classes (the so-called {\em constructor classes}), and where the edges are labelled by instance variables. This graph is called the {\em object graph}. A class dictionary determines which such graphs are legal in any application. One often wishes to traverse some part of an object graph, perhaps performing some operation on the nodes as they are traversed. Lieberherr observed that often one could specify such traversals by simply specifying the classes of the source and target nodes. Such a traversal specification is called a {\em propagation directive}. The traversal specification determines a spanning tree of a subset of the object graph. This tree is then traversed depth-first, and actions can be performed at each node. The action to be performed at each node is called its {\em wrapper}, and is determined by the class of the node. Because the program executes a depth-first tree traversal, we can distinguish the prefix and postfix visits to a node; hence we have prefix and postfix wrappers. Actions can also be associated with edges. The entire specification, including the propagation directive and the wrappers, is called a {\em propagation pattern}. (I find the terminology of propagation patterns and wrappers unmotivated at best; I don't know if others find them equally discomforting). Thus, to print out all the variables in an expression, we might write something like: \begin{verbatim} *operation* print_vars () *traverse* *from* Exp *to* Ident_Exp *wrapper* Ident_Exp *prefix* (@ print (self.ident) @) \end{verbatim} Here we have written the wrapper in pseudo-code, since Demeter does not make a commitment about the target language. To print out all the free variables, we might write \begin{verbatim} *operation* print_free_vars (bound_vars) *traverse* *from* Exp *to* Ident_Exp *wrapper* Ident_Exp (@ if !(member (self.ident, bound_vars) then print (self.ident); @) *wrapper* Abstraction_Exp *prefix* (@ bound_vars = cons (self.bound_var, bound_vars) @) *suffix* (@ bound_vars = cdr (bound_vars) @) \end{verbatim} Thus a propagation pattern is a program framework: it constructs a control structure into which the programmer can insert bits of code. The computational model of this framework is that of depth-first traversal of a spanning tree; the set of paths determines a tree that is traversed in the usual order, with the appropriate actions taken on the prefix and postfix visits to each node. The propagation pattern tool takes a propagation pattern (traversal specification) and a class graph (schema) and produces a program that will perform the traversal on any object graph of the schema. Full-blown propagation patterns have additional features not shown above: \begin{itemize} \item The traversal specifications may be more complex than shown above: they may include union, concatenation, etc., of paths, and the ability to specify that certain edges are not to be included. \item The operations may also include {\em transportation directives} that allow additional bits of state to be carried along the traversal. \end{itemize} The propagation pattern framework has several shortcomings: \begin{itemize} \item Its formulation in terms of paths is somewhat misleading. While propagation directives are formulated in terms of a set of paths, it is more intuitive to think of the paths as determining a spanning tree for a contiguous set of nodes in the graph; then the tree is traversed, with the prefix and postfix actions performed at the appropriate times in the traversal. \item If the node in the tree has multiple branches, it is not clear how to specify actions to take at intermediate visits to the node, or how to order visits to the subtrees. Thus, in our examples above, it is not clear that the variables will be printed out in the same order in which they occur in the program. (The implementation always allows escape into the full target language, so no program is excluded, but the formalism itself does not support specification of the order of the depth-first traversal.) \item The semantics of a traversal specification on a graph that changes during traversal has been the subject of considerable debate within the Demeter group. Similarly, the semantics of a propagation pattern acting on a graph structure is arbitrary. \item The framework is essentially imperative. It is not so clear how to do more complicated tasks that are easily expressed recursively, such as annotating a parse tree, or constructing a copy of a parse tree with each node annotated by its set of free variables. Such operations would seem at best to require hand-simulation of the data structures involved in the recursion; the manipulation of \verb|bound_vars| is simple enough, but more complicated examples seem to require uncomfortably much hand-coding. \end{itemize} \section{Adaptive Programming} Because a propagation pattern usually specifies the classes of only the starting and ending nodes, the propagation pattern can be used on object graphs from a variety of class graphs (schemas). For example, the propagation patterns above would continue to work if we added a conditional expression to our language by writing \begin{verbatim} Exp : Ident_Exp | Abstraction_Exp | Application_Exp | Conditional_Exp . Conditional_Exp = "if" Exp "then" Exp "else" Exp . \end{verbatim} This observation is why Karl calls programming with propagation patterns {\em adaptive programming}: the propagation patterns adapts to changes in the schema without modification. A propagation pattern may be thought of as polymorphic in the ``type'' of the graphs that it may be called upon to traverse. Karl sometimes uses the term {\em customizer} to denote a class graph when it used as input to a propagation pattern. (I find this terminology confusing; I would have expected the customizer to be the program that takes a propagation pattern and a class graph and produces a customized traversal program.) A fundamental thesis of Karl's recent work is that this kind of polymorphism is useful. This is a pragmatic question, about which we have little data. While the system has been used intensively (most importantly for its own development, of course), there has been no attempt to monitor and quantify the steps by which schemata and design have evolved. Such quantitative measures would help determine the importance of this particular kind of adaptiveness. It is likely that many evolutionary schema changes involve adding small bits of data or additional cases similar to existing ones, like adding a conditional expression to our example. With such changes, the program will continue to do {\em something} with the new data, but it is not clear whether what it does will be correct. For instance, consider adding a {\tt let}-expression to our example: \begin{verbatim} Let_Exp = "let" Ident "=" Exp "in" Exp \end{verbatim} Even though we continued to use the instance variable \verb|bound_var| for the bound variable (so that the \verb|*bypassing* bound_var| directive is still appropriate), we need to add detailed knowledge about the new construct for the program to work correctly (i.e., the variable is bound in the body but not the rhs). It seems just as likely that most schema changes involve this kind of new knowledge. It would seem that capturing such knowledge is a key to change management. I have considered adaptiveness separately from the propagation pattern mechanism because I believe they really are separate. It is entirely possible that propagation patterns will be judged to be a worthwhile programming construct even if it turns out that the claim of adaptiveness is unjustified; and it is also possible that the study of adaptiveness or resilience in the face of change will be worthwhile even if propagation patterns are not widely adopted. \end{document}