\section{Introduction}
Object-oriented programming languages are growing in popularity in
the commercial system building world.  In a recent survey\footnote{Mass High Tech trade paper April, 89} of Massachusetts
software companies, 6\% of the participating
companies 
are currently using \oo\ programming languages. 
The companies estimate this figure to increase to 16\% within a year.
It is fully expected that
this order of growth
will continue.  

This growth has come about due to the increased acceptance of C++, Smalltalk, 
Objective-C and Eiffel by the applications systems community, and the 
development of CLOS by the symbolic computing communities.  However, because
of the relative youth of these languages compared to Fortran, COBOL and Lisp,
theories of program design, software engineering and programming style have not
yet been fully developed.  Some of the ideas and theories developed for
procedure-oriented programming languages will carry forward into  the
\oo\ realm, especially those concerning data types, data abstraction and
encapsulation.  However, the move from a procedure/function  oriented language,
like C, to an \oo\ language, like C++, is not a small one. It is
more than a change in programming style (e.g., going from Pascal to C), it
is a change of programming paradigm.

For the past four years, we have been using \oop\ techniques in our
research and our undergraduate and graduate teaching.  We have been primarily
involved in the development of the Demeter system \cite{LHLR:law-paper} 
\cite{lieber-riel:oop} \cite{karl1:class}, first
using Lisp/Flavors and now using C++.  The Demeter system is an \oo\ CASE
tool.  This system is a testbed for our work on \oo\ design and
programming, \oo\ software engineering and the development of a high level
\oo\ representation language.  Our current work focuses on the issues of
preventive maintenance.  By preventive maintenance\footnote{In 
\cite{pressman:software-87} the term `preventive maintenance' 
is attributed to Miller \cite{miller:syst-maint}.}, we mean the extra
work done to help reduce the cost of future maintenance of the system.
We are looking at ways to reduce the impact of local changes to the system,
due to maintenance, on
the rest of the system.  This is an important issue since, as the `First
Law of System Engineering' \cite{bersoff:config-manag} states:

\begin{quote}
{No matter where you are in the system life cycle, the system will
change, and the desire to change it will persist throughout the life
cycle.}
\end{quote}

We are looking at this problem from two perspectives:
\begin{itemize}
\item 
Programming style issues: Guidelines for programmers to follow while writing
\oo\ code. The guidelines help reduce the cost of maintenance.  
One such guideline is
the Law of Demeter, described below.
\item
CASE tools:  Utilities provided by the CASE environment to help create
generic software and utilities to automate aspects of software maintenance.
We describe below some of the issues concerning the development and use
of these tools.
\end{itemize}

The heart of the Demeter system is a grammar-based notation,
called the \cm\ notation which we
use to describe our \oo\ application designs.  This notation allows us 
to express our \oo\ ideas in a programming language independent way.
It also forms the basis for the tools described below.

The Demeter approach to \oo\ software development has many advantages
for software maintenance:
\begin{itemize}

\item Separation of structure and processing

Maintenance is simplified since the structural aspects of classes can be 
maintained separately from code. 

\item Class evolution

The structure of most classes changes (evolves) over the lifetime of a system.
The class module notation captures the structure of classes and therefore
it provides a good framework to study structural class evolution with the
goal to automate important but tedious parts of the maintenance
process.

\item High-level, yet precise

Class definitions are written at a high level of abstraction.
They are programming language independent and they
are free of low-level details such as pointers.
Program maintenance starts at the \cm\ level where unimportant
details are hidden.

\item Subclassing

The \cm\ notation distinguishes between at least six fundamentally different
kinds of subclassing. This distinction helps during the maintenance
process.

\item Language = class analogy

The \cm\ notation helps to
arrive at class definitions through language analogy
since each class describes a formal language.
Sentences of the formal language can be automatically parsed
into objects, making it easy to describe self-explaining
test data if the appropriate
``syntactic sugar'' is used in the \cm\ definition.

\item Free Software

By ``free'' software we mean software which is either generated or
provided generically by the Demeter system.
A \cm\ defines free software 
which 
is several factors larger than
the user-written \cm\ description and
which is tedious to write by hand.
Instead of maintaining the software directly, it is easier to maintain
the smaller \cms.
The software can be regenerated easily without cost.

\item Uniform interface for application development

The \cm\ notation 
provides a uniform interface for a large number 
of object-oriented systems and for a fixed system it provides
a uniform interface for every application. After using
one application environment provided by a \cm\ description, it is easy to 
reuse this knowledge for the next application environment.
The programming language independent nature of the \cm\ notation
simplifies the maintenance process when several programming languages
are used simultaneously.

\item Software maintenance using meta-programming

The class module approach to software development facilitates metaprogramming
which is known for its benefits to software maintenance \cite{cameron:reuse-83}.
For metaprogramming, say of Pascal programs, we use a class module 
which defines the syntax of Pascal.
\end{itemize}
\section{Programming style}

In this section we look at the low-level aspects of software maintenance.
That is, the problem of taking an existing piece of code and changing it
so that it conforms to a new design requirement or corrects a bug.  
We will not be concerned with the processes that produced the design
requirement or the process of debugging the original program. 

Unfortunately, one small change in a program may accomplish
the desired task, but it may introduce many more new problems.  We are
interested in how programming style can reduce the impact of 
change and therefore reducing maintenance costs.  Before introducing the
problems involved with an \oo\ program, it is useful to examine
the procedure-oriented case.

A change to a program written in a procedural language,
such as Pascal or C, can fall into one of
the following categories:
\begin{itemize}
\item
Removing or changing the declaration of a variable:
If the variable is declared locally within a procedure, the programmer 
only has to consider the effect of the change on the procedure's code.
If the variable is global, then the change could potentially effect the
entire program.  The scope of the variable defines the space of potential
impact.  The use of global variables has long been considered harmful.
\item
Changing a function or procedure:
If a function or procedure is changed, the change may affect
every other procedure or function that uses it.  The change may
invalidate an assumption made about the role of the changed function.
The space of potential
impact is the set of functions or procedures which call the changed
function.  If procedure {\tt A} calls procedure {\tt B}, then we say 
that {\tt A} is
a client of {\tt B} (Figure 1).  This creates a dependency link between {\tt A }
and {\tt B}. {\tt A}
depends on {\tt B} and therefore when {\tt B} changes, {\tt A} may be affected.
If a large number of functions depend on {\tt A } then extra care needs to
be taken when A is changed during a maintenance phase.  Existing
reverse engineering tools analyze procedure/function calls to produce
a call graph which can be displayed graphically.
\begin{figure}
\vspace{2.5in}
\caption{Client}
\end{figure}
\item
Changing the implementation of a data structure:  
A change of this kind affects all the code that made assumptions about
the implementation details of the data structure.  The number of functions
or procedures making these assumptions needs to be kept at a minimum.  This
is a simple restatement of the well-known principles of data hiding and
encapsulation of implementation details.  Adopting a modular programming
style which follows these principles can greatly reduce the cost of
future maintenance.
\item
Removing or changing a data structure:
This can have a very large impact on the existing software. The space of
potential impact is the set of functions or procedures that use the interface of
the data structure.  However, the fundamental structure of a program will
change when an important data structure is removed or changed radically.
\end{itemize}

In a procedure-oriented environment, function names are unique and 
the programmer has only to be concerned with either local or
global variable declarations.  This makes the rewriting task relatively
easy since the program text provides all the information necessary. 
If multiple source files are involved, then the question of locating a
function definition becomes only slightly more difficult.

In an \oo\ environment, the situation changes drastically.
Functions\footnote{Functions attached to classes are sometimes called
`methods'.} are defined as part of a class.
%\cite{stefik-bobrow:oop}
%\cite{meyer:book-88} \cite{moon:flavors}.  
This means that the functions are intimately connected to the data
structures.  Classes defined in an \oo\ program are related
together in an inheritance hierarchy and classes are defined in terms of
other classes.  These two relationships greatly influence the scope
of variables and functions.
\begin{itemize}
\item
A variable used in a function {\tt F} attached to a class {\tt C} 
can be a local
variable declared in {\tt F}
, a global variable, a data member\footnote{The term
`data member' comes from C++; other terms used for the same concept
are `instance variable', `slot' and `attribute'.}
defined as part of {\tt C}, or
a data member which {\tt C} inherits from another class.  Changing a local or
global variable declaration has the same effect as the above.  However,
if a data member of a class {\tt C}
needs to be removed or changed, then the space
of potential impact is the set of functions attached to {\tt C} and the
functions attached classes which inherit from {\tt C}.  If the details of the
data members of classes are not hidden, then the code can become 
expensive to maintain.  To prevent this kind of problem, classes should 
provide public access functions to manipulate private data member details.

\item
Function names do not have to be unique in most \oop\ languages.  A class
{\tt A} and a class {\tt B} may have two different functions, both called 
{\tt F} attached
to them.  If {\tt A} and {\tt B} are unrelated then a call to {\tt F}
will be easily recognized as a call to either {\tt A}'s {\tt F} or
{\tt B}'s {\tt F} from the text of the program.  However, if
{\tt A} inherits from {\tt B}, {\tt B} inherits from {\tt A}, or
{\tt A} and {\tt B} inherits from a class {\tt C}, then it may not
be clear from the program text, which body of code will be executed
when {\tt F} is called.
Indeed,
this can only be resolved at run-time.  This makes the task of tracing
the impact of change more complicated.
%\item
%Function names do not have to be unique in most \oop\ languages.  A class
%{\tt C} and a class {\tt B} may have two different functions, both called 
%{\tt F} attached
%to them.  Since inherited functions can be overridden by functions attached
%to classes lower down in the inheritance hierarchy, it often is impossible
%to know which function will be executed by only looking at the text of 
%the program.  Indeed
%these decisions can only be resolved at run-time.  This means
%that the dependency links between the functions are not explicit.  

\item
Classes are used to implement data structures in an \oop\ system.
Changing the implementation details of a class, i.e., the data members of
the class and the functions attached to the class, has the same
consequences as outlined in the procedure-oriented case above.  Using
public access functions to hide the data members is one way to limit
the impact of a change, minimizing the public interface to a class
is another, and
the Law of Demeter, described below, is yet a third.
\item
Changing the class inheritance hierarchy or the class part-of hierarchy
can have a drastic affect on existing code because the code is so
intimately bound to the classes.  However, classes frequently need
updating during the prototype and maintenance phases of an \oo\ 
application system.  The tools described below regarding class
evolution will play an important role in reducing the cost of this
kind of maintenance.
\end{itemize}

\subsection{Looking at dependencies}

In the procedure-oriented case, if function {\tt A} calls function {\tt B}, we
called  {\tt A} a client of {\tt B}.  In the \oo\ case we extend the
client relationship to classes.  If a class {\tt C1} has a function  {\tt A} 
which
calls a function  {\tt B} attached to class {\tt C2}, we call 
class  {\tt C1} a client of
class  {\tt C2} and  {\tt C2} is a supplier to  {\tt C1} (Figure 2).
Similar client/supplier relationships
are used in \cite{meyer:book-88}.
\begin{figure}
\vspace{3.25in}
\caption{Client class}
\end{figure}

\begin{definition}
If a class  {\tt C1} is a client of a class  
{\tt C2}, then C1 {\em depends} on  {\tt C2}.
\end{definition}

If the functions of a class change in any way, there is a potential impact
on all the clients of the class.  


It is, therefore,
important to minimize the number of suppliers to a class.  It is 
also important to maximize code reuse and minimize code
duplication.  
%In many cases it will be impossible to achieve all of these
%goals. 

To call a function attached to a class, a message, consisting of the function
name and actual parameters, is sent to an object which is a member of that class.  For example, if object {\tt ic1} is an instance of class {\tt C1}, then the
following code fragments send {\tt ic1} the message, `execute function A':
in C++ syntax -
    \verb|ic1->A();|
in Lisp/Flavors syntax -
    \verb|(send ic1 :A)|.
The object receiving the message is passed as an implicit 
actual parameter to the function.  The {\tt ic1} object can be sent messages
referring to functions attached directly to class {\tt C1} or to functions
inherited by {\tt C1}.  Therefore, in order to reduce the set of 
functions called
within a given function, we need to reduce the set of objects which are
allowed to receive messages in the function.

In \cite{benefits:law-89} \cite{LHLR:law-paper} we introduced the programming
guideline called the
Law of Demeter.  The initial motivation behind the Law was to ensure proper
encapsulation and data hiding of class implementation details.  As discussed
above, these are the important issues in software maintenance.

Before stating the Law, a few definitions are needed.  

The {\it protocol} of
a class {\tt C} is the set of messages that can be sent to {\tt C} and its
instances.

A class {\tt C1} is an {\it acquaintance class} of a given function 
{\tt A} attached to class {\tt C2}, if {\tt C1}'s protocol is used in 
{\tt A}, and {\tt C1} is not the same as {\tt C2}, or an argument class of
{\tt A} or the class of a data member of {\tt C2}.

\centerline {\it Law of Demeter}
\nopagebreak
{\bf Minimize the number of acquaintance classes over all
functions}.\footnote{We say that a program is in {\em good style}
if the number of acquaintance classes which it uses is close to
the minimum needed. For a more detailed discussion of good style
and its implications, see \cite{LHLR:law-paper}, where the Law
is formulated for individual languages.}

This Law is not intended to stand alone, but has to be considered
in the context of other design and programming principles.
The following illustrates the effect the Law has on the 
dependencies between classes.

The application to be implemented is a simple interpreter which can
evaluate prefix expressions, such as {\tt 
(+ 6 (+ 8 (* 3 2)))
}.
The classes are defined in Demeter notation which is explained
below:
\begin{verbatim}
Example = <exps> Prefix_list.
Prefix_list ~ {Prefix}.
Prefix : Number | Compound.
Compound = "(" <op> Oper 
	       <args> Prefix_list ")".
Oper : MulSym | AddSym.
MulSym = "*".
AddSym = "+".
\end{verbatim}
This is called a class dictionary, and it defines the classes 
{\tt Example, Prefix\_list, Prefix, Compound, Oper, MulSym} and {\tt AddSym}.


Each class is defined by exactly one production.
The modified BNF of Demeter uses three kinds of productions, each of which 
has its own symbol to connect the left-hand-side of the production with 
the right-hand-side. 
{\em Construction} productions, such as the one 
for {\tt Compound},
use {\verb"="}, and their right-hand-sides are basically sequences of nonterminals
or terminal symbols.
Angle-bracketed terms, such as
\verb|<|{\tt args}\verb|>| provide official names for
the objects they are next to.
These label-names become
essential if an object has two constituents of the same type.
Construction productions use square brackets to indicate
that an item is optional.
{\em Alternation} productions connect the left-hand-side to
the right-hand-side via the symbol {\verb":"}. {\tt Prefix} 
is such a production.
Basically, the right-hand-side of an alternation is just a group of 
alternatives -- in the case of {\tt Prefix}, two alternatives.
The third type of Demeter production is the {\em repetition}
production. It uses
the symbol \verb|"~"|, and amounts to the operation * or + for regular expressions.
Specifically, the production for {\tt Prefix\_list} 
indicates that a {\tt Prefix\_list} object consists of zero
or more {\tt Prefix} objects.

In these productions, quoted strings such as {\verb|"("|} are
concrete syntax. Demeter \cds\ which
define automatically readable objects, must be LL(1), so concrete syntax can help
disambiguate our grammars, as well as making it easier to read the files 
containing the ASCII representation of objects (namely, the strings of 
the language specified by the grammar). 
We do not give a production for {\tt Number}, 
since this is a basic class.

The {\tt Example} class has one data member called {\tt exps} which is 
an instance of class {\tt Prefix\_list}.  The instances of 
the {\tt Prefix\_list} class are lists of zero or more {\tt Prefix} objects.
To implement this class in C++, we follow the list examples of
\cite{stroustrup:c++} and use an iterator class called {\tt Prefix\_list\_it}.  
The  {\tt Prefix} class is an abstract class.  
%There will be no instances of this class.  Instead, 
It is used as a
base class for the classes {\tt Number} and {\tt Compound}, i.e.,
the two latter classes inherit from {\tt Prefix}.
The {\tt Compound} class has two data members, the first is for the operator
(either '+' or '*') 
called {\tt op}, and the second is for a list of arguments to the
operator called {\tt args}.

There are several other features used in Demeter \cds\, including inheritance
and parameterization. These add functionality,
but at the expense of complicating somewhat the above rather basic account of
\cds. For further information see \cite{lieber-riel:oop},
\cite{karl1:class}.

The Demeter notation has been carefully designed to be close to the EBNF
notation which is widely known \cite{wirth:ebnf}. If in the above example
we delete all the labels and replace \verb|~| and \verb|:| by \verb|=| then
we get a legal EBNF grammar.

\begin{figure}
\vspace{3.5in}
\caption{6 acquaintance classes}
\end{figure}

An experienced C++ programmer wrote the code in Appendix A to evaluate
objects based on the above definitions.
From this code, we can draw the dependency graph shown in Figure 3.  A
directed arc is drawn from a class {\tt C1} to a class {\tt C2} 
if C1 depends on C2.
%if a function attached to
%{\tt C1} uses the protocol of {\tt C2}.  
This code does not minimize the number of acquaintance classes.
As can be seen from the graph, 
the classes are heavily interdependent on one another.  
There are six acquaintance classes used:
{\tt Prefix\_list\_it} and {\tt Prefix} 
are acquaintance classes of {\tt Example::eval}.
{\tt Prefix\_list\_it} and Prefix are acquaintance classes of
{\tt MulSym::apply\_op} and {\tt AddSym::apply\_op}.
There are a large number of
acquaintance classes because the programmer has
made use of the knowledge of the detailed implementation of lists
using iterators in the {\tt Example}, {\tt AddSym} and {\tt MulSym} class.
%This is because the programmer has made use of the
%knowledge that, in the definition of {\tt Compound}, the arguments to the 
%operator are implemented using a
%{\tt Prefix\_list} object, and that Prefix\_list objects 
%are implemented in terms
%of lists of Prefix objects, in functions attached to {\tt AddSym}, {\tt MulSym}
%and elsewhere.  
A change in the implementation of lists would have
a great impact on this code.  

%For instance, if it is decided that only
%binary operations are to be allowed then the following definition of 
%{\tt Compound} might be used.
%\begin{verbatim}
%Compound = "(" <op> Oper 
%	       <arg1> Prefix 
%	       <arg2> Prefix ")".
%\end{verbatim}

The code can be rewritten to reduce the number of acquaintance classes
to two (instead of six).
Appendix B contains the
revised code and Figure 4 shows the associated dependency graph.  This
graph shows a drastic reduction in the number of class dependencies 
and clearly illustrates that the only class which needs to be 
dependent on the {\tt Prefix\_list\_it} is {\tt Prefix\_list}.
The code is 
more readable and the number of acquaintance classes has been reduced
to two: the two functions attached to {\tt Prefix\_list} have
both {\tt Prefix\_list\_it} as acquaintance class.  

\begin{figure}
\vspace{3.5in}
\caption{2 acquaintance classes}
\end{figure}

We are developing a compile-time tool to identify acquaintance classes.
Each acquaintance class has to be declared by the user.
If an acquaintance class is not declared,
then a decision must
be made whether or not to perform preventive maintenance and rewrite
the code so that the Law is followed.  As can be seen from the above small
example, the extra time taken at this stage may be well worth the cost.

The notation described above is used extensively in the Demeter system.  
Class dictionaries are used to generate C++ or Lisp/flavors code.  This
generated code includes the language specific class declarations and 
many utilities which can be used easily by the software engineer.

To represent the complete application design we use a {\it class module}, 
which is a combination of a class dictionary and information describing
the functions attached to classes.  The following is an example of a
class module for the prefix expression evaluator based on Appendix B.

To introduce a signature we use the keyword {\tt SIG}.
For specifying argument names and argument classes, we reuse the construction
class notation, e.g., {\tt apply( \verb|<|op\verb|>| Oper)} describes a method 
with one argument {\tt op} of type {\tt Oper}.
\begin{verbatim}
MODULE Prefix_Evaluator
Example = <exps> Prefix_list
  SIG eval() RETURNS void.
Prefix_list ~ {Prefix}
  SIG eval() RETURNS int
	top_eval() RETURNS void
	apply( <op> Oper ) RETURNS int.
Prefix : Number | Compound
  SIG VIRTUAL eval() RETURNS int.
Compound = 
  "(" <op> Oper <args> Prefix_list ")"
  SIG eval() RETURNS int.
Oper : MulSym | AddSym
  SIG VIRTUAL 
    apply_op(<num1> int <num2> int) 
			RETURNS int.
MulSym = "*".
  SIG  
    apply_op(<num1> int <num2> int) 
			RETURNS int.
AddSym = "+".
  SIG  
    apply_op(<num1> int <num2> int) 
   			RETURNS int.
Number = <val> int
  SIG eval() RETURNS int.
\end{verbatim}
