%\documentstyle[12pt]{article}
%\documentstyle{siam}
%\documentstyle{article}
\documentstyle[proc]{article}
%\pagestyle{headings}

\begin{document}
\bibliographystyle{alpha}
\raggedbottom

%\title{Formulations and Benefits\\ of the Law of Demeter}
\title{Tools for Preventive Software Maintenance}
%\title{Preventive Maintenance of\\ 
%Object-Oriented Software}
\author{Karl J. Lieberherr, Ian M. Holland\\
Northeastern University, College of Computer Science\\
Cullinane Hall, 360 Huntington Ave., Boston MA 02115\\
lieber or holland@corwin.CCS.northeastern.EDU}

\maketitle
\input time.tex 
%\centerline{Texed at \thetime\ \today}
%\centerline{Copyright \copyright 1988 Karl Lieberherr}
\input common.tex

\begin{abstract}
We discuss tools for preventive maintenance of software
which is written following the {\em \oo\ paradigm.} 
Initial ideas on preventive maintenance of object-oriented
systems were presented in \cite{LHLR:law-paper},
\cite{sakkinen:law-88}, \cite{benefits:law-89}.
These papers describe
a set of guidelines called the ``Law of Demeter''.\footnote{\demsign\/
is the name of the object-oriented CASE system we are working 
on \cite{lieber-riel:oop}, \cite{karl1:class}.}
We discuss the implications of the Law for software maintenance.
%and
%propose a metric, called the preventive maintenance metric, to compare
%designs or programs for the same application.

In the second part of the paper we discuss the advantages of grammar-based
object-oriented design and programming for software maintenance. 
We study two kinds of grammar-based maintenance tools:
(1) Class evolution tools, especially 
an automatic class parameterization tool and (2)
growth plan tools with applications to software maintenance planning and
regression testing.
\end{abstract}

\noindent 
{\bf Keywords:} Object-oriented programming style, 
preventive program maintenance, information
hiding and restriction, grammar-based design,
application development planning, class parameterization.
%\input outline
\section{Introduction}
Object-oriented programming languages are growing in popularity in
the commercial system building world.  In a recent survey\footnote{Mass High Tech trade paper April, 89} of Massachusetts
software companies, 6\% of the participating
companies 
are currently using \oo\ programming languages. 
The companies estimate this figure to increase to 16\% within a year.
It is fully expected that
this order of growth
will continue.  

This growth has come about due to the increased acceptance of C++, Smalltalk, 
Objective-C and Eiffel by the applications systems community, and the 
development of CLOS by the symbolic computing communities.  However, because
of the relative youth of these languages compared to Fortran, COBOL and Lisp,
theories of program design, software engineering and programming style have not
yet been fully developed.  Some of the ideas and theories developed for
procedure-oriented programming languages will carry forward into  the
\oo\ realm, especially those concerning data types, data abstraction and
encapsulation.  However, the move from a procedure/function  oriented language,
like C, to an \oo\ language, like C++, is not a small one. It is
more than a change in programming style (e.g., going from Pascal to C), it
is a change of programming paradigm.

For the past four years, we have been using \oop\ techniques in our
research and our undergraduate and graduate teaching.  We have been primarily
involved in the development of the Demeter system \cite{LHLR:law-paper} 
\cite{lieber-riel:oop} \cite{karl1:class}, first
using Lisp/Flavors and now using C++.  The Demeter system is an \oo\ CASE
tool.  This system is a testbed for our work on \oo\ design and
programming, \oo\ software engineering and the development of a high level
\oo\ representation language.  Our current work focuses on the issues of
preventive maintenance.  By preventive maintenance\footnote{In 
\cite{pressman:software-87} the term `preventive maintenance' 
is attributed to Miller \cite{miller:syst-maint}.}, we mean the extra
work done to help reduce the cost of future maintenance of the system.
We are looking at ways to reduce the impact of local changes to the system,
due to maintenance, on
the rest of the system.  This is an important issue since, as the `First
Law of System Engineering' \cite{bersoff:config-manag} states:

\begin{quote}
{No matter where you are in the system life cycle, the system will
change, and the desire to change it will persist throughout the life
cycle.}
\end{quote}

We are looking at this problem from two perspectives:
\begin{itemize}
\item 
Programming style issues: Guidelines for programmers to follow while writing
\oo\ code. The guidelines help reduce the cost of maintenance.  
One such guideline is
the Law of Demeter, described below.
\item
CASE tools:  Utilities provided by the CASE environment to help create
generic software and utilities to automate aspects of software maintenance.
We describe below some of the issues concerning the development and use
of these tools.
\end{itemize}

The heart of the Demeter system is a grammar-based notation,
called the \cm\ notation which we
use to describe our \oo\ application designs.  This notation allows us 
to express our \oo\ ideas in a programming language independent way.
It also forms the basis for the tools described below.

The Demeter approach to \oo\ software development has many advantages
for software maintenance:
\begin{itemize}

\item Separation of structure and processing

Maintenance is simplified since the structural aspects of classes can be 
maintained separately from code. 

\item Class evolution

The structure of most classes changes (evolves) over the lifetime of a system.
The class module notation captures the structure of classes and therefore
it provides a good framework to study structural class evolution with the
goal to automate important but tedious parts of the maintenance
process.

\item High-level, yet precise

Class definitions are written at a high level of abstraction.
They are programming language independent and they
are free of low-level details such as pointers.
Program maintenance starts at the \cm\ level where unimportant
details are hidden.

\item Subclassing

The \cm\ notation distinguishes between at least six fundamentally different
kinds of subclassing. This distinction helps during the maintenance
process.

\item Language = class analogy

The \cm\ notation helps to
arrive at class definitions through language analogy
since each class describes a formal language.
Sentences of the formal language can be automatically parsed
into objects, making it easy to describe self-explaining
test data if the appropriate
``syntactic sugar'' is used in the \cm\ definition.

\item Free Software

By ``free'' software we mean software which is either generated or
provided generically by the Demeter system.
A \cm\ defines free software 
which 
is several factors larger than
the user-written \cm\ description and
which is tedious to write by hand.
Instead of maintaining the software directly, it is easier to maintain
the smaller \cms.
The software can be regenerated easily without cost.

\item Uniform interface for application development

The \cm\ notation 
provides a uniform interface for a large number 
of object-oriented systems and for a fixed system it provides
a uniform interface for every application. After using
one application environment provided by a \cm\ description, it is easy to 
reuse this knowledge for the next application environment.
The programming language independent nature of the \cm\ notation
simplifies the maintenance process when several programming languages
are used simultaneously.

\item Software maintenance using meta-programming

The class module approach to software development facilitates metaprogramming
which is known for its benefits to software maintenance \cite{cameron:reuse-83}.
For metaprogramming, say of Pascal programs, we use a class module 
which defines the syntax of Pascal.
\end{itemize}
\section{Programming style}

In this section we look at the low-level aspects of software maintenance.
That is, the problem of taking an existing piece of code and changing it
so that it conforms to a new design requirement or corrects a bug.  
We will not be concerned with the processes that produced the design
requirement or the process of debugging the original program. 

Unfortunately, one small change in a program may accomplish
the desired task, but it may introduce many more new problems.  We are
interested in how programming style can reduce the impact of 
change and therefore reducing maintenance costs.  Before introducing the
problems involved with an \oo\ program, it is useful to examine
the procedure-oriented case.

A change to a program written in a procedural language,
such as Pascal or C, can fall into one of
the following categories:
\begin{itemize}
\item
Removing or changing the declaration of a variable:
If the variable is declared locally within a procedure, the programmer 
only has to consider the effect of the change on the procedure's code.
If the variable is global, then the change could potentially effect the
entire program.  The scope of the variable defines the space of potential
impact.  The use of global variables has long been considered harmful.
\item
Changing a function or procedure:
If a function or procedure is changed, the change may affect
every other procedure or function that uses it.  The change may
invalidate an assumption made about the role of the changed function.
The space of potential
impact is the set of functions or procedures which call the changed
function.  If procedure {\sf A} calls procedure {\sf B}, then we say 
that {\sf A} is
a client of {\sf B} (Figure 1).  This creates a dependency link between {\sf A }
and {\sf B}. {\sf A}
depends on {\sf B} and therefore when {\sf B} changes, {\sf A} may be affected.
If a large number of functions depend on {\sf A } then extra care needs to
be taken when A is changed during a maintenance phase.  Existing
reverse engineering tools analyze procedure/function calls to produce
a call graph which can be displayed graphically.
\begin{figure}
\vspace{2.5in}
\caption{Client}
\end{figure}
\item
Changing the implementation of a data structure:  
A change of this kind affects all the code that made assumptions about
the implementation details of the data structure.  The number of functions
or procedures making these assumptions needs to be kept at a minimum.  This
is a simple restatement of the well-known principles of data hiding and
encapsulation of implementation details.  Adopting a modular programming
style which follows these principles can greatly reduce the cost of
future maintenance.
\item
Removing or changing a data structure:
This can have a very large impact on the existing software. The space of
potential impact is the set of functions or procedures that use the interface of
the data structure.  However, the fundamental structure of a program will
change when an important data structure is removed or changed radically.
\end{itemize}

In a procedure-oriented environment, function names are unique and 
the programmer has only to be concerned with either local or
global variable declarations.  This makes the rewriting task relatively
easy since the program text provides all the information necessary. 
If multiple source files are involved, then the question of locating a
function definition becomes only slightly more difficult.

In an \oo\ environment, the situation changes drastically.
Functions\footnote{Functions attached to classes are sometimes called
`methods'.} are defined as part of a class.
%\cite{stefik-bobrow:oop}
%\cite{meyer:book-88} \cite{moon:flavors}.  
This means that the functions are intimately connected to the data
structures.  Classes defined in an \oo\ program are related
together in an inheritance hierarchy and classes are defined in terms of
other classes.  These two relationships greatly influence the scope
of variables and functions.
\begin{itemize}
\item
A variable used in a function {\sf F} attached to a class {\sf C} 
can be a local
variable declared in {\sf F}
, a global variable, a data member\footnote{The term
`data member' comes from C++; other terms used for the same concept
are `instance variable', `slot' and `attribute'.}
defined as part of {\sf C}, or
a data member which {\sf C} inherits from another class.  Changing a local or
global variable declaration has the same effect as the above.  However,
if a data member of a class {\sf C}
needs to be removed or changed, then the space
of potential impact is the set of functions attached to {\sf C} and the
functions attached classes which inherit from {\sf C}.  If the details of the
data members of classes are not hidden, then the code can become 
expensive to maintain.  To prevent this kind of problem, classes should 
provide public access functions to manipulate private data member details.

\item
Function names do not have to be unique in most \oop\ languages.  A class
{\sf A} and a class {\sf B} may have two different functions, both called 
{\sf F} attached
to them.  If {\sf A} and {\sf B} are unrelated then a call to {\sf F}
will be easily recognized as a call to either {\sf A}'s {\sf F} or
{\sf B}'s {\sf F} from the text of the program.  However, if
{\sf A} inherits from {\sf B}, {\sf B} inherits from {\sf A}, or
{\sf A} and {\sf B} inherits from a class {\sf C}, then it may not
be clear from the program text, which body of code will be executed
when {\sf F} is called.
Indeed,
this can only be resolved at run-time.  This makes the task of tracing
the impact of change more complicated.
%\item
%Function names do not have to be unique in most \oop\ languages.  A class
%{\sf C} and a class {\sf B} may have two different functions, both called 
%{\sf F} attached
%to them.  Since inherited functions can be overridden by functions attached
%to classes lower down in the inheritance hierarchy, it often is impossible
%to know which function will be executed by only looking at the text of 
%the program.  Indeed
%these decisions can only be resolved at run-time.  This means
%that the dependency links between the functions are not explicit.  

\item
Classes are used to implement data structures in an \oop\ system.
Changing the implementation details of a class, i.e., the data members of
the class and the functions attached to the class, has the same
consequences as outlined in the procedure-oriented case above.  Using
public access functions to hide the data members is one way to limit
the impact of a change, minimizing the public interface to a class
is another, and
the Law of Demeter, described below, is yet a third.
\item
Changing the class inheritance hierarchy or the class part-of hierarchy
can have a drastic affect on existing code because the code is so
intimately bound to the classes.  However, classes frequently need
updating during the prototype and maintenance phases of an \oo\ 
application system.  The tools described below regarding class
evolution will play an important role in reducing the cost of this
kind of maintenance.
\end{itemize}

\subsection{Looking at dependencies}

In the procedure-oriented case, if function {\sf A} calls function {\sf B}, we
called  {\sf A} a client of {\sf B}.  In the \oo\ case we extend the
client relationship to classes.  If a class {\sf C1} has a function  {\sf A} 
which
calls a function  {\sf B} attached to class {\sf C2}, we call 
class  {\sf C1} a client of
class  {\sf C2} and  {\sf C2} is a supplier to  {\sf C1} (Figure 2).
Similar client/supplier relationships
are used in \cite{meyer:book-88}.
\begin{figure}
\vspace{3.25in}
\caption{Client class}
\end{figure}

\begin{definition}
If a class  {\sf C1} is a client of a class  
{\sf C2}, then C1 {\em depends} on  {\sf C2}.
\end{definition}

If the functions of a class change in any way, there is a potential impact
on all the clients of the class.  


It is, therefore,
important to minimize the number of suppliers to a class.  It is 
also important to maximize code reuse and minimize code
duplication.  
%In many cases it will be impossible to achieve all of these
%goals. 

To call a function attached to a class, a message, consisting of the function
name and actual parameters, is sent to an object which is a member of that class.  For example, if object {\sf ic1} is an instance of class {\sf C1}, then the
following code fragments send {\sf ic1} the message, `execute function A':
in C++ syntax -
    \verb|ic1->A();|
in Lisp/Flavors syntax -
    \verb|(send ic1 :A)|.
The object receiving the message is passed as an implicit 
actual parameter to the function.  The {\sf ic1} object can be sent messages
referring to functions attached directly to class {\sf C1} or to functions
inherited by {\sf C1}.  Therefore, in order to reduce the set of 
functions called
within a given function, we need to reduce the set of objects which are
allowed to receive messages in the function.

In \cite{benefits:law-89} \cite{LHLR:law-paper} we introduced the programming
guideline called the
Law of Demeter.  The initial motivation behind the Law was to ensure proper
encapsulation and data hiding of class implementation details.  As discussed
above, these are the important issues in software maintenance.

Before stating the Law, a few definitions are needed.  

The {\it protocol} of
a class {\sf C} is the set of messages that can be sent to {\sf C} and its
instances.

A class {\sf C1} is an {\it acquaintance class} of a given function 
{\sf A} attached to class {\sf C2}, if {\sf C1}'s protocol is used in 
{\sf A}, and {\sf C1} is not the same as {\sf C2}, or an argument class of
{\sf A} or the class of a data member of {\sf C2}.

\centerline {\it Law of Demeter}
\nopagebreak
{\bf Minimize the number of acquaintance classes over all
functions}.\footnote{We say that a program is in {\em good style}
if the number of acquaintance classes which it uses is close to
the minimum needed. For a more detailed discussion of good style
and its implications, see \cite{LHLR:law-paper}, where the Law
is formulated for individual languages.}

This Law is not intended to stand alone, but has to be considered
in the context of other design and programming principles.
The following illustrates the effect the Law has on the 
dependencies between classes.

The application to be implemented is a simple interpreter which can
evaluate prefix expressions, such as {\sf 
(+ 6 (+ 8 (* 3 2)))
}.
The classes are defined in Demeter notation which is explained
below:
\begin{verbatim}
Example = <exps> Prefix_list.
Prefix_list ~ {Prefix}.
Prefix : Number | Compound.
Compound = "(" <op> Oper 
               <args> Prefix_list ")".
Oper : MulSym | AddSym.
MulSym = "*".
AddSym = "+".
\end{verbatim}
This is called a class dictionary, and it defines the classes 
{\sf Example, Prefix\_list, Prefix, Compound, Oper, MulSym} and {\sf AddSym}.


Each class is defined by exactly one production.
The modified BNF of Demeter uses three kinds of productions, each of which 
has its own symbol to connect the left-hand-side of the production with 
the right-hand-side. 
{\em Construction} productions, such as the one 
for {\sf Compound},
use {\verb"="}, and their right-hand-sides are basically sequences of nonterminals
or terminal symbols.
Angle-bracketed terms, such as
\verb|<|{\sf args}\verb|>| provide official names for
the objects they are next to.
These label-names become
essential if an object has two constituents of the same type.
Construction productions use square brackets to indicate
that an item is optional.
{\em Alternation} productions connect the left-hand-side to
the right-hand-side via the symbol {\verb":"}. {\sf Prefix} 
is such a production.
Basically, the right-hand-side of an alternation is just a group of 
alternatives -- in the case of {\sf Prefix}, two alternatives.
The third type of Demeter production is the {\em repetition}
production. It uses
the symbol \verb|"~"|, and amounts to the operation * or + for regular expressions.
Specifically, the production for {\sf Prefix\_list} 
indicates that a {\sf Prefix\_list} object consists of zero
or more {\sf Prefix} objects.

In these productions, quoted strings such as {\verb|"("|} are
concrete syntax. Demeter \cds\ which
define automatically readable objects, must be LL(1), so concrete syntax can help
disambiguate our grammars, as well as making it easier to read the files 
containing the ASCII representation of objects (namely, the strings of 
the language specified by the grammar). 
We do not give a production for {\sf Number}, 
since this is a basic class.

The {\sf Example} class has one data member called {\sf exps} which is 
an instance of class {\sf Prefix\_list}.  The instances of 
the {\sf Prefix\_list} class are lists of zero or more {\sf Prefix} objects.
To implement this class in C++, we follow the list examples of
\cite{stroustrup:c++} and use an iterator class called {\sf Prefix\_list\_it}.  
The  {\sf Prefix} class is an abstract class.  
%There will be no instances of this class.  Instead, 
It is used as a
base class for the classes {\sf Number} and {\sf Compound}, i.e.,
the two latter classes inherit from {\sf Prefix}.
The {\sf Compound} class has two data members, the first is for the operator
(either '+' or '*') 
called {\sf op}, and the second is for a list of arguments to the
operator called {\sf args}.

There are several other features used in Demeter \cds\, including inheritance
and parameterization. These add functionality,
but at the expense of complicating somewhat the above rather basic account of
\cds. For further information see \cite{lieber-riel:oop},
\cite{karl1:class}.

The Demeter notation has been carefully designed to be close to the EBNF
notation which is widely known \cite{wirth:ebnf}. If in the above example
we delete all the labels and replace \verb|~| and \verb|:| by \verb|=| then
we get a legal EBNF grammar.

\begin{figure}
\vspace{3.5in}
\caption{6 acquaintance classes}
\end{figure}

An experienced C++ programmer wrote the code in Appendix A to evaluate
objects based on the above definitions.
From this code, we can draw the dependency graph shown in Figure 3.  A
directed arc is drawn from a class {\sf C1} to a class {\sf C2} 
if C1 depends on C2.
%if a function attached to
%{\sf C1} uses the protocol of {\sf C2}.  
This code does not minimize the number of acquaintance classes.
As can be seen from the graph, 
the classes are heavily interdependent on one another.  
There are six acquaintance classes used:
{\sf Prefix\_list\_it} and {\sf Prefix} 
are acquaintance classes of {\sf Example::eval}.
{\sf Prefix\_list\_it} and Prefix are acquaintance classes of
{\sf MulSym::apply\_op} and {\sf AddSym::apply\_op}.
There are a large number of
acquaintance classes because the programmer has
made use of the knowledge of the detailed implementation of lists
using iterators in the {\sf Example}, {\sf AddSym} and {\sf MulSym} class.
%This is because the programmer has made use of the
%knowledge that, in the definition of {\sf Compound}, the arguments to the 
%operator are implemented using a
%{\sf Prefix\_list} object, and that Prefix\_list objects 
%are implemented in terms
%of lists of Prefix objects, in functions attached to {\sf AddSym}, {\sf MulSym}
%and elsewhere.  
A change in the implementation of lists would have
a great impact on this code.  

%For instance, if it is decided that only
%binary operations are to be allowed then the following definition of 
%{\sf Compound} might be used.
%\begin{verbatim}
%Compound = "(" <op> Oper 
%              <arg1> Prefix 
%              <arg2> Prefix ")".
%\end{verbatim}

The code can be rewritten to reduce the number of acquaintance classes
to two (instead of six).
Appendix B contains the
revised code and Figure 4 shows the associated dependency graph.  This
graph shows a drastic reduction in the number of class dependencies 
and clearly illustrates that the only class which needs to be 
dependent on the {\sf Prefix\_list\_it} is {\sf Prefix\_list}.
The code is 
more readable and the number of acquaintance classes has been reduced
to two: the two functions attached to {\sf Prefix\_list} have
both {\sf Prefix\_list\_it} as acquaintance class.  

\begin{figure}
\vspace{3.5in}
\caption{2 acquaintance classes}
\end{figure}

We are developing a compile-time tool to identify acquaintance classes.
Each acquaintance class has to be declared by the user.
If an acquaintance class is not declared,
then a decision must
be made whether or not to perform preventive maintenance and rewrite
the code so that the Law is followed.  As can be seen from the above small
example, the extra time taken at this stage may be well worth the cost.

The notation described above is used extensively in the Demeter system.  
Class dictionaries are used to generate C++ or Lisp/flavors code.  This
generated code includes the language specific class declarations and 
many utilities which can be used easily by the software engineer.

To represent the complete application design we use a {\it class module}, 
which is a combination of a class dictionary and information describing
the functions attached to classes.  The following is an example of a
class module for the prefix expression evaluator based on Appendix B.

To introduce a signature we use the keyword {\sf SIG}.
For specifying argument names and argument classes, we reuse the construction
class notation, e.g., {\sf apply( \verb|<|op\verb|>| Oper)} describes a method 
with one argument {\sf op} of type {\sf Oper}.
\begin{verbatim}
MODULE Prefix_Evaluator
Example = <exps> Prefix_list
  SIG eval() RETURNS void.
Prefix_list ~ {Prefix}
  SIG eval() RETURNS int
        top_eval() RETURNS void
        apply( <op> Oper ) RETURNS int.
Prefix : Number | Compound
  SIG VIRTUAL eval() RETURNS int.
Compound = 
  "(" <op> Oper <args> Prefix_list ")"
  SIG eval() RETURNS int.
Oper : MulSym | AddSym
  SIG VIRTUAL 
    apply_op(<num1> int <num2> int) 
                        RETURNS int.
MulSym = "*".
  SIG  
    apply_op(<num1> int <num2> int) 
                        RETURNS int.
AddSym = "+".
  SIG  
    apply_op(<num1> int <num2> int) 
                        RETURNS int.
Number = <val> int
  SIG eval() RETURNS int.
\end{verbatim}
\section{Grammar-based maintenance}

The grammar-based approach to software development is used
successfully in several software systems, e.g., in the Gandalf system
\cite{notkin:gandalf-85} and the Cornell Synthesizer system
\cite{reps-teitelbaum:84}.
The Demeter system integrates the grammar-based approach with
the class-based \oo\ paradigm (supporting encapsulation, late binding
of calls to code, inheritance and parameterized classes in a programming
language independent way).
In the Demeter approach to software development,
a first step is to
produce a list of \cms.
These describe the object classes
relevant to the program along with their signatures,
using an extension of EBNF \cite{wirth:ebnf}.
The relationship between complex objects
and their constituents parallels the linguistic relation between large phrases
and smaller phrases that compose them.
After producing the class modules which describe the structural aspects
of classes,
%When one designs an object-oriented
%program using the Demeter system, a first step is to produce a list of \cms.
one then writes the procedures, or methods, associated with
the various classes introduced in the \cms, using a programming
language which supports the \oo\ paradigm (e.g., C++). 
The methods describe the details
of processing
the objects defined by the \cms.

\subsection{Class Evolution}

The motivation to improve the class organization of 
an object-oriented application often arises after a large 
part of the application/knowledge base has already
been written.
It is typically a case of hindsight by which one decides that the
class organization can be improved.
This presents a dilemma to the application writer/knowledge engineer.
He can easily continue to write the application with the poorly
designed classes
or take on the tougher task of improving his classes and adapt
the already written software,
with the hope that it will pay off in the long run.  In the
presence of deadlines and other forms of pressure to finish his task, the
application writer might be tempted to take the easy way out, even though it
may cost him more time in the long run.  
It is understandable
that application writers/ knowledge engineers 
do not like to rewrite running and tested software "only" to
make it cleaner and more amenable to future growth. Indeed, the transformations
one has to perform to the methods are often very mechanical.

Therefore we discuss now common class transformations and their
implications on the updating of method and object descriptions:
There is an opportunity to develop a successful CASE tool which
will assist in performing the updating of method and 
object descriptions.

To talk about classes in a programming language independent way, we use
the \cm\ notation. 
We distinguish between four cases of \cm\ transformations:
\begin{enumerate}
\item Method and object upward compatibility. 
\label{mi-up}
The change does not require
any modification of methods or object descriptions.
By an object description we mean a textual description defined by
a \cm. Recall that a \cm\ defines both classes and languages.

Examples of these transformations are:
add optional parts,
add parts which are a possibly empty list,
add an alternative,
generalize (A construction class {\sf B}
is replaced by an alternation class {\sf B1}
which has {\sf B} as an alternative),
required to optional,
non-empty list to possibly empty list.

\item Object upward compatibility. 
\label{o-up}
The methods need to be changed but the
object descriptions stay the same.

All the transformations from (\ref{mi-up}) and:
instance variable name change,
class name change,
additional level (a new construction class with one part is introduced),
parameterize (see later section),
factor (the same repeating pattern is captured by a new production),
promotion of structure (an alternation production with common parts
is introduced to simplify construction productions, see
the example given below),
part to inherit (and inverse),
add method argument.
\item Method upward compatibility. The methods are invariant, but the 
object descriptions need to be modified.

All transformations from (\ref{mi-up}) and:
add parts (optional or required),
optional to required, possibly empty list to non-empty list,
concrete syntax changes which keep abstract syntax invariant.
\item No compatibility. The methods and object descriptions
need to be changed.

\end{enumerate}

To illustrate these transformations we expand one example from case 2: 
promotion of structure.

\bv
Basket = <content> Apple <weight> Number.
Apple = <weight> Number
  SIG cost() RETURNS Number.
\end{verbatim}

is replaced by 

\bv
Basket = <content> Fruit <weight> Number.
Fruit : Apple COMMON <weight> Number
  VIRTUAL SIG cost() RETURNS Number.
Apple = 
  SIG cost() RETURNS Number.
\end{verbatim}

{\sf Fruit} is an alternation class with common parts, namely all alternatives
of {\sf Fruit}, in this case only {\sf Apple}, will have the parts
listed after {\sf COMMON}. There is an inheritance link from
an alternation class with common parts to all its alternatives.

The motivation for this \cm\ transformation
is that later more alternatives will be 
added to {\sf Fruit}. Therefore it pays off to factor out the common aspects
of fruit.

Software changes which have to be done (in C++) and which can be automated:

\begin{itemize}
\item
Define new class called {\sf Fruit} (with protected constructor, since {\sf Fruit}
is abstract):
\bv
class Fruit {
  public:
    virtual void cost() { ... };
\end{verbatim}

\item
add public inheritance from class {\sf Fruit} to {\sf Apple}.
Update the constructor for {\sf Apple} class.
\end{itemize}







\subsection{Parameterizing existing software}

We focus now on one special case of \cm\ evolution with upward
compatible objects: type parameterization. 
This is an important aspect of \cm\ evolution.

Writing software to make it more reusable is one kind of preventive
software maintenance. When maintenance needs to be done, we are more
likely to be able to reuse well-tested code, saving on debugging
time during the maintenance phase.

We introduce a process to automatically build polymorphic
software from existing non-polymorphic software.  We extend the
facilities provided by \oop\ systems for software reusability by
generating a generalization of the original class hierarchy while
maintaining  software compatibility.  
The importance of data-encapsulation, modularity
and coding style is reinforced, as these properties point the way
to the solutions of the problems.

%To get the best return from the time and cost invested in software
%development it is important that future systems can benefit in
%real terms from the development of current systems.
%Reusable software development and reusability techniques are 
%important issues in ensuring those benefits 
%\cite{habermann:hicss-88}.  But, what does it
%mean for a program, or a piece of code to be reusable?  At the basic
%level, it is simply the duplication of sections of code from one program to
%another.  At a higher level, it is a program or piece of software which
%is generic, i.e., 
%it solves a problem in the abstract without any dependency
%on the specific instances of the problem.  
%
%For example, a program which
%traverses the elements of a tree (any tree, independent of the types 
%of the tree nodes).
%Polymorphism is the term used to describe such flexibility.
%A program is said to be polymorphic if one of its parameters
%is a type.
%Many programming languages in the past have not supported user defined
%polymorphism, e.g., Pascal and Modula-2. Even Ada has rather
%limited capabilities in this area.  While a programmer may solve a
%problem generically (as they often do when thinking about abstract data 
%structures), when it comes around to writing the program, specific type
%details are hard-coded into the programs.  Thus,  when it is time to
%`reuse' the software, time must be taken to extract and replace type
%dependent information.

Our work on a parameterization tool
follows from our work on \oop \/ style \cite{LHLR:law-paper}
in our 
investigation into easily modifiable, reusable software.  We propose
a new technique to automatically parameterize
programs with respect to types from a given example which is written
in good style (defined earlier).

The main goal of automatic parameterization is to free
the user 
from writing polymorphic programs, i.e., the programmer still writes programs 
for a specific instance of a general problem. The user
provides a program for specific types, decides which types
should be parameterized and then gets the parameterized
program for free. The advantage of the parameterized program
is that it is more reusable than the non-parameterized version.
The parameterization process specifies constraints on the actual parameters
which may be substituted for the actual type parameters.  

There is a large body of literature on inferring types, including
parameterized types, from programs.  A good survey of this literature
can be found in
\cite{cardelli-wegner:types-85}.  All these type inferencing algorithms
make use of unification \cite{milner:unification-78},
and the type systems have the principal type property.
Recently, these type inferencing methods have been applied to \oo\/ 
languages \cite{cardelli:mult-84}.
Our contribution is to provide a useful tool
 for type parameterization for \oo\/ languages which do not have 
the principal type property.  Instead of inferring types from 
programs we infer parameterized types from unparameterized types, updating
the programs accordingly.  The control of the parameterization
process is given to the user.

We describe
an example which illustrates the motivations, notations and
problems involved in the parameterization process.  

\subsubsection{Parameterizing classes}

The Demeter system allows the creation of parameterized classes. 
The way
these classes are handled by the Demeter system shows how 
the polymorphism goal can be achieved.  These parameterized classes are
described in much the same notation indicated above.  The difference is
that the class names on the left hand sides of the productions may have
parameters. A \dd\/ with parameterized classes acts as a template 
class hierarchy description for an infinite set of hierarchies. 
To use a member of this set we simply specify actual parameters for
the parameterized classes.

Suppose we have the following \cm\ which describes a generic 
tree data structure:
\bv
Tree(Nodetype) : 
  Leaf(Nodetype) | Deeptree(Nodetype).
Leaf(Nodetype) = <node> Nodetype.
Deeptree(Nodetype) = 
  <node> Nodetype 
  <leftTree> Tree(Nodetype) 
  <rightTree> Tree(Nodetype).

Example = <myTree> Tree(Number).
\end{verbatim}
The {\sf Tree} class has a parameter {\sf Nodetype } 
whose actual value will be a name of
another class.  This parameter specifies the kind of tree we want for
our current application, e.g., a tree of numbers, identifiers, records etc.
When an instantiation of this class is needed, as in the case of
{\sf Example } above, the Demeter system  expands the template to arrive at
a standard class-dictionary.  The following is generated for the example.
\bv
Example = <myTree> Number-Tree.

Number-Tree : 
  Number-Leaf | Number-DeepTree.
Number-Leaf = <node> Number.
Number-DeepTree = 
  <node> Number 
  <leftTree> Number-Tree
  <rightTree> Number-Tree.
\end{verbatim}

In addition, abstract classes {\sf Tree, Leaf and DeepTree } are
generated from which 
classes {\sf Number-Tree, Number-Leaf, Number-DeepTree}
inherit. This enables software to be written for the template 
which can then
be used for all the derived \dds.
The names of the derived classes are calculated using a formula which
ensures unique naming.
The semantics of parameterized classes are specified relative
to the semantics of normal classes. For a discussion of the
considerations and restrictions of the expansion process see 
\cite{lieber-riel:oop}.

Class parameterization (our goal) performs the inverse 
of this expansion
process. We start with the \dd\
\bv
Example = <myTree> Number-Tree.

Number-Tree : 
  Number-Leaf | Number-DeepTree.
Number-Leaf = <node> Number.
Number-DeepTree = 
  <node> Number 
  <leftTree> Number-Tree
  <rightTree> Number-Tree.
\end{verbatim}
and specify that we want to parameterize with respect to {\sf Number}
and that 
 we want to call the parameterized classes {\sf Tree, Leaf} and {\sf DeepTree}.
The result is the parameterized \cd\ shown earlier.

Parameterization of the structure of classes is not enough.
We also have to parameterize the methods. For the
method parameterization process to work properly, we require that
the methods are written in good style.
It is beyond the scope of this paper to explain the parameterization
of methods.


%{Grammar-Based Application Planning \\for Object-Oriented Systems}
\subsection{Growth plan}

This section is a summary of \cite{lieber:89-11} 
from a maintenance point of view.
We follow again the theme that some extra effort spent during
software development will pay off during maintenance time.
The extra effort we discuss here is related to application development
planning which leads to layered system development as well as to
systematic regression testing.

In our approach to software development one first develops a list 
of \cms.
If they are complex, then one might implement the corresponding
object-oriented
program in phases, as follows: The phases correspond to increasingly 
detailed prototypes. At earlier phases, one writes methods for fewer 
or else easier objects. But at each phase, the new prototype works
on at least some 
of the objects specified by the \cm. Moreover, later phases 
subsume earlier phases -- that is, later prototypes
work on all of the objects that earlier prototypes worked on.
A nice feature of our object-oriented approach is that in general we
need only {\em add} new methods at later phases, {\em without changing the
old ones}.

Given a \cm, we can
find the objects necessary for a first prototype simply by 
finding a count-minimal sub-grammar of the \cm, where by 
{\em count-minimal sub-grammar} 
we mean a sub-grammar with the fewest possible productions.
Since it is a sub-grammar, any object specified by this count-minimal 
sub-grammar is legal according to the full \cm. But using this
sub-grammar as a guide, we can implement our prototype by writing methods 
for the fewest objects. We augment our prototypes by finding larger subgrammars
whose counts are still as low as possible. When we have reached the full
grammar, our program works for all of our objects.

In practice, we usually want a slightly different criterion for 
the productions in
our various prototypes. We want to assign a cost to each production in the 
\cm. Higher costing productions correspond to objects 
whose methods we would prefer to write later rather than sooner. We might 
assign a production a higher cost for two reasons: 
we want to implement the easier methods first, or, 
we want to tackle the tough methods first. 

In any case, once we have assigned costs we can then seek a prototype 
corresponding to the cost-minimal sub-grammar. By ``cost-minimal'' we
mean cheapest. Note that in general
this cost-minimal sub-grammar may fail to be count-minimal. That is, it may 
have more than the fewest possible productions.

Given a \cm\ D, along with the cost of each production, a
{\em growth-plan} finds for phase 1 a cost-minimal sub-grammar of D. 
Then, for each 
phase $n$ at which 
the sub-grammar, SG, has not yet grown to be D itself, the growth-plan
finds a cheapest sub-grammar of D that is a proper super-grammar 
of SG.
In other words, 
a growth plan helps you plan the cheapest first 
prototype, and then the cheapest increments, eventually handling all of D.
Thus the growth-plan is an abbreviation for {\em application software
development plan}.

We have implemented two growth-plan generators and added them to
Demeter.
The first
one gives cost-minimal growth plans, but is computation-time intensive
since we have shown that computing cost-minimal growth plans is NP-hard.
The second one gives ``good'' approximations to cost-minimal growth plans,
but runs fast.
Given a \cm\, the growth-plan generators
produce a list of new productions for each phase of the prototype, 
building -- from a (close to) cost-minimal prototype -- all the way up to the full 
\cm.

\subsubsection{A Simple Example}

To illustrate the workings of the growth-plan, 
we present and explain a simple example.

\begin{example}
We give a \cm\ and the corresponding growth plan.

%\medskip
%\centerline{\bf Demeter Notation}
\bv
  Book = 
    Intro-or-preface 
    <main-text> Text
    [<appendix> Text]
      *SIG* insert_footnote(Footnote) 
        *RETURNS* Number 
    *COST* 2.
  Intro-or-preface : 
    Introduction | Preface *COST* 0.
  Text ~ String-or-number 
    {String-or-number} *COST* 2.
  Introduction = "Introduction to the book" 
    Text *COST* 2.
  Preface = 
    "Preface to the book" Text *COST* 1.
  Footnote = <contents> Text *COST* 2.
  String-or-number : 
    String | Number *COST* 0.
\end{verbatim}

%\medskip
%\centerline{\bf Intermediate Notation}
%
%\bv
%Book -> 
%  *COST* 2 
%  *REQUIRED* ( Number Footnote Text 
%    Intro-or-preface ) 
%  *OPTIONS* ( Text ). 
%Intro-or-preface -> 
%  *COST* 0 
%  *ALTERNATIVES* ( Introduction Preface ). 
%Text -> 
%  *COST* 2 
%  *REQUIRED* ( String-or-number ). 
%Introduction -> 
%  *COST* 2 
%  *REQUIRED* ( Text ). 
%Preface -> 
%  *COST* 1 
%  *REQUIRED* ( Text ). 
%Footnote -> 
%  *COST* 2 
%  *REQUIRED* ( Text ). 
%String-or-number -> 
%  *COST* 0  
%  *ALTERNATIVES* ( String Number ). 
%String -> *COST* 2.
%Number -> *COST* 1.
%\end{verbatim}

\medskip
\centerline{\bf Growth Plan}

Minimal Grammar:
{\sf Book  Number  Footnote  Text  Preface} 

Growth Phase 1:
{\sf String} 

Growth Phase 2:
{\sf Introduction} 
\end{example}

%The \cm\ given above,
%illustrates many of Demeter's features for specifying \cms:
%
%Each class is defined by exactly one production.
%The modified BNF of Demeter uses three kinds of productions, each of which 
%has its own symbol to connect the left-hand-side of the production with 
%the right-hand-side. 
%(We have various reasons for this tripartite approach. The main one is that we
%believe it facilitates the readability of \cms.)
%{\em Construction} productions, such as the one 
%for {\sf Book},
%use {\verb"="}, and their right-hand-sides are basically sequences of nonterminals
%or terminal symbols. Construction productions use square brackets to indicate
%that an item is optional.
%%, but they do not allow alternatives. For instance,
%%one cannot use
%%
%%\bv 
%%   Book = One-kind-of-book | Another-kind.
%%\end{verbatim}
%%
%%Moreover, each nonterminal is the left-hand-side of only one production,
%%so one cannot use
%%
%%\bv
%%   Book = One-kind-of-book.
%%   Book = Another-kind.
%%\end{verbatim}
%%
%%In short, we do not use construction productions for alternatives. Instead we 
%{\em Alternation} productions productions connect the left-hand-side to
%the right-hand-side via the symbol {\verb":"}. {\sf Intro-or-preface} 
%is such a production.
%Basically, the right-hand-side of an alternation is just a group of 
%alternatives -- in the case of {\sf Intro-or-preface}, two alternatives.
%
%The third type of Demeter production is the {\em repetition}
%production. It uses
%the symbol \verb|"~"|, and amounts to the operation * or + for regular expressions.
%Specifically, the production for {\sf Text} indicates that {\sf Text} consists of one
%or more {\sf String-or-number} -- if we omitted the first occurrence of 
%{\sf String-or-number} on the right-hand-side, then {\sf Text} would consist of 
%zero or more {\sf String-or-number}.
%
%In these productions, quoted strings such as {\sf Preface-to-the-book} are
%concrete syntax. Demeter \cms\ which
%define automatically readable objects, must be LL(1), so concrete syntax can help
%disambiguate our grammars, as well as making it easier to read the files 
%containing the ASCII representation of objects (namely, the strings of 
%the language specified by the grammar). Angle-bracketed terms, such as
%\verb|<| {\sf contents} \verb|>| provide official names for the objects they are next to. When
%a method refers to the constituent of a {\sf Footnote}, it uses the label-name, 
%{\sf contents} rather than the less-suggestive {\sf Text}. 
%These label-names become
%essential if an object has two constituents of the same type, for example,
%
%\bv
%Two-strings = 
%  <first> String <second> String .
%\end{verbatim}
%
%Note that in giving a production we can include a phrase beginning 
%with {\sf *SIG*}. This
%indicates signatures to associate with the class defined by the 
%production. For instance, we 
%specify in the above example that {\sf Book} uses a method called 
%{\sf insert\_footnote},
%whose parameter is an object of class {\sf Footnote}, and whose 
%return-value is an object of class {\sf Number}.
%
%We do not give productions for {\sf String} and {\sf Number}, 
%since these are basic classes
%supplied by Demeter. 
%
%There are several other features used in Demeter \cms\, including inheritance
%and parameterization. These add functionality,
%but at the expense of complicating somewhat the above rather basic account of
%\cms. For further information see \cite{lieber-riel:oop},
%\cite{karl1:class}.
%
%For convenience, our generator transforms \cms\ into an
%`intermediate notation', or `intermediate \cm', that is, a format
%more suitable for the present task. (We think this format is useful for other
%tasks as well.)
%Moving on to the intermediate notation in the example, 
%we see that with each production we
%associate a number (its cost, which is copied from
%the corresponding production in the \cm), 
%and three optional lists of productions, labelled
%"*REQUIRED*", "*ALTERNATIVES*", and "*OPTIONS*". 
%In most cases we automatically assign alternations a cost of zero. 
%This is because we do not write methods for alternations (unless we are using
%some of the advanced features of Demeter alluded to above).
We have extended the \cm\ notation to allow the specification
of a cost for each class.

To compute the growth plan, we look at the \cm. We begin
with the start-symbol ({\sf Book})
and find a cost-minimal sub-grammar. In the example,
the desired grammar must use all of the required parts of 
{\sf Book}, which are {\sf Intro-or-preface Text Footnote Number}.
Since one of the required parts is {\sf Intro-or-preface},
we must use this production. So we must use at least one of its 
alternatives, either {\sf Preface} or {\sf Introduction}. 
Since {\sf Preface} is
cheaper, we choose that. At later stages of the growth plan, we gradually 
add the productions that are not used in the minimal plan.

There are at least two applications of 
the growth plan idea to software maintenance.
\begin{itemize}
\item
Plans for program maintenance:

 Suppose we are                
  given two class modules, the first one reflecting the current
   classes, the second one the planned classes. We would like to
    compute a growth plan for the maintenance steps which decomposes
     the task into a number of subtasks, some of which might be automated.

\item
Testing of modifications.
The growth plan for a \cm\ can  be used to test software in
a systematic way.
A sequence of inputs corresponding to the growth phases of a \cm\
can be used to test the smallest number of additional
classes on successive test runs.
For example, if a program correctly processed the  test inputs for
the first 5 growth phases but failed to process the test input for
the 6th growth phase, we would probably want to examine the methods
attached to the classes introduced at growth phase 6.
Regression testing should be done in the order suggested by the growth plan
to localize a potential error.
\end{itemize}
\section{Related recent work}
Work on maintaining structure-oriented environments 
is useful for the study of class evolution.
A tool for updating objects when a grammar changes is described
in \cite{garlan:transform-88}. 

\cite{kaiser:workspaces} describes the Infuse tool which allows the user
to study the impact of change before the change is made.

\section{Conclusion}
As more software is written using the \oo\ paradigm,
it is important to study how this software is best maintained.
We identified a useful style rule, the Law of Demeter, which
simplifies or even obviates manual maintenance (see the discussion
of the parameterizer tool).

To simplify the maintenance (and development process), we propose
a two-level approach to \oo\ design and development: The definition of
the structure of objects is separated from the definition of their
functionality and the structure definition always implies a language
definition. We have discussed the benefits of this approach and we
have described two useful tools: a parameterizer tool and a growth plan tool
with applications to maintenance plan generation
and regression testing.

\medskip
\centerline{{\bf Acknowledgements}}

We would like to thank Carl Woolf and Bill Brown for the
implementation of the growth plan algorithms.
Ignacio Silva-Lepe and Paul Steckler contributed to the prefix
example.

We would like to thank Cindy Brown and Paul Steckler for their
feedback on an early version of this paper.

\section{Appendix A}
\begin{verbatim}
// code to evaluate Prefix expressions. 
// the Pretty print functions are omitted.
// class definitions are in Section 2.
// code violates the Law of Demeter:
// it uses too many acquaintance classes

void Example::eval()
{  Prefix_list_it next_exp( *args );
// acquaintances: Prefix_list_it, Prefix.
   Prefix*          each_exp;
   while( each_exp = next_exp() )
      {  each_exp->pp();
         cout << " = " 
              << each_exp->eval();}}

int Prefix::eval() {}

int Compound::eval()
{ return oper->apply_op(args);}

int Number:eval()
{ return this->get_val();}

int Oper::apply_op(Prefix_list *args) {}

int MulSym::apply_op( Prefix_list *args )
// acquaintances: Prefix_list_it, Prefix.
{  Prefix_list_it next_exp( *args );
   Prefix*        each_arg;
   int         result = 1;
   while( each_arg = next_exp() )
      result *= each_arg->eval();
   return result;}

int AddSym::apply_op( Prefix_list *args )
// acquaintances: Prefix_list_it, Prefix.
{  Prefix_list_it next_exp( *args );
   Prefix*        each_arg;
   int         result = 0;
   while( each_arg = next_exp() )
      result += each_arg->eval();
   return result;}
\end{verbatim}
\section{Appendix B}

\begin{verbatim}
// Revised code for evaluating Prefix 
//   expressions 
// code follows the Law of Demeter.
// Class definitions are in section 2.

void Example::eval()
{ exps->top_eval();}
         
void Prefix_list::top_eval()
// acquaintance: Prefix_list_it
{ Prefix* each_exp;
  Prefix_list_it next_exp(*this);
   while( each_exp = next_exp() )
       {  each_exp->pp();
          cout << " = " 
               << each_exp->eval();};}

int Prefix_list::apply(Oper* op)
// acquaintance: Prefix_list_it
{ Prefix* each;
  Prefix_list_it next_exp(*this);
  int result = next_exp()->eval();
  while (each = next_exp())
    result = 
       op->apply_op(result,each->eval());
  return result;}

int Prefix::eval() {}

int Compound::eval()
{ return args->apply(op);}

int Oper::apply_op(int num1,int num2) {}

int Number::eval()
{ return this->get_val();}

int Addsym::apply_op(int num1,int num2)
{ return num1 + num2;}

int Mulsym::apply_op(int num1,int num2)
{ return num1 * num2;}

\end{verbatim}

\bibliography{biblio}
%\tableofcontents

\end{document}

