1205 S '03
Lectures
 
Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Lecture 8
Lecture 9
Lecture 10
Lecture 11
Lecture 12

Refactoring Software


What
----------------------------------------

When you recognize that the design, tests, or code proper have
structural flaws, you need to refactor.

Refactoring means to change your products without changing their
extensional properties.

Refactoring code means to change the code while preserving its
behavior and thus without modifying the feature set (at all!).

Why
----------------------------------------

>>Picture of a spiral, iterative development<<

Projects evolve. They either evolve because they are designed as a
"continuous prototype" (see XP, for example) or because the
"perfect specifier" didn't understand the requirements perfectly
or the perfect customers just couldn't predict their needs
properly.

So, in either case, software is modified and the new/modified
pieces interact with the existing pieces. Tests make sure that the
interaction doesn't invalidate, existing features. But, tests do
not make sure that the structure that evolves is the one that
would have been written down, if the programmer had understood the
needs perfectly.

You know that the code has structural flaws when you are supposed
to make changes and you can't figure out which pieces and how many
pieces of the software you need to change and what else they might
affect.

What is the most common flaw
----------------------------------------

Duplication of code is the most common problem, both in test
suites as well as in code proper.

Duplication usually comes about because programmers copy and paste
code, and then modify. (It is also copy-and-paste when you re-type
lines and make small changes as you go.)

Duplication is bad because it duplicates mistakes. When you or
your customer find that there's a problem, and you have copied the
code in which you correct a problem, it is quite likely that the
flaw is also in the copy. Do you remember that it was copied? Do
you remember where all the copies are? Probably not. So you're
likely to leave some bugs in. Good tests can help there, but they
may not find every one of the copies.

Duplication is also bad because you can't just change one piece of
code to add new functionality. If you change a piece of code to
add some feature, it is likely that you also need to change some
of the other versions of this code.

Put positively, you want a "single point of control" for every
feature of your product (that is, every piece of functionality).

Examples
----------------------------------------

Duplication happens at all levels: lines of code, methods,
classes, class hierarchies, and so on. To eliminate duplication,
you want to be able to extract the commonalities -- build an
abstraction -- and to specialize the abstraction for the various
occurrences.

Lines:
----------------------------------------

in some code walks we saw things like this:

if (pebble[0] == holdings[0] ||
pebble[1] == holdings[0] ||
pebble[2] == holdings[0] ||
pebble[3] == holdings[0] ||
pebble[4] == holdings[0])
{ ... }
else
if ...

Clearly, what the programmer wants to express is this:

boolean isOneOf(IColor pebble[], IColor holdings) {
for(int i = 0; i < pebble.length; i++)
if (pebble[i] == holdings)
return true;
}

if (isOneOf(pebble,holdings[0]))
{ ... }
else
if

Introducing a method eliminates the repetition, makes it more
general, is easier to read, and isn't any longer. Why did they
leave it in?

Lines again:

The code was actually worse than that:

if (pebble[0] == holdings[0] ||
pebble[1] == holdings[0] ||
pebble[2] == holdings[0] ||
pebble[3] == holdings[0] ||
pebble[4] == holdings[0])
{ ... }
else
if (pebble[0] == holdings[1] ||
pebble[1] == holdings[1] ||
pebble[2] == holdings[1] ||
pebble[3] == holdings[1] ||
pebble[4] == holdings[1])
{ ... }

It turns out that the programmers wanted to express the idea that
all the colors in pebble also occur in holdings. In other words,

boolean allColorsIn(IColor pebbles[], IColor holdings)

is the proper abstraction. Now just imagine if we added one field
to holdings or one field to pebble or one to each. How many
changes do you need in the original code? How many do you need in
the modified code?

Methods:
----------------------------------------

We often see similarities in methods. Create common substrates and
use those.

Here is something we saw in the code walks, too:

ICard pick_diff_card(ICard givenCard) {
int i = ... random(50) ... ;
while (Cards[i] != givenCard)
i = ... random(50) ... ;
return Cards[i];
}

IEquations pick_diff_equation(IEquations givenEq) {
int i = ... random(10) ... ;
while (Equations[i] != givenEq)
i = ... random(10) ... ;
return Equations[i];
}

There are two similarities: the algorithmic structure and the type
structure. The algorithmic differences we can abstract, with a
small change to the signatures. Yes, sometimes it takes a bit more
to refactor code, but it's almost always worth it.

ICard pick_diff_card(int indexGivenCard) {
return Cards[pick_diff_index(indexGivenCard,50)];
}

IEquations pick_diff_equation(IEquations indexGivenEq) {
return Equation[pick_diff_index(indexGivenEq,10)];
}

int pick_diff_index(int i, int N) {
int j = ... random(N) ...;
while (i != j)
j = ... random(N) ...;
return j;
}

The type differences we can't abstract in Java because it lacks
parametric polymorphism for plain functions.

NOTE: This explains why language researchers haven't declared that
Fortran, Algol, Pascal, C, C++, Java or C# is the best language in
the world. They all understand that programming is about avoiding
duplication (while you also want to provide other services) and
that existing languages simply don't provide enough tools to avoid
code duplication. See last example. END

Classes:
----------------------------------------

Here is an abstract example, nothing from the project.

+--------------+ +--------------+
| Woman | | Man |
+--------------+ +--------------+
| movesArm() | | movesArm() |
| givesBirth() | | goesBald() |
+--------------+ +--------------+


Clearly, there are commonalities between the two classes, and we
can abstract over them using a common superclass, that is, by
creating a class hierarchy:

+--------------+
| Person |
+--------------+
| movesArm() |
| abstract |
| givesBirth()|
+--------------+
/|\
|
|
+---------------------------------------+
| |
| |
+--------------+ +--------------+
| Woman | | Man |
+--------------+ +--------------+
| givesBirth() | | goesBald() |
+--------------+ +--------------+

Of course, it is even more common that the methods that we can
lift refer to methods that differ in the concrete derived classes:


+--------------+ +--------------+
| Woman | | Man |
+--------------+ +--------------+
| movesArm() | | movesArm() |
| appearance(){| | appearance(){|
| .dresses(). | | .dresses(). |
| } | | } |
| dresses() | | dresses() |
| givesBirth() | | goesBald() |
+--------------+ +--------------+

In that case, we need the template and hook pattern to create the
common abstraction:


+--------------+
| Person |
+--------------+
| movesArm() |
| appearance(){|
| .dresses(). |
| } |
| abstract |
| dresses() |
| abstract |
| givesBirth()|
+--------------+
/|\
|
|
+---------------------------------------+
| |
| |
+--------------+ +--------------+
| Woman | | Man |
+--------------+ +--------------+
| dresses() | | dresses() |
| givesBirth() | | goesBald() |
+--------------+ +--------------+

That is, we start to exploit one of the strengths of classes:
inheritance.

More generally, when we discover similarities in related classes,
we create or modify the class hierarchy. Often we recognize this
relationship even before we implement the concrete (similar)
classes, but even God created Man before Woman according to the
Bible.

Class Hierarchies
----------------------------------------

In OO classes, even with the support of patterns, we often
recognize patterns. Unfortunately, most OO languages don't provide
the linguistic tools for generalizing over such patterns.

Let's go back to the contract lectures, especially the sequence
contracts. We suggested that you implement those with the state
pattern. For example, a port interface with open, close, read,
eofP usually requires a sequence contract such as

{open x {read | eofP}* x close}*

It may also not allow the re-opening of a closed port. That is not
important for the example. It is also unimportant that we discuss
ports. We could equally well discuss the sequence contract for a
turn in Bazaar.

flash up IPort.html and PortState.html

Such sequence contracts are equivalent to finite state machines,
crossed with the actual behavior. In our running example, we
recognize two states with *ClosedPort* being the initial state:


open()
+------------------------------------+ read(), eofP()
| | +---------+
| v | |
+------------+ +----------+ |
|*ClosedPort*| | OpenPort | |
+------------+ +----------+ |
^ | ^ |
| | | |
+------------------------------------+ +---------+
close()


and four transitions. Missing transitions mean that it is illegal
to call the method in that state.

Now let's write this down as text, with some extra parentheses:

(sequence ClosedPort
[ClosedPort
(open OpenPort ...)]
[OpenPort
(read OpenPort ...)
(eofP OpenPort ...)
(close ClosedPort ...)])

This says that ClosedPort is the initial state, that ClosedPort
and OpenPort are all the states, that using open in a ClosedPort
state goes to OpenPort, and so on. We can also see how this
specifies a class Port with four methods: open(), read(), eofP(),
and close(). If we specify what the methods do and what they
return in the ...'s, we have all we need to generate the code that
implements this sequence contract and the full functionality of
the class.

Here is the full code:

(define port<%>
(interface ()
open ; -> Void
read ; -> Number
eof? ; -> Boolean
close ; -> Void
;; sequence contract: open x {read | eof?}* x close
))

;; (Listof Numbers) -> (InstanceOf port<%>)
;; create a port from a list of numbers
(define (make-port% x)
(make-object
(sequence closed%
[open%
(read open% (if (eof?)
(error 'read "contract violation")
(begin0 (car x) (set! x (cdr x)))))
(eof? open% (null? x))
(close closed% (void))]
[closed%
(open open% (void))])))


It is Scheme code and it generates all of the necessary classes
and transition checks etc. It's all there is to it. And we can use
the use the sequence construct to specify *all* sequence
contracts, not just those for ports.

The point? In Scheme, a programmer can define "sequence" and thus
eliminate code duplications across many places in the class
hierarchy. This is just one example of how language research can
produce extremely powerful approaches to software construction.



last updated on Tue Jan 11 13:12:34 EST 2005generated with PLT Scheme