Quality code

[ Working code, correct code, good code | An example ]

The beginning of wisdom for a [software engineer] is to recognize the difference between getting a program to work, and getting it right.
-- M A Jackson, 1975

There is a big difference between

Good code is abstract, straightforward, and efficient enough to do the job. Good code is easy to understand and to maintain. Good code evolves as requirements change. It doesn't have to be thrown away when a new feature is added.

Not all correct code is good code. It might be unnecessarily inefficient or overly complicated or insufficiently abstract. Poor code is hard to maintain. When requirements change, it is often easier to replace poor code with newly written code than to try to understand and to modify the poor code.

Not all code that works is correct. It might happen to work when it is linked together with one particular implementation of an abstract data type that it uses, but not when it is linked together with an alternative but equally correct implementation of that data type. It might happen to work when presented with ordinary data, but not with unusual data.

Not all correct code works, but it is far more likely to work than incorrect code. Compiler bugs sometimes prevent correct code from working as it should. More often, some other module may contain a bug that prevents a correct module from working. With good code, we can isolate these bugs. With poor code, it is harder to determine which module is at fault.

Not all good code is correct, but good code is much more likely to be correct than poor code. It's hard to convince anyone that poor code is correct even when it is. What's more, any attempt to add new features to poor code is likely to introduce bugs that are hard to find and to fix. With good code, you can find and fix any bugs. In the long run, good but incorrect code is more valuable than poor but correct code.


Suppose, for example, that you are given an abstract data type itree of binary trees whose leaves are labelled by integers, and this ADT is defined by the signature

    MakeLeafNode:               int -> itree
    MakeInteriorNode: itree x itree -> itree
    IsLeaf:                   itree -> boolean
    Label:                    itree -> int
    Left:                     itree -> itree
    Right:                    itree -> itree
and the algebraic specification
    IsLeaf (MakeLeafNode (k))          =  true
    IsLeaf (MakeInteriorNode (t1, t2)  =  false

    Label (MakeLeafNode (k))           =  k

    Left  (MakeInteriorNode (t1, t2))  =  t1
    Right (MakeInteriorNode (t1, t2))  =  t2

Suppose further that your task is to write a function named MaxLeaf that takes an itree as its argument and returns the largest label found on any of its leaves.

Good code for this function is simple and easy to understand. It is also efficient.

Poor code for this function may be correct, but it is harder to understand. It is also harder to adapt to changing requirements, because it is constructed from special-purpose routines that may interact through obscure side effects. For example, why would the poor code for MaxLeaf become incorrect if the assignment

    p = p - n;
were omitted?

Poor code is often inefficient as well. In this example, the good code runs in linear time but the poor code requires quadratic time. The poor code also contains a serious space leak. It is easier to write the good code from scratch than it would be to make the poor code reasonably efficient. Hence the poor code isn't worth much even though it is, in some sense, correct.

Incorrect code for this function might happen to work. Suppose, for example, that the abstract data type itree is written in C. The header file itree.h might expose enough of the representation for a programmer to write incorrect code that nonetheless happens to work with one particular implementation of the itree ADT. If that implementation is changed, however, then the incorrect code will almost certainly cease to work. Correct code, even correct code of poor quality, will continue to work when the implementation of an abstract data type is changed.