Links to Java code:
SeqInt ADT
SeqInt
SeqInt
SeqInt is an immutable abstract data type that
was specified as follows.
Signature:
public static SeqInt empty();
public static SeqInt singleton (int);
public static SeqInt concat (SeqInt, SeqInt);
public static boolean isEmpty (SeqInt);
public static int length (SeqInt);
public static int at (SeqInt, int);
Algebraic specification:
isEmpty (empty()) = true
isEmpty (singleton (x)) = false
isEmpty (concat (s1, s2)) = isEmpty (s1) && isEmpty (s2)
length (empty()) = 0
length (singleton (x)) = 1
length (concat (s1, s2)) = length (s1) + length (s2)
at (singleton (x), 0) = x
at (concat (s1, s2), i) = at (s1, i)
if i < length (s1)
at (concat (s1, s2), i) = at (s2, i - length (s1))
otherwise
Our
first implementation of SeqInt
was obtained by
following a fairly general recipe for translating algebraic
specifications of immutable ADTs into executable Java code.
The nice thing about this recipe is that, when it works,
it usually produces a simple implementation that is easy to
relate to the specification and therefore easy to understand.
Unfortunately, the recipe often produces inefficient code.
To test the performance of implementations of SeqInt,
I wrote two simple stress tests, or benchmarks, which I call
benchmarks
A
and
B.
Both of these benchmarks create a SeqInt with
2n elements and then compute the sum of its
elements.
The main difference between them is how they construct this
test case. Benchmark A constructs the test case as follows:
static SeqInt testCase (int n) {
if (n == 0)
return SeqInt.singleton (1);
else {
SeqInt t = testCase (n - 1);
return SeqInt.concat (t, t);
}
}
Benchmark B constructs the test case this way:
static SeqInt testCase (int n) {
int m = 1;
for (int i = 0; i < n; i = i + 1)
m = m + m;
SeqInt s = SeqInt.empty();
for (int i = 0; i < m; i = i + 1)
s = SeqInt.concat (s, SeqInt.singleton (i));
return s;
}
With n=10, we will be testing to see how long it takes to add
1024 integers. That shouldn't take very long, but with our
recipe implementation of SeqInt on an unloaded Sun Blade 100,
it can take several minutes. With n=12, so we are adding 4096
integers, it can take half a day. With n=14, it can blow up.
A B
10 ( 1024) .83 262.41
12 ( 4096) 6.77 48084.11
14 (16384) 101.17 StackOverflowError
16 (65536) 1618.97
This is pathetic.
One of the problems is that the at operation
computes the length of a SeqInt. Multiple
uses of the at operation will compute this
length multiple times. With our recipe implementation,
each computation of the length takes time proportional
to the height of the representation. Since the length
never changes, it would be better to compute the length
only once, when a SeqInt is created, than
to recompute it every time. This
improved implementation
of SeqInt performs much better, but still
gives a StackOverflowError with stress test B.
A B
10 ( 1024) .83 .35 262.41 .77
12 ( 4096) 6.77 .43 48084.11 15.76
14 (16384) 101.17 .55 StackOverflowError
16 (65536) 1618.97 .92 StackOverflowError
The StackOverflowError results from delegation in our
implementation of the at operation:
int atMethod (int i) {
if (i < length (s1))
return at (s1, i);
else
return at (s2, i - length (s1));
}
Although delegation is a common and important technique in
object-oriented programming, most of the world's implementations
of object-oriented languages impose a small finite limit on how
many delegations can occur before the result is computed. This
bug is known as improper tail recursion.
Improper tail recursion is the cause of the stack overflow with stress test B. As an application programmer, there isn't much you can do about this bug, except to avoid long chains of delegations.
With our implementation, the number of delegations is equal to
the height of the representation. With our implementation,
stress test A yields a perfectly balanced representation, but
stress test B yields a maximally unbalanced representation.
By adding some cleverness to our implementation, we can arrange
for the concat operation to create and return more
balanced representations. This
more balanced representation
performs reasonably well on both stress tests:
A B
10 ( 1024) .83 .35 .38 262.41 .77 .38
12 ( 4096) 6.77 .43 .36 48084.11 15.76 .51
14 (16384) 101.17 .55 .48 StackOverflowError .78
16 (65536) 1618.97 .92 .90 StackOverflowError 1.91
For stress test B and n=12, we've reduced the running time from
half a day to half a second. The balanced representation has
also solved our stack overflow problem.
Last updated 30 April 2002.