Siskind's Challenge Benchmarks

In a posting to comp.lang.ml, comp.lang.functional, comp.lang.scheme, and comp.lang.lisp, Jeffrey Mark Siskind offered two benchmarks as a challenge to the functional programming community:

I have enclosed below two benchmarks that compute the same thing. One is written in a functional style and the other is written in an imperative style. The benchmarks are written in R4RS Scheme and implement a multidimensional clustering algorithm based on the EM algorithm.

Siskind reported that the functional version was 1.44 to 2.69 times as slow as the imperative version, depending on the implementation of Scheme. He continued:

So you may say, OK, all of the Scheme implementations that you tested are broken. Well, here is my challenge. Rewrite these benchmarks faithfully in your favourite language. Any language that supports both a functional and imperative style. And run them in your favourite implementation of that language. I predict that you will get similar results.

In each of the four implementations of Scheme that I tried, the imperative version of Siskind's EM benchmark (EM-IMP) was at least twice as fast as the functional version (EM-FUN). The main reason for this discrepancy is that the functional version was written in a very inefficient style that resulted in six times as many procedure calls to top-level and anonymous procedures as in the imperative version.

For example, both EM-IMP and EM-FUN use a procedure named DETERMINANT. Although DETERMINANT is coded using side effects in both EM-IMP and EM-FUN, this is reasonable because DETERMINANT is being used as if it were a built-in library procedure. What is not reasonable is that EM-IMP and EM-FUN do not use the same code for DETERMINANT. Where EM-IMP uses

    (do ((i 0 (+ i 1)))
        ((= i n))
        (do ((j 0 (+ j 1)))
            ((= j n))
            (matrix-set! b i j (matrix-ref a i j))))

the corresponding loops are implemented in EM-FUN as

    (for-each-n (lambda (i)
                  (for-each-n (lambda (j)
                                (matrix-set! b i j (matrix-ref a i j)))
                              n))
                n)

Both of these code fragments perform exactly the same assignments, so this has nothing to do with imperative versus functional programming. All it shows is that efficient imperative code is faster than inefficient imperative code.

I cleaned up EM-FUN to make it more faithful to the algorithms and coding styles that were used in EM-IMP. As part of the cleanup, I moved primitives like DETERMINANT into a separate file that is shared by both the imperative and functional versions. In Siskind's original code, the boundary between the primitives and the code that used those primitives was indicated by the comment

;;; EM

Some primitives, such as DETERMINANT, had been implemented using side effects in both EM-IMP and EM-FUN, but the primitive itself had no externally visible side effects. Whenever that was the case, I changed Siskind's code so both EM-IMP and EM-FUN would use the same implementation of the primitive. I never used side effects within a procedure unless Siskind's code for that procedure had used side effects. None of the non-primitive procedures in EM-FUN use side effects.

The cleaned-up version of the functional benchmark still performs about 40% more calls than the imperative version, but is only 5 to 10 per cent slower. I also benchmarked a version that uses hygienic macros to ensure that the most heavily used functions would be inlined (regardless of implementation), but this did not change the relative performance and had little impact on the absolute performance. The following table shows the relative speeds of the benchmarks in four implementations of Scheme, normalized to 100 for the original imperative version.

                    original         cleaned-up       with macros
                 EM-IMP  EM-FUN    EM-IMP  EM-FUN    EM-IMP  EM-FUN

Chez Scheme v5.0a 100      42        101     91        106     99
Larceny v0.32     100      43         95     90        101     94
Gambit v2.7       100      38         99     89
MacScheme v4.2    100      49        101     97        103     98

MacScheme was run on an Apple PowerBook 170, which uses a 33 MHz Motorola 68030 with a 68882 floating point coprocessor. The other implementations were run on a SPARC Ultra 1. CPU time was reproducible to within about 5 per cent. The source code and procedure calling profiles are available online.

These results support Siskind's claim that no Scheme compiler generates as good code for functional programs as for imperative programs. They also show that the difference is substantially smaller than Siskind's original benchmarks appeared to suggest.

William D Clinger

Last updated 23 April 1998.