Description of Benchmarks

Note: Our educated guesses concerning the reasons for a particular system's performance appear within [square brackets].


This pseudo-benchmark is an aggregate statistic that shows the geometric mean for all benchmarks. Where other benchmarks display timings in seconds, the numerical scores for the geometric mean show the (geometric) average ratio, over all benchmarks that a system was able to run, of the system's time to the fastest system's time. An average ratio of 1.0 is the lowest possible, and can be achieved only by a system that is fastest on every benchmark.

Gabriel Benchmarks


Bob Boyer's theorem proving benchmark, a Gabriel benchmark, 20 iterations. This benchmark uses association lists in place of the original benchmark's property lists. The nboyer benchmark is an updated and altogether better version of this benchmark.


Browsing a data base, a Gabriel benchmark, 600 iterations. [May be a test of string->symbol and/or symbol->string.]


The tak benchmark in continuation-passing style, 1000 iterations. A test of closure creation.


The tak benchmark in continuation-capturing style, 100 iterations. A test of call-with-current-continuation. [Larceny's code for call-with-current-continuation is now written in C, and most of its time on this benchmark is spent crossing the Scheme/C barrier.]


Table-driven symbolic differentiation, a Gabriel benchmark, 2000000 iterations. Uses association lists instead of the original benchmark's property lists.


Symbolic differentiation, a Gabriel benchmark, 2000000 iterations.


Destructive list operations, a Gabriel benchmark, 500 iterations.


Divides 200 by 2 using lists as a unary notation for integers, a Gabriel benchmark, 1000000 iterations. This benchmark tests null?, cons, car, cdr, and little else.


This benchmark is the same as idiv2 except it uses deep recursion instead of iteration.


Combinatorial search of a state space, a Gabriel benchmark, 100 iterations. A test of arrays and classical compiler optimizations. This benchmark was originally written in Pascal by Forrest Baskett.


The tak benchmark using lists to represent integers, a Gabriel benchmark, 300 iterations.


Another combinatorial search similar to puzzle, a Gabriel benchmark, 10 iterations.

Numerical Benchmarks

The common characteristic of these numerical benchmarks is that they are easy to translate into C.


Fast Fourier Transform, 2000 iterations. A test of floating point arithmetic.


Doubly recursive computation of the 35th fibonacci number (9227465), using (< n 2) to terminate the recursion; 5 iterations.


Calculation of the 35th Fibonacci number using inexact numbers, with only 2 iterations. A test of floating point arithmetic. Uses essentially the same code as the fib benchmark.


Generation of a Mandelbrot set, 100 iterations. A test of floating point arithmetic.


Determination of a nucleic acid's spatial structure, 5 iterations. A test of floating point arithmetic, and a real program.


Testing to see whether a point is contained within a 2-dimensional polygon, 100000 iterations (with 12 tests per iteration). A test of floating point arithmetic.


Sums the integers from 0 to 10000, 10000 iterations.


Sums the integers from 0 to 10000, 10000 iterations. A test of floating point arithmetic. Uses essentially the same code as the sum benchmark.


A triply recursive integer function related to the Takeuchi function, a Gabriel benchmark, 2000 iterations. A test of non-tail calls and arithmetic. Historical note: The Symbolics 3600 performed 1 iteration of this benchmark in 0.43 seconds using generic arithmetic; for 2000 iterations, that would be 860 seconds.

Kernighan and Van Wyk Benchmarks

Brian W Kernighan and Christopher J Van Wyk wrote a set of small benchmarks to compare the perforance of several scripting languages, including C and Scheme. Marc Feeley and I have modified some of these benchmarks to correct bugs and to increase the number of iterations.


A version of the Ackermann function, with arguments 3,9.


This benchmark allocates, initializes, and copies some fairly large one-dimensional arrays.


This file-copying benchmark is a simple test of character i/o.


This tests string-append and substring, and very little else.


This benchmark reads and sums 100,000 floating point numbers. It is primarily a test of floating point input.


Increments a global variable 100000000 times. This benchmark was intended to test looping and arithmetic, but in Scheme it becomes a test of the garbage collector's write barrier.


This benchmark performs considerable character i/o.


Another character i/o benchmark.

Other Benchmarks


A type checker written by Jim Miller, 40 iterations.


Dynamic type inference, self-applied, 20 iterations. Written by Fritz Henglein. A real program.


Earley's parsing algorithm, parsing a 9-symbol input according to one of the simplest ambiguous grammars, 200 iterations. A real program, applied to toy data.


A version of fib that uses first class continuations; written by Kent Dybvig. Calculates the 18th Fibonacci number 500 times.


This program was provided by Andrew Wright, but we don't know much about it, and would appreciate more information. This higher order program creates closures almost as often as it performs non-tail procedure calls.


Another program that was provided by Andrew Wright, though it may have been written by Jim Miller. It enumerates the order-preserving maps between finite lattices.


Another program that was provided by Andrew Wright. Computes maximal matrices; similar to some puzzle programs.


Constructs a maze on a hexagonal grid, 4000 iterations. Written by Olin Shivers. [The maze benchmark uses bitwise-not and bitwise-and procedures that are not standard in R5RS. Implementations that provide these procedures as extensions probably have an advantage over implementations that use the portable versions we wrote for this benchmark.]


Constructs a maze on a rectangular grid using purely functional style, 1000 iterations. Written by Marc Feeley.


Computes the number of solutions to the 8-queens problem, 2000 times.


Computes the number of paraffins that have 17 carbon atoms, 1000 times.


Partial evaluation of Scheme code, 200 iterations. Written by Marc Feeley.


Computes the primes less than 100, 100000 times, using a list-based Sieve of Eratosthenes. Written by Eric Mohr.


Ray tracing a simple scene, 5 iterations. A test of floating point arithmetic. This program is translated from the Common Lisp code in Example 9.8 of Paul Graham's book on ANSI Common Lisp.


A Scheme interpreter evaluating a merge sort, 20000 iterations. Written by Marc Feeley.


Simplex algorithm, 100000 iterations. A test of floating point arithmetic, and a real program.


Scheme to LaTeX processor, 20 iterations. A test of file i/o and probably much else. Part of a real program written by Dorai Sitaram.


Creates a list containing all 362880 permutations of a list of 9 integers, in Gray code order, using Zaks's algorithm, 10 iterations. Please note that this benchmark is not the same as the 10perm9 benchmark written by Will Clinger, because it does not retain the result of the previous iteration while computing the result of the current iteration.

As written by Will Clinger, the 10perm9 benchmark is a test of storage allocation and garbage collection. At the end of each iteration, the oldest half of the live storage becomes garbage. This benchmark is particularly difficult for generational garbage collectors, since it violates their assumption that young objects have a shorter future life expectancy than older objects. The perm9 benchmark distributed by Marc Feeley does not have that property.


An updated and exponentially scalable version of the boyer benchmark. The nboyer benchmark's data structures are considerably more appropriate than the data structures used in the boyer benchmarks. These timings are for 100 iterations of a problem of size 0, which is the same size as for boyer, and is way too small for modern machines. A test of lists, vectors, and garbage collection.


A version of nboyer that has been tuned (by Henry Baker) to reduce storage allocation, making it less of a garbage collection benchmark and more of a compiler benchmark. Only 4 lines of code were changed, and another 7 lines of code were added.


This program was written to mimic the phase structure that has been conjectured for a class of application programs for which garbage collection may represent a significant fraction of the execution time. This benchmark warms up by allocating and then dropping a binary tree of height 18. Then it allocates a permanent tree of height 16 and a permanent array of 500000 floating point numbers. Then it allocates about 350 megabytes of tree storage in seven phases, allocating about 50 megabytes per phase. The first phase allocates 67648 trees of height 4, dropping each tree before allocating the next. The second phase allocates and drops 16512 trees of height 6, and so on to the last phase, which allocates 8 trees of height 16. Each phase is divided into two subphases. The first subphase allocates trees top-down using side effects, while the second subphase allocates trees bottom-up without using side effects. This benchmark was written in Java by John Ellis and Pete Kovac, modified by Hans Boehm, and translated into Scheme, Standard ML, C++, and C by William Clinger.


Parses the nboyer benchmark 1000 times using a scanner and parser generated using Will Clinger's LexGen and ParseGen. [The MIT Scheme compiler cannot compile single procedures that are as large as this benchmark, so the timing shown is for the MIT Scheme interpreter.]


A synthetic garbage collection benchmark written by David Detlefs and translated to Scheme by Will Clinger and Lars Hansen.

Last updated 16 February 2007.