Description of Benchmarks

Note: Our educated guesses concerning the reasons for a particular system's performance appear within [square brackets].

geometricMean

This pseudo-benchmark is an aggregate statistic that shows the geometric mean for all benchmarks. Where other benchmarks display timings in seconds, the numerical scores for the geometric mean show the (geometric) average ratio, over all benchmarks that a system was able to run, of the system's time to the fastest system's time. An average ratio of 1.0 is the lowest possible, and can be achieved only by a system that is fastest on every benchmark.


Gabriel Benchmarks

boyer

Bob Boyer's theorem proving benchmark, a Gabriel benchmark, 20 iterations. This benchmark uses association lists in place of the original benchmark's property lists. The nboyer benchmark is an updated and altogether better version of this benchmark.

browse

Browsing a data base, a Gabriel benchmark, 600 iterations. [May be a test of string->symbol and/or symbol->string.]

cpstak

The tak benchmark in continuation-passing style, 1000 iterations. A test of closure creation.

ctak

The tak benchmark in continuation-capturing style, 100 iterations. A test of call-with-current-continuation. [Larceny's code for call-with-current-continuation is now written in C, and most of its time on this benchmark is spent crossing the Scheme/C barrier.]

dderiv

Table-driven symbolic differentiation, a Gabriel benchmark, 2000000 iterations. Uses association lists instead of the original benchmark's property lists.

deriv

Symbolic differentiation, a Gabriel benchmark, 2000000 iterations.

destruc

Destructive list operations, a Gabriel benchmark, 500 iterations.

diviter

Divides 200 by 2 using lists as a unary notation for integers, a Gabriel benchmark, 1000000 iterations. This benchmark tests null?, cons, car, cdr, and little else.

divrec

This benchmark is the same as idiv2 except it uses deep recursion instead of iteration.

puzzle

Combinatorial search of a state space, a Gabriel benchmark, 100 iterations. A test of arrays and classical compiler optimizations. This benchmark was originally written in Pascal by Forrest Baskett.

takl

The tak benchmark using lists to represent integers, a Gabriel benchmark, 300 iterations.

triangl

Another combinatorial search similar to puzzle, a Gabriel benchmark, 10 iterations.


Numerical Benchmarks

The common characteristic of these numerical benchmarks is that they are easy to translate into C.

fft

Fast Fourier Transform, 2000 iterations. A test of floating point arithmetic.

fib

Doubly recursive computation of the 35th fibonacci number (9227465), using (< n 2) to terminate the recursion; 5 iterations.

fibfp

Calculation of the 35th Fibonacci number using inexact numbers, with only 2 iterations. A test of floating point arithmetic. Uses essentially the same code as the fib benchmark.

mbrot

Generation of a Mandelbrot set, 100 iterations. A test of floating point arithmetic.

nucleic

Determination of a nucleic acid's spatial structure, 5 iterations. A test of floating point arithmetic, and a real program.

pnpoly

Testing to see whether a point is contained within a 2-dimensional polygon, 100000 iterations (with 12 tests per iteration). A test of floating point arithmetic.

sum

Sums the integers from 0 to 10000, 10000 iterations.

sumfp

Sums the integers from 0 to 10000, 10000 iterations. A test of floating point arithmetic. Uses essentially the same code as the sum benchmark.

tak

A triply recursive integer function related to the Takeuchi function, a Gabriel benchmark, 2000 iterations. A test of non-tail calls and arithmetic. Historical note: The Symbolics 3600 performed 1 iteration of this benchmark in 0.43 seconds using generic arithmetic; for 2000 iterations, that would be 860 seconds.


Kernighan and Van Wyk Benchmarks

Brian W Kernighan and Christopher J Van Wyk wrote a set of small benchmarks to compare the perforance of several scripting languages, including C and Scheme. Marc Feeley and I have modified some of these benchmarks to correct bugs and to increase the number of iterations.

ack

A version of the Ackermann function, with arguments 3,9.

array1

This benchmark allocates, initializes, and copies some fairly large one-dimensional arrays.

cat

This file-copying benchmark is a simple test of character i/o.

string

This tests string-append and substring, and very little else.

sum1

This benchmark reads and sums 100,000 floating point numbers. It is primarily a test of floating point input.

sumloop

Increments a global variable 100000000 times. This benchmark was intended to test looping and arithmetic, but in Scheme it becomes a test of the garbage collector's write barrier.

tail

This benchmark performs considerable character i/o.

wc

Another character i/o benchmark.


Other Benchmarks

conform

A type checker written by Jim Miller, 40 iterations.

dynamic

Dynamic type inference, self-applied, 20 iterations. Written by Fritz Henglein. A real program.

earley

Earley's parsing algorithm, parsing a 9-symbol input according to one of the simplest ambiguous grammars, 200 iterations. A real program, applied to toy data.

fibc

A version of fib that uses first class continuations; written by Kent Dybvig. Calculates the 18th Fibonacci number 500 times.

graphs

This program was provided by Andrew Wright, but we don't know much about it, and would appreciate more information. This higher order program creates closures almost as often as it performs non-tail procedure calls.

lattice

Another program that was provided by Andrew Wright, though it may have been written by Jim Miller. It enumerates the order-preserving maps between finite lattices.

matrix

Another program that was provided by Andrew Wright. Computes maximal matrices; similar to some puzzle programs.

maze

Constructs a maze on a hexagonal grid, 4000 iterations. Written by Olin Shivers. [The maze benchmark uses bitwise-not and bitwise-and procedures that are not standard in R5RS. Implementations that provide these procedures as extensions probably have an advantage over implementations that use the portable versions we wrote for this benchmark.]

mazefun

Constructs a maze on a rectangular grid using purely functional style, 1000 iterations. Written by Marc Feeley.

nqueens

Computes the number of solutions to the 8-queens problem, 2000 times.

paraffins

Computes the number of paraffins that have 17 carbon atoms, 1000 times.

peval

Partial evaluation of Scheme code, 200 iterations. Written by Marc Feeley.

primes

Computes the primes less than 100, 100000 times, using a list-based Sieve of Eratosthenes. Written by Eric Mohr.

ray

Ray tracing a simple scene, 5 iterations. A test of floating point arithmetic. This program is translated from the Common Lisp code in Example 9.8 of Paul Graham's book on ANSI Common Lisp.

scheme

A Scheme interpreter evaluating a merge sort, 20000 iterations. Written by Marc Feeley.

simplex

Simplex algorithm, 100000 iterations. A test of floating point arithmetic, and a real program.

slatex

Scheme to LaTeX processor, 20 iterations. A test of file i/o and probably much else. Part of a real program written by Dorai Sitaram.

perm9

Creates a list containing all 362880 permutations of a list of 9 integers, in Gray code order, using Zaks's algorithm, 10 iterations. Please note that this benchmark is not the same as the 10perm9 benchmark written by Will Clinger, because it does not retain the result of the previous iteration while computing the result of the current iteration.

As written by Will Clinger, the 10perm9 benchmark is a test of storage allocation and garbage collection. At the end of each iteration, the oldest half of the live storage becomes garbage. This benchmark is particularly difficult for generational garbage collectors, since it violates their assumption that young objects have a shorter future life expectancy than older objects. The perm9 benchmark distributed by Marc Feeley does not have that property.

nboyer

An updated and exponentially scalable version of the boyer benchmark. The nboyer benchmark's data structures are considerably more appropriate than the data structures used in the boyer benchmarks. These timings are for 100 iterations of a problem of size 0, which is the same size as for boyer, and is way too small for modern machines. A test of lists, vectors, and garbage collection.

sboyer

A version of nboyer that has been tuned (by Henry Baker) to reduce storage allocation, making it less of a garbage collection benchmark and more of a compiler benchmark. Only 4 lines of code were changed, and another 7 lines of code were added.

gcbench

This program was written to mimic the phase structure that has been conjectured for a class of application programs for which garbage collection may represent a significant fraction of the execution time. This benchmark warms up by allocating and then dropping a binary tree of height 18. Then it allocates a permanent tree of height 16 and a permanent array of 500000 floating point numbers. Then it allocates about 350 megabytes of tree storage in seven phases, allocating about 50 megabytes per phase. The first phase allocates 67648 trees of height 4, dropping each tree before allocating the next. The second phase allocates and drops 16512 trees of height 6, and so on to the last phase, which allocates 8 trees of height 16. Each phase is divided into two subphases. The first subphase allocates trees top-down using side effects, while the second subphase allocates trees bottom-up without using side effects. This benchmark was written in Java by John Ellis and Pete Kovac, modified by Hans Boehm, and translated into Scheme, Standard ML, C++, and C by William Clinger.

parsing

Parses the nboyer benchmark 1000 times using a scanner and parser generated using Will Clinger's LexGen and ParseGen. [The MIT Scheme compiler cannot compile single procedures that are as large as this benchmark, so the timing shown is for the MIT Scheme interpreter.]

gcold

A synthetic garbage collection benchmark written by David Detlefs and translated to Scheme by Will Clinger and Lars Hansen.


Last updated 16 February 2007.