The naive code generators we wrote for COM 3355 are very simple, but they generate very inefficient code. In COM 3356 we will be concerned with efficiency, at the expense of a more complicated code generator.
To give us some idea of just how bad our naive code generators are, I wrote three teensy-weensy toy benchmarks in quirk13 and translated them into C++, Standard ML, and Scheme. I ran these benchmarks on merrimack, which is the DECstation 5000/120 in my office. I used my naive code generator for quirk13, g++ v2.4.5 for C++, Standard ML of New Jersey v0.93, and Chez Scheme v4.1b. I recorded the user time as reported by the time command.
These benchmarks do not give a fair comparison of these compilers. In particular, all compilers are generating safe code except for the quirk13 and C++ compilers.
loop -- a tight loop counts down from 10 millionquirk13 8.6 ******** Chez Scheme 57.3 ****************************************************** Chez Scheme (opt) 29.6 **************************** Standard ML of NJ 6.6 ****** C++ 4.5 **** C++ (optimized) 3.2 ***
fib -- a doubly recursive computation of fib(30)quirk13 16.0 ************************* Chez Scheme 40.9 **********************************/.../*************** Chez Scheme (opt) 8.0 ************* Standard ML of NJ 4.5 ******* C++ 3.6 ****** C++ (optimized) 1.9 ***
selectionsort -- sorting an array of 2000 integersquirk13 13.0 *********************************** Chez Scheme 25.0 **********************************/.../*************** Chez Scheme (opt) 10.7 ***************************** Standard ML of NJ 3.6 ********** C++ 3.0 ******** C++ (optimized) 1.1 ***
Our naive code generation algorithm allocates over 2 million empty ribs for the selectionsort benchmark. This is completely useless. The trivial optimization that fixes this will reduce the CPU time from 13.0 to 4.7 seconds.
With naive code generation, the fib benchmark allocates over 64 Mby of heap storage. All of this storage could have been allocated on a stack instead.
What is the relative cost of heap versus stack allocation? With our allocator, it takes about 18 instructions to allocate an object on the heap. It takes only 2 instructions to allocate and deallocate an object on the runtime stack. Since 5385087 objects are allocated, we are executing 86161392 unnecessary instructions. These unnecessary instructions probably account for about half of the CPU time.
The loop benchmark is so small that it can be optimized by hand, with these CPU times:
8.6 naive code generation 7.6 plus peephole optimization 6.1 plus local optimization 4.1 plus one loop optimization (keep i in a register) 3.3 plus another loop optimization (cache @i in a register; write-through) 1.5 plus another loop optimization (cache @i in a register; write-back)
The last two optimizations above depend on an alias analysis.
The following CPU times, obtained on a SPARC, are included here
to show how much variation there can be from machine to machine
and from compiler to compiler. Notice, for example, that Chez
Scheme is now faster than unoptimized C++ on the fib
benchmark. Notice also the spread between the two Common Lisp
compilers (AKCL and Sun Common Lisp). Keep in mind that AKCL
might well be faster than Sun Common Lisp on a different kind of
benchmark.
fib -- a doubly recursive computation of fib(30)AKCL 19.58 ***************************************************** Chez Scheme (opt) 1.31 **** Larceny 1.10 *** Standard ML of NJ 1.98 ***** Sun Common Lisp 1.90 ***** g++ 1.43 **** g++ (optimized) 1.19 ***
selectionsort -- sorting an array of 2000 integersAKCL 17.01 **********************************/.../*************** Chez Scheme (opt) 2.78 ************************* Larceny 2.31 ******************** Standard ML of NJ 0.98 ********* Sun Common Lisp 0.70 ****** g++ 0.71 ****** g++ (optimized) 0.34 ***