1. Strategy
2. Implementation
3. Mixed-representation arithmetic
4. Bignum implementation
Larceny has six representations for numbers:
fixnum exact fixed-precision integer, 29 bits signed bignum exact arbitrary-precision integer outside the fixnum range ratnum exact rational, represented as two exact integers flonum inexact rational, represented as an IEEE double-precision floating-point number rectnum exact complex number, represented as two exact rationals compnum inexact complex number, represented as two IEEE double-precision floating-point numbersFrom the outset it was decided that we should make certain kinds of generic arithmetic as fast as we could without having any type information. It was further decided that remaining kinds of generic arithmetic should be as fast as it was convenient to make them. The optimized cases were to be same-representation arithmetic on fixnums, flonums, and compnums (all operations that are not unary are binary; the compiler translates from e.g. (+ a b c) to (+ (+ a b) c)). The operations on these combinations of arguments that were to be optimized were primarily the common arithmetic operations. In addition, certain efficiency concessions were to be made when operating on one flonum and one compnum whose imaginary part is 0.0.
The strategy taken was to generate in-line code for + and -, by far the most common operations. The check that both operands are fixnums is performed, and if it succeeds, the operation is performed in-line. If the operation overflows or the tag-check fails, a millicode call is made to a subroutine for the operation.
In addition, in-line code is generated for the arithmetic ordering predicates and for some other predicates, although the exact set of optimized primitive operations has been changing over time; there is also the question of what kinds of optimizations the compiler does. For example, if the compiler replaces (abs x) with the equivalent of
(let ((t x)) (if (< t 0) (- t) t)then writing code generator macros that generate in-line code for abs is pretty silly, since the back-end won't ever get to use them.
In the future, we may also generate in-line code for *, quotient, and remainder; when Larceny was first designed it was pointless to do so because current systems did not implement multiplication and division in hardware.
taddcc %REG1, %REG2, %RESULT bvc,a L1 nop overflow/non-fixnum code goes here L1:The branch delay slot is usually filled with the instruction at L1, so in practice, a tagged add or subtract costs only an extra branch, assuming taddcc is as efficiently implemented as
When the millicode routine for an arithmetic operation is entered, the arguments are in the registers RESULT and SECOND. The millicode routine must figure out the representations of its arguments as quickly as possible, and then perform the desired operation; this is a dispatch on the two tags of the arguments. The dispatch is implemented as an open-coded decision tree:
(let ((tag1 (tag-of RESULT))) (cond ((= tag1 bytevector-like-tag) (let ((hdrtag1 (hdrtag-of RESULT))) (cond ((= hdrtag1 flonum-hdrtag) (if (= (tag-of SECOND) bytevector-like-tag) (cond ((= (hdrtag-of SECOND) flonum-hdrtag) flonum operation, in-line) ((and (= (hdrtag-of SECOND) compnum-hdrtag) (= 0.0 (imag-part SECOND))) flonum operation, in-line) (else slow case)))) ((= hdrtag1 compnum-hdrtag) (if (= (tag-of SECOND) bytevector-like-tag) (cond ((= (hdrtag-of SECOND) compnum-hdrtag) compnum operation, in-line) ((and (= (hdrtag-of SECOND) flonum-hdrtag) (= 0.0 (imag-part FIRST))) flonum operation, in-line) (else slow case)))) ((= hdrtag1 bignum-hdrtag) (if (and (= (tag-of SECOND) bytevector-like-tag) (= (hdrtag-of SECOND) bignum-hdrtag)) bignum operation, out-of-line slow case)) (else slow case)))) ((= tag1 vector-like-tag) rectnum and ratnum cases as above, out-of-line) ((= tag1 fixnum-tag) fixnum case, in-line) (else slow case)))In the above, the "slow case" makes a call to Scheme code that contains a dispatch procedure not unlike the one exibited here, but that also deals with non-numbers and mixed-representation arithmetic. Cases marked "out-of-line" above are implemented in Scheme; the dispatch procedure simply calls a known Scheme procedure via a vector installed in the run-time system at startup-time. This is also true for bignum operations, which are explained in more detail below.
Note the code for mixed flonum/compnum arithmetic where the imaginary part of the compnum is 0.0.
By carefully hand-coding the dispatch code in assembly language and tuning it, the dispatch is reasonably efficient, and is not the dominating factor in adding two flonums, for example (the time there being spent loading data and allocating the resulting flonum on the heap). For those really curious, here is the assembly code for generic add from Larceny v0.25.