Some quick notes while everything's still fresh in my mind. These might also be useful when integrating this in core. === Integration of the numeric tower into the type system The basic idea in this code is that there are two distinct types of numbers: "basic" and "extended". The basic numbers are the fundamental ones that have always been known by CHICKEN core, with the new addition of bignums. The extended numbers exist *ONLY* in Scheme, which means that to C they're just structure/record objects (much like the way SRFI 4 vectors are currently handled in core as second-class citizens). This rule is broken in only a handful of places (eqv, nan?) for performance reasons. Those situations are generally when no complicated or allocating computations are required for extended numbers. In the CHICKEN 4 numbers egg, this is faked out because we can't truly extend the core number types, so bignums are structures as well. But for integration into core, this is changed to a true type by using one of the two remaining unused header tags. In intermediate versions of the "numbers" egg, we had to pass a failure continuation, which meant creating an extra closure object upon every call to numeric operations. But now, in order to avoid any performance impact, the Scheme procedures are invoked as an "exception", much like the way the error handler is invoked through barf() in cases of error. This allows us to only pass the arguments to a numeric operation, pretending the implementation is native C. === Performance impact I've tried very hard to keep the performance of basic numeric operations exactly the same as in core. In particular, the various checks for number types are done in exactly the same order as everywhere in core: - Is it a fixnum? - Is it an immediate? If so, barf. - Does the header have a flonum tag? (before, this was combined with the immediate check) - Does the header have a bignum tag? (normally, we'd have an "else barf()" at this point) - Look up the numeric operation's matching Scheme procedure for extended numeric types, and call it (or barf, if it's not defined for these) This means that "generic" numeric code should incur ZERO performance penalty for functions that are non-allocating and inlineable.... Unfortunately, that's where the good part ends. Any operation that results in a fresh number is no longer inlineable, because in case of bignums they will need to allocate an unknown quantity of memory, which may require a GC. The upshot is that every "allocating inline" procedure will now need to be called in primitive CPS context. This is fundamental limitation that we can't do much about. In addition, the comparison functions (=, <, >, <=, >=) are no longer inline. This is due to the fact that in order to correctly compare flonums, they need to be converted to a bignum and then compared. We *could* decide to rip this out, but that would result in unexpected things, like: (< 19000000000000000.0 19000000000000001) => #f or (= 19000000000000000.0 190000000000000001) => #t These are currently the case, too. This is due to precision loss from the fix->flo conversion (which means we drop from 62 bits to 54 bits). Because we _are_ comparing inexact numbers (which could already have lost information before comparing them), we could decide to ignore these edge cases and keep them like they are. For the "=" function that would mean it can remain inlined and non-allocating. For < and >, however, this doesn't help: in the case of ratnums we must multiply the numerator of x with the denominator of y and vice versa, and compare the results. This means we're stuck with an allocating, non-inlineable function. Because of this, I decided to keep the comparison functions all non-inlineable. Finally, the C implementation of the comparison functions as well as +, -, * and / are no longer vararg functions. Instead, the variadic part is handled in Scheme, and the C implementation only compares two numbers at a time. This shouldn't be too much of a performance impact considering they already have to be in CPS context anyway. Plus, calls with two operations can easily be rewritten to a direct call, which leads us to... ==== Specializations There's some light at the end of the tunnel: In critical number-crunching code, you'll usually be working with either integers or flonums (or you'd already be using the old numbers egg and everything would be shit-slow anyway). These two situations are catered to specifically by specialized versions. This is where the specialization/scrutiny stuff really shines: if we know something is a whole integer, we can use unsafe operations that only need to check a single bit to distinguish whether a number is fixnum or bignum, just like in the old situation we could this to distinguish between fixnum and flonum. This case should be easy to infer. There's one caveat: do not use the "/" division operator, because this may result in a ratnum. Instead, in fast code where you know you're dealing with integers, it's best to use "quotient", instead. And of course, if you use the trigonometric operations you may get a flonum, which will also result in the generic number functions being used. === Other random notes A diadic version of "gcd" has been pulled into C for performance reasons. This is because ratnums require calculation of the gcd as part of their normalization process. See also the impact of this function's performance on the performance of the cl-bench-bignum.scm code's run-pi-decimal/big part.