Some quick notes while everything's still fresh in my mind.  These
might also be useful when integrating this in core.

=== Integration of the numeric tower into the type system

The basic idea in this code is that there are two distinct types of
numbers: "basic" and "extended".  The basic numbers are the
fundamental ones that have always been known by CHICKEN core, with the
new addition of bignums.  The extended numbers exist *ONLY* in Scheme,
which means that to C they're just structure/record objects (much like
the way SRFI 4 vectors are currently handled in core as second-class
citizens).  This rule is broken in only a handful of places (eqv,
nan?)  for performance reasons.  Those situations are generally when
no complicated or allocating computations are required for extended
numbers.

In the CHICKEN 4 numbers egg, this is faked out because we can't truly
extend the core number types, so bignums are structures as well.  But
for integration into core, this is changed to a true type by using one
of the two remaining unused header tags.

In intermediate versions of the "numbers" egg, we had to pass a
failure continuation, which meant creating an extra closure object
upon every call to numeric operations.  But now, in order to avoid any
performance impact, the Scheme procedures are invoked as an
"exception", much like the way the error handler is invoked through
barf() in cases of error.  This allows us to only pass the arguments
to a numeric operation, pretending the implementation is native C.

=== Performance impact

I've tried very hard to keep the performance of basic numeric
operations exactly the same as in core.  In particular, the various
checks for number types are done in exactly the same order as
everywhere in core:

- Is it a fixnum?
- Is it an immediate?  If so, barf.
- Does the header have a flonum tag? (before, this was combined with the immediate check)
- Does the header have a bignum tag? (normally, we'd have an "else barf()" at this point)
- Look up the numeric operation's matching Scheme procedure for extended numeric types, and call it (or barf, if it's not defined for these)

This means that "generic" numeric code should incur ZERO performance
penalty for functions that are non-allocating and inlineable....

Unfortunately, that's where the good part ends.  Any operation that
results in a fresh number is no longer inlineable, because in case of
bignums they will need to allocate an unknown quantity of memory,
which may require a GC.  The upshot is that every "allocating inline"
procedure will now need to be called in primitive CPS context.  This
is fundamental limitation that we can't do much about.

In addition, the comparison functions (=, <, >, <=, >=) are no longer
inline.  This is due to the fact that in order to correctly compare
flonums, they need to be converted to a bignum and then compared.  We
*could* decide to rip this out, but that would result in unexpected
things, like: (< 19000000000000000.0 19000000000000001) => #f or
(= 19000000000000000.0 190000000000000001) => #t
These are currently the case, too.  This is due to precision loss from
the fix->flo conversion (which means we drop from 62 bits to 54 bits).
Because we _are_ comparing inexact numbers (which could already have
lost information before comparing them), we could decide to ignore
these edge cases and keep them like they are.  For the "=" function
that would mean it can remain inlined and non-allocating.  For < and >,
however, this doesn't help: in the case of ratnums we must multiply the
numerator of x with the denominator of y and vice versa, and compare
the results.  This means we're stuck with an allocating, non-inlineable
function.  Because of this, I decided to keep the comparison functions
all non-inlineable.

Finally, the C implementation of the comparison functions as well as
+, -, * and / are no longer vararg functions.  Instead, the variadic
part is handled in Scheme, and the C implementation only compares two
numbers at a time.  This shouldn't be too much of a performance impact
considering they already have to be in CPS context anyway.  Plus,
calls with two operations can easily be rewritten to a direct call,
which leads us to...

==== Specializations

There's some light at the end of the tunnel: In critical
number-crunching code, you'll usually be working with either integers
or flonums (or you'd already be using the old numbers egg and
everything would be shit-slow anyway).  These two situations are
catered to specifically by specialized versions.  This is where the
specialization/scrutiny stuff really shines: if we know something is a
whole integer, we can use unsafe operations that only need to check a
single bit to distinguish whether a number is fixnum or bignum, just
like in the old situation we could this to distinguish between fixnum
and flonum.

This case should be easy to infer.  There's one caveat: do not use the
"/" division operator, because this may result in a ratnum.  Instead,
in fast code where you know you're dealing with integers, it's best to
use "quotient", instead.  And of course, if you use the trigonometric
operations you may get a flonum, which will also result in the generic
number functions being used.

=== Other random notes

A diadic version of "gcd" has been pulled into C for performance reasons.
This is because ratnums require calculation of the gcd as part of their
normalization process.  See also the impact of this function's performance
on the performance of the cl-bench-bignum.scm code's run-pi-decimal/big part.