- When y/x is huge, it's faster and more accurate to return pi/2
instead of pi - pi/2.
- There's no need for 3 lines of bit fiddling to compute -z.
- Fix a comment.
- Adjust several constants for float precision. Some thresholds
that were appropriate for double precision were never changed
when these routines were converted to float precision. This
has an impact on performance but not accuracy. (Submitted by bde.)
- Reduce the degrees of the polynomials used. A smaller degree
suffices for float precision.
- In asinf(), use double arithmetic in part of the calculation to
avoid a corner case and some complicated arithmetic involving a
division and some buggy constants. This improves performance and
accuracy.
Max error (ulps):
asinf acosf atanf
before 0.925 0.782 0.852
after 0.743 0.804 0.852
As bde points out, it's cheaper for asin*() and acos*() to use
polynomials instead of rational functions, but that's a task for
another day.
spurious optimizations. gcc doesn't support FENV_ACCESS, so when it
folds constants, it assumes that the rounding mode is always the
default and floating point exceptions never matter.
We approximate pi with more than float precision using pi_hi+pi_lo in
the usual way (pi_hi is actually spelled pi in the source code), and
expect (float)0.5*pi_lo to give the low part of the corresponding
approximation for pi/2. However, the high part for pi/2 (pi_o_2) is
rounded to nearest, which happens to round up, while the high part for
pi was rounded down. Thus pi_o_2+(float)0.5*pi (in infinite precision)
was a very bad approximation for pi/2 -- the low term has the wrong
sign and increases the error drom less than half an ULP to a full ULP.
This fix rounds up instead of down for pi_hi. Consistently rounding
down instead of up should work, and is the method used in e_acosf.c
and e_asinf.c. The reason for the difference is that we sometimes
want to return precisely pi/2 in e_atan2f.c, so it is convenient to
have a correctly rounded (to nearest) value for pi/2 in a variable.
a_acosf.c and e_asinf.c also differ in directly approximating pi/2
instead pi; they multiply by 2.0 instead of dividing by 0.5 to convert
the approximation.
These complications are not directly visible in the double precision
versions because rounding to nearest happens to round down.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
-- Begin comments from J.T. Conklin:
The most significant improvement is the addition of "float" versions
of the math functions that take float arguments, return floats, and do
all operations in floating point. This doesn't help (performance)
much on the i386, but they are still nice to have.
The float versions were orginally done by Cygnus' Ian Taylor when
fdlibm was integrated into the libm we support for embedded systems.
I gave Ian a copy of my libm as a starting point since I had already
fixed a lot of bugs & problems in Sun's original code. After he was
done, I cleaned it up a bit and integrated the changes back into my
libm.
-- End comments
Reviewed by: jkh
Submitted by: jtc