freebsd-dev/lib/msun
Bruce Evans 43590b1517 Optimize the 9pi/2 < |x| <= 2**19pi/2 case on amd64 and i386 by avoiding
the the double to int conversion operation which is very slow on these
arches.  Assume that the current rounding mode is the default of
round-to-nearest and use rounding operations in this mode instead of
faking this mode using the round-towards-zero mode for conversion to
int.  Round the double to an integer as a double first and as an int
second since the double result is needed much earler.

Double rounding isn't a problem since we only need a rough approximation.
We didn't support other current rounding modes and produce much larger
errors than before if called in a non-default mode.

This saves an average about 10 cycles on amd64 (A64) and about 25 on
i386 (A64) for x in the above range.  In some cases the saving is over
25%.  Most cases with |x| < 1000pi now take about 88 cycles for cos
and sin (with certain CFLAGS, etc.), except on i386 where cos and sin
(but not cosf and sinf) are much slower at 111 and 121 cycles respectivly
due to the compiler only optimizing well for float precision.  A64
hardware cos and sin are slower at 105 cycles on i386 and 110 cycles
on amd64.
2008-02-22 15:55:14 +00:00
..
amd64 Use hardware remainder on amd64 since it is 5 to 10 times faster than 2008-02-13 06:01:48 +00:00
arm Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
bsdsrc Eliminate some warnings. 2008-02-22 02:26:51 +00:00
i387 Implement rintl(), nearbyintl(), lrintl(), and llrintl(). 2008-01-14 02:12:07 +00:00
ia64 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
ld80 2 long double constants were missing L suffixes. This helped break tanl() 2008-02-18 15:39:52 +00:00
ld128 Add kernel functions for 128-bit long doubles. These could be improved 2008-02-17 07:32:31 +00:00
man Document return values better. 2008-02-18 19:02:49 +00:00
powerpc Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
sparc64 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
src Optimize the 9pi/2 < |x| <= 2**19pi/2 case on amd64 and i386 by avoiding 2008-02-22 15:55:14 +00:00
Makefile Add tgammaf() as a simple wrapper around tgamma(). 2008-02-18 17:27:11 +00:00
Symbol.map Add tgammaf() as a simple wrapper around tgamma(). 2008-02-18 17:27:11 +00:00