freebsd-dev/lib/msun
Bruce Evans 9e9d3bc9f1 Optimize for 3pi/4 <= |x| <= 9pi/4 in much the same way as for
pi/4 <= |x| <= 3pi/4.  Use the same branch ladder as for float precision.
Remove the optimization for |x| near pi/2 and don't do it near the
multiples of pi/2 in the newly optimized range, since it requires
fairly large code to handle only relativley few cases.  Ifdef out
optimization for |x| <= pi/4 since this case can't occur because it
is done in callers.

On amd64 (A64), for cos() and sin() with uniformly distributed args,
no cache misses, some parallelism in the caller, and good but not great
CC and CFLAGS, etc., this saves about 40 cycles or 38% in the newly
optimized range, or about 27% on average across the range |x| <= 2pi
(~65 cycles for most args, while the A64 hardware fcos and fsin take
~75 cycles for half the args and 125 cycles for the other half).  The
speedup for tan() is much smaller, especially relatively.  The speedup
on i386 (A64) is slightly smaller, especially relatively.  i386 is
still much slower than amd64 here (unlike in the float case where it
is slightly faster).
2008-02-19 15:30:58 +00:00
..
amd64 Use hardware remainder on amd64 since it is 5 to 10 times faster than 2008-02-13 06:01:48 +00:00
arm Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
bsdsrc Fix tgamma() on some special args: 2007-05-02 15:24:49 +00:00
i387 Implement rintl(), nearbyintl(), lrintl(), and llrintl(). 2008-01-14 02:12:07 +00:00
ia64 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
ld80 2 long double constants were missing L suffixes. This helped break tanl() 2008-02-18 15:39:52 +00:00
ld128 Add kernel functions for 128-bit long doubles. These could be improved 2008-02-17 07:32:31 +00:00
man Document return values better. 2008-02-18 19:02:49 +00:00
powerpc Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
sparc64 Use C comments since we now preprocess these files with CPP. 2007-04-29 14:05:22 +00:00
src Optimize for 3pi/4 <= |x| <= 9pi/4 in much the same way as for 2008-02-19 15:30:58 +00:00
Makefile Add tgammaf() as a simple wrapper around tgamma(). 2008-02-18 17:27:11 +00:00
Symbol.map Add tgammaf() as a simple wrapper around tgamma(). 2008-02-18 17:27:11 +00:00