Bruce Evans f4b01a9edf Rearranged the polynomial evaluation to reduce dependencies, as in
k_tanf.c but with different details.

The polynomial is odd with degree 13 for tanf() and odd with degree
9 for sinf(), so the details are not very different for sinf() -- the
term with the x**11 and x**13 coefficients goes awaym and (mysteriously)
it helps to do the evaluation of w = z*z early although moving it later
was a key optimization for tanf().  The details are different but simpler
for cosf() because the polynomial is even and of lower degree.

On Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
an optimization of about 4 cycles (10%) in most cases (13% for sinf()
on AXP, but 0% for cosf() with gcc-3.3 -O1 on AXP).  The best case
(sinf() with gcc-3.4 -O1 -fcaller-saves on A64) now takes 33-39 cycles
(was 37-45 cycles).  Hardware sinf takes 74-129 cycles.  Despite
being fine tuned for Athlons, the optimization is even larger on
some other arches (about 15% on ia64 (pluto2) and 20% on alpha (beast)
with gcc -O2 -fomit-frame-pointer).
2005-11-30 11:51:17 +00:00
..
2005-04-22 18:57:32 +00:00
2005-07-13 10:40:07 +00:00
2005-02-13 23:45:54 +00:00
2005-10-04 22:00:35 +00:00
2005-11-24 10:30:44 +00:00
2005-11-24 10:32:39 +00:00
2005-11-24 10:43:35 +00:00
2004-12-29 02:18:24 +00:00
2005-11-19 04:47:06 +00:00
2005-11-17 13:00:00 +00:00
2004-12-21 10:49:29 +00:00
2005-11-24 10:54:47 +00:00
2005-11-19 04:47:06 +00:00
2005-06-04 10:48:21 +00:00
2005-11-24 11:14:06 +00:00
2004-12-21 10:16:04 +00:00
2005-09-26 06:23:43 +00:00
2005-11-24 11:26:36 +00:00