1994-08-19 09:40:01 +00:00
|
|
|
/* k_cosf.c -- float version of k_cos.c
|
|
|
|
* Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com.
|
2005-10-28 13:36:58 +00:00
|
|
|
* Debugged and optimized by Bruce D. Evans.
|
1994-08-19 09:40:01 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ====================================================
|
|
|
|
* Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
|
|
|
|
*
|
|
|
|
* Developed at SunPro, a Sun Microsystems, Inc. business.
|
|
|
|
* Permission to use, copy, modify, and distribute this
|
1995-05-30 05:51:47 +00:00
|
|
|
* software is freely granted, provided that this notice
|
1994-08-19 09:40:01 +00:00
|
|
|
* is preserved.
|
|
|
|
* ====================================================
|
|
|
|
*/
|
|
|
|
|
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
2005-11-28 04:58:57 +00:00
|
|
|
#ifndef INLINE_KERNEL_COSDF
|
2008-02-22 02:30:36 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
2005-11-21 04:57:12 +00:00
|
|
|
#endif
|
1994-08-19 09:40:01 +00:00
|
|
|
|
|
|
|
#include "math.h"
|
|
|
|
#include "math_private.h"
|
|
|
|
|
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
2005-11-28 04:58:57 +00:00
|
|
|
/* |cos(x) - c(x)| < 2**-34.1 (~[-5.37e-11, 5.295e-11]). */
|
|
|
|
static const double
|
2005-10-28 13:36:58 +00:00
|
|
|
one = 1.0,
|
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
2005-11-28 04:58:57 +00:00
|
|
|
C0 = -0x1ffffffd0c5e81.0p-54, /* -0.499999997251031003120 */
|
|
|
|
C1 = 0x155553e1053a42.0p-57, /* 0.0416666233237390631894 */
|
|
|
|
C2 = -0x16c087e80f1e27.0p-62, /* -0.00138867637746099294692 */
|
|
|
|
C3 = 0x199342e0ee5069.0p-68; /* 0.0000243904487962774090654 */
|
1994-08-19 09:40:01 +00:00
|
|
|
|
2009-06-03 08:16:34 +00:00
|
|
|
#ifndef INLINE_KERNEL_COSDF
|
|
|
|
extern
|
2005-11-21 04:57:12 +00:00
|
|
|
#endif
|
2009-06-03 08:16:34 +00:00
|
|
|
__inline float
|
Use only double precision for "kernel" cosf and sinf (except for
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
2005-11-28 04:58:57 +00:00
|
|
|
__kernel_cosdf(double x)
|
1994-08-19 09:40:01 +00:00
|
|
|
{
|
2005-11-30 11:51:17 +00:00
|
|
|
double r, w, z;
|
2005-10-26 12:36:18 +00:00
|
|
|
|
2005-11-30 11:51:17 +00:00
|
|
|
/* Try to optimize for parallel evaluation as in k_tanf.c. */
|
|
|
|
z = x*x;
|
|
|
|
w = z*z;
|
|
|
|
r = C2+z*C3;
|
|
|
|
return ((one+z*C0) + w*C1) + (w*z)*r;
|
1994-08-19 09:40:01 +00:00
|
|
|
}
|