This supersedes the fix for the old algorithm in rev.1.8 of k_cosf.c. I want this change mainly because it is an optimization. It helps make software cos[f](x) and sin[f](x) faster than the i387 hardware versions for small x. It is also a simplification, and reduces the maximum relative error for cosf() and sinf() on machines like amd64 from about 0.87 ulps to about 0.80 ulps. It was validated for cosf() and sinf() by exhaustive testing. Exhaustive testing is not possible for cos() and sin(), but ucbtest reports a similar reduction for the worst case found by non-exhaustive testing. ucbtest's non-exhaustive testing seems to be good enough to find problems in algorithms but not maximum relative errors when there are spikes. E.g., short runs of it find only 3 ulp error where the i387 hardware cos() has an error of about 2**40 ulps near pi/2.
80 lines
2.7 KiB
C
80 lines
2.7 KiB
C
|
|
/* @(#)k_cos.c 1.3 95/01/18 */
|
|
/*
|
|
* ====================================================
|
|
* Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
|
|
*
|
|
* Developed at SunSoft, a Sun Microsystems, Inc. business.
|
|
* Permission to use, copy, modify, and distribute this
|
|
* software is freely granted, provided that this notice
|
|
* is preserved.
|
|
* ====================================================
|
|
*/
|
|
|
|
#ifndef lint
|
|
static char rcsid[] = "$FreeBSD$";
|
|
#endif
|
|
|
|
/*
|
|
* __kernel_cos( x, y )
|
|
* kernel cos function on [-pi/4, pi/4], pi/4 ~ 0.785398164
|
|
* Input x is assumed to be bounded by ~pi/4 in magnitude.
|
|
* Input y is the tail of x.
|
|
*
|
|
* Algorithm
|
|
* 1. Since cos(-x) = cos(x), we need only to consider positive x.
|
|
* 2. if x < 2^-27 (hx<0x3e400000 0), return 1 with inexact if x!=0.
|
|
* 3. cos(x) is approximated by a polynomial of degree 14 on
|
|
* [0,pi/4]
|
|
* 4 14
|
|
* cos(x) ~ 1 - x*x/2 + C1*x + ... + C6*x
|
|
* where the remez error is
|
|
*
|
|
* | 2 4 6 8 10 12 14 | -58
|
|
* |cos(x)-(1-.5*x +C1*x +C2*x +C3*x +C4*x +C5*x +C6*x )| <= 2
|
|
* | |
|
|
*
|
|
* 4 6 8 10 12 14
|
|
* 4. let r = C1*x +C2*x +C3*x +C4*x +C5*x +C6*x , then
|
|
* cos(x) ~ 1 - x*x/2 + r
|
|
* since cos(x+y) ~ cos(x) - sin(x)*y
|
|
* ~ cos(x) - x*y,
|
|
* a correction term is necessary in cos(x) and hence
|
|
* cos(x+y) = 1 - (x*x/2 - (r - x*y))
|
|
* For better accuracy, rearrange to
|
|
* cos(x+y) ~ w + (tmp + (r-x*y))
|
|
* where w = 1 - x*x/2 and tmp is a tiny correction term
|
|
* (1 - x*x/2 == w + tmp exactly in infinite precision).
|
|
* The exactness of w + tmp in infinite precision depends on w
|
|
* and tmp having the same precision as x. If they have extra
|
|
* precision due to compiler bugs, then the extra precision is
|
|
* only good provided it is retained in all terms of the final
|
|
* expression for cos(). Retention happens in all cases tested
|
|
* under FreeBSD, so don't pessimize things by forcibly clipping
|
|
* any extra precision in w.
|
|
*/
|
|
|
|
#include "math.h"
|
|
#include "math_private.h"
|
|
|
|
static const double
|
|
one = 1.00000000000000000000e+00, /* 0x3FF00000, 0x00000000 */
|
|
C1 = 4.16666666666666019037e-02, /* 0x3FA55555, 0x5555554C */
|
|
C2 = -1.38888888888741095749e-03, /* 0xBF56C16C, 0x16C15177 */
|
|
C3 = 2.48015872894767294178e-05, /* 0x3EFA01A0, 0x19CB1590 */
|
|
C4 = -2.75573143513906633035e-07, /* 0xBE927E4F, 0x809C52AD */
|
|
C5 = 2.08757232129817482790e-09, /* 0x3E21EE9E, 0xBDB4B1C4 */
|
|
C6 = -1.13596475577881948265e-11; /* 0xBDA8FAE9, 0xBE8838D4 */
|
|
|
|
double
|
|
__kernel_cos(double x, double y)
|
|
{
|
|
double hz,z,r,w;
|
|
|
|
z = x*x;
|
|
r = z*(C1+z*(C2+z*(C3+z*(C4+z*(C5+z*C6)))));
|
|
hz = (float)0.5*z;
|
|
w = one-hz;
|
|
return w + (((one-w)-hz) + (z*r-x*y));
|
|
}
|