Fix some off-by-1 errors.

e_rem_pio2.c:
Float and double precision didn't work because init_jk[] was 1 too small.
It needs to be 2 larger than you might expect, and 1 larger than it was
for these precisions, since its test for recomputing needs a margin of
47 bits (almost 2 24-bit units).

init_jk[] seems to be barely enough for extended and quad precisions.
This hasn't been completely verified.  Callers now get about 24 bits
of extra precision for float, and about 19 for double, but only about
8 for extended and quad.  8 is not enough for callers that want to
produce extra-precision results, but current callers have rounding
errors of at least 0.8 ulps, so another 1/2**8 ulps of error from the
reduction won't affect them much.

Add a comment about some of the magic for init_jk[].

e_rem_pio2.c:
Double precision worked in practice because of a compensating off-by-1
error here.  Extended precision was asked for, and it executed exactly
the same code as the unbroken double precision.

e_rem_pio2f.c:
Float precision worked in practice because of a compensating off-by-1
error here.  Double precision was asked for, and was almost needed,
since the cosf() and sinf() callers want to produce extra-precision
results, at least internally so that their error is only 0.5009 ulps.
However, the extra precision provided by unbroken float precision is
enough, and the double-precision code has extra overheads, so the
off-by-1 error cost about 5% in efficiency on amd64 and i386.
This commit is contained in:
Bruce Evans 2008-02-25 11:43:20 +00:00
parent de625c605d
commit 0d1564b6c7
3 changed files with 11 additions and 6 deletions

View File

@ -182,7 +182,7 @@ __ieee754_rem_pio2(double x, double *y)
tx[2] = z; tx[2] = z;
nx = 3; nx = 3;
while(tx[nx-1]==zero) nx--; /* skip zero term */ while(tx[nx-1]==zero) nx--; /* skip zero term */
n = __kernel_rem_pio2(tx,y,e0,nx,2); n = __kernel_rem_pio2(tx,y,e0,nx,1);
if(hx<0) {y[0] = -y[0]; y[1] = -y[1]; return -n;} if(hx<0) {y[0] = -y[0]; y[1] = -y[1]; return -n;}
return n; return n;
} }

View File

@ -45,7 +45,7 @@ int
__ieee754_rem_pio2f(float x, float *y) __ieee754_rem_pio2f(float x, float *y)
{ {
double w,r,fn; double w,r,fn;
double tx[1],ty[2]; double tx[1],ty[1];
float z; float z;
int32_t e0,n,ix,hx; int32_t e0,n,ix,hx;
@ -77,7 +77,7 @@ __ieee754_rem_pio2f(float x, float *y)
e0 = (ix>>23)-150; /* e0 = ilogb(|x|)-23; */ e0 = (ix>>23)-150; /* e0 = ilogb(|x|)-23; */
SET_FLOAT_WORD(z, ix - ((int32_t)(e0<<23))); SET_FLOAT_WORD(z, ix - ((int32_t)(e0<<23)));
tx[0] = z; tx[0] = z;
n = __kernel_rem_pio2(tx,ty,e0,1,1); n = __kernel_rem_pio2(tx,ty,e0,1,0);
y[0] = ty[0]; y[0] = ty[0];
y[1] = ty[0] - y[0]; y[1] = ty[0] - y[0];
if(hx<0) {y[0] = -y[0]; y[1] = -y[1]; return -n;} if(hx<0) {y[0] = -y[0]; y[1] = -y[1]; return -n;}

View File

@ -78,8 +78,13 @@ __FBSDID("$FreeBSD$");
* Here is the description of some local variables: * Here is the description of some local variables:
* *
* jk jk+1 is the initial number of terms of ipio2[] needed * jk jk+1 is the initial number of terms of ipio2[] needed
* in the computation. The recommended value is 2,3,4, * in the computation. The minimum and recommended value
* 6 for single, double, extended,and quad. * for jk is 3,4,4,6 for single, double, extended, and quad.
* jk+1 must be 2 larger than you might expect so that our
* recomputation test works. (Up to 24 bits in the integer
* part (the 24 bits of it that we compute) and 23 bits in
* the fraction part may be lost to cancelation before we
* recompute.)
* *
* jz local integer variable indicating the number of * jz local integer variable indicating the number of
* terms of ipio2[] used. * terms of ipio2[] used.
@ -129,7 +134,7 @@ __FBSDID("$FreeBSD$");
#include "math.h" #include "math.h"
#include "math_private.h" #include "math_private.h"
static const int init_jk[] = {2,3,4,6}; /* initial value for jk */ static const int init_jk[] = {3,4,4,6}; /* initial value for jk */
/* /*
* Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi