Use double precision for z and thus for the entire calculation of

exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication
and addition.  This fixes the code to match the comment that the maximum
error is 0.5010 ulps (except on machines that evaluate float expressions
in extra precision, e.g., i386's, where the evaluation was already
in extra precision).

Fix and expand the comment about use of double precision.

The relative roundoff error from evaluating p(z) in non-extra precision
was about 16 times larger than in exp2() because the interval length
is 16 times smaller.  Its maximum was at least P1 * (1.0 ulps) *
max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the
addition in (1 + P1*z) with a cancelation error when z ~= -1/32).  The
actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have
come from the additional roundoff error in p(z).  I can't explain why
the additional roundoff error was almost 3/2 times larger than the rough
estimate.
This commit is contained in:
Bruce Evans 2008-02-11 05:20:02 +00:00
parent 2b8d4f5bd4
commit 51f86873af
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=176168

View File

@ -82,7 +82,8 @@ static const double exp2ft[TBLSIZE] = {
*
* We compute exp2(i/TBLSIZE) via table lookup and exp2(z) via a
* degree-4 minimax polynomial with maximum error under 1.4 * 2**-33.
* Using double precision in the final calculation avoids roundoff error.
* Using double precision for everything except the reduction makes
* roundoff error insignificant and simplifies the scaling step.
*
* This method is due to Tang, but I do not use his suggested parameters:
*
@ -92,8 +93,8 @@ static const double exp2ft[TBLSIZE] = {
float
exp2f(float x)
{
double tv, twopk;
float t, z;
double tv, twopk, z;
float t;
uint32_t hx, htv, ix, i0;
int32_t k;