Use double precision for z and thus for the entire calculation of
exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the maximum error is 0.5010 ulps (except on machines that evaluate float expressions in extra precision, e.g., i386's, where the evaluation was already in extra precision). Fix and expand the comment about use of double precision. The relative roundoff error from evaluating p(z) in non-extra precision was about 16 times larger than in exp2() because the interval length is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) * max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have come from the additional roundoff error in p(z). I can't explain why the additional roundoff error was almost 3/2 times larger than the rough estimate.
This commit is contained in:
parent
c2724541ae
commit
a75ea6c233
@ -82,7 +82,8 @@ static const double exp2ft[TBLSIZE] = {
|
||||
*
|
||||
* We compute exp2(i/TBLSIZE) via table lookup and exp2(z) via a
|
||||
* degree-4 minimax polynomial with maximum error under 1.4 * 2**-33.
|
||||
* Using double precision in the final calculation avoids roundoff error.
|
||||
* Using double precision for everything except the reduction makes
|
||||
* roundoff error insignificant and simplifies the scaling step.
|
||||
*
|
||||
* This method is due to Tang, but I do not use his suggested parameters:
|
||||
*
|
||||
@ -92,8 +93,8 @@ static const double exp2ft[TBLSIZE] = {
|
||||
float
|
||||
exp2f(float x)
|
||||
{
|
||||
double tv, twopk;
|
||||
float t, z;
|
||||
double tv, twopk, z;
|
||||
float t;
|
||||
uint32_t hx, htv, ix, i0;
|
||||
int32_t k;
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user