Use double precision for z and thus for the entire calculation of

exp2(i/TBLSIZE) * p(z) instead of only for the final multiplication and addition. This fixes the code to match the comment that the maximum error is 0.5010 ulps (except on machines that evaluate float expressions in extra precision, e.g., i386's, where the evaluation was already in extra precision). Fix and expand the comment about use of double precision. The relative roundoff error from evaluating p(z) in non-extra precision was about 16 times larger than in exp2() because the interval length is 16 times smaller. Its maximum was at least P1 * (1.0 ulps) * max(|z|) ~= log(2) * 1.0 * 1/32 ~= 0.0217 ulps (1.0 ulps from the addition in (1 + P1*z) with a cancelation error when z ~= -1/32). The actual final maximum was 0.5313 ulps, of which 0.0303 ulps must have come from the additional roundoff error in p(z). I can't explain why the additional roundoff error was almost 3/2 times larger than the rough estimate.
2008-02-11 05:20:02 +00:00 · 2008-02-11 05:20:02 +00:00 · a75ea6c233
commit a75ea6c233
parent c2724541ae
1 changed files with 4 additions and 3 deletions
--- a/lib/msun/src/s_exp2f.c
+++ b/lib/msun/src/s_exp2f.c
@ -82,7 +82,8 @@ static const double exp2ft[TBLSIZE] = {
 *
 *   We compute exp2(i/TBLSIZE) via table lookup and exp2(z) via a
 *   degree-4 minimax polynomial with maximum error under 1.4 * 2**-33.
- *   Using double precision in the final calculation avoids roundoff error.
+ *   Using double precision for everything except the reduction makes
+ *   roundoff error insignificant and simplifies the scaling step.
 *
 *   This method is due to Tang, but I do not use his suggested parameters:
 *
@ -92,8 +93,8 @@ static const double exp2ft[TBLSIZE] = {
 float
 exp2f(float x)
 {
-	double tv, twopk;
-	float t, z;
+	double tv, twopk, z;
+	float t;
 	uint32_t hx, htv, ix, i0;
 	int32_t k;