a7b8acac04
a maximum error of 2.905 ulps for cosf(), but the algorithm for cosf() is good for < 1 ulps and happens to give perfect rounding (< 0.5 ulps) near +-pi/2 except for the bug. The extra relative errors for tanf() were similar (slightly larger). The bug didn't affect sinf() since sinf'(+-pi/2) is 0. For range reduction in ~[-3pi/4, -pi/4] and ~[pi/4, 3pi/4] we must subtract +-pi/2 and the only complication is that this must be done in extra precision. We have handy 17+24-bit and 17+17+24-bit approximations to pi/2. If we always used the former then we would lose up to 24 bits of accuracy due to cancelation of leading bits, but we need to keep at least 24 bits plus a guard digit or 2, and should keep as many guard bits as efficiency permits. So we used the less-precise pi/2 not very near +-pi/2 and switched to using the more-precise pi/2 very near +-pi/2. However, we got the threshold for the switch wrong by allowing 19 bits to cancel, so we ended up with only 21 or 22 bits of accuracy in some cases, which is even worse than naively subtracting pi/2 would have done. Exhaustive checking shows that allowing only 17 bits to cancel (min. accuracy ~24 bits) is sufficient to reduce the maximum error for cosf() near +-pi/2 to 0.726 ulps, but allowing only 6 bits to cancel (min. accuracy ~35-bits) happens to give perfect rounding for cosf() at little extra cost so we prefer that. We actually (in effect) allow 0 bits to cancel and always use the 17+17+24-bit pi/2 (min. accuracy ~41 bits). This is simpler and probably always more efficient too. Classifying args to avoid using this pi/2 when it is not needed takes several extra integer operations and a branch, but just using it takes only 1 FP operation. The patch also fixes misspelling of 17 as 24 in many comments. For the double-precision version, the magic numbers include 33+53 bits for the less-precise pi/2 and (53-32-1 = 20) bits being allowed to cancel, so there are ~33-20 = 13 guard bits. This is sufficient except probably for perfect rounding. The more-precise pi/2 has 33+33+53 bits and we still waste time classifying args to avoid using it. The bug is apparently from mistranslation of the magic 32 in 53-32-1. The number of bits allowed to cancel is not critical and we use 32 for double precision because it allows efficient classification using a 32-bit comparison. For float precision, we must use an explicit mask, and there are fewer bits so there is less margin for error in their allocation. The 32 got reduced to 4 but should have been reduced almost in proportion to the reduction of mantissa bits. |
||
---|---|---|
.. | ||
alpha | ||
amd64 | ||
arm | ||
bsdsrc | ||
i387 | ||
ia64 | ||
man | ||
powerpc | ||
sparc64 | ||
src | ||
Makefile |