that is specialized for float precision. The new polynomial has degree
5 instead of 11, and a maximum error of 2**-27.74 ulps instead
of 2**-30.64. This doesn't affect the final error significantly; the
maximum error was and is about 0.9101 ulps on amd64 -01 and the number
of cases with an error of > 0.5 ulps is actually reduced by epsilon
despite the larger error in the polynomial.
This is about 15% faster on amd64 (A64), i386 (A64) and ia64. The asm
version is still used instead of this on i386 since it is faster and
more accurate.