babf3acb0e
cases which are used mainly by regression tests. As usual, the cutoff for tiny args was not correctly translated to float precision. It was 2**-54 but 2**-24 works. It must be about 2**-precision, since the error from approximating log(1+x) by x is about the same as |x|. Exhaustive testing shows that 2**-24 gives perfect rounding in round-to-nearest mode. Similarly for the cutoff for being small, except this is not used by so many other functions. It was 2**-29 but 2**-15 works. It must be a bit smaller than sqrt(2**-precision), since the error from approximating log(1+x) by x-x*x/2 is about the same as x*x. Exhaustive testing shows that 2**-15 gives a maximum error of 0.5052 ulps in round-to-nearest-mode. The algorithm for the general case is only good for 0.8388 ulps, so this is sufficient (but it loses slightly on i386 -- then extra precision gives 0.5032 ulps for the general case). While investigating this, I noticed that optimizing the usual case by falling into a middle case involving a simple polynomial evaluation (return x-x*x/2 instead of x here) is not such a good idea since it gives an enormous pessimization of tinier args on machines for which denormals are slow. Float x*x/2 is denormal when |x| ~< 2**-64 and x*x/2 is evaluated in float precision, so it can easily be denormal for normal x. This is even more interesting for general polynomial evaluations. Multiplying out large powers of x is normally a good optimization since it reduces dependencies, but it creates denormals starting with quite large x.