8299eb7e3e
__ieee754_rem_pio2f() to its 3 callers and manually inline them. On Athlons, with favourable compiler flags and optimizations and favourable pipeline conditions, this gives a speedup of 30-40 cycles for cosf(), sinf() and tanf() on the range pi/4 < |x| <= 9pi/4, so thes functions are now signifcantly faster than the hardware trig functions in many cases. E.g., in a benchmark with uniformly distributed x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took 37-55 cycles. Out-of-order execution is needed to get both of these times. The optimizations in this commit apparently work more by removing 1 serialization point than by reducing latency. |
||
---|---|---|
.. | ||
alpha | ||
amd64 | ||
arm | ||
bsdsrc | ||
i387 | ||
ia64 | ||
man | ||
powerpc | ||
sparc64 | ||
src | ||
Makefile |