Symptoms of the problem included assembler warnings and
nondeterministic runtime behavior when a fe*() call that affects the
fpsr is closely followed by a float point op.
The bug (at least, I think it's a bug) is that gcc does not insert a
break between a volatile asm and a dependent instruction if the
volatile asm came from an inlined function. Volatile asms seem to be
fine in other circumstances, even without -mvolatile-asm-stop, so
perhaps the compiler adds the stop bits before inlining takes place.
The problem does not occur at -O0 because inlining is disabled, and it
doesn't happen at -O2 because -fschedule-insns2 knows better.
inputs. The trouble with replacing two floats with a double is that
the latter has 6 extra bits of precision, which actually hurts
accuracy in many cases. All of the constants are optimal when float
arithmetic is used, and would need to be recomputed to do this right.
Noticed by: bde (ucbtest)
results in a performance gain on the order of 10% for amd64 (sledge),
ia64 (pluto1), i386+SSE (Pentium 4), and sparc64 (panther), and a
negligible improvement for i386 without SSE. (The i386 port still
uses the hardware instruction, though.)
manpages. They are not very related, so separating them makes it
easier to add meaningful cross-references and extend some of the
descriptions.
- Move the part of math(3) that discusses IEEE 754 to the ieee(3)
manpage.
- Rearrange the list of functions into categories.
- Remove the ulps column. It was appropriate for only some
of the functions in the list, and correct for even fewer
of them.
- Add some new paragraphs, and remove some old ones about
NaNs that may do more harm than good.
- Document precisions other than double-precision.
flags, so they are not pure. Remove the __pure2 annotation from them.
I believe that the following routines and their float and long double
counterparts are the only ones here that can be __pure2:
copysign is* fabs finite fmax fmin fpclassify ilogb nan signbit
When gcc supports FENV_ACCESS, perhaps there will be a new annotation
that allows the other functions to be considered pure when FENV_ACCESS
is off.
Discussed with: bde
basically support this, subject to gcc's lack of FENV_ACCESS support.
In any case, the previous setting of math_errhandling to 0 is not
allowed by POSIX.
registers as volatile. Instructions that *wrote* to FP state were
already marked volatile, but apparently gcc has license to move
non-volatile asms past volatile asms. This broke amd64's feupdateenv
at -O2 due to a WAR conflict between fnstsw and fldenv there.