UTS with the stack correctly aligned. Also, while here, use an indirect
jump rather than the pushq/ret hack.
This fixes threaded apps that use floating point for me, although
it hasn't solved all the problems. It is an improvement though.
Preservation of the 128 byte red zone hasn't been resolved yet.
Approved by: re (scottl)
about the fpu code here. It should be using fxsave/fxrstor instead of
saving/restoring the control word. The SSE registers are used a lot in
gcc generated code on amd64. I'm not sure how this all fits together
though.
pthread_md.h. This commit only moves the definition; it does not
change it for any of the platforms. This more easily allows 64-bit
architectures (in particular) to pick a slightly larger stack size.
state for amd64 was twice as large as necessary. Peter
recently fixed this, so the comment no longer applies.
Also, since the size of struct mcontext changed, adjust
the threads library version of get&set context to match.
FYI, any change layout/size change to any arch's struct
mcontext will likely need some minor changes in libpthread.
instead of long types for low-level locks.
Add prototypes for some internal libc functions that are
wrapped by the library as cancellation points.
Add memory barriers to alpha atomic swap functions (submitted
by davidxu).
Requested by: bde
archs that can (or are required to) have per-thread registers.
Tested on i386, amd64; marcel is testing on ia64 and will
have some follow-up commits.
Reviewed by: davidxu