UTS with the stack correctly aligned. Also, while here, use an indirect
jump rather than the pushq/ret hack.
This fixes threaded apps that use floating point for me, although
it hasn't solved all the problems. It is an improvement though.
Preservation of the 128 byte red zone hasn't been resolved yet.
Approved by: re (scottl)