correct stack alignment, however when we have a leaf function that uses
thread local storage it calls __aeabi_read_tp to get the thread pointer.
Neither GCC or clang see this as a function call so will align the stack
to a 4-byte boundary. This may be a problem as _rtld_bind expects to be
on an 8-byte boundary.
The solution is to store a copy of the stack pointer and force the
alignment before calling _rtld_bind.
This fixes a problem with armeb where applications would crash in odd ways.
It should also remove the need for a local patch to clang to force the
stack alignment to an 8-byte boundary, even for leaf functions. Further
testing will be needed before reverting this local change to clang as we
may rely on it in other places.
Reviewed by: jmg@