Don't force 16-byte alignment at run-time. Do it at compile-time.
This saves us the pointer fiddling by the setjmp functions and
reduces complexity. While here, increase the jmp_buf by 16 bytes
to an even 512 bytes. Coincidentally, due to the way alignment
was handled prior to this change, the jmp_buf has not changed in
size, but only in how the space is used. Prior to this change
the 16 bytes were reserved for enforcing alignment; now they are
reserved by us for future extensions.
Therefore, this ABI breaker is relatively save: the failure is
always an alignment trap.