on the Variant II code, however arm64 uses Variant I. The former placed the
thread pointer after the data, pointing at the thread control block, while
the latter places these before said data.
Because of this we need to use the size of the previous entry to calculate
where to place the current entry. We also need to reserve 16 bytes at the
start for the thread control block.
This also fixes the value of TLS_TCB_SIZE to be correct. This is the size
of two unsigned longs, i.e. 2 * 8 bytes.
While here remove the bogus adjustment of the pointer in the
R_AARCH64_TLS_TPREL64 case. It should be the offset of the data relative
to the thread pointer, including the thread control block.
Sponsored by: ABT Systems Ltd