for TLS microbenchmark using global-dynamic TLS model on amd64 (which is
default for PIC dso objects).
Split the slow path into tls_get_addr_slow(), for which inlining is
disabled. This prevents the registers spill on tls_get_addr_common()
entry.
Provide static branch hint to the compiler, indicating that slow path
is not likely to be taken.
While there, do some minimal style adjustments.
Reported and tested by: davidxu
MFC after: 1 week