1. fast simple type mutex.
2. __thread tls works.
3. asynchronous cancellation works ( using signal ).
4. thread synchronization is fully based on umtx, mainly, condition
variable and other synchronization objects were rewritten by using
umtx directly. those objects can be shared between processes via
shared memory, it has to change ABI which does not happen yet.
5. default stack size is increased to 1M on 32 bits platform, 2M for
64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.
Okayed by: jeff, mtm, rwatson, scottl
Note that the tp register (r13) is reserved as the TLS pointer in
the same way that that gp register (r1) is reserved as the global
pointer. This implementation uses the tp register to point to the
thread structure used by the threads implementation. This is not
in violation with the runtime specification provided the TLS is
a fixed distance from the thread structure. This is only an issue
when code used the __thread keyword to create TLS. This is not
supported at the moment.