The threaded rtld lock implementation is faster even in the single-threaded
case because it postpones signal handlers via THR_CRITICAL_ENTER and
THR_CRITICAL_LEAVE instead of calling sigprocmask(2).
As a result, exception handling becomes faster in single-threaded
applications linked with libthr.
Reviewed by: kib