self-consistent, there is no need in anything but compiler barrier in
the implementation of atomic_thread_fence_*() on ARMv5. Split
implementation of fences for ARMv4/5 and ARMv6; the former use
compiler barriers, the later also perform hardware barriers.
An issue which is fixed by the change is the faults from the CP15
coprocessor accesses in the user mode. This was uncovered by the
pthread_once() changes in r287556.
Reported by: Mattia Rossi <mattia.rossi.mailinglists@gmail.com>
Discussed with: alc, cognet, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week