lwsync instruction, which does not provide Store/Load barrier. Fix
this by using "full" sync barrier for mb().
atomic_store_rel() does not need full barrier, change mb() call there
to the lwsync instruction if not hitting the known CPU erratas
(i.e. on 32bit). Provide powerpc_lwsync() helper to isolate the
lwsync/sync compile time selection, and use it in atomic_store_rel()
and several other places which duplicate the code.
Noted by: alc
Reviewed and tested by: nwhitehorn
Sponsored by: The FreeBSD Foundation