Add support for optimized (non-generic) atomic libcalls.
For integer types of sizes 1, 2, 4 and 8, libcompiler-rt (and libgcc)
provide atomic functions that pass parameters by value and return
results directly.
libgcc and libcompiler-rt only provide optimized libcalls for
__atomic_fetch_*, as generic libcalls on non-integer types would make
little sense. This means that we can finally make __atomic_fetch_*
work
on architectures for which we don't provide these operations as
builtins
(e.g. ARM).
This should fix the dreaded "cannot compile this atomic library call
yet" error that would pop up once every while.
This should make it possible for me to get C11 atomics working on all of
our platforms.