Document the semantics of atomic_thread_fence operations.

Add atomic_load_<type> and atomic_store_<type>, and explain why they exist. Define the synchronizes-with relationship and its effects. Reorder and revise some of the existing text. For example, more precisely describe when ordinary accesses are atomic. Reviewed by: jhb, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13522
2017-12-19 17:07:50 +00:00 · 2017-12-19 17:07:50 +00:00 · 22fd1b5dc6
commit 22fd1b5dc6
parent d6716aa2af
1 changed files with 116 additions and 60 deletions
--- a/share/man/man9/atomic.9
+++ b/share/man/man9/atomic.9
@ -23,7 +23,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd March 23, 2017
+.Dd December 19, 2017
 .Dt ATOMIC 9
 .Os
 .Sh NAME
@ -36,7 +36,8 @@
 .Nm atomic_readandclear ,
 .Nm atomic_set ,
 .Nm atomic_subtract ,
-.Nm atomic_store
+.Nm atomic_store ,
+.Nm atomic_thread_fence
 .Nd atomic operations
 .Sh SYNOPSIS
 .In sys/types.h
@ -60,7 +61,7 @@
 .Ft <type>
 .Fn atomic_fetchadd_<type> "volatile <type> *p" "<type> v"
 .Ft <type>
-.Fn atomic_load_acq_<type> "volatile <type> *p"
+.Fn atomic_load_[acq_]<type> "volatile <type> *p"
 .Ft <type>
 .Fn atomic_readandclear_<type> "volatile <type> *p"
 .Ft void
@ -68,19 +69,33 @@
 .Ft void
 .Fn atomic_subtract_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
 .Ft void
-.Fn atomic_store_rel_<type> "volatile <type> *p" "<type> v"
+.Fn atomic_store_[rel_]<type> "volatile <type> *p" "<type> v"
 .Ft <type>
 .Fn atomic_swap_<type> "volatile <type> *p" "<type> v"
 .Ft int
 .Fn atomic_testandclear_<type> "volatile <type> *p" "u_int v"
 .Ft int
 .Fn atomic_testandset_<type> "volatile <type> *p" "u_int v"
+.Ft void
+.Fn atomic_thread_fence_[acq|acq_rel|rel|seq_cst] "void"
 .Sh DESCRIPTION
-All of these operations are performed atomically across multiple
-threads and in the presence of interrupts, meaning that they are               
-performed in an indivisible manner from the perspective of concurrently
+Atomic operations are commonly used to implement reference counts and as
+building blocks for synchronization primitives, such as mutexes.
+.Pp
+All of these operations are performed
+.Em atomically
+across multiple threads and in the presence of interrupts, meaning that they
+are performed in an indivisible manner from the perspective of concurrently
 running threads and interrupt handlers.
 .Pp
+On all architectures supported by
+.Fx ,
+ordinary loads and stores of integers in cache-coherent memory are
+inherently atomic if the integer is naturally aligned and its size does not
+exceed the processor's word size.
+However, such loads and stores may be elided from the program by
+the compiler, whereas atomic operations are always performed.
+.Pp
 When atomic operations are performed on cache-coherent memory, all
 operations on the same location are totally ordered.
 .Pp
@ -93,29 +108,16 @@ interrupt handler will observe a
 .Em torn write ,
 or partial modification of the location.
 .Pp
-On all architectures supported by
-.Fx ,
-ordinary loads and stores of naturally aligned integer types
-are atomic, as executed by the processor.
-.Pp
-Atomic operations can be used to implement reference counts or as
-building blocks for synchronization primitives such as mutexes.
-.Pp
-The semantics of
-.Fx Ns 's
-atomic operations are almost identical to those of the similarly named
-C11 operations.
-The one important difference is that the C11 standard does not
-require ordinary loads and stores to ever be atomic.
-This is is why the
-.Fn atomic_load_explicit memory_order_relaxed
-operation exists in the C11 standard, but is not provided by
-.In machine/atomic.h .
+Except as noted below, the semantics of these operations are almost
+identical to the semantics of similarly named C11 atomic operations.
 .Ss Types
-Each atomic operation operates on a specific
+Most atomic operations act upon a specific
 .Fa type .
-The type to use is indicated in the function name.
-The available types that can be used are:
+That type is indicated in the function name.
+In contrast to C11 atomic operations,
+.Fx Ns 's
+atomic operations are performed on ordinary integer types.
+The available types are:
 .Pp
 .Bl -tag -offset indent -width short -compact
 .It Li int
@ -147,8 +149,7 @@ unsigned 8-bit integer
 unsigned 16-bit integer
 .El
 .Pp
-These must not be used in MI code because the instructions to implement them
-efficiently might not be available.
+These types must not be used in machine-independent code.
 .Ss Acquire and Release Operations
 By default, a thread's accesses to different memory locations might not be
 performed in
@ -167,52 +168,64 @@ Moreover, in some cases, such as the implementation of synchronization between
 threads, arbitrary reordering might result in the incorrect execution of the
 program.
 To constrain the reordering that both the compiler and processor might perform
-on a thread's accesses, the thread should use atomic operations with
+on a thread's accesses, a programmer can use atomic operations with
 .Em acquire
 and
 .Em release
 semantics.
 .Pp
-Most of the atomic operations on memory have three variants.
+Atomic operations on memory have up to three variants.
 The first variant performs the operation without imposing any ordering
 constraints on memory accesses to other locations.
 The second variant has acquire semantics, and the third variant has release
 semantics.
-In effect, operations with acquire and release semantics establish one-way
-barriers to reordering.
 .Pp
-When an atomic operation has acquire semantics, the effects of the operation
-must have completed before any subsequent load or store (by program order) is
+When an atomic operation has acquire semantics, the operation must have
+completed before any subsequent load or store (by program order) is
 performed.
 Conversely, acquire semantics do not require that prior loads or stores have
 completed before the atomic operation is performed.
+An atomic operation can only have acquire semantics if it performs a load
+from memory.
 To denote acquire semantics, the suffix
 .Dq Li _acq
 is inserted into the function name immediately prior to the
 .Dq Li _ Ns Aq Fa type
 suffix.
-For example, to subtract two integers ensuring that subsequent loads and
-stores happen after the subtraction is performed, use
+For example, to subtract two integers ensuring that the subtraction is
+completed before any subsequent loads and stores are performed, use
 .Fn atomic_subtract_acq_int .
 .Pp
-When an atomic operation has release semantics, the effects of all prior
-loads or stores (by program order) must have completed before the operation
-is performed.
-Conversely, release semantics do not require that the effects of the
-atomic operation must have completed before any subsequent load or store is
-performed.
+When an atomic operation has release semantics, all prior loads or stores
+(by program order) must have completed before the operation is performed.
+Conversely, release semantics do not require that the atomic operation must
+have completed before any subsequent load or store is performed.
+An atomic operation can only have release semantics if it performs a store
+to memory.
 To denote release semantics, the suffix
 .Dq Li _rel
 is inserted into the function name immediately prior to the
 .Dq Li _ Ns Aq Fa type
 suffix.
 For example, to add two long integers ensuring that all prior loads and
-stores happen before the addition, use
+stores are completed before the addition is performed, use
 .Fn atomic_add_rel_long .
 .Pp
-The one-way barriers provided by acquire and release operations allow the
-implementations of common synchronization primitives to express their
-ordering requirements without also imposing unnecessary ordering.
+When a release operation by one thread
+.Em synchronizes with
+an acquire operation by another thread, usually meaning that the acquire
+operation reads the value written by the release operation, then the effects
+of all prior stores by the releasing thread must become visible to
+subsequent loads by the acquiring thread.
+Moreover, the effects of all stores (by other threads) that were visible to
+the releasing thread must also become visible to the acquiring thread.
+These rules only apply to the synchronizing threads.
+Other threads might observe these stores in a different order.
+.Pp
+In effect, atomic operations with acquire and release semantics establish
+one-way barriers to reordering that enable the implementations of
+synchronization primitives to express their ordering requirements without
+also imposing unnecessary ordering.
 For example, for a critical section guarded by a mutex, an acquire operation
 when the mutex is locked and a release operation when the mutex is unlocked
 will prevent any loads or stores from moving outside of the critical
@ -220,6 +233,61 @@ section.
 However, they will not prevent the compiler or processor from moving loads
 or stores into the critical section, which does not violate the semantics of
 a mutex.
+.Ss Thread Fence Operations
+Alternatively, a programmer can use atomic thread fence operations to
+constrain the reordering of accesses.
+In contrast to other atomic operations, fences do not, themselves, access
+memory.
+.Pp
+When a fence has acquire semantics, all prior loads (by program order) must
+have completed before any subsequent load or store is performed.
+Thus, an acquire fence is a two-way barrier for load operations.
+To denote acquire semantics, the suffix
+.Dq Li _acq
+is appended to the function name, for example,
+.Fn atomic_thread_fence_acq .
+.Pp
+When a fence has release semantics, all prior loads or stores (by program
+order) must have completed before any subsequent store operation is
+performed.
+Thus, a release fence is a two-way barrier for store operations.
+To denote release semantics, the suffix
+.Dq Li _rel
+is appended to the function name, for example,
+.Fn atomic_thread_fence_rel .
+.Pp
+Although
+.Fn atomic_thread_fence_acq_rel
+implements both acquire and release semantics, it is not a full barrier.
+For example, a store prior to the fence (in program order) may be completed
+after a load subsequent to the fence.
+In contrast,
+.Fn atomic_thread_fence_seq_cst
+implements a full barrier.
+Neither loads nor stores may cross this barrier in either direction.
+.Pp
+In C11, a release fence by one thread synchronizes with an acquire fence by
+another thread when an atomic load that is prior to the acquire fence (by
+program order) reads the value written by an atomic store that is subsequent
+to the release fence.
+In constrast, in FreeBSD, because of the atomicity of ordinary, naturally
+aligned loads and stores, fences can also be synchronized by ordinary loads
+and stores.
+This simplifies the implementation and use of some synchronization
+primitives in
+.Fx .
+.Pp
+Since neither a compiler nor a processor can foresee which (atomic) load
+will read the value written by an (atomic) store, the ordering constraints
+imposed by fences must be more restrictive than acquire loads and release
+stores.
+Essentially, this is why fences are two-way barriers.
+.Pp
+Although fences impose more restrictive ordering than acquire loads and
+release stores, by separating access from ordering, they can sometimes
+facilitate more efficient implementations of synchronization primitives.
+For example, they can be used to avoid executing a memory barrier until a
+memory access shows that some condition is satisfied.
 .Ss Multiple Processors
 In multiprocessor systems, the atomicity of the atomic operations on memory
 depends on support for cache coherence in the underlying architecture.
@ -326,12 +394,6 @@ and do not have any variants with memory barriers at this time.
 .Bd -literal -compact
 return (*p);
 .Ed
-.El
-.Pp
-The
-.Fn atomic_load
-functions are only provided with acquire memory barriers.
-.Bl -hang
 .It Fn atomic_readandclear p
 .Bd -literal -compact
 tmp = *p;
@ -363,12 +425,6 @@ and do not have any variants with memory barriers at this time.
 .Bd -literal -compact
 *p = v;
 .Ed
-.El
-.Pp
-The
-.Fn atomic_store
-functions are only provided with release memory barriers.
-.Bl -hang
 .It Fn atomic_swap p v
 .Bd -literal -compact
 tmp = *p;