Document the semantics of atomic_thread_fence operations.
Add atomic_load_<type> and atomic_store_<type>, and explain why they exist. Define the synchronizes-with relationship and its effects. Reorder and revise some of the existing text. For example, more precisely describe when ordinary accesses are atomic. Reviewed by: jhb, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13522
This commit is contained in:
parent
d6716aa2af
commit
22fd1b5dc6
@ -23,7 +23,7 @@
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd March 23, 2017
|
||||
.Dd December 19, 2017
|
||||
.Dt ATOMIC 9
|
||||
.Os
|
||||
.Sh NAME
|
||||
@ -36,7 +36,8 @@
|
||||
.Nm atomic_readandclear ,
|
||||
.Nm atomic_set ,
|
||||
.Nm atomic_subtract ,
|
||||
.Nm atomic_store
|
||||
.Nm atomic_store ,
|
||||
.Nm atomic_thread_fence
|
||||
.Nd atomic operations
|
||||
.Sh SYNOPSIS
|
||||
.In sys/types.h
|
||||
@ -60,7 +61,7 @@
|
||||
.Ft <type>
|
||||
.Fn atomic_fetchadd_<type> "volatile <type> *p" "<type> v"
|
||||
.Ft <type>
|
||||
.Fn atomic_load_acq_<type> "volatile <type> *p"
|
||||
.Fn atomic_load_[acq_]<type> "volatile <type> *p"
|
||||
.Ft <type>
|
||||
.Fn atomic_readandclear_<type> "volatile <type> *p"
|
||||
.Ft void
|
||||
@ -68,19 +69,33 @@
|
||||
.Ft void
|
||||
.Fn atomic_subtract_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
|
||||
.Ft void
|
||||
.Fn atomic_store_rel_<type> "volatile <type> *p" "<type> v"
|
||||
.Fn atomic_store_[rel_]<type> "volatile <type> *p" "<type> v"
|
||||
.Ft <type>
|
||||
.Fn atomic_swap_<type> "volatile <type> *p" "<type> v"
|
||||
.Ft int
|
||||
.Fn atomic_testandclear_<type> "volatile <type> *p" "u_int v"
|
||||
.Ft int
|
||||
.Fn atomic_testandset_<type> "volatile <type> *p" "u_int v"
|
||||
.Ft void
|
||||
.Fn atomic_thread_fence_[acq|acq_rel|rel|seq_cst] "void"
|
||||
.Sh DESCRIPTION
|
||||
All of these operations are performed atomically across multiple
|
||||
threads and in the presence of interrupts, meaning that they are
|
||||
performed in an indivisible manner from the perspective of concurrently
|
||||
Atomic operations are commonly used to implement reference counts and as
|
||||
building blocks for synchronization primitives, such as mutexes.
|
||||
.Pp
|
||||
All of these operations are performed
|
||||
.Em atomically
|
||||
across multiple threads and in the presence of interrupts, meaning that they
|
||||
are performed in an indivisible manner from the perspective of concurrently
|
||||
running threads and interrupt handlers.
|
||||
.Pp
|
||||
On all architectures supported by
|
||||
.Fx ,
|
||||
ordinary loads and stores of integers in cache-coherent memory are
|
||||
inherently atomic if the integer is naturally aligned and its size does not
|
||||
exceed the processor's word size.
|
||||
However, such loads and stores may be elided from the program by
|
||||
the compiler, whereas atomic operations are always performed.
|
||||
.Pp
|
||||
When atomic operations are performed on cache-coherent memory, all
|
||||
operations on the same location are totally ordered.
|
||||
.Pp
|
||||
@ -93,29 +108,16 @@ interrupt handler will observe a
|
||||
.Em torn write ,
|
||||
or partial modification of the location.
|
||||
.Pp
|
||||
On all architectures supported by
|
||||
.Fx ,
|
||||
ordinary loads and stores of naturally aligned integer types
|
||||
are atomic, as executed by the processor.
|
||||
.Pp
|
||||
Atomic operations can be used to implement reference counts or as
|
||||
building blocks for synchronization primitives such as mutexes.
|
||||
.Pp
|
||||
The semantics of
|
||||
.Fx Ns 's
|
||||
atomic operations are almost identical to those of the similarly named
|
||||
C11 operations.
|
||||
The one important difference is that the C11 standard does not
|
||||
require ordinary loads and stores to ever be atomic.
|
||||
This is is why the
|
||||
.Fn atomic_load_explicit memory_order_relaxed
|
||||
operation exists in the C11 standard, but is not provided by
|
||||
.In machine/atomic.h .
|
||||
Except as noted below, the semantics of these operations are almost
|
||||
identical to the semantics of similarly named C11 atomic operations.
|
||||
.Ss Types
|
||||
Each atomic operation operates on a specific
|
||||
Most atomic operations act upon a specific
|
||||
.Fa type .
|
||||
The type to use is indicated in the function name.
|
||||
The available types that can be used are:
|
||||
That type is indicated in the function name.
|
||||
In contrast to C11 atomic operations,
|
||||
.Fx Ns 's
|
||||
atomic operations are performed on ordinary integer types.
|
||||
The available types are:
|
||||
.Pp
|
||||
.Bl -tag -offset indent -width short -compact
|
||||
.It Li int
|
||||
@ -147,8 +149,7 @@ unsigned 8-bit integer
|
||||
unsigned 16-bit integer
|
||||
.El
|
||||
.Pp
|
||||
These must not be used in MI code because the instructions to implement them
|
||||
efficiently might not be available.
|
||||
These types must not be used in machine-independent code.
|
||||
.Ss Acquire and Release Operations
|
||||
By default, a thread's accesses to different memory locations might not be
|
||||
performed in
|
||||
@ -167,52 +168,64 @@ Moreover, in some cases, such as the implementation of synchronization between
|
||||
threads, arbitrary reordering might result in the incorrect execution of the
|
||||
program.
|
||||
To constrain the reordering that both the compiler and processor might perform
|
||||
on a thread's accesses, the thread should use atomic operations with
|
||||
on a thread's accesses, a programmer can use atomic operations with
|
||||
.Em acquire
|
||||
and
|
||||
.Em release
|
||||
semantics.
|
||||
.Pp
|
||||
Most of the atomic operations on memory have three variants.
|
||||
Atomic operations on memory have up to three variants.
|
||||
The first variant performs the operation without imposing any ordering
|
||||
constraints on memory accesses to other locations.
|
||||
The second variant has acquire semantics, and the third variant has release
|
||||
semantics.
|
||||
In effect, operations with acquire and release semantics establish one-way
|
||||
barriers to reordering.
|
||||
.Pp
|
||||
When an atomic operation has acquire semantics, the effects of the operation
|
||||
must have completed before any subsequent load or store (by program order) is
|
||||
When an atomic operation has acquire semantics, the operation must have
|
||||
completed before any subsequent load or store (by program order) is
|
||||
performed.
|
||||
Conversely, acquire semantics do not require that prior loads or stores have
|
||||
completed before the atomic operation is performed.
|
||||
An atomic operation can only have acquire semantics if it performs a load
|
||||
from memory.
|
||||
To denote acquire semantics, the suffix
|
||||
.Dq Li _acq
|
||||
is inserted into the function name immediately prior to the
|
||||
.Dq Li _ Ns Aq Fa type
|
||||
suffix.
|
||||
For example, to subtract two integers ensuring that subsequent loads and
|
||||
stores happen after the subtraction is performed, use
|
||||
For example, to subtract two integers ensuring that the subtraction is
|
||||
completed before any subsequent loads and stores are performed, use
|
||||
.Fn atomic_subtract_acq_int .
|
||||
.Pp
|
||||
When an atomic operation has release semantics, the effects of all prior
|
||||
loads or stores (by program order) must have completed before the operation
|
||||
is performed.
|
||||
Conversely, release semantics do not require that the effects of the
|
||||
atomic operation must have completed before any subsequent load or store is
|
||||
performed.
|
||||
When an atomic operation has release semantics, all prior loads or stores
|
||||
(by program order) must have completed before the operation is performed.
|
||||
Conversely, release semantics do not require that the atomic operation must
|
||||
have completed before any subsequent load or store is performed.
|
||||
An atomic operation can only have release semantics if it performs a store
|
||||
to memory.
|
||||
To denote release semantics, the suffix
|
||||
.Dq Li _rel
|
||||
is inserted into the function name immediately prior to the
|
||||
.Dq Li _ Ns Aq Fa type
|
||||
suffix.
|
||||
For example, to add two long integers ensuring that all prior loads and
|
||||
stores happen before the addition, use
|
||||
stores are completed before the addition is performed, use
|
||||
.Fn atomic_add_rel_long .
|
||||
.Pp
|
||||
The one-way barriers provided by acquire and release operations allow the
|
||||
implementations of common synchronization primitives to express their
|
||||
ordering requirements without also imposing unnecessary ordering.
|
||||
When a release operation by one thread
|
||||
.Em synchronizes with
|
||||
an acquire operation by another thread, usually meaning that the acquire
|
||||
operation reads the value written by the release operation, then the effects
|
||||
of all prior stores by the releasing thread must become visible to
|
||||
subsequent loads by the acquiring thread.
|
||||
Moreover, the effects of all stores (by other threads) that were visible to
|
||||
the releasing thread must also become visible to the acquiring thread.
|
||||
These rules only apply to the synchronizing threads.
|
||||
Other threads might observe these stores in a different order.
|
||||
.Pp
|
||||
In effect, atomic operations with acquire and release semantics establish
|
||||
one-way barriers to reordering that enable the implementations of
|
||||
synchronization primitives to express their ordering requirements without
|
||||
also imposing unnecessary ordering.
|
||||
For example, for a critical section guarded by a mutex, an acquire operation
|
||||
when the mutex is locked and a release operation when the mutex is unlocked
|
||||
will prevent any loads or stores from moving outside of the critical
|
||||
@ -220,6 +233,61 @@ section.
|
||||
However, they will not prevent the compiler or processor from moving loads
|
||||
or stores into the critical section, which does not violate the semantics of
|
||||
a mutex.
|
||||
.Ss Thread Fence Operations
|
||||
Alternatively, a programmer can use atomic thread fence operations to
|
||||
constrain the reordering of accesses.
|
||||
In contrast to other atomic operations, fences do not, themselves, access
|
||||
memory.
|
||||
.Pp
|
||||
When a fence has acquire semantics, all prior loads (by program order) must
|
||||
have completed before any subsequent load or store is performed.
|
||||
Thus, an acquire fence is a two-way barrier for load operations.
|
||||
To denote acquire semantics, the suffix
|
||||
.Dq Li _acq
|
||||
is appended to the function name, for example,
|
||||
.Fn atomic_thread_fence_acq .
|
||||
.Pp
|
||||
When a fence has release semantics, all prior loads or stores (by program
|
||||
order) must have completed before any subsequent store operation is
|
||||
performed.
|
||||
Thus, a release fence is a two-way barrier for store operations.
|
||||
To denote release semantics, the suffix
|
||||
.Dq Li _rel
|
||||
is appended to the function name, for example,
|
||||
.Fn atomic_thread_fence_rel .
|
||||
.Pp
|
||||
Although
|
||||
.Fn atomic_thread_fence_acq_rel
|
||||
implements both acquire and release semantics, it is not a full barrier.
|
||||
For example, a store prior to the fence (in program order) may be completed
|
||||
after a load subsequent to the fence.
|
||||
In contrast,
|
||||
.Fn atomic_thread_fence_seq_cst
|
||||
implements a full barrier.
|
||||
Neither loads nor stores may cross this barrier in either direction.
|
||||
.Pp
|
||||
In C11, a release fence by one thread synchronizes with an acquire fence by
|
||||
another thread when an atomic load that is prior to the acquire fence (by
|
||||
program order) reads the value written by an atomic store that is subsequent
|
||||
to the release fence.
|
||||
In constrast, in FreeBSD, because of the atomicity of ordinary, naturally
|
||||
aligned loads and stores, fences can also be synchronized by ordinary loads
|
||||
and stores.
|
||||
This simplifies the implementation and use of some synchronization
|
||||
primitives in
|
||||
.Fx .
|
||||
.Pp
|
||||
Since neither a compiler nor a processor can foresee which (atomic) load
|
||||
will read the value written by an (atomic) store, the ordering constraints
|
||||
imposed by fences must be more restrictive than acquire loads and release
|
||||
stores.
|
||||
Essentially, this is why fences are two-way barriers.
|
||||
.Pp
|
||||
Although fences impose more restrictive ordering than acquire loads and
|
||||
release stores, by separating access from ordering, they can sometimes
|
||||
facilitate more efficient implementations of synchronization primitives.
|
||||
For example, they can be used to avoid executing a memory barrier until a
|
||||
memory access shows that some condition is satisfied.
|
||||
.Ss Multiple Processors
|
||||
In multiprocessor systems, the atomicity of the atomic operations on memory
|
||||
depends on support for cache coherence in the underlying architecture.
|
||||
@ -326,12 +394,6 @@ and do not have any variants with memory barriers at this time.
|
||||
.Bd -literal -compact
|
||||
return (*p);
|
||||
.Ed
|
||||
.El
|
||||
.Pp
|
||||
The
|
||||
.Fn atomic_load
|
||||
functions are only provided with acquire memory barriers.
|
||||
.Bl -hang
|
||||
.It Fn atomic_readandclear p
|
||||
.Bd -literal -compact
|
||||
tmp = *p;
|
||||
@ -363,12 +425,6 @@ and do not have any variants with memory barriers at this time.
|
||||
.Bd -literal -compact
|
||||
*p = v;
|
||||
.Ed
|
||||
.El
|
||||
.Pp
|
||||
The
|
||||
.Fn atomic_store
|
||||
functions are only provided with release memory barriers.
|
||||
.Bl -hang
|
||||
.It Fn atomic_swap p v
|
||||
.Bd -literal -compact
|
||||
tmp = *p;
|
||||
|
Loading…
x
Reference in New Issue
Block a user