cff0a327b8
operation as a write barrier. That description has never been correct, and it has caused confusion. An acquire operation orders writes as well as reads, and a release operation orders reads as well as writes. Also, explicitly say that a thread doesn't see its own accesses being reordered. The reordering of a thread's accesses is only (potentially) visible to another thread. Thus, memory barriers need only be used to control the ordering of accesses between threads, not within a thread. Reviewed by: bde, kib Discussed with: jhb MFC after: 1 week
428 lines
11 KiB
Groff
428 lines
11 KiB
Groff
.\" Copyright (c) 2000-2001 John H. Baldwin <jhb@FreeBSD.org>
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR
|
|
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
|
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
|
|
.\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
|
|
.\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
|
.\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
|
.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
|
|
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd August 14, 2015
|
|
.Dt ATOMIC 9
|
|
.Os
|
|
.Sh NAME
|
|
.Nm atomic_add ,
|
|
.Nm atomic_clear ,
|
|
.Nm atomic_cmpset ,
|
|
.Nm atomic_fetchadd ,
|
|
.Nm atomic_load ,
|
|
.Nm atomic_readandclear ,
|
|
.Nm atomic_set ,
|
|
.Nm atomic_subtract ,
|
|
.Nm atomic_store
|
|
.Nd atomic operations
|
|
.Sh SYNOPSIS
|
|
.In sys/types.h
|
|
.In machine/atomic.h
|
|
.Ft void
|
|
.Fn atomic_add_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
|
|
.Ft void
|
|
.Fn atomic_clear_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
|
|
.Ft int
|
|
.Fo atomic_cmpset_[acq_|rel_]<type>
|
|
.Fa "volatile <type> *dst"
|
|
.Fa "<type> old"
|
|
.Fa "<type> new"
|
|
.Fc
|
|
.Ft <type>
|
|
.Fn atomic_fetchadd_<type> "volatile <type> *p" "<type> v"
|
|
.Ft <type>
|
|
.Fn atomic_load_acq_<type> "volatile <type> *p"
|
|
.Ft <type>
|
|
.Fn atomic_readandclear_<type> "volatile <type> *p"
|
|
.Ft void
|
|
.Fn atomic_set_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
|
|
.Ft void
|
|
.Fn atomic_subtract_[acq_|rel_]<type> "volatile <type> *p" "<type> v"
|
|
.Ft void
|
|
.Fn atomic_store_rel_<type> "volatile <type> *p" "<type> v"
|
|
.Ft <type>
|
|
.Fn atomic_swap_<type> "volatile <type> *p" "<type> v"
|
|
.Ft int
|
|
.Fn atomic_testandset_<type> "volatile <type> *p" "u_int v"
|
|
.Sh DESCRIPTION
|
|
Each of the atomic operations is guaranteed to be atomic across multiple
|
|
threads and in the presence of interrupts.
|
|
They can be used to implement reference counts or as building blocks for more
|
|
advanced synchronization primitives such as mutexes.
|
|
.Ss Types
|
|
Each atomic operation operates on a specific
|
|
.Fa type .
|
|
The type to use is indicated in the function name.
|
|
The available types that can be used are:
|
|
.Pp
|
|
.Bl -tag -offset indent -width short -compact
|
|
.It Li int
|
|
unsigned integer
|
|
.It Li long
|
|
unsigned long integer
|
|
.It Li ptr
|
|
unsigned integer the size of a pointer
|
|
.It Li 32
|
|
unsigned 32-bit integer
|
|
.It Li 64
|
|
unsigned 64-bit integer
|
|
.El
|
|
.Pp
|
|
For example, the function to atomically add two integers is called
|
|
.Fn atomic_add_int .
|
|
.Pp
|
|
Certain architectures also provide operations for types smaller than
|
|
.Dq Li int .
|
|
.Pp
|
|
.Bl -tag -offset indent -width short -compact
|
|
.It Li char
|
|
unsigned character
|
|
.It Li short
|
|
unsigned short integer
|
|
.It Li 8
|
|
unsigned 8-bit integer
|
|
.It Li 16
|
|
unsigned 16-bit integer
|
|
.El
|
|
.Pp
|
|
These must not be used in MI code because the instructions to implement them
|
|
efficiently might not be available.
|
|
.Ss Acquire and Release Operations
|
|
By default, a thread's accesses to different memory locations might not be
|
|
performed in
|
|
.Em program order ,
|
|
that is, the order in which the accesses appear in the source code.
|
|
To optimize the program's execution, both the compiler and processor might
|
|
reorder the thread's accesses.
|
|
However, both ensure that their reordering of the accesses is not visible to
|
|
the thread.
|
|
Otherwise, the traditional memory model that is expected by single-threaded
|
|
programs would be violated.
|
|
Nonetheless, other threads in a multithreaded program, such as the
|
|
.Fx
|
|
kernel, might observe the reordering.
|
|
Moreover, in some cases, such as the implementation of synchronization between
|
|
threads, arbitrary reordering might result in the incorrect execution of the
|
|
program.
|
|
To constrain the reordering that both the compiler and processor might perform
|
|
on a thread's accesses, the thread should use atomic operations with
|
|
.Em acquire
|
|
and
|
|
.Em release
|
|
semantics.
|
|
.Pp
|
|
Most of the atomic operations on memory have three variants.
|
|
The first variant performs the operation without imposing any ordering
|
|
constraints on memory accesses to other locations.
|
|
The second variant has acquire semantics, and the third variant has release
|
|
semantics.
|
|
In effect, operations with acquire and release semantics establish one-way
|
|
barriers to reordering.
|
|
.Pp
|
|
When an atomic operation has acquire semantics, the effects of the operation
|
|
must have completed before any subsequent load or store (by program order) is
|
|
performed.
|
|
Conversely, acquire semantics do not require that prior loads or stores have
|
|
completed before the atomic operation is performed.
|
|
To denote acquire semantics, the suffix
|
|
.Dq Li _acq
|
|
is inserted into the function name immediately prior to the
|
|
.Dq Li _ Ns Aq Fa type
|
|
suffix.
|
|
For example, to subtract two integers ensuring that subsequent loads and
|
|
stores happen after the subtraction is performed, use
|
|
.Fn atomic_subtract_acq_int .
|
|
.Pp
|
|
When an atomic operation has release semantics, the effects of all prior
|
|
loads or stores (by program order) must have completed before the operation
|
|
is performed.
|
|
Conversely, release semantics do not require that the effects of the
|
|
atomic operation must have completed before any subsequent load or store is
|
|
performed.
|
|
To denote release semantics, the suffix
|
|
.Dq Li _rel
|
|
is inserted into the function name immediately prior to the
|
|
.Dq Li _ Ns Aq Fa type
|
|
suffix.
|
|
For example, to add two long integers ensuring that all prior loads and
|
|
stores happen before the addition, use
|
|
.Fn atomic_add_rel_long .
|
|
.Pp
|
|
The one-way barriers provided by acquire and release operations allow the
|
|
implementations of common synchronization primitives to express their
|
|
ordering requirements without also imposing unnecessary ordering.
|
|
For example, for a critical section guarded by a mutex, an acquire operation
|
|
when the mutex is locked and a release operation when the mutex is unlocked
|
|
will prevent any loads or stores from moving outside of the critical
|
|
section.
|
|
However, they will not prevent the compiler or processor from moving loads
|
|
or stores into the critical section, which does not violate the semantics of
|
|
a mutex.
|
|
.Ss Multiple Processors
|
|
In multiprocessor systems, the atomicity of the atomic operations on memory
|
|
depends on support for cache coherence in the underlying architecture.
|
|
In general, cache coherence on the default memory type,
|
|
.Dv VM_MEMATTR_DEFAULT ,
|
|
is guaranteed by all architectures that are supported by
|
|
.Fx .
|
|
For example, cache coherence is guaranteed on write-back memory by the
|
|
.Tn amd64
|
|
and
|
|
.Tn i386
|
|
architectures.
|
|
However, on some architectures, cache coherence might not be enabled on all
|
|
memory types.
|
|
To determine if cache coherence is enabled for a non-default memory type,
|
|
consult the architecture's documentation.
|
|
.Ss Semantics
|
|
This section describes the semantics of each operation using a C like notation.
|
|
.Bl -hang
|
|
.It Fn atomic_add p v
|
|
.Bd -literal -compact
|
|
*p += v;
|
|
.Ed
|
|
.It Fn atomic_clear p v
|
|
.Bd -literal -compact
|
|
*p &= ~v;
|
|
.Ed
|
|
.It Fn atomic_cmpset dst old new
|
|
.Bd -literal -compact
|
|
if (*dst == old) {
|
|
*dst = new;
|
|
return (1);
|
|
} else
|
|
return (0);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_cmpset
|
|
functions are not implemented for the types
|
|
.Dq Li char ,
|
|
.Dq Li short ,
|
|
.Dq Li 8 ,
|
|
and
|
|
.Dq Li 16 .
|
|
.Bl -hang
|
|
.It Fn atomic_fetchadd p v
|
|
.Bd -literal -compact
|
|
tmp = *p;
|
|
*p += v;
|
|
return (tmp);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_fetchadd
|
|
functions are only implemented for the types
|
|
.Dq Li int ,
|
|
.Dq Li long
|
|
and
|
|
.Dq Li 32
|
|
and do not have any variants with memory barriers at this time.
|
|
.Bl -hang
|
|
.It Fn atomic_load p
|
|
.Bd -literal -compact
|
|
return (*p);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_load
|
|
functions are only provided with acquire memory barriers.
|
|
.Bl -hang
|
|
.It Fn atomic_readandclear p
|
|
.Bd -literal -compact
|
|
tmp = *p;
|
|
*p = 0;
|
|
return (tmp);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_readandclear
|
|
functions are not implemented for the types
|
|
.Dq Li char ,
|
|
.Dq Li short ,
|
|
.Dq Li ptr ,
|
|
.Dq Li 8 ,
|
|
and
|
|
.Dq Li 16
|
|
and do not have any variants with memory barriers at this time.
|
|
.Bl -hang
|
|
.It Fn atomic_set p v
|
|
.Bd -literal -compact
|
|
*p |= v;
|
|
.Ed
|
|
.It Fn atomic_subtract p v
|
|
.Bd -literal -compact
|
|
*p -= v;
|
|
.Ed
|
|
.It Fn atomic_store p v
|
|
.Bd -literal -compact
|
|
*p = v;
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_store
|
|
functions are only provided with release memory barriers.
|
|
.Bl -hang
|
|
.It Fn atomic_swap p v
|
|
.Bd -literal -compact
|
|
tmp = *p;
|
|
*p = v;
|
|
return (tmp);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_swap
|
|
functions are not implemented for the types
|
|
.Dq Li char ,
|
|
.Dq Li short ,
|
|
.Dq Li ptr ,
|
|
.Dq Li 8 ,
|
|
and
|
|
.Dq Li 16
|
|
and do not have any variants with memory barriers at this time.
|
|
.Bl -hang
|
|
.It Fn atomic_testandset p v
|
|
.Bd -literal -compact
|
|
bit = 1 << (v % (sizeof(*p) * NBBY));
|
|
tmp = (*p & bit) != 0;
|
|
*p |= bit;
|
|
return (tmp);
|
|
.Ed
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn atomic_testandset
|
|
functions are only implemented for the types
|
|
.Dq Li int ,
|
|
.Dq Li long
|
|
and
|
|
.Dq Li 32
|
|
and do not have any variants with memory barriers at this time.
|
|
.Pp
|
|
The type
|
|
.Dq Li 64
|
|
is currently not implemented for any of the atomic operations on the
|
|
.Tn arm ,
|
|
.Tn i386 ,
|
|
and
|
|
.Tn powerpc
|
|
architectures.
|
|
.Sh RETURN VALUES
|
|
The
|
|
.Fn atomic_cmpset
|
|
function returns the result of the compare operation.
|
|
The
|
|
.Fn atomic_fetchadd ,
|
|
.Fn atomic_load ,
|
|
.Fn atomic_readandclear ,
|
|
and
|
|
.Fn atomic_swap
|
|
functions return the value at the specified address.
|
|
The
|
|
.Fn atomic_testandset
|
|
function returns the result of the test operation.
|
|
.Sh EXAMPLES
|
|
This example uses the
|
|
.Fn atomic_cmpset_acq_ptr
|
|
and
|
|
.Fn atomic_set_ptr
|
|
functions to obtain a sleep mutex and handle recursion.
|
|
Since the
|
|
.Va mtx_lock
|
|
member of a
|
|
.Vt "struct mtx"
|
|
is a pointer, the
|
|
.Dq Li ptr
|
|
type is used.
|
|
.Bd -literal
|
|
/* Try to obtain mtx_lock once. */
|
|
#define _obtain_lock(mp, tid) \\
|
|
atomic_cmpset_acq_ptr(&(mp)->mtx_lock, MTX_UNOWNED, (tid))
|
|
|
|
/* Get a sleep lock, deal with recursion inline. */
|
|
#define _get_sleep_lock(mp, tid, opts, file, line) do { \\
|
|
uintptr_t _tid = (uintptr_t)(tid); \\
|
|
\\
|
|
if (!_obtain_lock(mp, tid)) { \\
|
|
if (((mp)->mtx_lock & MTX_FLAGMASK) != _tid) \\
|
|
_mtx_lock_sleep((mp), _tid, (opts), (file), (line));\\
|
|
else { \\
|
|
atomic_set_ptr(&(mp)->mtx_lock, MTX_RECURSE); \\
|
|
(mp)->mtx_recurse++; \\
|
|
} \\
|
|
} \\
|
|
} while (0)
|
|
.Ed
|
|
.Sh HISTORY
|
|
The
|
|
.Fn atomic_add ,
|
|
.Fn atomic_clear ,
|
|
.Fn atomic_set ,
|
|
and
|
|
.Fn atomic_subtract
|
|
operations were first introduced in
|
|
.Fx 3.0 .
|
|
This first set only supported the types
|
|
.Dq Li char ,
|
|
.Dq Li short ,
|
|
.Dq Li int ,
|
|
and
|
|
.Dq Li long .
|
|
The
|
|
.Fn atomic_cmpset ,
|
|
.Fn atomic_load ,
|
|
.Fn atomic_readandclear ,
|
|
and
|
|
.Fn atomic_store
|
|
operations were added in
|
|
.Fx 5.0 .
|
|
The types
|
|
.Dq Li 8 ,
|
|
.Dq Li 16 ,
|
|
.Dq Li 32 ,
|
|
.Dq Li 64 ,
|
|
and
|
|
.Dq Li ptr
|
|
and all of the acquire and release variants
|
|
were added in
|
|
.Fx 5.0
|
|
as well.
|
|
The
|
|
.Fn atomic_fetchadd
|
|
operations were added in
|
|
.Fx 6.0 .
|
|
The
|
|
.Fn atomic_swap
|
|
and
|
|
.Fn atomic_testandset
|
|
operations were added in
|
|
.Fx 10.0 .
|