36058c09e4
- add rm_try_rlock(). - add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while holding the write lock. - change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go through the lock/unlock in order to synchronize. As a side effect, this also avoids IPI to CPUs without any readers during rm_wlock. Discussed with: ups@, rwatson@ on arch@ Sponsored by: Isilon Systems, Inc.
368 lines
12 KiB
Groff
368 lines
12 KiB
Groff
.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org )
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd August 24, 2010
|
|
.Dt LOCKING 9
|
|
.Os
|
|
.Sh NAME
|
|
.Nm locking
|
|
.Nd kernel synchronization primitives
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Em FreeBSD
|
|
kernel is written to run across multiple CPUs and as such requires
|
|
several different synchronization primitives to allow the developers
|
|
to safely access and manipulate the many data types required.
|
|
.Ss Mutexes
|
|
Mutexes (also called "sleep mutexes") are the most commonly used
|
|
synchronization primitive in the kernel.
|
|
Thread acquires (locks) a mutex before accessing data shared with other
|
|
threads (including interrupt threads), and releases (unlocks) it afterwards.
|
|
If the mutex cannot be acquired, the thread requesting it will sleep.
|
|
Mutexes fully support priority propagation.
|
|
.Pp
|
|
See
|
|
.Xr mutex 9
|
|
for details.
|
|
.Ss Spin mutexes
|
|
Spin mutexes are variation of basic mutexes; the main difference between
|
|
the two is that spin mutexes never sleep - instead, they spin, waiting
|
|
for the thread holding the lock, which runs on another CPU, to release it.
|
|
Differently from ordinary mutex, spin mutexes disable interrupts when acquired.
|
|
Since disabling interrupts is expensive, they are also generally slower.
|
|
Spin mutexes should be used only when necessary, e.g. to protect data shared
|
|
with interrupt filter code (see
|
|
.Xr bus_setup_intr 9
|
|
for details).
|
|
.Ss Pool mutexes
|
|
With most synchronization primitives, such as mutexes, programmer must
|
|
provide a piece of allocated memory to hold the primitive.
|
|
For example, a mutex may be embedded inside the structure it protects.
|
|
Pool mutex is a variant of mutex without this requirement - to lock or unlock
|
|
a pool mutex, one uses address of the structure being protected with it,
|
|
not the mutex itself.
|
|
Pool mutexes are seldom used.
|
|
.Pp
|
|
See
|
|
.Xr mtx_pool 9
|
|
for details.
|
|
.Ss Reader/writer locks
|
|
Reader/writer locks allow shared access to protected data by multiple threads,
|
|
or exclusive access by a single thread.
|
|
The threads with shared access are known as
|
|
.Em readers
|
|
since they should only read the protected data.
|
|
A thread with exclusive access is known as a
|
|
.Em writer
|
|
since it may modify protected data.
|
|
.Pp
|
|
Reader/writer locks can be treated as mutexes (see above and
|
|
.Xr mutex 9 )
|
|
with shared/exclusive semantics.
|
|
More specifically, regular mutexes can be
|
|
considered to be equivalent to a write-lock on an
|
|
.Em rw_lock.
|
|
The
|
|
.Em rw_lock
|
|
locks have priority propagation like mutexes, but priority
|
|
can be propagated only to an exclusive holder.
|
|
This limitation comes from the fact that shared owners
|
|
are anonymous.
|
|
Another important property is that shared holders of
|
|
.Em rw_lock
|
|
can recurse, but exclusive locks are not allowed to recurse.
|
|
This ability should not be used lightly and
|
|
.Em may go away.
|
|
.Pp
|
|
See
|
|
.Xr rwlock 9
|
|
for details.
|
|
.Ss Read-mostly locks
|
|
Mostly reader locks are similar to
|
|
.Em reader/writer
|
|
locks but optimized for very infrequent write locking.
|
|
.Em Read-mostly
|
|
locks implement full priority propagation by tracking shared owners
|
|
using a caller-supplied
|
|
.Em tracker
|
|
data structure.
|
|
.Pp
|
|
See
|
|
.Xr rmlock 9
|
|
for details.
|
|
.Ss Shared/exclusive locks
|
|
Shared/exclusive locks are similar to reader/writer locks; the main difference
|
|
between them is that shared/exclusive locks may be held during unbounded sleep
|
|
(and may thus perform an unbounded sleep).
|
|
They are inherently less efficient than mutexes, reader/writer locks
|
|
and read-mostly locks.
|
|
They don't support priority propagation.
|
|
They should be considered to be closely related to
|
|
.Xr sleep 9 .
|
|
In fact it could in some cases be
|
|
considered a conditional sleep.
|
|
.Pp
|
|
See
|
|
.Xr sx 9
|
|
for details.
|
|
.Ss Counting semaphores
|
|
Counting semaphores provide a mechanism for synchronizing access
|
|
to a pool of resources.
|
|
Unlike mutexes, semaphores do not have the concept of an owner,
|
|
so they can be useful in situations where one thread needs
|
|
to acquire a resource, and another thread needs to release it.
|
|
They are largely deprecated.
|
|
.Pp
|
|
See
|
|
.Xr sema 9
|
|
for details.
|
|
.Ss Condition variables
|
|
Condition variables are used in conjunction with mutexes to wait for
|
|
conditions to occur.
|
|
A thread must hold the mutex before calling the
|
|
.Fn cv_wait* ,
|
|
functions.
|
|
When a thread waits on a condition, the mutex
|
|
is atomically released before the thread is blocked, then reacquired
|
|
before the function call returns.
|
|
.Pp
|
|
See
|
|
.Xr condvar 9
|
|
for details.
|
|
.Ss Giant
|
|
Giant is an instance of a mutex, with some special characteristics:
|
|
.Bl -enum
|
|
.It
|
|
It is recursive.
|
|
.It
|
|
Drivers and filesystems can request that Giant be locked around them
|
|
by not marking themselves MPSAFE.
|
|
Note that infrastructure to do this is slowly going away as non-MPSAFE
|
|
drivers either became properly locked or disappear.
|
|
.It
|
|
Giant must be locked first before other locks.
|
|
.It
|
|
It is OK to hold Giant while performing unbounded sleep; in such case,
|
|
Giant will be dropped before sleeping and picked up after wakeup.
|
|
.It
|
|
There are places in the kernel that drop Giant and pick it back up
|
|
again.
|
|
Sleep locks will do this before sleeping.
|
|
Parts of the network or VM code may do this as well, depending on the
|
|
setting of a sysctl.
|
|
This means that you cannot count on Giant keeping other code from
|
|
running if your code sleeps, even if you want it to.
|
|
.El
|
|
.Ss Sleep/wakeup
|
|
The functions
|
|
.Fn tsleep ,
|
|
.Fn msleep ,
|
|
.Fn msleep_spin ,
|
|
.Fn pause ,
|
|
.Fn wakeup ,
|
|
and
|
|
.Fn wakeup_one
|
|
handle event-based thread blocking.
|
|
If a thread must wait for an external event, it is put to sleep by
|
|
.Fn tsleep ,
|
|
.Fn msleep ,
|
|
.Fn msleep_spin ,
|
|
or
|
|
.Fn pause .
|
|
Threads may also wait using one of the locking primitive sleep routines
|
|
.Xr mtx_sleep 9 ,
|
|
.Xr rw_sleep 9 ,
|
|
or
|
|
.Xr sx_sleep 9 .
|
|
.Pp
|
|
The parameter
|
|
.Fa chan
|
|
is an arbitrary address that uniquely identifies the event on which
|
|
the thread is being put to sleep.
|
|
All threads sleeping on a single
|
|
.Fa chan
|
|
are woken up later by
|
|
.Fn wakeup ,
|
|
often called from inside an interrupt routine, to indicate that the
|
|
resource the thread was blocking on is available now.
|
|
.Pp
|
|
Several of the sleep functions including
|
|
.Fn msleep ,
|
|
.Fn msleep_spin ,
|
|
and the locking primitive sleep routines specify an additional lock
|
|
parameter.
|
|
The lock will be released before sleeping and reacquired
|
|
before the sleep routine returns.
|
|
If
|
|
.Fa priority
|
|
includes the
|
|
.Dv PDROP
|
|
flag, then the lock will not be reacquired before returning.
|
|
The lock is used to ensure that a condition can be checked atomically,
|
|
and that the current thread can be suspended without missing a
|
|
change to the condition, or an associated wakeup.
|
|
In addition, all of the sleep routines will fully drop the
|
|
.Va Giant
|
|
mutex
|
|
(even if recursed)
|
|
while the thread is suspended and will reacquire the
|
|
.Va Giant
|
|
mutex before the function returns.
|
|
.Pp
|
|
See
|
|
.Xr sleep 9
|
|
for details.
|
|
.Pp
|
|
.Ss Lockmanager locks
|
|
Shared/exclusive locks, used mostly in
|
|
.Xr VFS 9 ,
|
|
in particular as a
|
|
.Xr vnode 9
|
|
lock.
|
|
They have features other lock types don't have, such as sleep timeout,
|
|
writer starvation avoidance, draining, and interlock mutex, but this makes them
|
|
complicated to implement; for this reason, they are deprecated.
|
|
.Pp
|
|
See
|
|
.Xr lock 9
|
|
for details.
|
|
.Sh INTERACTIONS
|
|
The primitives interact and have a number of rules regarding how
|
|
they can and can not be combined.
|
|
Many of these rules are checked using the
|
|
.Xr witness 4
|
|
code.
|
|
.Ss Bounded vs. unbounded sleep
|
|
The following primitives perform bounded sleep: mutexes, pool mutexes,
|
|
reader/writer locks and read-mostly locks.
|
|
.Pp
|
|
The following primitives block (perform unbounded sleep): shared/exclusive locks,
|
|
counting semaphores, condition variables, sleep/wakeup and lockmanager locks.
|
|
.Pp
|
|
It is an error to do any operation that could result in any kind of sleep while
|
|
holding spin mutex.
|
|
.Pp
|
|
As a general rule, it is an error to do any operation that could result
|
|
in unbounded sleep while holding any primitive from the 'bounded sleep' group.
|
|
For example, it is an error to try to acquire shared/exclusive lock while
|
|
holding mutex, or to try to allocate memory with M_WAITOK while holding
|
|
read-write lock.
|
|
.Pp
|
|
As a special case, it is possible to call
|
|
.Fn sleep
|
|
or
|
|
.Fn mtx_sleep
|
|
while holding a single mutex.
|
|
It will atomically drop that mutex and reacquire it as part of waking up.
|
|
This is often a bad idea because it generally relies on the programmer having
|
|
good knowledge of all of the call graph above the place where
|
|
.Fn mtx_sleep
|
|
is being called and assumptions the calling code has made.
|
|
Because the lock gets dropped during sleep, one one must re-test all
|
|
the assumptions that were made before, all the way up the call graph to the
|
|
place where the lock was acquired.
|
|
.Pp
|
|
It is an error to do any operation that could result in any kind of sleep when
|
|
running inside an interrupt filter.
|
|
.Pp
|
|
It is an error to do any operation that could result in unbounded sleep when
|
|
running inside an interrupt thread.
|
|
.Ss Interaction table
|
|
The following table shows what you can and can not do while holding
|
|
one of the synchronization primitives discussed:
|
|
.Bl -column ".Ic xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
|
|
.It Xo
|
|
.Em "You have: You want:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep
|
|
.Xc
|
|
.It spin mtx Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3
|
|
.It mutex Ta \&ok Ta \&ok-1 Ta \&no Ta \&ok Ta \&ok Ta \&no-3
|
|
.It sx Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&ok Ta \&ok-4
|
|
.It rwlock Ta \&ok Ta \&ok Ta \&no Ta \&ok-2 Ta \&ok Ta \&no-3
|
|
.It rmlock Ta \&ok Ta \&ok Ta \&ok-5 Ta \&ok Ta \&ok-2 Ta \&ok-5
|
|
.El
|
|
.Pp
|
|
.Em *1
|
|
Recursion is defined per lock.
|
|
Lock order is important.
|
|
.Pp
|
|
.Em *2
|
|
Readers can recurse though writers can not.
|
|
Lock order is important.
|
|
.Pp
|
|
.Em *3
|
|
There are calls that atomically release this primitive when going to sleep
|
|
and reacquire it on wakeup (e.g.
|
|
.Fn mtx_sleep ,
|
|
.Fn rw_sleep
|
|
and
|
|
.Fn msleep_spin
|
|
).
|
|
.Pp
|
|
.Em *4
|
|
Though one can sleep holding an sx lock, one can also use
|
|
.Fn sx_sleep
|
|
which will atomically release this primitive when going to sleep and
|
|
reacquire it on wakeup.
|
|
.Pp
|
|
.Em *5
|
|
.Em Read-mostly
|
|
locks can be initialized to support sleeping while holding a write lock.
|
|
See
|
|
.Xr rmlock 9
|
|
for details.
|
|
.Ss Context mode table
|
|
The next table shows what can be used in different contexts.
|
|
At this time this is a rather easy to remember table.
|
|
.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
|
|
.It Xo
|
|
.Em "Context:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep
|
|
.Xc
|
|
.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
|
|
.It interrupt thread: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no
|
|
.It callout: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no
|
|
.It syscall: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr witness 4 ,
|
|
.Xr condvar 9 ,
|
|
.Xr lock 9 ,
|
|
.Xr mtx_pool 9 ,
|
|
.Xr mutex 9 ,
|
|
.Xr rmlock 9 ,
|
|
.Xr rwlock 9 ,
|
|
.Xr sema 9 ,
|
|
.Xr sleep 9 ,
|
|
.Xr sx 9 ,
|
|
.Xr LOCK_PROFILING 9
|
|
.Sh HISTORY
|
|
These
|
|
functions appeared in
|
|
.Bsx 4.1
|
|
through
|
|
.Fx 7.0
|
|
.Sh BUGS
|
|
There are too many locking primitives to choose from.
|