turnstiles to implement blocking isntead of implementing a thread queue
directly. These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.
Turnstiles do not come out of a fixed-sized pool. Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads. The queue associated with a given lock
is found by a lookup in a simple hash table. The turnstile itself is
protected by a lock associated with its entry in the hash table. This
means that sched_lock is no longer needed to contest on a mutex. Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.
Currently turnstiles only support mutexes. Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation. For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.
The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win). Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.
Tested on: i386 SMP, sparc64 SMP, alpha SMP
thread's pid to make debugging easier for people who don't want to have to
use the intended tool for these panics (witness).
Indirectly prodded by: kris
locks held by each thread.
- Fix a bug in the original BSD/OS code where a contested lock was not
properly handed off from the old thread to the new thread when a
contested lock with more than one blocked thread was transferred from
one thread to another.
- Don't use an atomic operation to write the MTX_CONTESTED value to
mtx_lock in the aforementioned special case. The memory barriers and
exclusion provided by sched_lock are sufficient.
Spotted by: alc (2)
already own. The mtx_trylock() will fail however. Enhance the comment
at the top of the try lock function to explain this.
Requested by: jlemon and his evil netisr locking
- Declare some local variables at the top of the function instead of in a
nested block.
- Use mtx_owned() instead of masking off bits from mtx_lock manually.
- Read the value of mtx_lock into 'v' as a separate line rather than inside
an if statement for clarity. This code is hairy enough as it is.
owned. Previously the KASSERT would only trigger if we successfully
acquired a lock that we already held. However, _obtain_lock() fails to
acquire locks that we already hold, so the KASSERT was never checked in
the case it was supposed to fail.
o Always check for null when dereferencing the filename component.
o Implement a try-and-backoff method for allocating memory to
dump stats to avoid a spin-lock -> sleep-lock mutex lock order
panic with WITNESS.
Approved by: des, markm (mentor)
Not objected: jhb
does not require Giant.
This means that we may miss panics on a class of mutex programming bugs,
but only if running with a Chernobyl setting of debug-flags.
Spotted by: Pete Carah <pete@ns.altadena.net>
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c
Reviewed by: -arch
name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes. Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.
from stopping another thread from completing a syscall, and this allows it to
release its resources etc. Probably more related commits to follow (at least
one I know of)
Initial concept by: julian, dillon
Submitted by: davidxu
sleep mutexes and vice versa. WITNESS normally should catch this but
not everyone uses WITNESS so this is a fallback to catch nasty but easy
to do bugs.
log the start and end of periods during which mtx_lock() is waiting
to acquire a sleep mutex. The log message includes the file and
line of both the waiter and the holder.
Reviewed by: jhb, jake
since it breaks mtx_owned() on spin mutexes when used outside of
mtx_assert(). Unfortunately we currently use it in the i386 MD code
and in the sio(4) driver.
Reported by: bde
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
simple reads (and on IA32, a "pause" instruction for each interation of the
loop) to spin until either the mutex owner field changes, or the lock owner
stops executing.
Suggested by: tanimura
Tested on: i386
MI API with empty cpu_pause() functions on other arch's, but this
functionality is definitely unique to IA-32, so I decided to leave it
as i386-only and wrap it in #ifdef's. I should have dropped the cpu_
prefix when I made that decision.
Requested by: bde
Pentium 4's and newer IA32 processors. The "pause" instruction has been
verified by Intel to be a NOP on all currently existing IA32 processors
prior to the Pentium 4.
option is used (not on by default).
- In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set,
then we can safely read the thread pointer from the mtx_lock member while
holding sched_lock. We then examine the thread to see if it is currently
executing on another CPU. If it is, then we keep looping instead of
blocking.
- In the case of trying to unlock a mutex, it is now possible for a mutex
to have MTX_CONTESTED set in mtx_lock but to not have any threads
actually blocked on it, so we need to handle that case. In that case,
we just release the lock as if MTX_CONTESTED was not set and return.
- We do not adaptively spin on Giant as Giant is held for long times and
it slows SMP systems down to a crawl (it was taking several minutes,
like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when
they adaptively spinned on Giant).
- We only compile in the code to do this for SMP kernels, it doesn't make
sense for UP kernels.
Tested on: i386, alpha, sparc64
the generic lock type for use with witness. If this argument is NULL then
the lock name is used as the lock type. Add a macro for a lock type name
for network driver locks.
even when the number of records approaches the size of the hash table.
Besides, the previous implementation (using linear probing) was broken :)
Also, use the newly introduced MTX_SYSINIT.
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.
Tested on: alpha, i386
locks to be able to setup a SYSINIT call. This helps in places where
a lock is needed to protect some data, but the data is not truly
associated with a subsystem that can properly initialize it's lock.
The macros use the mtx_sysinit() and sx_sysinit() functions,
respectively, as the handler argument to SYSINIT().
Reviewed by: alfred, jhb, smp@
release times. Measurements are made and stored in nanoseconds but
presented in microseconds, which should be sufficient for the locks for
which we actually want this (those that are held long and / or often).
Also, rename some variables and structure members to unit-agnostic names.
following sysctl variables:
debug.mutex.prof.enable enable / disable profiling
debug.mutex.prof.acquisitions number of mutex acquisitions recorded
debug.mutex.prof.records number of acquisition points recorded
debug.mutex.prof.maxrecords max number of acquisition points
debug.mutex.prof.rejected number of rejections (due to full table)
debug.mutex.prof.hashsize hash size
debug.mutex.prof.collisions number of hash collisions
debug.mutex.prof.stats profiling statistics
The code records four numbers for each acquisition point (identified by
source file name and line number): longest time held, total time held,
number of non-recursive acquisitions, average time held. The measurements
are in clock cycles (as returned by get_cyclecount(9)); this may cause
measurements on some SMP systems to be unreliable. This can probably be
worked around by replacing get_cyclecount(9) by some incarnation of
nanotime(9).
This work was derived from initial patches by eivind.
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.
Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.
Approved by: jhb