73 Commits

Author SHA1 Message Date
Attilio Rao
702748e988 MFC r200447,201703,201709-201710:
In current code, threads performing an interruptible sleep
will leave the waiters flag on forcing the owner to do a wakeup even
when the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" queue while the other queue has real waiters on it,
because nobody is going to wakeup the 2nd queue waiters and they will
sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on.

Add a sleepqueue interface which does report the actual number of waiters
on a specified queue of a waitchannel and track if at least one sleepfail
waiter is present or not. In presence of this or empty "preferred" queue,
wakeup both waiters queues.

Discussed with:	kib
Tested by:	Pete French <petefrench at ticketswitch dot com>,
		Justin Head <justin at encarnate dot com>
2010-01-18 14:43:44 +00:00
Attilio Rao
3f4609ac69 MFC r197643, r197735:
When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.

Approved by:	re (kib)
2009-10-12 15:32:00 +00:00
Attilio Rao
c90c9ddddb Adaptive spinning for locking primitives, in read-mode, have some tuning
SYSCTLs which are inappropriate for a daily use of the machine (mostly
useful only by a developer which wants to run benchmarks on it).
Remove them before the release as long as we do not want to ship with
them in.

Now that the SYSCTLs are gone, instead than use static storage for some
constants, use real numeric constants in order to avoid eventual compiler
dumbiness and the risk to share a storage (and then a cache-line) among
CPUs when doing adaptive spinning together.

Pleasse note that the sys/linker_set.h inclusion in lockmgr and sx lock
support could have been gone, but re@ preferred them to be in order to
minimize the risk of problems on future merging.

Please note that this patch is not a MFC, but an 'edge case' as commit
directly to stable/8, which creates a diverging from HEAD.

Tested by:      Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Approved by:	re (kib)
2009-09-09 09:34:13 +00:00
Attilio Rao
db0c92ce82 MFC r196772:
fix adaptive spinning in lockmgr by using correctly GIANT_RESTORE and
continue statement and improve adaptive spinning for sx lock by just
doing once GIANT_SAVE.

Approved by:	re (kib)
2009-09-09 09:17:31 +00:00
Attilio Rao
7e2d0af9e0 MFC r196334:
* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
  a pointer-fetching specific operation check. Consequently, rename the
  operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
  directly alignment on the word boundry, for all the given specific
  architectures. That's a bit too strict for some common case, but it
  assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation

Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)
2009-08-17 16:33:53 +00:00
Bjoern A. Zeeb
21845eba48 MFC r196226:
Add a new macro to test that a variable could be loaded atomically.
  Check that the given variable is at most uintptr_t in size and that
  it is aligned.

  Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
        alignment -- however, the function of ALIGN() is to guarantee
        alignment, and therefore may lead to stronger alignment
        enforcement than necessary for types that are smaller than
        sizeof(uintptr_t).

  Add checks to mtx, rw and sx locks init functions to detect possible
  breakage. This was used during debugging of the problem fixed with
  r196118 where a pointer was on an un-aligned address in the dpcpu area.

  In collaboration with:  rwatson
  Reviewed by:            rwatson

Approved by:	re (kib)
2009-08-14 21:50:47 +00:00
Attilio Rao
f083018223 Handle lock recursion differenty by always checking against LO_RECURSABLE
instead the lock own flag itself.

Tested by:	pho
2009-06-02 13:03:35 +00:00
Attilio Rao
e31d083357 The patch for r193011 was partially rejected when applied, complete it. 2009-05-29 08:01:48 +00:00
Attilio Rao
1ae1c2a3bd Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.

Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.

This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).

KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.

Requested by:	jeff, kmacy
Reviewed by:	jeff
Tested by:	pho
2009-05-29 01:49:27 +00:00
Stacey Son
a5aedd68b4 Add the OpenSolaris dtrace lockstat provider. The lockstat provider
adds probes for mutexes, reader/writer and shared/exclusive locks to
gather contention statistics and other locking information for
dtrace scripts, the lockstat(1M) command and other potential
consumers.

Reviewed by:	attilio jhb jb
Approved by:	gnn (mentor)
2009-05-26 20:28:22 +00:00
Jeff Roberson
1723a06485 - Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks. 2009-03-15 08:03:54 +00:00
John Baldwin
413134305e Teach WITNESS about the interlocks used with lockmgr. This removes a bunch
of spurious witness warnings since lockmgr grew witness support.  Before
this, every time you passed an interlock to a lockmgr lock WITNESS treated
it as a LOR.

Reviewed by:	attilio
2008-09-10 19:13:30 +00:00
John Baldwin
da7bbd2c08 If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup().  When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup.  However, this required
grabbing proc0's thread lock to perform the wakeup.  If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked.  The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up.  The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with:	jeff
Glanced at by:	sam
Tested by:	Jurgen Weber  jurgen - ish com au
MFC after:	2 weeks
2008-08-05 20:02:31 +00:00
Attilio Rao
90356491d7 - Embed the recursion counter for any locking primitive directly in the
lock_object, using an unified field called lo_data.
- Replace lo_type usage with the w_name usage and at init time pass the
  lock "type" directly to witness_init() from the parent lock init
  function.  Handle delayed initialization before than
  witness_initialize() is called through the witness_pendhelp structure.
- Axe out LO_ENROLLPEND as it is not really needed.  The case where the
  mutex init delayed wants to be destroyed can't happen because
  witness_destroy() checks for witness_cold and panic in case.
- In enroll(), if we cannot allocate a new object from the freelist,
  notify that to userspace through a printf().
- Modify the depart function in order to return nothing as in the current
  CVS version it always returns true and adjust callers accordingly.
- Fix the witness_addgraph() argument name prototype.
- Remove unuseful code from itismychild().

This commit leads to a shrinked struct lock_object and so smaller locks,
in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are
gained.

Reviewed by:	jhb
2008-05-15 20:10:06 +00:00
Jeff Roberson
c5aa6b581d - Pass the priority argument from *sleep() into sleepq and down into
sched_sleep().  This removes extra thread_lock() acquisition and
   allows the scheduler to decide what to do with the static boost.
 - Change the priority arguments to cv_* to match sleepq/msleep/etc.
   where 0 means no priority change.  Catch -1 in cv_broadcastpri() and
   convert it to 0 for now.
 - Set a flag when sleeping in a way that is compatible with swapping
   since direct priority comparisons are meaningless now.
 - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
   controls the boost behavior.  Turning it off gives better performance
   in some workloads but needs more investigation.
 - While we're modifying sleepq, change signal and broadcast to both
   return with the lock held as the lock was held on enter.

Reviewed by:	jhb, peter
2008-03-12 06:31:06 +00:00
Jeff Roberson
eea4f254fe - Re-implement lock profiling in such a way that it no longer breaks
the ABI when enabled.  There is no longer an embedded lock_profile_object
   in each lock.  Instead a list of lock_profile_objects is kept per-thread
   for each lock it may own.  The cnt_hold statistic is now always 0 to
   facilitate this.
 - Support shared locking by tracking individual lock instances and
   statistics in the per-thread per-instance lock_profile_object.
 - Make the lock profiling hash table a per-cpu singly linked list with a
   per-cpu static lock_prof allocator.  This removes the need for an array
   of spinlocks and reduces cache contention between cores.
 - Use a seperate hash for spinlocks and other locks so that only a
   critical_enter() is required and not a spinlock_enter() to modify the
   per-cpu tables.
 - Count time spent spinning in the lock statistics.
 - Remove the LOCK_PROFILE_SHARED option as it is always supported now.
 - Specifically drop and release the scheduler locks in both schedulers
   since we track owners now.

In collaboration with:	Kip Macy
Sponsored by:	Nokia
2007-12-15 23:13:31 +00:00
Attilio Rao
f9721b43ed Expand lock class with the "virtual" function lc_assert which will offer
an unified way for all the lock primitives to express lock assertions.
Currenty, lockmgrs and rmlocks don't have assertions, so just panic in
that case.
This will be a base for more callout improvements.

Ok'ed by: jhb, jeff
2007-11-18 14:43:53 +00:00
Julian Elischer
431f890614 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
Pawel Jakub Dawidek
764a938b11 Fix sx_try_slock(), so it only fails when there is an exclusive owner.
Before that fix, it was possible for the function to fail if number
of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr()
call.

This fixes ZFS problem when ZFS returns strange EIO errors under load.
In ZFS there is a code that depends on the fact that sx_try_slock() can
only fail if there is an exclusive owner.

Discussed with:	attilio
Reviewed by:	jhb
Approved by:	re (kensmith)
2007-10-02 14:48:48 +00:00
Attilio Rao
c1a6d9fa42 Fix some problems with lock_profiling in sx locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
  more accurate and trustable
- Disable shared paths for lock_profiling.  Actually, lock_profiling has a
  subtle race which makes results caming from shared paths not completely
  trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
  re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
  lock_profiling
- Style fixes
- Add a CTASSERT for some flags building

Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
2007-07-06 13:20:44 +00:00
Attilio Rao
f9819486e5 Add functions sx_xlock_sig() and sx_slock_sig().
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.

Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.

Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)
2007-05-31 09:14:48 +00:00
Attilio Rao
2c7289cbfa style(9) fixes for sx locks.
Approved by: jeff (mentor)
2007-05-29 19:46:37 +00:00
Attilio Rao
acf840c4bd Add a small fix for lock profiling in sx locks.
"0" cannot be a correct value since when the function is entered at least
one shared holder must be present and since we want the last one "1" is
the correct value.
Note that lock_profiling for sx locks is far from being perfect.
Expect further fixes for that.

Approved by: jeff (mentor)
2007-05-29 19:34:32 +00:00
John Baldwin
7ec137e5b0 Rename the macros for assertion flags passed to sx_assert() from SX_* to
SA_* to match mutexes and rwlocks.  The old flags still exist for
backwards compatiblity.

Requested by:	attilio
2007-05-19 21:26:05 +00:00
John Baldwin
71a95881d4 Expose sx_xholder() as a public macro. It returns a pointer to the thread
that holds the current exclusive lock, or NULL if no thread holds an
exclusive lock.

Requested by:	pjd
2007-05-19 20:18:12 +00:00
John Baldwin
46a8b9cbc9 Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for the
assert in sx_init_flags().

Submitted by:	attilio
2007-05-19 18:34:24 +00:00
John Baldwin
b0d673254d Add a new SX_RECURSE flag to make support for recursive exclusive locks
conditional.  By default, sx(9) locks are back to not supporting recursive
exclusive locks.

Submitted by:	attilio
2007-05-19 16:35:27 +00:00
John Baldwin
da7d0d1e24 Fix a comment. 2007-05-18 15:05:41 +00:00
John Baldwin
c91fcee75d Move lock_profile_object_{init,destroy}() into lock_{init,destroy}(). 2007-05-18 15:04:59 +00:00
John Baldwin
0026c92c3e Add destroyed cookie values for sx locks and rwlocks as well as extra
KASSERTs so that any lock operations on a destroyed lock will panic or
hang.
2007-05-08 21:51:37 +00:00
Kip Macy
59a31e6acf fix typo 2007-04-04 00:11:22 +00:00
Kip Macy
e2bc106690 style fixes and make sure that the lock is treated as released in the sharers == 0 case
not that this is somewhat racy because a new sharer can come in while we're updating stats
2007-04-04 00:01:05 +00:00
Kip Macy
afc0bfbd90 Fixes to sx for newsx - fix recursed case and move out of inline
Submitted by: Attilio Rao <attilio@freebsd.org>
2007-04-03 22:58:21 +00:00
John Baldwin
4e7f640dfb Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
John Baldwin
aa89d8cd52 Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.
2007-03-21 21:20:51 +00:00
John Baldwin
6e21afd40c Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.
These functions are intended to be used to drop a lock and then reacquire
it when doing an sleep such as msleep(9).  Both functions accept a
'struct lock_object *' as their first parameter.  The 'lc_unlock' function
returns an integer that is then passed as the second paramter to the
subsequent 'lc_lock' function.  This can be used to communicate state.
For example, sx locks and rwlocks use this to indicate if the lock was
share/read locked vs exclusive/write locked.

Currently, spin mutexes and lockmgr locks do not provide working lc_lock
and lc_unlock functions.
2007-03-09 16:27:11 +00:00
John Baldwin
ae8dde30c2 Use C99-style struct member initialization for lock classes. 2007-03-09 16:04:44 +00:00
Kip Macy
c66d760608 lock stats updates need to be protected by the lock 2007-03-02 07:21:20 +00:00
Kip Macy
a5bceb77f2 Evidently I've overestimated gcc's ability to peak inside inline functions
and optimize away unused stack values. The 48 bytes that the lock_profile_object
adds to the stack evidently has a measurable performance impact on certain workloads.
2007-03-01 09:35:48 +00:00
Kip Macy
f183910b97 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Kip Macy
61bd5e21b3 track lock class name in a way that doesn't break WITNESS 2006-11-13 05:41:46 +00:00
Kip Macy
7c0435b933 MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile
wait (time waited to acquire) and hold times for *all* kernel locks. If
the architecture has a system synchronized TSC, the profiling code will
use that - thereby minimizing profiling overhead. Large chunks of profiling
code have been moved out of line, the overhead measured on the T1 for when
it is compiled in but not enabled is < 1%.

Approved by: scottl (standing in for mentor rwatson)
Reviewed by: des and jhb
2006-11-11 03:18:07 +00:00
John Baldwin
462a7add8e Add a new 'show sleepchain' ddb command similar to 'show lockchain' except
that it operates on lockmgr and sx locks.  This can be useful for tracking
down vnode deadlocks in VFS for example.  Note that this command is a bit
more fragile than 'show lockchain' as we have to poke around at the
wait channel of a thread to see if it points to either a struct lock or
a condition variable inside of a struct sx.  If td_wchan points to
something unmapped, then this command will terminate early due to a fault,
but no harm will be done.
2006-08-15 18:29:01 +00:00
John Baldwin
764e4d54e9 Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is
a count of all non-spin locks, not just lockmgr locks.  This can give us a
much cheaper way to see if we have any locks held (such as when returning
to userland via userret()) without requiring WITNESS.

MFC after:	1 week
2006-07-27 21:45:55 +00:00
John Baldwin
83a81bcb14 Add a new file (kern/subr_lock.c) for holding code related to struct
lock_obj objects:
- Add new lock_init() and lock_destroy() functions to setup and teardown
  lock_object objects including KTR logging and registering with WITNESS.
- Move all the handling of LO_INITIALIZED out of witness and the various
  lock init functions into lock_init() and lock_destroy().
- Remove the constants for static indices into the lock_classes[] array
  and change the code outside of subr_lock.c to use LOCK_CLASS to compare
  against a known lock class.
- Move the 'show lock' ddb function and lock_classes[] array out of
  kern_mutex.c over to subr_lock.c.
2006-01-17 16:55:17 +00:00
John Baldwin
3c6decc327 Trim another pointer from struct lock_object (and thus from struct mtx and
struct sx).  Instead of storing a direct pointer to a our lock_class
struct in lock_object, reserve 4 bits in the lo_flags field to serve as an
index into a global lock_classes array that contains pointers to the lock
classes.  Only debugging code such as WITNESS or INVARIANTS checks and KTR
logging need to access the lock_class member, so this shouldn't add any
overhead to production kernels.  It might add some slight overhead to
kernels using those debug options however.

As with the previous set of changes to lock_object, this is going to
completely obliterate the kernel ABI, so be sure to recompile all your
modules.
2006-01-06 18:07:32 +00:00
John Baldwin
d272fe53a4 Add a new 'show lock' command to ddb. If the argument has a valid lock
class, then it displays various information about the lock and calls a
new function pointer in lock_class (lc_ddb_show) to dump class-specific
information about the lock as well (such as the owner of a mutex or
xlock'ed sx lock).  This is easier than staring at hex dumps of locks to
figure out who owns the lock, etc.  Note that extending lock_class doesn't
affect the ABI for any kernel modules as the only code that deals with
lock_class structures directly is kern_mutex.c, kern_sx.c, and witness.

MFC after:	1 week
2005-12-13 23:14:35 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
John Baldwin
03129ba97f Fix _sx_assert() to panic() rather than printf() when an assertion fails
and ignore assertions if we have already paniced.
2004-02-27 16:13:44 +00:00