Commit Graph

62 Commits

Author SHA1 Message Date
Mateusz Guzik
e2ab16b1a6 lockprof: move panic check after inspecting the state 2021-05-23 17:55:27 +00:00
Mateusz Guzik
6a467cc5e1 lockprof: pass lock type as an argument instead of reading the spin flag 2021-05-23 17:55:27 +00:00
Mateusz Guzik
a0842e69aa lockprof: add contested-only profiling
This allows tracking all wait times with much smaller runtime impact.

For example when doing -j 104 buildkernel on tmpfs:

no profiling:	2921.70s user 282.72s system 6598% cpu 48.562 total
all acquires:	2926.87s user 350.53s system 6656% cpu 49.237 total
contested only:	2919.64s user 290.31s system 6583% cpu 48.756 total
2021-05-22 19:28:37 +00:00
Mateusz Guzik
fca5cfd584 lockprof: retire lock_prof_skipcount
The implementation uses a global variable for *ALL* calls, defeating the
point of sampling in the first place. Remove it as it clearly remains
unused.
2021-05-22 19:28:37 +00:00
Edward Tomasz Napierala
7f6157f7fd lock_delay(9): improve interaction with restrict_starvation
After e7a5b3bd05, the la->delay value was adjusted after
being set by the starvation_limit code block, which is wrong.

Reported By:	avg
Reviewed By:	avg
Fixes:		e7a5b3bd05
Sponsored By:	NetApp, Inc.
Sponsored By:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29513
2021-04-03 13:08:53 +01:00
Edward Tomasz Napierala
e7a5b3bd05 Modify lock_delay() to increase the delay time after spinning
Modify lock_delay() to increase the delay time after spinning,
not before.  Previously we would spin at least twice instead of once.
In NetApp's benchmarks this fixes a performance regression compared
to FreeBSD 10, which called cpu_spinwait() directly.

Reviewed By:	mjg
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27331
2021-02-25 18:55:26 +00:00
Mateusz Guzik
eac22dd480 lockmgr: shrink struct lock by 8 bytes on LP64
Currently the struct has a 4 byte padding stemming from 3 ints.

1. prio comfortably fits in short, unfortunately there is no dedicated
   type for it and plumbing it throughout the codebase is not worth it
   right now, instead an assert is added which covers also flags for
   safety
2. lk_exslpfail can in principle exceed u_short, but the count is
   already not considered reliable and it only ever gets modified
   straight to 0. In other words it can be incrementing with an upper
   bound of USHRT_MAX

With these in place struct lock shrinks from 48 to 40 bytes.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D28680
2021-02-15 13:57:25 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
2e77cad11d locks: add default delay struct
Use it for all primitives. This makes everything fit in 8 bytes.
2020-01-05 12:48:19 +00:00
Mateusz Guzik
6b8dd26e7c locks: convert delay times to u_short
int is just a waste of space for this purpose.
2020-01-05 12:47:29 +00:00
Mateusz Guzik
3ac2ac2e08 lockprof: use IPI-injecetd fences to fix hangs on stat dump and reset
The previously used quiesce_all_cpus walks all CPUs and waits until curthread
can run on them. Even on contemporary machines this becomes a significant
problem under load when it can literally take minutes for the operation to
complete. With the patch the stall is normally less than 1 second.

Reviewed by:	kib, jeff (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21740
2019-11-30 17:24:42 +00:00
Mateusz Guzik
d2be3ef05c lockprof: move per-cpu data to dpcpu
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21747
2019-09-22 20:44:24 +00:00
Mateusz Guzik
cbba2cb367 lockprof: use CPUFOREACH and drop always false lp_cpu NULL checks
Sponsored by:	The FreeBSD Foundation
2019-09-21 19:05:38 +00:00
John Baldwin
2e43efd0bb Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
Mateusz Guzik
a045941bd2 locks: tweak backoff a little bit
Previous limits were chosen when locking primitives had spurious lock
accesses.

Flipping the starting point to 1 (or rather 2 as the first call shifts it)
provides a modest win when mild contention is seen while not hurting worse
cases. Tested on a bunch of one, two and four socket old and new systems
(Westmere, Skylake, Threadreaper and others) by doing concurrent page faults,
buildkernel/buildworld and other stuff (although not all systems got all the
tests).

Another thing is the upper limit. It is semi-arbitrarily chosen as it was
getting out of hand for slightly less small systems (e.g. a 128-thread one).

Note that backoff is fundamentally a speculative bandaid and this change just
makes it fit a little bit better. It remains completely oblivious to the
hardware topology or the contention pattern. This is being experimented with.
2018-04-08 16:34:10 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Mateusz Guzik
3c798b2b1f locks: follow up r313386
Unfinished diff was committed by accident. The loop in lock_delay
was changed to decrement, but the loop iterator was still incrementing.
2017-02-07 16:01:07 +00:00
Mateusz Guzik
8e5a3e9a9d locks: change backoff to exponential
Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.

Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.

Reviewed by:	kib (previous version)
2017-02-07 14:49:36 +00:00
Mateusz Guzik
290511163d Sprinkle __read_mostly on backoff and lock profiling code.
MFC after:	1 month
2017-01-27 15:03:51 +00:00
Mateusz Guzik
1ada904147 Implement trivial backoff for locking primitives.
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.

One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.

For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.

Current parameters are autotuned on boot based on mp_cpus.

Autotune factors are very conservative and are subject to change later.

Reviewed by:	kib, jhb
Tested by:	pho
MFC after:	1 week
2016-08-01 21:48:37 +00:00
Dmitry Chagin
fd07ddcf6f Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9).
A _NEW flag passed to _init_flags() to avoid check for double-init.

Differential Revision:	https://reviews.freebsd.org/D1208
Reviewed by:	jhb, wblock
MFC after:	1 Month
2014-12-13 21:00:10 +00:00
John Baldwin
e432d5f6a7 Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
John-Mark Gurney
3a6cdc4e55 fix spelling of lock_initialized.. jhb approved..
MFC after:	1 week
2014-01-28 17:27:54 +00:00
John Baldwin
cd32bd7ad1 Several improvements to rmlock(9). Many of these are based on patches
provided by Isilon.
- Add an rm_assert() supporting various lock assertions similar to other
  locking primitives.  Because rmlocks track readers the assertions are
  always fully accurate unlike rw_assert() and sx_assert().
- Flesh out the lock class methods for rmlocks to support sleeping via
  condvars and rm_sleep() (but only while holding write locks), rmlock
  details in 'show lock' in DDB, and the lc_owner method used by
  dtrace.
- Add an internal destroyed cookie so that API functions can assert
  that an rmlock is not destroyed.
- Make use of rm_assert() to add various assertions to the API (e.g.
  to assert locks are held when an unlock routine is called).
- Give RM_SLEEPABLE locks their own lock class and always use the
  rmlock's own lock_object with WITNESS.
- Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
  while holding a read lock on an rmlock.

Submitted by:	andre
Obtained from:	EMC/Isilon
2013-06-25 18:44:15 +00:00
Jeff Roberson
28d91af30f - Implement run-time expansion of the KTR buffer via sysctl.
- Implement a function to ensure that all preempted threads have switched
   back out at least once.  Use this to make sure there are no stale
   references to the old ktr_buf or the lock profiling buffers before
   updating them.

Reviewed by:	marius (sparc64 parts), attilio (earlier patch)
Sponsored by:	EMC / Isilon Storage Division
2012-11-15 00:51:57 +00:00
Andriy Gapon
353705930f panic: add a switch and infrastructure for stopping other CPUs in SMP case
Historical behavior of letting other CPUs merily go on is a default for
time being.  The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.

Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
  state

Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set.  That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts.  Those locks might be held by the stopped threads and would never
be released.  To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.

This change has substantial portions written and re-written by attilio
and kib at various times.  Other changes are heavily based on the ideas
and patches submitted by jhb and mdf.  bde has provided many insights
into the details and history of the current code.

The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console.  This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection.  Dumping to USB-connected disks may also be affected.

PR:			amd64/139614 (at least)
In cooperation with:	attilio, jhb, kib, mdf
Discussed with:		arch@, bde
Tested by:		Eugene Grosbein <eugen@grosbein.net>,
			gnn,
			Steven Hartland <killing@multiplay.co.uk>,
			glebius,
			Andrew Boyer <aboyer@averesystems.com>
			(various versions of the patch)
MFC after:		3 months (or never)
2011-12-11 21:02:01 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Matthew D Fleming
00f0e671ff Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9).  This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by:	rwatson
2011-01-27 00:34:12 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Rebecca Cran
b1ce21c6ef Fix typos.
PR:	bin/148894
Submitted by:	olgeni
2010-11-09 10:59:09 +00:00
Matthew D Fleming
4e6571599b Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
Matthew D Fleming
404a593e28 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
Matthew D Fleming
dd67e2103c Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00
Ed Schouten
60ae52f785 Use ISO C99 integer types in sys/kern where possible.
There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.
2010-06-21 09:55:56 +00:00
Andriy Gapon
e7154e7ef1 lock_profile_release_lock: do not compare unsigned with zero
Found by:	Coverity Prevent
CID:		3660
Reviewed by:	jhb
MFC after:	2 weeks
2010-06-17 10:15:13 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Jeff Roberson
2e6b8de462 - Implement a new mechanism for resetting lock profiling. We now
guarantee that all cpus have acknowledged the cleared enable int by
   scheduling the resetting thread on each cpu in succession.  Since all
   lock profiling happens within a critical section this guarantees that
   all cpus have left lock profiling before we clear the datastructures.
 - Assert that the per-thread queue of locks lock profiling is aware of
   is clear on thread exit.  There were several cases where this was not
   true that slows lock profiling and leaks information.
 - Remove all objects from all lists before clearing any per-cpu
   information in reset.  Lock profiling objects can migrate between
   per-cpu caches and previously these migrated objects could be zero'd
   before they'd been removed

Discussed with:	attilio
Sponsored by:	Nokia
2009-03-15 06:41:47 +00:00
Kip Macy
947265b6bd - track maximum wait time
- resize columns based on actual observed numerical values

MFC after:	3 days
2008-07-27 21:45:20 +00:00
Attilio Rao
90356491d7 - Embed the recursion counter for any locking primitive directly in the
lock_object, using an unified field called lo_data.
- Replace lo_type usage with the w_name usage and at init time pass the
  lock "type" directly to witness_init() from the parent lock init
  function.  Handle delayed initialization before than
  witness_initialize() is called through the witness_pendhelp structure.
- Axe out LO_ENROLLPEND as it is not really needed.  The case where the
  mutex init delayed wants to be destroyed can't happen because
  witness_destroy() checks for witness_cold and panic in case.
- In enroll(), if we cannot allocate a new object from the freelist,
  notify that to userspace through a printf().
- Modify the depart function in order to return nothing as in the current
  CVS version it always returns true and adjust callers accordingly.
- Fix the witness_addgraph() argument name prototype.
- Remove unuseful code from itismychild().

This commit leads to a shrinked struct lock_object and so smaller locks,
in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are
gained.

Reviewed by:	jhb
2008-05-15 20:10:06 +00:00
Attilio Rao
13ddf72de7 Really, no explicit checks against against lock_class_* object should be
done in consumers code: using locks properties is much more appropriate.
Fix current code doing these bogus checks.

Note: Really, callout are not usable by all !(LC_SPINLOCK | LC_SLEEPABLE)
primitives like rmlocks doesn't implement the generic lock layer
functions, but they can be equipped for this, so the check is still
valid.

Tested by: matteo, kris (earlier version)
Reviewed by: jhb
2008-02-06 00:04:09 +00:00
Kris Kennaway
357911ce77 Fix logic in skipcount handling (used to sample every 1/N lock operations
to reduce profiling overhead)
2008-01-08 01:11:40 +00:00
Jeff Roberson
0c66dc6758 - Pause a while after disabling lock profiling and before resetting it
to be sure that all participating CPUs have stopped updating it.
 - Restore the behavior of printing the name of the lock type in the output.
2007-12-31 03:45:51 +00:00
Jeff Roberson
eea4f254fe - Re-implement lock profiling in such a way that it no longer breaks
the ABI when enabled.  There is no longer an embedded lock_profile_object
   in each lock.  Instead a list of lock_profile_objects is kept per-thread
   for each lock it may own.  The cnt_hold statistic is now always 0 to
   facilitate this.
 - Support shared locking by tracking individual lock instances and
   statistics in the per-thread per-instance lock_profile_object.
 - Make the lock profiling hash table a per-cpu singly linked list with a
   per-cpu static lock_prof allocator.  This removes the need for an array
   of spinlocks and reduces cache contention between cores.
 - Use a seperate hash for spinlocks and other locks so that only a
   critical_enter() is required and not a spinlock_enter() to modify the
   per-cpu tables.
 - Count time spent spinning in the lock statistics.
 - Remove the LOCK_PROFILE_SHARED option as it is always supported now.
 - Specifically drop and release the scheduler locks in both schedulers
   since we track owners now.

In collaboration with:	Kip Macy
Sponsored by:	Nokia
2007-12-15 23:13:31 +00:00
Stephan Uphoff
f53d15fe1b Initial checkin for rmlock (read mostly lock) a multi reader single writer
lock optimized for almost exclusive reader access. (see also rmlock.9)

TODO:
    Convert to per cpu variables linkerset as soon as it is available.
    Optimize UP (single processor)  case.
2007-11-08 14:47:55 +00:00
Attilio Rao
4486adc51f Currently the LO_NOPROFILE flag (which is masked on upper level code by
per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is
not really honoured. In particular lock_profile_obtain_lock_failure() and
lock_profile_obtain_lock_success() are naked respect this flag.
The bug leads to locks marked with no-profiling to be profiled as well.
In the case of the clock_lock, used by the timer i8254 this leads to
unpredictable behaviour both on amd64 and ia32 (double faults panic,
sudden reboots, etc.). The amd64 clock_lock is also not marked as
not profilable as it should be.
Fix these bugs adding proper checks in the lock profiling code and at
clock_lock initialization time.

i8254 bug pointed out by: kris
Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it>
Approved by: jeff (mentor)
Approved by: re
2007-09-14 01:12:39 +00:00
Kris Kennaway
cdcc788a7e Revert some debugging KTRs that were added during development. 2007-06-03 18:24:31 +00:00
John Baldwin
c91fcee75d Move lock_profile_object_{init,destroy}() into lock_{init,destroy}(). 2007-05-18 15:04:59 +00:00
Kip Macy
8289600ce7 skip call to _lock_profile_obtain_lock_success entirely if acquisition time is non-zero
(i.e. recursing or adding sharers)
2007-04-03 18:36:27 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00