Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Unclear how, but the locking routine for mutexes was using the *release*
barrier instead of acquire. This must have been either a copy-pasto or bad
completion.
Going through other uses of atomics shows no barriers in:
- upgrade routines (addressed in this patch)
- sections protected with turnstile locks - this should be fine as necessary
barriers are in the worst case provided by turnstile unlock
I would like to thank Mark Millard and andreast@ for reporting the problem and
testing previous patches before the issue got identified.
ps.
.-'---`-.
,' `.
| \
| \
\ _ \
,\ _ ,'-,/-)\
( * \ \,' ,' ,'-)
`._,) -',-')
\/ ''/
) / /
/ ,'-'
Hardware provided by: IBM LTC
Since fcmpset can fail without lock contention e.g. on arm, it was possible
to get spurious failures when the caller was expecting the primitive to succeed.
Reported by: mmel
sx primitives use inlines as opposed to macros. Change the tested condition
to LOCK_DEBUG which covers the case, but is slightly overzelaous.
Reported by: kib
It is only needed if the LOCK_PROFILING is enabled. It has to always check if
the lock is about to be released which requires an avoidable read if the option
is not specified..
They all fallback to the slow path if necessary and the check is there.
This means a panicked kernel executing code from modules will be able to
succeed doing actual lock/unlock, but this was already the case for core code
which has said primitives inlined.
Update comments to note these functions are reachable if lockstat is
enabled.
Check if the lock has any bits set before attempting unlock, which saves
an unnecessary atomic operation.
Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.
Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.
Reviewed by: kib (previous version)
When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.
Read the value if necessary.
Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.
The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.
Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.
Interestingly, rwlock primitives are already not inlined.
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.
One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.
For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.
Current parameters are autotuned on boot based on mp_cpus.
Autotune factors are very conservative and are subject to change later.
Reviewed by: kib, jhb
Tested by: pho
MFC after: 1 week
Both variables are uint64_t, but they only count spins or sleeps.
All reasonable values which we can get here comfortably hit in 32-bit range.
Suggested by: kib
MFC after: 1 week
Inline version of primitives do an atomic op and if it fails they fallback to
actual primitives, which immediately retry the atomic op.
The obvious optimisation is to check if the lock is free and only then proceed
to do an atomic op.
Reviewed by: jhb, vangyzen
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3205
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2993
LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and
we don't want to recurse if lockstat probes are enabled.
PR: 201642
Reviewed by: avg
MFC after: 3 days
0. For spin events report time spent spinning, not a loop count.
While loop count is much easier and cheaper to obtain it is hard
to reason about the reported numbers, espcially for adaptive locks
where both spinning and sleeping can happen.
So, it's better to compare apples and apples.
1. Teach lockstat about FreeBSD rw locks.
This is done in part by changing the corresponding probes
and in part by changing what probes lockstat should expect.
2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD.
3. Report lock acquisition events for successful rw try-lock operations.
4. Teach lockstat about FreeBSD sx locks.
Reporting of events for those locks completely mirrors
rw locks.
5. Report spin and block events before acquisition event.
This is behavior documented for the upstream, so it makes sense to stick
to it. Note that because of FreeBSD adaptive lock implementations
both the spin and block events may be reported for the same acquisition
while the upstream reports only one of them.
Differential Revision: https://reviews.freebsd.org/D2727
Reviewed by: markj
MFC after: 17 days
Relnotes: yes
Sponsored by: ClusterHQ
A _NEW flag passed to _init_flags() to avoid check for double-init.
Differential Revision: https://reviews.freebsd.org/D1208
Reviewed by: jhb, wblock
MFC after: 1 Month
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].
[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].
Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip
shared-held locks for all the primitives in lc_lock/lc_unlock routines.
This fixes the problems introduced in r255747, which indeed introduced an
inversion in the logic.
Reported by: many
Tested by: bdrewery, pho, lme, Adam McDougall, O. Hartmann
Approved by: re (glebius)
current lock classes KPI it was really difficult because there was no
way to pass an rmtracker object to the lock/unlock routines. In order
to accomplish the task, modify the aforementioned functions so that
they can return (or pass as argument) an uinptr_t, which is in the rm
case used to hold a pointer to struct rm_priotracker for current
thread. As an added bonus, this fixes rm_sleep() in the rm shared
case, which right now can communicate priotracker structure between
lc_unlock()/lc_lock().
Suggested by: jhb
Reviewed by: jhb
Approved by: re (delphij)
- Call lock_init() first before setting any lock_object fields in
lock init routines. This way if the machine panics due to a duplicate
init the lock's original state is preserved.
- Somewhat similarly, don't decrement td_locks and td_slocks until after
an unlock operation has completed successfully.
interrupt context can still be idlethread. At that point, without the
panic condition, it can still happen that idlethread then will try to
acquire some locks to carry on some operations.
Skip the idlethread check on block/sleep lock operations when KDB is
active.
Reported by: jh
Tested by: jh
MFC after: 1 week
trap checks (eg. printtrap()).
Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.
Reviewed by: bde, jhb
MFC after: 1 week
Idle threads are not allowed to acquire any lock but spinlocks.
Deny any attempt to do so by panicing at the locking operation
when INVARIANTS is on. Then, remove the check on blocking on a
turnstile.
The check in sleepqueues is left because they are not allowed to use
tsleep() either which could happen still.
Reviewed by: bde, jhb, kib
MFC after: 1 week
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
Sponsored by: NETASQ
MFC after: 1 month
Historical behavior of letting other CPUs merily go on is a default for
time being. The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.
Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
state
Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set. That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts. Those locks might be held by the stopped threads and would never
be released. To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.
This change has substantial portions written and re-written by attilio
and kib at various times. Other changes are heavily based on the ideas
and patches submitted by jhb and mdf. bde has provided many insights
into the details and history of the current code.
The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console. This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection. Dumping to USB-connected disks may also be affected.
PR: amd64/139614 (at least)
In cooperation with: attilio, jhb, kib, mdf
Discussed with: arch@, bde
Tested by: Eugene Grosbein <eugen@grosbein.net>,
gnn,
Steven Hartland <killing@multiplay.co.uk>,
glebius,
Andrew Boyer <aboyer@averesystems.com>
(various versions of the patch)
MFC after: 3 months (or never)
This enables locking consumers to pass their own structures around as const and
be able to assert locks embedded into those structures.
Reviewed by: ed, kib, jhb
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).
In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.
The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.
Reported by: avg, kib, pho
Reviewed by: kib
Tested by: pho