The primitive can be used to wait for the lock to be released. Intended
usage is for locks in structures which are about to be freed.
The benefit is the avoided interrupt enable/disable trip + atomic op to
grab the lock and shorter wait if the lock is held (since there is no
worry someone will contend on the lock, re-reads can be more aggressive).
Briefly discussed with: kib
Since this function is effectively slow path, if we get here the lock is most
likely already taken in which case it is cheaper to not blindly attempt the
atomic op.
While here move hwpmc probe out of the loop to match other primitives.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
for many locks even in production kernels.
While here whack the tid argument from wlock hard and xlock hard.
There is no kbi change of any sort - "external" primitives still accept the
pair.
This shortens the lock hold time while not affecting corretness.
All the woken up threads end up competing can lose the race against
a completely unrelated thread getting the lock anyway.
1) shorten the fast path by pushing the lockstat probe to the slow path
2) test for kernel panic only after it turns out we will have to spin,
in particular test only after we know we are not recursing
MFC after: 1 week
tid must be equal to curthread and the target routine was already reading
it anyway, which is not a problem. Not passing it as a parameter allows for
a little bit shorter code in callers.
MFC after: 1 week
Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Since fcmpset can fail without lock contention e.g. on arm, it was possible
to get spurious failures when the caller was expecting the primitive to succeed.
Reported by: mmel
This denotes changes which went in by accident in r313877.
On most production kernels both said parameters are zeroed and have nothing
reading them in either __mtx_lock_sleep or __mtx_unlock_sleep. Thus this change
stops passing them by internal consumers which this is the case.
Kernel modules use _flags variants which are not affected kbi-wise.
It is only needed if the LOCK_PROFILING is enabled. It has to always check if
the lock is about to be released which requires an avoidable read if the option
is not specified..
They all fallback to the slow path if necessary and the check is there.
This means a panicked kernel executing code from modules will be able to
succeed doing actual lock/unlock, but this was already the case for core code
which has said primitives inlined.
Update comments to note these functions are reachable if lockstat is
enabled.
Check if the lock has any bits set before attempting unlock, which saves
an unnecessary atomic operation.
Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.
Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.
Reviewed by: kib (previous version)
When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.
Read the value if necessary.
Lockstat requires checking if it is enabled and if so, calling a 6 argument
function. Further, determining whether to call it on unlock requires
pre-reading the lock value.
This is problematic in at least 3 ways:
- more branches in the hot path than necessary
- additional cacheline ping pong under contention
- bigger code
Instead, check first if lockstat handling is necessary and if so, just fall
back to regular locking routines. For this purpose a new macro is introduced
(LOCKSTAT_PROFILE_ENABLED).
LOCK_PROFILING uninlines all primitives. Fold in the current inline lock
variant into the _mtx_lock_flags to retain the support. With this change
the inline variants are not used when LOCK_PROFILING is defined and thus
can ignore its existence.
This results in:
text data bss dec hex filename
22259667 1303208 4994976 28557851 1b3c21b kernel.orig
21797315 1303208 4994976 28095499 1acb40b kernel.patched
i.e. about 3% reduction in text size.
A remaining action is to remove spurious arguments for internal kernel
consumers.
The found value is passed to locking routines in order to reduce cacheline
accesses.
mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures
the routine can fail even if the lock could have been handled by the inline
primitive.
Discussed with: jhb
Tested by: pho (previous version)
Instead of spuriously re-reading the lock value, read it once.
This change also has a side effect of fixing a performance bug:
on failed _mtx_obtain_lock, it was possible that re-read would find
the lock is unowned, but in this case the primitive would make a trip
through turnstile code.
This is diff reduction to a variant which uses atomic_fcmpset.
Discussed with: jhb (previous version)
Tested by: pho (previous version)
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.
One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.
For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.
Current parameters are autotuned on boot based on mp_cpus.
Autotune factors are very conservative and are subject to change later.
Reviewed by: kib, jhb
Tested by: pho
MFC after: 1 week