152 Commits

Author SHA1 Message Date
mjg
f8fa891369 sx: retire SX_NOADAPTIVE
The flag is not used by anything for years and supporting it requires an
explicit read from the lock when entering slow path.

Flag value is left unused on purpose.

Sponsored by:	The FreeBSD Foundation
2018-12-05 16:43:03 +00:00
vangyzen
0b1c6c1fc3 Make no assertions about lock state when the scheduler is stopped.
Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths.  I did this for mutexes in r306346.

Reported by:	Travis Lane <tlane@isilon.com>
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-13 20:48:05 +00:00
mjg
d42cf69b37 sx: fixup a braino in r334024
If a thread waiting on sx dropped Giant it would not be properly
reacquired on exit from the routine, later resulting in panics
indicating Giant is not held (when it should be).

The bug was not present in the original patch sent to pho, I wittingly
added it just prior to the commit and only smoke-tested it.

Reported by:	pho
2018-05-22 15:13:25 +00:00
mjg
2eae2c18c9 sx: port over writer starvation prevention measures from rwlock
A constant stream of readers could completely starve writers and this is not
a hypothetical scenario.

The 'poll2_threads' test from the will-it-scale suite reliably starves writers
even with concurrency < 10 threads.

The problem was run into and diagnosed by dillon@backplane.com

There was next to no change in lock contention profile during -j 128 pkg build,
despite an sx lock being at the top.

Tested by:	pho
2018-05-22 07:20:22 +00:00
mmacy
0c8764214e fix uninitialized variable warning in reader locks 2018-05-19 03:52:55 +00:00
mjg
0b711116eb locks: extend speculative spin waiting for readers to drain
Now that 10 years have passed since the original limit of 10000 was
committed, bump it a little bit.

Spinning waiting for writers is semi-informed in the sense that we always
know if the owner is running and base the decision to spin on that.
However, no such information is provided for read-locking. In particular
this means that it is possible for a write-spinner to completely waste cpu
time waiting for the lock to be released, while the reader holding it was
preempted and is now waiting for the spinner to go off cpu.

Nonetheless, in majority of cases it is an improvement to spin instead of
instantly giving up and going to sleep.

The current approach is pretty simple: snatch the number of current readers
and performs that many pauses before checking again. The total number of
pauses to execute is limited to 10k. If the lock is still not free by
that time, go to sleep.

Given the previously noted problem of not knowing whether spinning makes
any sense to begin with the new limit has to remain rather conservative.
But at the very least it should also be related to the machine. Waiting
for writers uses parameters selected based on the number of activated
hardware threads. The upper limit of pause instructions to be executed
in-between re-reads of the lock is typically 16384 or 32678. It was
selected as the limit of total spins. The lower bound is set to
already present 10000 as to not change it for smaller machines.

Bumping the limit reduces system time by few % during benchmarks like
buildworld, buildkernel and others. Tested on 2 and 4 socket machines
(Broadwell, Skylake).

Figuring out how to make a more informed decision while not pessimizing
the fast path is left as an exercise for the reader.
2018-04-11 01:43:29 +00:00
mjg
bce760d1e8 locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.

Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.

This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total

So note this is still a significant hit.

LOCK_PROFILING results are not affected.
2018-03-17 19:26:33 +00:00
mjg
8531f3bfc6 sx: don't do an atomic op in upgrade if it cananot succeed
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.

This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
2018-03-04 21:41:05 +00:00
mjg
de7546e12b locks: fix a corner case in r327399
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.

The bug sometimes manifested itself in stalls during -j 128 package builds.

Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
2018-03-04 21:38:30 +00:00
mjg
dd76098f06 sx: fix adaptive spinning broken in r327397
The condition was flipped.

In particular heavy multithreaded kernel builds on zfs started suffering
due to nested sx locks.

For instance make -s -j 128 buildkernel:

before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total

ps.
      .-'---`-.			      .-'---`-.
    ,'          `.		    ,'          `.
    |             \		    |             \
    |              \		    |              \
    \           _  \		    \           _  \
    ,\  _    ,'-,/-)\		    ,\  _    ,'-,/-)\
    ( * \ \,' ,' ,'-)		    ( * \ \,' ,' ,'-)
     `._,)     -',-')		     `._,)     -',-')
       \/         ''/		       \/         ''/
        )        / /		        )        / /
       /       ,'-'		       /       ,'-'
2018-03-02 21:26:27 +00:00
mjg
c782ed4dbc Undo LOCK_PROFILING pessimisation after r313454 and r313455
With the option used to compile the kernel both sx and rw shared ops would
always go to the slow path which added avoidable overhead even when the
facility is disabled.

Furthermore the increased time spent doing uncontested shared lock acquire
would be bogusly added to total wait time, somewhat skewing the results.

Restore old behaviour of going there only when profiling is enabled.

This change is a no-op for kernels without LOCK_PROFILING (which is the
default).
2018-02-17 12:07:09 +00:00
mjg
00f02d857e sx: retry hard shared unlock just like in r327905 for rwlocks 2018-01-13 09:26:24 +00:00
mjg
ea6bb8f729 locks: adjust loop limit check when waiting for readers
The check was for the exact value, but since the counter started being
incremented by the number of readers it could have jumped over.
2017-12-31 02:31:01 +00:00
mjg
38154309c8 sx: fix up non-smp compilation after r327397 2017-12-31 01:59:56 +00:00
mjg
1cb4f9d747 locks: re-check the reason to go to sleep after locking sleepq/turnstile
In both rw and sx locks we always go to sleep if the lock owner is not
running.

We do spin for some time if the lock is read-locked.

However, if we decide to go to sleep due to the lock owner being off cpu
and after sleepq/turnstile gets acquired the lock is read-locked, we should
fallback to the aforementioned wait.
2017-12-31 00:47:04 +00:00
mjg
095fbaddd3 sx: read the SX_NOADAPTIVE flag and Giant ownership only once
These used to be read multiple times when waiting for the lock the become
free, which had the potential to issue completely avoidable traffic.
2017-12-31 00:37:50 +00:00
pfg
cc22a86800 sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
mjg
2d0a05c617 sx: change sunlock to wake waiters up if it locked sleepq
sleepq is only locked if the curhtread is the last reader. By the time
the lock gets acquired new ones could have arrived. The previous code
would unlock and loop back. This results spurious relocking of sleepq.

This is a step towards xadd-based unlock routine.
2017-11-25 20:13:50 +00:00
mjg
c7ffe8c4e0 locks: retry turnstile/sleepq loops on failed cmpset
In order to go to sleep threads set waiter flags, but that can spuriously
fail e.g. when a new reader arrives. Instead of unlocking everything and
looping back, re-evaluate the new state while still holding the lock necessary
to go to sleep.
2017-11-25 20:10:33 +00:00
markj
d88166d005 Have lockstat:::sx-release fire only after the lock state has changed.
MFC after:	1 week
2017-11-24 19:04:31 +00:00
markj
e32cd825f4 Add a missing lockstat:::sx-downgrade probe.
We were returning without firing the probe when the lock had no shared
waiters.

MFC after:	1 week
2017-11-24 19:02:06 +00:00
mjg
b01e4efe7a sx: unbreak debug after r326107
An assertion was modified to use the found value, but it was not updated to
handle a race where blocked threads appear after the entrance to the func.

Move the assertion down to the area protected with sleepq lock where the
lock is read anyway. This does not affect coverage of the assertion and
is consistent with what rw locks are doing.

Reported by:	Shawn Webb
2017-11-23 03:40:51 +00:00
mjg
c8a2652582 locks: pass the found lock value to unlock slow path
This avoids an explicit read later.

While here whack the cheaply obtainable 'tid' argument.
2017-11-22 22:04:04 +00:00
mjg
b1c2309fd4 locks: remove the file + line argument from internal primitives when not used
The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
for many locks even in production kernels.

While here whack the tid argument from wlock hard and xlock hard.

There is no kbi change of any sort - "external" primitives still accept the
pair.
2017-11-22 21:51:17 +00:00
mjg
3fb5232780 locks: fix compilation issues without SMP or KDTRACE_HOOKS 2017-11-17 23:27:06 +00:00
mjg
2cadb364c5 sx: perform a minor cleanup of the unlock slowpath
No functional changes.
2017-11-17 02:27:04 +00:00
mjg
d5884672d5 locks: pull up PMC_SOFT_CALLs out of slow path loops 2017-11-17 02:22:51 +00:00
mjg
72b0124a90 sx: avoid branches if in the slow path if lockstat is disabled 2017-11-17 02:21:07 +00:00
mjg
6f516b023d locks: take the number of readers into account when waiting
Previous code would always spin once before checking the lock. But a lock
with e.g. 6 readers is not going to become free in the duration of once spin
even if they start draining immediately.

Conservatively perform one for each reader.

Note that the total number of allowed spins is still extremely small and is
subject to change later.

MFC after:	1 week
2017-10-05 19:18:02 +00:00
mjg
a059e2d8eb locks: partially tidy up waiting on readers
spin first instant of instantly re-readoing and don't re-read after
spinning is finished - the state is already known.

Note the code is subject to significant changes later.

MFC after:	1 week
2017-10-05 13:01:18 +00:00
mjg
fb0f2cc9b2 Sprinkle __read_frequently on few obvious places.
Note that some of annotated variables should probably change their types
to something smaller, preferably bit-sized.
2017-09-06 20:33:33 +00:00
markj
611738a7d7 Fix the !TD_IS_IDLETHREAD(curthread) locking assertions.
Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.

MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2017-06-19 21:09:50 +00:00
mjg
ab50e3ceda locks: ensure proper barriers are used with atomic ops when necessary
Unclear how, but the locking routine for mutexes was using the *release*
barrier instead of acquire. This must have been either a copy-pasto or bad
completion.

Going through other uses of atomics shows no barriers in:
- upgrade routines (addressed in this patch)
- sections protected with turnstile locks - this should be fine as necessary
  barriers are in the worst case provided by turnstile unlock

I would like to thank Mark Millard and andreast@ for reporting the problem and
testing previous patches before the issue got identified.

ps.
  .-'---`-.
,'          `.
|             \
|              \
\           _  \
,\  _    ,'-,/-)\
( * \ \,' ,' ,'-)
 `._,)     -',-')
   \/         ''/
    )        / /
   /       ,'-'

Hardware provided by: IBM LTC
2017-03-01 05:06:21 +00:00
mjg
093d7b0fcc locks: make trylock routines check for 'unowned' value
Since fcmpset can fail without lock contention e.g. on arm, it was possible
to get spurious failures when the caller was expecting the primitive to succeed.

Reported by:	mmel
2017-02-19 16:28:46 +00:00
mjg
9d1d07d1cb locks: clean up trylock primitives
In particular thius reduces accesses of the lock itself.
2017-02-18 22:06:03 +00:00
mjg
ce6402c263 sx: fix compilation on UP kernels after r313855
sx primitives use inlines as opposed to macros. Change the tested condition
to LOCK_DEBUG which covers the case, but is slightly overzelaous.

Reported by:	kib
2017-02-17 10:58:12 +00:00
mjg
56448704f5 locks: let primitives for modules unlock without always goging to the slsow path
It is only needed if the LOCK_PROFILING is enabled. It has to always check if
the lock is about to be released which requires an avoidable read if the option
is not specified..
2017-02-17 05:39:40 +00:00
mjg
92dde5f426 locks: remove SCHEDULER_STOPPED checks from primitives for modules
They all fallback to the slow path if necessary and the check is there.

This means a panicked kernel executing code from modules will be able to
succeed doing actual lock/unlock, but this was already the case for core code
which has said primitives inlined.
2017-02-17 05:09:51 +00:00
mjg
bcee8cb651 locks: tidy up unlock fallback paths
Update comments to note these functions are reachable if lockstat is
enabled.

Check if the lock has any bits set before attempting unlock, which saves
an unnecessary atomic operation.
2017-02-09 08:19:30 +00:00
mjg
72d5d36dd3 sx: implement slock/sunlock fast path
See r313454.
2017-02-08 19:29:34 +00:00
mjg
210d7e9a55 locks: change backoff to exponential
Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.

Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.

Reviewed by:	kib (previous version)
2017-02-07 14:49:36 +00:00
mjg
331b02c0ea locks: fix recursion support after recent changes
When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.

Read the value if necessary.
2017-02-06 09:40:14 +00:00
mjg
d88e33a5a2 sx: move lockstat handling out of inline primitives
See r313275 for details.
2017-02-05 09:54:16 +00:00
mjg
8e657aca1f sx: add witness support missed in r313272 2017-02-05 06:51:45 +00:00
mjg
f9bb72ec71 sx: uninline slock/sunlock
Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.

The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.

Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.

Interestingly, rwlock primitives are already not inlined.
2017-02-05 05:20:29 +00:00
mjg
cb2a65b4f1 sx: switch to fcmpset
Discussed with:	jhb
Tested by:	pho (previous version)
2017-02-05 04:54:20 +00:00
mjg
5004e7bcb7 Sprinkle __read_mostly on backoff and lock profiling code.
MFC after:	1 month
2017-01-27 15:03:51 +00:00
mjg
7de7dabf52 sx: reduce lock accesses similarly to r311172
Discussed with:	jhb
Tested by:	pho (previous version)
2017-01-18 17:55:08 +00:00
markj
179b6ff3ce Return a non-NULL owner only if the lock is exclusively held in owner_sx().
Fix some whitespace bugs while here.

MFC after:	2 weeks
2016-12-10 02:56:44 +00:00
mjg
bdc6c48a91 locks: fix sx compilation on mips after r303643
The kernel.h header is required for the SYSINIT macro, which apparently
was present on amd64 by accident.

Reported by:	kib
2016-08-03 09:15:10 +00:00