freebsd-skq

Author	SHA1	Message	Date
jhb	ae6222b0c3	Drop "All rights reserved" from my copyright statements. Reviewed by: rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D19485	2019-03-06 22:11:45 +00:00
mjg	da67d6603f	locks: plug warnings about unitialized variables They only showed up after I redefined LOCKSTAT_ENABLED to 0. doing_lockprof in mutex.c is a real (but harmless) bug. Should the value be non-zero it will do checks for lock profiling which would otherwise be skipped. state in rwlock.c is a wart from the compiler, the value can't be used if lock profiling is not enabled. Sponsored by: The FreeBSD Foundation	2018-11-13 21:29:56 +00:00
vangyzen	0b1c6c1fc3	Make no assertions about lock state when the scheduler is stopped. Change the assert paths in rm, rw, and sx locks to match the lock and unlock paths. I did this for mutexes in r306346. Reported by: Travis Lane <tlane@isilon.com> MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2018-11-13 20:48:05 +00:00
mjg	cdca29b9c6	Remove an unused argument to turnstile_unpend. PR: 228694 Submitted by: Julian Pszczołowski <julian.pszczolowski@gmail.com>	2018-06-02 22:37:53 +00:00
mjg	30f6cb7019	rw: decrease writer starvation Writers waiting on readers to finish can set the RW_LOCK_WRITE_SPINNER bit. This prevents most new readers from coming on. However, the last reader to unlock also clears the bit which means new readers can sneak in and the cycle starts over. Change the code to keep the bit after last unlock. Note that starvation potential is still there: no matter how many write spinners are there, there is one bit. After the writer unlocks, the lock is free to get raided by readers again. It is good enough for the time being. The real fix would include counting writers. This runs into a caveat: the writer which set the bit may now be preempted. In order to get rid of the problem all attempts to set the bit are preceeded with critical_enter. The bit gets cleared when the thread which set it goes to sleep. This way an invariant holds that if the bit is set, someone is actively spinning and will grab the lock soon. In particular this means that readers which find the lock in this transient state can safely spin until the lock finds itself an owner (i.e. they don't need to block nor speculate how long to spin speculatively). Tested by: pho	2018-05-22 07:16:39 +00:00
mmacy	0c8764214e	fix uninitialized variable warning in reader locks	2018-05-19 03:52:55 +00:00
mjg	0b711116eb	locks: extend speculative spin waiting for readers to drain Now that 10 years have passed since the original limit of 10000 was committed, bump it a little bit. Spinning waiting for writers is semi-informed in the sense that we always know if the owner is running and base the decision to spin on that. However, no such information is provided for read-locking. In particular this means that it is possible for a write-spinner to completely waste cpu time waiting for the lock to be released, while the reader holding it was preempted and is now waiting for the spinner to go off cpu. Nonetheless, in majority of cases it is an improvement to spin instead of instantly giving up and going to sleep. The current approach is pretty simple: snatch the number of current readers and performs that many pauses before checking again. The total number of pauses to execute is limited to 10k. If the lock is still not free by that time, go to sleep. Given the previously noted problem of not knowing whether spinning makes any sense to begin with the new limit has to remain rather conservative. But at the very least it should also be related to the machine. Waiting for writers uses parameters selected based on the number of activated hardware threads. The upper limit of pause instructions to be executed in-between re-reads of the lock is typically 16384 or 32678. It was selected as the limit of total spins. The lower bound is set to already present 10000 as to not change it for smaller machines. Bumping the limit reduces system time by few % during benchmarks like buildworld, buildkernel and others. Tested on 2 and 4 socket machines (Broadwell, Skylake). Figuring out how to make a more informed decision while not pessimizing the fast path is left as an exercise for the reader.	2018-04-11 01:43:29 +00:00
mjg	1a3d741e47	rw: whack avoidable re-reads in try_upgrade	2018-04-10 22:32:31 +00:00
mjg	bce760d1e8	locks: slightly depessimize lockstat The slow path is always taken when lockstat is enabled. This induces rdtsc (or other) calls to get the cycle count even when there was no contention. Still go to the slow path to not mess with the fast path, but avoid the heavy lifting unless necessary. This reduces sys and real time during -j 80 buildkernel: before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total So note this is still a significant hit. LOCK_PROFILING results are not affected.	2018-03-17 19:26:33 +00:00
mjg	de7546e12b	locks: fix a corner case in r327399 If there were exactly rowner_retries/asx_retries (by default: 10) transitions between read and write state and the waiters still did not get the lock, the next owner -> reader transition would result in the code correctly falling back to turnstile/sleepq where it would incorrectly think it was waiting for a writer and decide to leave turnstile/sleepq to loop back. From this point it would take ts/sq trips until the lock gets released. The bug sometimes manifested itself in stalls during -j 128 package builds. Refactor the code to fix the bug, while here remove some of the gratituous differences between rw and sx locks.	2018-03-04 21:38:30 +00:00
mjg	c782ed4dbc	Undo LOCK_PROFILING pessimisation after r313454 and r313455 With the option used to compile the kernel both sx and rw shared ops would always go to the slow path which added avoidable overhead even when the facility is disabled. Furthermore the increased time spent doing uncontested shared lock acquire would be bogusly added to total wait time, somewhat skewing the results. Restore old behaviour of going there only when profiling is enabled. This change is a no-op for kernels without LOCK_PROFILING (which is the default).	2018-02-17 12:07:09 +00:00
mjg	9e89b7286b	rwlock: diff-reduction of runlock compared to sx sunlock	2018-02-14 20:37:33 +00:00
mjg	6f29c630d9	rwlock: try regular read unlock even in the hard path Saves on turnstile trips if the lock got more readers.	2018-01-13 00:05:31 +00:00
mjg	ea6bb8f729	locks: adjust loop limit check when waiting for readers The check was for the exact value, but since the counter started being incremented by the number of readers it could have jumped over.	2017-12-31 02:31:01 +00:00
mjg	1cb4f9d747	locks: re-check the reason to go to sleep after locking sleepq/turnstile In both rw and sx locks we always go to sleep if the lock owner is not running. We do spin for some time if the lock is read-locked. However, if we decide to go to sleep due to the lock owner being off cpu and after sleepq/turnstile gets acquired the lock is read-locked, we should fallback to the aforementioned wait.	2017-12-31 00:47:04 +00:00
mjg	c5a49cfe43	rwlock: tidy up __rw_runlock_hard similarly to r325921	2017-12-31 00:31:14 +00:00
pfg	cc22a86800	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
mjg	26410c3616	rw: fix runlock_hard when new readers show up When waiters/writer spinner flags are set no new readers can show up unless they already have a different rw rock read locked. The change in r326195 failed to take that into account - in presence of new readers it would spin until they all drain, which would be lead to trouble if e.g. they go off cpu and can get scheduled because of this thread. Reported by: pho	2017-11-26 21:10:47 +00:00
mjg	67f9bba460	rwlock: fix up compilation of the previous change commmitted wrong version of the patch	2017-11-25 20:25:45 +00:00
mjg	c52edb8a9e	rwlock: add __rw_try_{r,w}lock_int	2017-11-25 20:22:51 +00:00
mjg	c7ffe8c4e0	locks: retry turnstile/sleepq loops on failed cmpset In order to go to sleep threads set waiter flags, but that can spuriously fail e.g. when a new reader arrives. Instead of unlocking everything and looping back, re-evaluate the new state while still holding the lock necessary to go to sleep.	2017-11-25 20:10:33 +00:00
mjg	d5c5e62cc7	rwlock: stop re-reading the owner when going to sleep	2017-11-25 20:08:11 +00:00
mjg	77a5a9349b	rwlock: unbreak WITNESS builds after r326110 Reported by: Shawn Webb	2017-11-23 03:20:12 +00:00
mjg	41e06ccaac	rwlock: don't check for curthread's read lock count in the fast path	2017-11-22 23:52:05 +00:00
mjg	c8a2652582	locks: pass the found lock value to unlock slow path This avoids an explicit read later. While here whack the cheaply obtainable 'tid' argument.	2017-11-22 22:04:04 +00:00
mjg	b1c2309fd4	locks: remove the file + line argument from internal primitives when not used The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed) for many locks even in production kernels. While here whack the tid argument from wlock hard and xlock hard. There is no kbi change of any sort - "external" primitives still accept the pair.	2017-11-22 21:51:17 +00:00
markj	06c0131e2d	Clean up the SYSINIT_FLAGS definitions for rwlock(9) and rmlock(9). Avoid duplication in their macro definitions, and document them. No functional change intended. MFC after: 1 week	2017-11-21 14:59:23 +00:00
mjg	45ffcf24ba	rwlock: unlock before traversing threads to wake up While here perform a minor cleanup of the unlock path.	2017-11-17 02:26:15 +00:00
mjg	d5884672d5	locks: pull up PMC_SOFT_CALLs out of slow path loops	2017-11-17 02:22:51 +00:00
mjg	3b151c9e66	rwlock: avoid branches in the slow path if lockstat is disabled	2017-11-17 02:21:24 +00:00
mjg	41984bdd76	rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER	2017-11-11 09:34:11 +00:00
mjg	01ed16d59b	rwlock: fix up compilation without KDTRACE_HOOKS after r324787	2017-11-06 05:14:05 +00:00
mjg	abcba1b168	rwlock: reduce lockstat branches in the slowpath MFC after: 1 week	2017-10-20 03:32:42 +00:00
mjg	6f516b023d	locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. MFC after: 1 week	2017-10-05 19:18:02 +00:00
mjg	a059e2d8eb	locks: partially tidy up waiting on readers spin first instant of instantly re-readoing and don't re-read after spinning is finished - the state is already known. Note the code is subject to significant changes later. MFC after: 1 week	2017-10-05 13:01:18 +00:00
mjg	fb0f2cc9b2	Sprinkle __read_frequently on few obvious places. Note that some of annotated variables should probably change their types to something smaller, preferably bit-sized.	2017-09-06 20:33:33 +00:00
mjg	ba463d64a7	rwlock: perform the typically false td_rw_rlocks check later Check if the lock is available first instead. MFC after: 1 week	2017-07-02 01:05:16 +00:00
markj	611738a7d7	Fix the !TD_IS_IDLETHREAD(curthread) locking assertions. Most of the lock slowpaths assert that the calling thread isn't an idle thread. However, this may not be true if the system has panicked, and in some cases the assertion appears before a SCHEDULER_STOPPED() check. MFC after: 3 days Sponsored by: Dell EMC Isilon	2017-06-19 21:09:50 +00:00
mjg	ab50e3ceda	locks: ensure proper barriers are used with atomic ops when necessary Unclear how, but the locking routine for mutexes was using the release barrier instead of acquire. This must have been either a copy-pasto or bad completion. Going through other uses of atomics shows no barriers in: - upgrade routines (addressed in this patch) - sections protected with turnstile locks - this should be fine as necessary barriers are in the worst case provided by turnstile unlock I would like to thank Mark Millard and andreast@ for reporting the problem and testing previous patches before the issue got identified. ps. .-'---`-. ,' `. \| \ \| \ \ _ \ ,\ _ ,'-,/-)\ ( * \ \,' ,' ,'-) `._,) -',-') \/ ''/ ) / / / ,'-' Hardware provided by: IBM LTC	2017-03-01 05:06:21 +00:00
mjg	093d7b0fcc	locks: make trylock routines check for 'unowned' value Since fcmpset can fail without lock contention e.g. on arm, it was possible to get spurious failures when the caller was expecting the primitive to succeed. Reported by: mmel	2017-02-19 16:28:46 +00:00
mjg	9d1d07d1cb	locks: clean up trylock primitives In particular thius reduces accesses of the lock itself.	2017-02-18 22:06:03 +00:00
mjg	56448704f5	locks: let primitives for modules unlock without always goging to the slsow path It is only needed if the LOCK_PROFILING is enabled. It has to always check if the lock is about to be released which requires an avoidable read if the option is not specified..	2017-02-17 05:39:40 +00:00
mjg	92dde5f426	locks: remove SCHEDULER_STOPPED checks from primitives for modules They all fallback to the slow path if necessary and the check is there. This means a panicked kernel executing code from modules will be able to succeed doing actual lock/unlock, but this was already the case for core code which has said primitives inlined.	2017-02-17 05:09:51 +00:00
mjg	5f5fddd5ed	rwlock: fix r313454 The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state. Reported by: pho	2017-02-09 13:32:19 +00:00
mjg	bcee8cb651	locks: tidy up unlock fallback paths Update comments to note these functions are reachable if lockstat is enabled. Check if the lock has any bits set before attempting unlock, which saves an unnecessary atomic operation.	2017-02-09 08:19:30 +00:00
mjg	c845388950	rwlock: implemenet rlock/runlock fast path This improves singlethreaded throughput on my test machine from ~247 mln ops/s to ~328 mln. It is mostly about avoiding the setup cost of lockstat. Reviewed by: jhb (previous version)	2017-02-08 19:28:46 +00:00
mjg	58e05ff808	rwlock: implement RW_LOCK_WRITER_RECURSED bit This moves recursion handling out of the inlined wunlock path and in particular saves a read and a branch. Discussed with:	2017-02-07 17:04:31 +00:00
mjg	210d7e9a55	locks: change backoff to exponential Previous implementation would use a random factor to spread readers and reduce chances of starvation. This visibly reduces effectiveness of the mechanism. Switch to the more traditional exponential variant. Try to limit starvation by imposing an upper limit of spins after which spinning is half of what other threads get. Note the mechanism is turned off by default. Reviewed by: kib (previous version)	2017-02-07 14:49:36 +00:00
mjg	331b02c0ea	locks: fix recursion support after recent changes When a relevant lockstat probe is enabled the fallback primitive is called with a constant signifying a free lock. This works fine for typical cases but breaks with recursion, since it checks if the passed value is that of the executing thread. Read the value if necessary.	2017-02-06 09:40:14 +00:00
mjg	8f1a940ede	rwlock: move lockstat handling out of inline primitives See r313275 for details. One difference here is that recursion handling was removed from the fallback routine. As it is it was never supposed to see a recursed lock in the first place. Future changes will move it out of inline variants, but right now there is no easy to way to test if the lock is recursed without reading additional words.	2017-02-05 13:37:23 +00:00

1 2 3

134 Commits