Commit Graph

133 Commits

Author SHA1 Message Date
Mark Johnston
852ff943b9 sleepqueue: Annotate sleepq_max_depth as static
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-02-14 10:06:47 -05:00
Mark Johnston
893be9d8ac sleepqueue: Address a lock order reversal
After commit 74cf7cae4d ("softclock: Use dedicated ithreads for
running callouts."), there is a lock order reversal between the per-CPU
callout lock and the scheduler lock.  softclock_thread() locks callout
lock then the scheduler lock, when preparing to switch off-CPU, and
sleepq_remove_thread() stops the timed sleep callout while potentially
holding a scheduler lock.  In the latter case, it's the thread itself
that's locked, and if the thread is sleeping then its lock will be a
sleepqueue lock, but if it's still in the process of going to sleep
it'll be a scheduler lock.

We could perhaps change softclock_thread() to try to acquire locks in
the opposite order, but that'd require dropping and re-acquiring the
callout lock, which seems expensive for an operation that will happen
quite frequently.  We can instead perhaps avoid stopping the
td_slpcallout callout if the thread is still going to sleep, which is
what this patch does.  This will result in a spurious call to
sleepq_timeout(), but some counters suggest that this is very rare.

PR:		261198
Fixes:		74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
Reported and tested by:	thj
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34204
2022-02-14 10:06:47 -05:00
Konstantin Belousov
9b86d3e5de When queuing ignored signal, only abort target thread' sleep if it is inside sigwait()
Reported and tested by:	trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32252
2021-10-06 17:05:22 +03:00
Konstantin Belousov
f17eb93d55 When sending ignored signal, arrange for zero return code from sleep
Otherwise consumers get unexpected EINTR errors without seeing
a properly discarded signal.

Reported and tested by:	trasz
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32252
2021-10-06 17:05:22 +03:00
Alexander Motin
6df1359e55 sleepqueue(9): Remove sbinuptime() from sleepq_timeout().
Callout c_time is always bigger or equal than the scheduled time.  It
is also smaller than sbinuptime() and can't change while the callback
is running.  So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.

According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.

MFC after:	1 month
2021-10-02 21:08:41 -04:00
Alexander Motin
6df35af4d8 Allow sleepq_signal() to drop the lock.
Introduce SLEEPQ_DROP sleepq_signal() flag, allowing one to drop the
sleep queue chain lock before returning.  Reduced lock scope allows
significantly reduce lock contention inside taskqueue_enqueue() for
ZFS worker threads doing ~350K disk reads/s on 40-thread system.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2021-06-25 14:12:21 -04:00
Konstantin Belousov
15465a2c25 Add sleepq_remove_nested()
The helper removes the thread from a sleep queue, assuming that it would
need to sleep. The sleepq_remove_nested() function is intended for quite
special case, where suspended thread from traced stopped process is
temporary unsuspended to do some work on behalf of the debugger in the
target context, and this work might require sleep.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29955
2021-05-03 19:13:47 +03:00
Konstantin Belousov
203affb291 Fix TDP_WAKEUP/thr_wake(curthread->td_tid) after r366428.
Reported by:	arichardson
Reviewed by:	arichardson, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27597
2020-12-13 19:45:42 +00:00
Konstantin Belousov
0b459854bc Correct indent.
Sponsored by:	The FreeBSD Foundation
2020-12-13 19:43:45 +00:00
Konstantin Belousov
0c82fb267b Refactor sleepq_catch_signals().
- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
  sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.

Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26628
2020-10-04 16:30:05 +00:00
Gleb Smirnoff
022c2f5570 In r354148 the goal was to check THREAD_CAN_SLEEP() only once for the
purpose of epoch_trace() and for calling subsequent panic, but to keep
code fully under INVARIANTS, so don't use bare function call to panic().
However, at the last stage of review a true value slipped in, while
always false was assumed.  I checked that in email archive with kib@.

Noticed by:	trasz
2020-09-09 16:13:33 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Conrad Meyer
fea73412a0 sleep(9), sleepqueue(9): const'ify wchan pointers
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers.  Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST().  Plumb const through all of the
underlying sleepqueue bits.

No functional change.

Reviewed by:	rlibby
Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22914
2019-12-24 16:19:33 +00:00
Jeff Roberson
686bcb5c14 schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
Jeff Roberson
045b7c6084 schedlock 2/4
Do all sleepqueue post-processing in sleepq_remove_thread() so that we
do not require the thread lock after a context switch.

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D22745
2019-12-15 21:18:07 +00:00
Jeff Roberson
61a74c5ccd schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
Gleb Smirnoff
5757b59f3e Merge td_epochnest with td_no_sleeping.
Epoch itself doesn't rely on the counter and it is provided
merely for sleeping subsystems to check it.

- In functions that sleep use THREAD_CAN_SLEEP() to assert
  correctness.  With EPOCH_TRACE compiled print epoch info.
- _sleep() was a wrong place to put the assertion for epoch,
  right place is sleepq_add(), as there ways to call the
  latter bypassing _sleep().
- Do not increase td_no_sleeping in non-preemptible epochs.
  The critical section would trigger all possible safeguards,
  no sleeping counter is extraneous.

Reviewed by:	kib
2019-10-29 17:28:25 +00:00
Gleb Smirnoff
ed9d69b5e8 Use THREAD_CAN_SLEEP() macro to check if thread can sleep. There is no
functional change.

Discussed with:	kib
2019-10-24 21:55:19 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
Konstantin Belousov
ab74c84333 Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.

In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.

Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.

PR: 231445
Reported by:	Efi Weiss <valmarelox@gmail.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20381
2019-05-29 14:05:27 +00:00
John Baldwin
2e43efd0bb Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
Matt Macy
3adccf38e3 turnstile / sleepqueue: annotate variables only used by debug builds 2018-05-19 05:00:16 +00:00
Mark Johnston
803c11a3a6 Use LIST_FOREACH_SAFE in sleepq_chains_remove_matching().
We may remove a sleepqueue from the hash table in
sleepq_resume_thread().

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14847
2018-03-25 20:12:14 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Mateusz Guzik
9e68989764 Make the sleepq chain hash size configurable per-arch and bump on amd64.
While here cache-align chains.

This shortens longest found chain during poudriere -j 80 from 32 to 16.

Pushing this higher up will probably require allocation on boot.
2017-10-22 20:43:50 +00:00
Hans Petter Selasky
32b413d7f0 When showing the sleepqueues from the in-kernel debugger,
properly dump all the sendqueues and not just the first one

History:
It appears that in the commit which introduced the code,
r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
were interchanged. In r180927 "td_name[i]" was corrected to
"td_name[0]", but "sq_blocked[0]" was left unchanged.

PR:		222624
Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-09 18:33:29 +00:00
Mark Johnston
f38c0c46c5 Let stack_create(9) take a malloc flags argument.
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12614
2017-10-06 21:52:28 +00:00
Eric van Gyzen
8addc72b3e Add missing pieces of r315280
I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.
Specifically:

Add documentation, polish comments, and improve style(9).

Tested by:	pho (r315280)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9791
2017-03-14 22:02:02 +00:00
Eric van Gyzen
9dbdf2a169 When the RTC is adjusted, reevaluate absolute sleep times based on the RTC
POSIX 2008 says this about clock_settime(2):

    If the value of the CLOCK_REALTIME clock is set via clock_settime(),
    the new value of the clock shall be used to determine the time
    of expiration for absolute time services based upon the
    CLOCK_REALTIME clock.  This applies to the time at which armed
    absolute timers expire.  If the absolute time requested at the
    invocation of such a time service is before the new value of
    the clock, the time service shall expire immediately as if the
    clock had reached the requested time normally.

    Setting the value of the CLOCK_REALTIME clock via clock_settime()
    shall have no effect on threads that are blocked waiting for
    a relative time service based upon this clock, including the
    nanosleep() function; nor on the expiration of relative timers
    based upon this clock.  Consequently, these time services shall
    expire when the requested relative interval elapses, independently
    of the new or old value of the clock.

When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen.  The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.

At present, this affects the following functions:

    pthread_cond_timedwait(3)
    pthread_mutex_timedlock(3)
    pthread_rwlock_timedrdlock(3)
    pthread_rwlock_timedwrlock(3)
    sem_timedwait(3)
    sem_clockwait_np(3)

I'm working on adding clock_nanosleep(2), which will also be affected.

Reported by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by:	jhb, kib
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9791
2017-03-14 19:06:44 +00:00
Eric van Gyzen
8a8bea603c Fix grammar in some comments in subr_sleepqueue.c
While I'm here, remove trailing whitespace.

Reviewed by:	kib, mostly, as part of a larger review
MFC after:	3 days
2017-03-03 21:03:28 +00:00
Eric Badger
28d2efa983 sleepq_catch_signals: do thread suspension before signal check
Since locks are dropped when a thread suspends, it's possible for another
thread to deliver a signal to the suspended thread. If the thread awakens from
suspension without checking for signals, it may go to sleep despite having
a pending signal that should wake it up. Therefore the suspension check is
done first, so any signals sent while suspended will be caught in the
subsequent signal check.

Reviewed by:	kib
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9530
2017-02-14 17:13:23 +00:00
Mark Johnston
eab80d9276 Add a comment explaining the race fixed by r310423.
Suggested and reviewed by: jhb
X-MFC With:	r310423
2016-12-23 05:02:17 +00:00
Mark Johnston
aa3c544349 Revert part of r300109.
The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last
thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin
executing on a different CPU before sleepq_resume_thread() returns. This
leaves a window during which it may go back to sleep and incorrectly be
awoken again by the caller of sleepq_broadcast().

Reported and tested by:	pho
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2016-12-22 17:51:44 +00:00
John Baldwin
9f3aabb9eb Permit timed sleeps for threads other than thread0 before timers are working.
The callout subsystem already handles early callouts and schedules
the first clock interrupt appropriately based on the currently pending
callouts.  The one nit to fix was that callouts scheduled via C_HARDCLOCK
during early boot could fire too early once timers were enabled as the
per-CPU base time is always zero until timers are initialized.  The change
in callout_when() handles this case by using the current uptime as the
base time of the callout during bootup if the per-CPU base time is zero.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Netflix
2016-11-25 18:02:43 +00:00
Mark Johnston
3da0f3c9ae Micro-optimize sleepq_signal().
Lift a comparison out of the loop that finds the highest-priority thread
on the queue.

MFC after:	1 week
2016-09-04 00:29:48 +00:00
Konstantin Belousov
2d19b736ed Rewrite subr_sleepqueue.c use of callouts to not depend on the
specifics of callout KPI.  Esp., do not depend on the exact interface
of callout_stop(9) return values.

The main change is that instead of requiring precise callouts, code
maintains absolute time to wake up.  Callouts now should ensure that a
wake occurs at the requested moment, but we can tolerate both run-away
callout, and callout_stop(9) lying about running callout either way.

As consequence, it removes the constant source of the bugs where
sleepq_check_timeout() causes uninterruptible thread state where the
thread is detached from CPU, see e.g. r234952 and r296320.

Patch also removes dual meaning of the TDF_TIMEOUT flag, making code
(IMO much) simpler to reason about.

Tested by:	pho
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7137
2016-07-28 09:09:55 +00:00
Gleb Smirnoff
d153eeee97 The paradigm of a callout is that it has three consequent states:
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.

With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.

However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.

Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.

This change fixes panics on a high loaded TCP server.

Reviewed by:	jch, hselasky, rrs, kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7042
2016-07-05 18:47:17 +00:00
Konstantin Belousov
46e47c4f8d Provide helper macros to detect 'non-silent SBDRY' state and to
calculate appropriate return value for stops.  Simplify the code by
using them.

Fix typo in sig_suspend_threads().  The thread which sleep must be
aborted is td2. (*)

In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region.  For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)

Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)

Reported and tested by:	pho (*)
Requested by:	bde (**)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-07-03 18:19:48 +00:00
Mark Johnston
d2f5f8db87 Micro-optimize sleepq_broadcast().
- Avoid a conditional branch on the return value of sleepq_resume_thread()
  by ORing its return value into the boolean wakeup_swapper. This is
  consistent with other sleepqueue functions which just pass this return
  value to their caller.
- sleepq_resume_thread() unconditionally removes the thread from its queue,
  so there's no need to maintain a pointer to the next element in the queue.

MFC after:	2 weeks
2016-05-18 03:50:21 +00:00
John Baldwin
b4f1d267b7 Rework handling of thread sleeps before timers are working.
Previously, calls to *sleep() and cv_*wait*() immediately returned during
early boot.  Instead, permit threads that request a sleep without a
timeout to sleep as wakeup() works during early boot.  Sleeps with
timeouts are harder to emulate without working timers, so just punt and
panic explicitly if any thread tries to use those before timers are
working.  Any threads that depend on timeouts should either wait until
SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers
are available.

Until APs are started earlier this should be a no-op as other kthreads
shouldn't get a chance to start running until after timers are working
regardless of when they were created.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5724
2016-03-31 18:10:29 +00:00
Conrad Meyer
82e17adcb4 fail(9): Only gather/print stacks if STACK is enabled
This is a follow-up fix to the earlier r296927.

Reported by:	bz
Sponsored by:	EMC / Isilon Storage Division
2016-03-17 01:05:53 +00:00
Conrad Meyer
70e20d4e1a fail(9): Upstreaming some fail point enhancements
This is several year's worth of fail point upgrades done at EMC Isilon. They
are interdependent enough that it makes sense to put a single diff up for them.
Primarily, we added:

- Changing all mainline execution paths to be lockless, which lets us use fail
  points in more sleep-sensitive areas, and allows more parallel execution
- A number of additional commands, including 'pause' that lets us do some
  interesting deterministic repros of race conditions
- The ability to dump the stacks of all threads sleeping on a fail point
- A number of other API changes to allow marking up the fail point's context in
  the code, and firing callbacks before and after execution
- A man page update

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem (earlier version), jhb, kib, pho
With feedback from:	bdrewery
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5427
2016-03-16 04:22:32 +00:00
Konstantin Belousov
5db9ed8062 If callout_stop_safe() noted that the callout is currently executing,
but next invocation is cancelled while migrating,
sleepq_check_timeout() needs to be informed that the callout is
stopped.  Otherwise the thread switches off CPU and never become
runnable, since running callout could have already raced with us,
while the migrating and cancelled callout could be one which is
expected to set TDP_TIMOFAIL flag for us.  This contradicts with the
expected behaviour of callout_stop() for other callers, which
e.g. decrement references from the callout callbacks.

Add a new flag CS_MIGRBLOCK requesting report of the situation as
'successfully stopped'.

Reviewed by:	jhb (previous version)
Tested by:	cognet, pho
PR:	200992
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D5221
2016-03-02 18:46:17 +00:00
Hans Petter Selasky
a115fb62ed Revert for r277213:
FreeBSD developers need more time to review patches in the surrounding
areas like the TCP stack which are using MPSAFE callouts to restore
distribution of callouts on multiple CPUs.

Bump the __FreeBSD_version instead of reverting it.

Suggested by:		kmacy, adrian, glebius and kib
Differential Revision:	https://reviews.freebsd.org/D1438
2015-01-22 11:12:42 +00:00
Hans Petter Selasky
1a26c3c047 Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
Attilio Rao
e989086b1d sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes
before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING
case.

Sponsored by:	EMC / Isilon storage division
2014-06-24 15:16:55 +00:00
John Baldwin
e432d5f6a7 Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00