Commit Graph

119 Commits

Author SHA1 Message Date
cem
e26092ad11 sleep(9), sleepqueue(9): const'ify wchan pointers
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers.  Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST().  Plumb const through all of the
underlying sleepqueue bits.

No functional change.

Reviewed by:	rlibby
Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22914
2019-12-24 16:19:33 +00:00
jeff
506c867c6e schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
jeff
7431380c94 schedlock 2/4
Do all sleepqueue post-processing in sleepq_remove_thread() so that we
do not require the thread lock after a context switch.

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D22745
2019-12-15 21:18:07 +00:00
jeff
bf925a1e49 schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
glebius
8bb52bf920 Merge td_epochnest with td_no_sleeping.
Epoch itself doesn't rely on the counter and it is provided
merely for sleeping subsystems to check it.

- In functions that sleep use THREAD_CAN_SLEEP() to assert
  correctness.  With EPOCH_TRACE compiled print epoch info.
- _sleep() was a wrong place to put the assertion for epoch,
  right place is sleepq_add(), as there ways to call the
  latter bypassing _sleep().
- Do not increase td_no_sleeping in non-preemptible epochs.
  The critical section would trigger all possible safeguards,
  no sleeping counter is extraneous.

Reviewed by:	kib
2019-10-29 17:28:25 +00:00
glebius
74a423d9ac Use THREAD_CAN_SLEEP() macro to check if thread can sleep. There is no
functional change.

Discussed with:	kib
2019-10-24 21:55:19 +00:00
mav
da627eba88 Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
kib
7219eaccfb Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.

In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.

Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.

PR: 231445
Reported by:	Efi Weiss <valmarelox@gmail.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20381
2019-05-29 14:05:27 +00:00
jhb
ae6222b0c3 Drop "All rights reserved" from my copyright statements.
Reviewed by:	rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19485
2019-03-06 22:11:45 +00:00
mmacy
63ce31f3b1 turnstile / sleepqueue: annotate variables only used by debug builds 2018-05-19 05:00:16 +00:00
markj
7cb229344d Use LIST_FOREACH_SAFE in sleepq_chains_remove_matching().
We may remove a sleepqueue from the hash table in
sleepq_resume_thread().

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14847
2018-03-25 20:12:14 +00:00
kan
c8da6fae2c Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
pfg
cc22a86800 sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
mjg
ca37b93652 Make the sleepq chain hash size configurable per-arch and bump on amd64.
While here cache-align chains.

This shortens longest found chain during poudriere -j 80 from 32 to 16.

Pushing this higher up will probably require allocation on boot.
2017-10-22 20:43:50 +00:00
hselasky
69fd8ab233 When showing the sleepqueues from the in-kernel debugger,
properly dump all the sendqueues and not just the first one

History:
It appears that in the commit which introduced the code,
r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
were interchanged. In r180927 "td_name[i]" was corrected to
"td_name[0]", but "sq_blocked[0]" was left unchanged.

PR:		222624
Discussed with:	kmacy @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-09 18:33:29 +00:00
markj
08a37c1513 Let stack_create(9) take a malloc flags argument.
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12614
2017-10-06 21:52:28 +00:00
vangyzen
47fc9e6df6 Add missing pieces of r315280
I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.
Specifically:

Add documentation, polish comments, and improve style(9).

Tested by:	pho (r315280)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9791
2017-03-14 22:02:02 +00:00
vangyzen
c2172076df When the RTC is adjusted, reevaluate absolute sleep times based on the RTC
POSIX 2008 says this about clock_settime(2):

    If the value of the CLOCK_REALTIME clock is set via clock_settime(),
    the new value of the clock shall be used to determine the time
    of expiration for absolute time services based upon the
    CLOCK_REALTIME clock.  This applies to the time at which armed
    absolute timers expire.  If the absolute time requested at the
    invocation of such a time service is before the new value of
    the clock, the time service shall expire immediately as if the
    clock had reached the requested time normally.

    Setting the value of the CLOCK_REALTIME clock via clock_settime()
    shall have no effect on threads that are blocked waiting for
    a relative time service based upon this clock, including the
    nanosleep() function; nor on the expiration of relative timers
    based upon this clock.  Consequently, these time services shall
    expire when the requested relative interval elapses, independently
    of the new or old value of the clock.

When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen.  The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.

At present, this affects the following functions:

    pthread_cond_timedwait(3)
    pthread_mutex_timedlock(3)
    pthread_rwlock_timedrdlock(3)
    pthread_rwlock_timedwrlock(3)
    sem_timedwait(3)
    sem_clockwait_np(3)

I'm working on adding clock_nanosleep(2), which will also be affected.

Reported by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by:	jhb, kib
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9791
2017-03-14 19:06:44 +00:00
vangyzen
894839ac8b Fix grammar in some comments in subr_sleepqueue.c
While I'm here, remove trailing whitespace.

Reviewed by:	kib, mostly, as part of a larger review
MFC after:	3 days
2017-03-03 21:03:28 +00:00
badger
0361ff36a6 sleepq_catch_signals: do thread suspension before signal check
Since locks are dropped when a thread suspends, it's possible for another
thread to deliver a signal to the suspended thread. If the thread awakens from
suspension without checking for signals, it may go to sleep despite having
a pending signal that should wake it up. Therefore the suspension check is
done first, so any signals sent while suspended will be caught in the
subsequent signal check.

Reviewed by:	kib
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9530
2017-02-14 17:13:23 +00:00
markj
3719bd1ef7 Add a comment explaining the race fixed by r310423.
Suggested and reviewed by: jhb
X-MFC With:	r310423
2016-12-23 05:02:17 +00:00
markj
697ac35642 Revert part of r300109.
The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last
thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin
executing on a different CPU before sleepq_resume_thread() returns. This
leaves a window during which it may go back to sleep and incorrectly be
awoken again by the caller of sleepq_broadcast().

Reported and tested by:	pho
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2016-12-22 17:51:44 +00:00
jhb
f91c1f1791 Permit timed sleeps for threads other than thread0 before timers are working.
The callout subsystem already handles early callouts and schedules
the first clock interrupt appropriately based on the currently pending
callouts.  The one nit to fix was that callouts scheduled via C_HARDCLOCK
during early boot could fire too early once timers were enabled as the
per-CPU base time is always zero until timers are initialized.  The change
in callout_when() handles this case by using the current uptime as the
base time of the callout during bootup if the per-CPU base time is zero.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Netflix
2016-11-25 18:02:43 +00:00
markj
486a2523f7 Micro-optimize sleepq_signal().
Lift a comparison out of the loop that finds the highest-priority thread
on the queue.

MFC after:	1 week
2016-09-04 00:29:48 +00:00
kib
8fc564dae0 Rewrite subr_sleepqueue.c use of callouts to not depend on the
specifics of callout KPI.  Esp., do not depend on the exact interface
of callout_stop(9) return values.

The main change is that instead of requiring precise callouts, code
maintains absolute time to wake up.  Callouts now should ensure that a
wake occurs at the requested moment, but we can tolerate both run-away
callout, and callout_stop(9) lying about running callout either way.

As consequence, it removes the constant source of the bugs where
sleepq_check_timeout() causes uninterruptible thread state where the
thread is detached from CPU, see e.g. r234952 and r296320.

Patch also removes dual meaning of the TDF_TIMEOUT flag, making code
(IMO much) simpler to reason about.

Tested by:	pho
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7137
2016-07-28 09:09:55 +00:00
glebius
2b2f782761 The paradigm of a callout is that it has three consequent states:
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.

With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.

However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.

Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.

This change fixes panics on a high loaded TCP server.

Reviewed by:	jch, hselasky, rrs, kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7042
2016-07-05 18:47:17 +00:00
kib
2ff8e9c636 Provide helper macros to detect 'non-silent SBDRY' state and to
calculate appropriate return value for stops.  Simplify the code by
using them.

Fix typo in sig_suspend_threads().  The thread which sleep must be
aborted is td2. (*)

In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region.  For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)

Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)

Reported and tested by:	pho (*)
Requested by:	bde (**)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-07-03 18:19:48 +00:00
markj
d82dbf711e Micro-optimize sleepq_broadcast().
- Avoid a conditional branch on the return value of sleepq_resume_thread()
  by ORing its return value into the boolean wakeup_swapper. This is
  consistent with other sleepqueue functions which just pass this return
  value to their caller.
- sleepq_resume_thread() unconditionally removes the thread from its queue,
  so there's no need to maintain a pointer to the next element in the queue.

MFC after:	2 weeks
2016-05-18 03:50:21 +00:00
jhb
b4f65d818d Rework handling of thread sleeps before timers are working.
Previously, calls to *sleep() and cv_*wait*() immediately returned during
early boot.  Instead, permit threads that request a sleep without a
timeout to sleep as wakeup() works during early boot.  Sleeps with
timeouts are harder to emulate without working timers, so just punt and
panic explicitly if any thread tries to use those before timers are
working.  Any threads that depend on timeouts should either wait until
SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers
are available.

Until APs are started earlier this should be a no-op as other kthreads
shouldn't get a chance to start running until after timers are working
regardless of when they were created.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5724
2016-03-31 18:10:29 +00:00
cem
6b40e40026 fail(9): Only gather/print stacks if STACK is enabled
This is a follow-up fix to the earlier r296927.

Reported by:	bz
Sponsored by:	EMC / Isilon Storage Division
2016-03-17 01:05:53 +00:00
cem
1cab282ecb fail(9): Upstreaming some fail point enhancements
This is several year's worth of fail point upgrades done at EMC Isilon. They
are interdependent enough that it makes sense to put a single diff up for them.
Primarily, we added:

- Changing all mainline execution paths to be lockless, which lets us use fail
  points in more sleep-sensitive areas, and allows more parallel execution
- A number of additional commands, including 'pause' that lets us do some
  interesting deterministic repros of race conditions
- The ability to dump the stacks of all threads sleeping on a fail point
- A number of other API changes to allow marking up the fail point's context in
  the code, and firing callbacks before and after execution
- A man page update

Submitted by:	Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by:	cem (earlier version), jhb, kib, pho
With feedback from:	bdrewery
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5427
2016-03-16 04:22:32 +00:00
kib
486320aac4 If callout_stop_safe() noted that the callout is currently executing,
but next invocation is cancelled while migrating,
sleepq_check_timeout() needs to be informed that the callout is
stopped.  Otherwise the thread switches off CPU and never become
runnable, since running callout could have already raced with us,
while the migrating and cancelled callout could be one which is
expected to set TDP_TIMOFAIL flag for us.  This contradicts with the
expected behaviour of callout_stop() for other callers, which
e.g. decrement references from the callout callbacks.

Add a new flag CS_MIGRBLOCK requesting report of the situation as
'successfully stopped'.

Reviewed by:	jhb (previous version)
Tested by:	cognet, pho
PR:	200992
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D5221
2016-03-02 18:46:17 +00:00
hselasky
c0aba3b50d Revert for r277213:
FreeBSD developers need more time to review patches in the surrounding
areas like the TCP stack which are using MPSAFE callouts to restore
distribution of callouts on multiple CPUs.

Bump the __FreeBSD_version instead of reverting it.

Suggested by:		kmacy, adrian, glebius and kib
Differential Revision:	https://reviews.freebsd.org/D1438
2015-01-22 11:12:42 +00:00
hselasky
a9eed96a6b Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
attilio
195f74af68 sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes
before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING
case.

Sponsored by:	EMC / Isilon storage division
2014-06-24 15:16:55 +00:00
jhb
94d685456e Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
attilio
7ee4e910ce - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
jhb
8b099870ed Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
  issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
2013-03-18 17:23:58 +00:00
mav
ca616198e5 Make kern_nanosleep() and pause_sbt() to use per-CPU sleep queues.
This removes significant sleep queue lock congestion on multithreaded
microbenchmarks, making them scale to multiple CPUs almost linearly.
2013-03-12 06:58:49 +00:00
davide
a893437175 MFcalloutng:
Convert sleepqueue(9) bits to the new callout KPI. Take advantage of
the possibility to run callback directly from hw interrupt context.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, markj, Fabian Keil
2013-03-04 11:51:46 +00:00
jhb
8857575b13 Replace the TDP_NOSLEEPING flag with a counter so that the
THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest.

Reviewed by:	attilio
2013-03-01 22:03:31 +00:00
jhb
0fee3f66b8 Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
attilio
ac29dac65b Tweak the commit message in case of panic for sleeping from threads
with TDP_NOSLEEPING on.

The current message has no informations on the thread and wchan
involed, which may be useful in case where dumps have mangled dwarf
informations.

Reported by:    kib
Reviewed by:	bde, jhb, kib
MFC after:	1 week
2012-09-12 22:05:54 +00:00
rstone
a059a0e086 Implement the DTrace sched provider. This implementation aims to be
compatible with the sched provider implemented by Solaris and its open-
source derivatives.  Full documentation of the sched provider can be found
on Oracle's DTrace wiki pages.

Note that for compatibility with scripts originally written for Solaris,
serveral probes are defined that will never fire.  These probes are defined
to fire when Solaris-specific features perform certain actions.  As these
features are not present in FreeBSD, the probes can never fire.

Also, I have added a two probes that are not defined in Solaris, lend-pri
and load-change.  These probes have been added to make it possible to
collect schedgraph data with DTrace.

Finally, a few probes are defined in Solaris to take a cpuinfo_t *
argument.  As it was not immediately clear to me how to translate that to
FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *.

Sponsored by: Sandvine Incorporated
MFC after:	2 weeks
2012-05-15 01:30:25 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
mdf
7fc649fc41 Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9).  This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by:	rwatson
2011-01-27 00:34:12 +00:00
jhb
b92da6d9e2 Rework realtime priority support:
- Move the realtime priority range up above kernel sleep priorities and
  just below interrupt thread priorities.
- Contract the interrupt and kernel sleep priority ranges a bit so that
  the timesharing priority band can be increased.  The new timeshare range
  is now slightly larger than the old realtime + timeshare ranges.
- Change the ULE scheduler to no longer use realtime priorities for
  interactive threads.  Instead, the larger timeshare range is now split
  into separate subranges for interactive and non-interactive ("batch")
  threads.  The end result is that interactive threads and non-interactive
  threads still use the same priority ranges as before, but realtime
  threads now have a separate, dedicated priority range.
- Do not modify the priority of non-timeshare threads in sched_sleep()
  or via cv_broadcastpri().  Realtime and idle priority threads will
  no longer have their priorities affected by sleeping in the kernel.

Reviewed by:	jeff
2011-01-14 17:06:54 +00:00
mdf
5695ef4698 Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
mdf
3ed6eac561 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
mdf
bc54684253 Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00