Commit Graph

126 Commits

Author SHA1 Message Date
Ryan Stone
b3e9e682cf Implement the DTrace sched provider. This implementation aims to be
compatible with the sched provider implemented by Solaris and its open-
source derivatives.  Full documentation of the sched provider can be found
on Oracle's DTrace wiki pages.

Note that for compatibility with scripts originally written for Solaris,
serveral probes are defined that will never fire.  These probes are defined
to fire when Solaris-specific features perform certain actions.  As these
features are not present in FreeBSD, the probes can never fire.

Also, I have added a two probes that are not defined in Solaris, lend-pri
and load-change.  These probes have been added to make it possible to
collect schedgraph data with DTrace.

Finally, a few probes are defined in Solaris to take a cpuinfo_t *
argument.  As it was not immediately clear to me how to translate that to
FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *.

Sponsored by: Sandvine Incorporated
MFC after:	2 weeks
2012-05-15 01:30:25 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Matthew D Fleming
00f0e671ff Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9).  This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by:	rwatson
2011-01-27 00:34:12 +00:00
John Baldwin
2dc29adb9f Rework realtime priority support:
- Move the realtime priority range up above kernel sleep priorities and
  just below interrupt thread priorities.
- Contract the interrupt and kernel sleep priority ranges a bit so that
  the timesharing priority band can be increased.  The new timeshare range
  is now slightly larger than the old realtime + timeshare ranges.
- Change the ULE scheduler to no longer use realtime priorities for
  interactive threads.  Instead, the larger timeshare range is now split
  into separate subranges for interactive and non-interactive ("batch")
  threads.  The end result is that interactive threads and non-interactive
  threads still use the same priority ranges as before, but realtime
  threads now have a separate, dedicated priority range.
- Do not modify the priority of non-timeshare threads in sched_sleep()
  or via cv_broadcastpri().  Realtime and idle priority threads will
  no longer have their priorities affected by sleeping in the kernel.

Reviewed by:	jeff
2011-01-14 17:06:54 +00:00
Matthew D Fleming
4e6571599b Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
Matthew D Fleming
404a593e28 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
Matthew D Fleming
dd67e2103c Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00
David Xu
b274870405 make sure thread lock is locked. 2010-08-20 23:51:34 +00:00
David Xu
c6aa908d9c If thread set a TDP_WAKEUP for itself, clears the flag and returns EINTR
immediately, this is used for implementing reliable pthread cancellation.
2010-08-20 04:28:30 +00:00
John Baldwin
418a27e99e Update comment for tdsignal() -> tdsendsignal() rename. Forgot to include
this in 209592.
2010-06-30 18:00:45 +00:00
Attilio Rao
f7829d0d5c Introduce the new kernel thread called "deadlock resolver".
While the name is pretentious, a good explanation of its targets is
reported in this 17 months old presentation e-mail:
http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html

In order to implement it, the sq_type in sleepqueues is mandatory and not
only compiled along with INVARIANTS option. Additively, a new sleepqueue
function, sleepq_type() is added, returning the type of the sleepqueue
linked to a wchan.
Three new sysctls are added in order to configure the thread:
debug.deadlkres.slptime_threshold
debug.deadlkres.blktime_threshold
debug.deadlkres.sleepfreq

rappresenting the thresholds for sleep and block time that will lead to
a deadlock matching (when exceeded), while the sleepfreq rappresents the
number of seconds between 2 consecutive thread runnings.
In order to enable the deadlock resolver thread recompile your kernel
with the option DEADLKRES.

Reviewed by:	jeff
Tested by:	pho, Giovanni Trematerra
Sponsored by:	Nokia Incorporated, Sandvine Incorporated
MFC after:	2 weeks
2010-01-09 01:46:38 +00:00
Attilio Rao
2028867def In current code, threads performing an interruptible sleep (on both
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.

A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on.  In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).

In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.

The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.

Reported by:	avg, kib, pho
Reviewed by:	kib
Tested by:	pho
2009-12-12 21:31:07 +00:00
Konstantin Belousov
f33a947b56 Add new msleep(9) flag PBDY that shall be specified together with
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.

Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:52:46 +00:00
David Xu
6d9b63d6c8 Revision 184199 had not been fully reverted, add missing piece.
Reported by: phk
2008-12-01 01:54:55 +00:00
David Xu
7b4a950a7d Revert rev 184216 and 184199, due to the way the thread_lock works,
it may cause a lockup.

Noticed by: peter, jhb
2008-11-05 03:01:23 +00:00
John Baldwin
0caf1ab15a Don't bother calling setrunnable() and clearing the sleeping flag in
sleepq_resume_thread() if the thread isn't asleep.
2008-11-04 19:13:53 +00:00
David Xu
6406fd0be6 partly revert revision 184199, because TDF_NEEDSIGCHK is persitent
when thread is in kernel mode, it can cause dead loop, now unlock
process lock after acquired sleep queue lock and thread lock to
avoid the problem. This means TDF_NEEDSIGCHK and TDF_NEEDSUSPCHK must
be set with process lock and thread lock being hold at same time.
2008-10-24 01:03:31 +00:00
David Xu
3f9be10eb0 Actually, for signal and thread suspension, extra process spin lock is
unnecessary, the normal process lock and thread lock are enough. The
spin lock is still needed for process and thread exiting to mimic
single sched_lock.
2008-10-23 07:55:38 +00:00
Sam Leffler
39297ba455 Make ddb command registration dynamic so modules can extend
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
  commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
  commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
  for registering top-level commands, show operands, and show all operands,
  respectively

While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
  for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"

Submitted by:	Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by:	jhb
MFC after:	1 month
2008-09-15 22:45:14 +00:00
John Baldwin
aa4c44b58b Close a race in sleepq_broadcast() where the sleepq could be reused after
it had been assigned to the last sleeping thread.  That thread might have
started running on another CPU and have reused that sleep queue.  Fix it
by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than
a while loop.

PR:		amd64/124200
Discovered by:	tegge
Tested by:	benjsc
MFC after:	1 week
2008-09-08 19:44:57 +00:00
John Baldwin
da7bbd2c08 If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup().  When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup.  However, this required
grabbing proc0's thread lock to perform the wakeup.  If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked.  The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up.  The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with:	jeff
Glanced at by:	sam
Tested by:	Jurgen Weber  jurgen - ish com au
MFC after:	2 weeks
2008-08-05 20:02:31 +00:00
John Baldwin
f7f1cc1518 Really fix this. 2008-07-28 18:33:43 +00:00
Pawel Jakub Dawidek
7224dd4dad Properly check if td_name is empty and if it is, print process name,
instead of empty thread name.

Reviewed by:	jhb
2008-07-28 18:10:26 +00:00
Jeff Roberson
8df78c41d6 - Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
 - In reset walk the children of kern_sched_stats and reset the counters
   via the oid_arg1 pointer.  This allows us to add arbitrary counters to
   the tree and still reset them properly.
 - Define a set of switch types to be passed with flags to mi_switch().
   These types are named SWT_*.  These types correspond to SCHED_STATS
   counters and are automatically handled in this way.
 - Make the new SWT_ types more specific than the older switch stats.
   There are now stats for idle switches, remote idle wakeups, remote
   preemption ithreads idling, etc.
 - Add switch statistics for ULE's pickcpu algorithm.  These stats include
   how much migration there is, how often affinity was successful, how
   often threads were migrated to the local cpu on wakeup, etc.

Sponsored by:	Nokia
2008-04-17 04:20:10 +00:00
Jeff Roberson
e8245292a7 - Convert two timeout users to the new callout_reset_curcpu() api.
Sponsored by:	Nokia
2008-04-02 11:21:42 +00:00
Jeff Roberson
b7edba7704 - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
 - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
   thread_suspend_check() to ast() rather than userret().
 - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
   that we don't miss a suspend request.  If this is set use the
   expensive signal path.
 - Set NEEDSUSPCHK when creating a new thread in thr in case the
   creating thread is due to be suspended as well but has not yet.

Reviewed by:	davidxu (Authored original patch)
2008-03-21 08:23:25 +00:00
Jeff Roberson
241fbd3d13 - At the top of sleepq_catch_signals() lock the thread and check TDF_NEEDSIGCHK
before doing the very expensive cursig() and related locking.  NEEDSIGCHK
   is updated whenever our signal mask change or when a signal is delivered and
   should be sufficient to avoid the more expensive tests.  This eliminates
   another source of PROC_LOCK contention in multithreaded programs.
2008-03-19 07:35:14 +00:00
Jeff Roberson
afc5854dbc - Add a facility similar to LOCK_PROFILING under SLEEPQUEUE_PROFILING. Keep
a simple (wmesg, count) tuple in a hash to keep track of how many times
   we sleep at each wait message.  We hash on message and not channel.  No
   line number information is given as typically wait messages are not used in
   more than one place.  Identical strings defined at different addresses will
   show up with seperate counters.
 - Use debug.sleepq.enable to enable, .reset to reset, and .stats dumps stats.
 - Do an unsynchronized check in sleepq_switch() prior to switching before
   calling sleepq_profile() which uses a global lock to synchronize the hash.
   Only sleeps which actually cause a context switch are counted.
2008-03-19 07:22:07 +00:00
Jeff Roberson
f4d77e9e54 PR 117603
- Close a sleepqueue signal race by interlocking with the per-process
   spinlock.  This was mistakenly omitted from the thread_lock patch and
   has been a race since.

MFC After:	1 week
PR:		bin/117603
Reported by:	Danny Braniss <danny@cs.huji.ac.il>
2008-03-13 00:46:12 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Jeff Roberson
c5aa6b581d - Pass the priority argument from *sleep() into sleepq and down into
sched_sleep().  This removes extra thread_lock() acquisition and
   allows the scheduler to decide what to do with the static boost.
 - Change the priority arguments to cv_* to match sleepq/msleep/etc.
   where 0 means no priority change.  Catch -1 in cv_broadcastpri() and
   convert it to 0 for now.
 - Set a flag when sleeping in a way that is compatible with swapping
   since direct priority comparisons are meaningless now.
 - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
   controls the boost behavior.  Turning it off gives better performance
   in some workloads but needs more investigation.
 - While we're modifying sleepq, change signal and broadcast to both
   return with the lock held as the lock was held on enter.

Reviewed by:	jhb, peter
2008-03-12 06:31:06 +00:00
John Baldwin
bf49347744 Mark sleepqueue chain spin mutexes are recursable since the sleepq code
now recurses on them in sleepq_broadcast() and sleepq_signal() when
resuming threads that are fully asleep.

MFC after:	1 week
2008-02-13 23:36:56 +00:00
Jeff Roberson
626ac252ea - Add THREAD_LOCKPTR_ASSERT() to assert that the thread's lock points at
the provided lock or &blocked_lock.  The thread may be temporarily
   assigned to the blocked_lock by the scheduler so a direct comparison
   can not always be made.
 - Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling
   interfaces.  The schedulers themselves still use more explicit asserts.

Sponsored by:	Nokia
2008-02-07 06:55:38 +00:00
John Baldwin
02d23fdd74 Fix a bug where a thread that hit the race where the sleep timeout fires
while the thread does not hold the thread lock would stop blocking for
subsequent interruptible sleeps and would always immediately fail the
sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout).

Some background:
- KSE has a facility for allowing one thread to interrupt another thread.
  During this process, the target thread aborts any interruptible sleeps
  much as if the target thread had a pending signal.  Once the target
  thread acknowledges the interrupt, normal sleep handling resumes.  KSE
  manages this via the TDF_INTERRUPTED flag.  Specifically, it sets the
  flag when it sends an interrupt to another thread and clears it when
  the interrupt is acknowledged.  (Note that this is purely a software
  interrupt sort of thing and has no relation to hardware interrupts
  or kernel interrupt threads.)
- The old code for handling the sleep timeout race handled the race
  by setting the TDF_INTERRUPT flag and faking a KSE-style thread
  interrupt to the thread in the process of going to sleep.  It probably
  should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals()
  instead.
- The bug was that the sleepq code would set TDF_INTERRUPT but it was
  never cleared.  The sleepq code couldn't safely clear it in case there
  actually was a real KSE thread interrupt pending for the target thread
  (in fact, the sleepq timeout actually stomped on said pending interrupt).
  Thus, any future interruptible sleeps (*sleep(.. PCATCH ..) or
  cv_*wait_sig()) would see the TDF_INTERRUPT flag set and immediately
  fail with EWOULDBLOCK.  The flag could be cleared if the thread belonged
  to a KSE process and another thread posted an interrupt to the original
  thread.  However, in the more common case of a non-KSE process, the
  thread would pretty much stop sleeping.
- Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and
  not messing with TDF_INTERRUPT and td_intrval.  With yesterday's fix to
  fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient.

MFC after:	3 days
2008-01-25 19:44:46 +00:00
John Baldwin
515594a06f Fix a race in the sleepqueue timeout code that resulted in sleeps not
being properly cancelled by a timeout.  In general there is a race
between a the sleepq timeout handler firing while the thread is still
in the process of going to sleep.  In 6.x with sched_lock, the race was
largely protected by sched_lock.  The only place it was "exposed" and had
to be handled was while checking for any pending signals in
sleepq_catch_signals().

With the thread lock changes, the thread lock is dropped in between
sleepq_add() and sleepq_*wait*() opening up a new window for this race.
Thus, if the timeout fired while the sleeping thread was in between
sleepq_add() and sleepq_*wait*(), the thread would be marked as timed
out, but the thread would not be dequeued and sleepq_switch() would
still block the thread until it was awakened via some other means.  In
the case of pause(9) where there is no other wakeup, the thread would
never be awakened.

Fix this by teaching sleepq_switch() to check if the thread has had its
sleep canceled before blocking by checking the TDF_TIMEOUT flag and
aborting the sleep and dequeueing the thread if it is set.

MFC after:	3 days
Reported by:	dwhite, peter
2008-01-25 02:09:38 +00:00
Julian Elischer
e01eafef2a A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
Julian Elischer
431f890614 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
Attilio Rao
c7fb7ce53a subr_sleepqueue.c presents a thread lock missing which leads to dangerous
races for some struct thread members.
More specifically, this bug seems responsible for some memory dumping
problems people were experiencing.

Fix this adding correct thread locking.

Tested by: rwatson
Submitted by: tegge
Approved by: jeff
Approved by: re
2007-09-13 09:12:36 +00:00
Jeff Roberson
3036ab79e3 - Include opt_sched.h for SCHED_STATS. 2007-06-12 23:27:31 +00:00
Jeff Roberson
d72e80f09a Commit 2/14 of sched_lock decomposition.
- Adapt sleepqueues to the new thread_lock() mechanism.
 - Delay assigning the sleep queue spinlock as the thread lock until after
   we've checked for signals.  It is illegal for a thread to return in
   mi_switch() with any lock assigned to td_lock other than the scheduler
   locks.
 - Change sleepq_catch_signals() to do the switch if necessary to simplify
   the callers.
 - Simplify timeout handling now that locking a sleeping thread has the
   side-effect of locking the sleepqueue.  Some previous races are no
   longer possible.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:56 +00:00
Jeff Roberson
2b7e2ee7a5 - Convert turnstiles and sleepqueus to use UMA. This provides a modest
speedup and will be more useful after each gains a spinlock in the
   impending thread_lock() commit.
 - Move initialization and asserts into init/fini routines.  fini routines
   are only needed in the INVARIANTS case for now.

Submitted by:	Attilio Rao <attilio@FreeBSD.org>
Tested by:	kris, jeff
2007-05-18 06:32:24 +00:00
Kip Macy
10ebecb796 Cleaner fix for handling declaration of loop variable under INVARIANTS
- in trying to avoid nested brackets and #ifdef INVARIANTS around i at the
  top, I broke booting for INVARIANTS all together :-(
- the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled
- tested both with and without INVARIANTS :-/
2006-12-17 00:14:20 +00:00
Andrey A. Chernov
6d87718991 Don't intermix assignments and variable declarations in prev. commit 2006-12-16 21:17:27 +00:00
Andrey A. Chernov
fc6d254f9e Fix NULL pointer reference for INVARIANTS case
Submitted by:   Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
2006-12-16 20:33:26 +00:00
Kip Macy
bd9275b4c4 correct name of number of sleep queues 2006-12-16 07:50:39 +00:00
Kip Macy
6cbb70e2cc Add second sleep queue so that sx and lockmgr can have separate sleep
queues for shared and exclusive acquisitions

Submitted by: Attilio Rao
Approved by: jhb
2006-12-16 06:54:09 +00:00
Pawel Jakub Dawidek
7ee07175af Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',
which allows to use it with different kinds of locks. For example it allows
to implement Solaris conditions variables which will be used in ZFS port on
top of sx(9) locks.

Reviewed by:	jhb
2006-11-16 01:02:00 +00:00
John Baldwin
f9ab2f134f Print td_name instead of p_comm if td_name is non-empty for
'show turnstile' and 'show sleepq'.
2006-04-21 20:40:43 +00:00
John Baldwin
32553b153e Add a 'show sleepqueue' alias for 'show sleepq' in DDB. 2006-04-17 20:16:32 +00:00
David Xu
cfd6f8cd6c Clear TDF_SINTR in sleepq_resume_thread, also sleepq_catch_signal does
not need to clear it now, this should fix panic when msleep is recursivly
called. Patch is slightly adjusted after review.

Reviewed by: jhb
Tested by: Csaba Henk, csaba-ml at creo.hu
MFC after: 3 days
2006-04-13 23:29:25 +00:00
David Xu
dc94f5e383 Move comments to more accurate place. 2006-02-23 03:42:17 +00:00
David Xu
c008d51784 Fix a sleep queue race for KSE thread.
Reviewed by: jhb
2006-02-23 00:13:58 +00:00
David Xu
94f0972bec Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by:	jhb
MFC after:	1 week
2006-02-15 23:52:01 +00:00
Warner Losh
6229621e2c lock unused when INVARIANTS not defined, so don't declare it then 2006-01-28 00:49:31 +00:00
John Baldwin
f126e754e0 Add a new ddb command 'show sleepq'. It takes a wait channel as an
argument and looks for a sleep queue associated with that wait channel.
If it finds one it will display information such as the list of threads
sleeping on that queue.  If it can't find a sleep queue for that wait
channel, then it will see if that address matches any of the active
sleep queues.  If so, it will display information about the sleepq at the
specified address.
2006-01-27 22:24:07 +00:00
Warner Losh
2002eaadb7 Clarify panic message, I parsed the old one 'trying to sleep while sleeping' 2005-11-09 07:28:52 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
John Baldwin
51460da87f - Add a new simple facility for marking the current thread as being in a
state where sleeping on a sleep queue is not allowed.  The facility
  doesn't support recursion but uses a simple private per-thread flag
  (TDP_NOSLEEPING).  The sleepq_add() function will panic if the flag is
  set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
  (ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.

MFC after:	1 week
Reviewed by:	phk
2005-09-15 19:05:37 +00:00
David Xu
0cb7166f78 Remove thread_upcall_check, it was used to avoid race bug in earlier
day's sleep queue code, today the bug no longer exists.
please see 04/25/2004 freebsd-threads@ mailing list archive.
2005-05-27 15:57:27 +00:00
John Baldwin
95b66e9e53 Close a race between sleepq_broadcast() and sleepq_catch_signals().
Specifically, sleepq_broadcast() uses td_slpq for its private pending
queue of threads that it is going to wake up after it takes them off the
sleep queue.  The problem is that if one of the threads is actually not
asleep yet, then we can end up with td_slpq being corrupted and/or the
thread being made runnable at the wrong time resulting in the td_sleepqueue
== NULL assertion failures occasionally reported under heavy load.

The fix is to stop being so fancy and ditch the whole pending queue bit.
Instead, sleepq_remove_thread() and sleepq_resume_thread() were merged
into one function that requires the caller to hold sched_lock.  This
fixes several places that unlocked sched_lock only to call a function
that then locked sched_lock, so even though sched_lock is now held
slightly longer, removing the extra lock acquires (1 pair instead of 3
in some cases) probably makes it an overall win if you don't include the
fact that it closes a race.  This is definitely a 5.4 candidate.

PR:		kern/79693
Submitted by:	Steven Sears stevenjsears at yahoo dot com
MFC after:	4 days
2005-04-14 06:30:32 +00:00
Poul-Henning Kamp
c711aea6ca Make a bunch of malloc types static.
Found by:	src/tools/tools/kernxref
2005-02-10 12:02:37 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
John Baldwin
bd1d11f5dc - Store threads on sleep queues in FIFO order rather than sorted by
priority.  The sleep queues don't get updated when the priority of
  threads changes, so sleepq_signal() might not always wakeup the
  highest priority thread.  Updating the queues when thread priorities
  change cannot be easily done due to lock orders, so instead we do an
  O(n) walk of the queue for a sleepq_signal() operation instead of O(1).
  On the other hand, adding a thread to a sleep queue now goes from O(n)
  to O(1) so it ends up as an even tradeoff.  The correctness here with
  regards to priorities is actually fairly important.  msleep() gives
  interactive threads their priority "boost" after they are placed on the
  queue, but before this fix that "boost" wasn't used to determine the
  highest priority thread that sleepq_signal() awoke.
- Fix up some comments.

Inspired by:	ups, bde
2004-11-05 20:19:58 +00:00
John Baldwin
2ff0e645d1 Refine the turnstile and sleep queue interfaces just a bit:
- Add a new _lock() call to each API that locks the associated chain lock
  for a lock_object pointer or wait channel.  The _lookup() functions now
  require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
  the associated queue structure internally via _lookup() rather than
  accepting a pointer from the caller.  For turnstiles, this means that
  the actual lookup of the turnstile in the hash table is only done when
  the thread actually blocks rather than being done on each loop iteration
  in _mtx_lock_sleep().  For sleep queues, this means that sleepq_lookup()
  is no longer used outside of the sleep queue code except to implement an
  assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
  lock is already required.  For condition variables, this lets the
  cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
  while testing the waiters count.  This means that the waiters count
  internal to condition variables is no longer protected by the interlock
  mutex and cv_broadcast() and cv_signal() now no longer require that the
  interlock be held when they are called.  This lets consumers of condition
  variables drop the lock before waking other threads which can result in
  fewer context switches.

MFC after:	1 month
2004-10-12 18:36:20 +00:00
Stephan Uphoff
c6a08cf2d7 Directly modifying the priority of a thread that may be on the runqueue
can break the sorting order of the ksegp run queue.

Tested   by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:31:23 +00:00
John Baldwin
007ddf7e7a Now that the return value semantics of cv's for multithreaded processes
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
  thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
  Specifically, if a thread is awakened by something other than a signal
  while checking for signals before going to sleep, clear TDF_SINTR in
  sleepq_catch_signals().  This removes a sched_lock lock/unlock combo in
  that edge case during an interruptible sleep.  Also, fix
  sleepq_check_signals() to properly handle the condition if TDF_SINTR is
  clear rather than requiring the callers of the sleepq API to notice
  this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
  sleepq_broadcast() by creating an explicit submask for sleepq types.
  Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
  0.  Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
  move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
  than sleepq_catch_signals().  Note that it is the caller's responsibility
  to ensure that sleepq_catch_signals() is called if and only if this flag
  is passed to the preceeding sleepq_add().  Note that this also removes a
  sched_lock lock/unlock pair from sleepq_catch_signals().  It also ensures
  that for an interruptible sleep, TDF_SINTR is always set when
  TD_ON_SLEEPQ() is true.
2004-08-19 11:31:42 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
John Baldwin
ef0ebfc351 Add two new kernel options to allow rudimentary profiling of the internal
hash tables used in the sleep queue and turnstile code.  Each option adds
a sysctl tree under debug containing the maximum depth of any bucket in
the hash table as well as a separate node for each bucket (or chain)
containing the current depth and maximum depth for that bucket.
2004-06-29 02:30:12 +00:00
John Baldwin
a5471e4ef4 Remove the signal_caught argument from sleepq_timedwait() as it was
effectively always zero.
2004-06-28 18:57:06 +00:00
Bruce Evans
a13ec35b05 Fixed some common printf format errors. Don't assume that "struct foo *"
is "void *" (it isn't) or that the default promotion of pid_t is int.
Instead, assume that casting "struct foo *" to "void *" and printing the
result with %p is useful, and that all pid_t's are representable as longs.

Fixed some minor style bugs (mainly spelling errors in comments).
2004-05-14 20:51:42 +00:00
John Baldwin
3335671ddd Split sleepq_wakeup_thread() into two functions. sleepq_remove_thread()
removes a specific thread from a sleep queue.  sleepq_resume_thread()
resumes scheduling of a thread that has been previously removed from a
sleep queue.
- sleepq_catch_signals() just removes a thread from the queue it was just
  added to when a pending signal is found.
- sleepq_signal() and sleepq_broadcast() remove threads from a queue,
  drop the queue lock, and then resume all the previously removed threads.
  This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it
  makes it a little better as we no longer call setrunnble() with a sleep
  queue lock held meaning if setrunnable() tries to wakeup the swapper we
  don't try to lock two sleep queue chains at the same time.
2004-05-13 20:00:43 +00:00
Daniel Eischen
4fc21c0947 Keep track of threads waiting in kse_release() to avoid a race
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues.  Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.

This commit should fix recent mysql hangs.

Reviewed by:	jhb, davidxu
Mysql'd by:	Robin P. Blanchard <robin.blanchard at gactr uga edu>
2004-04-28 20:36:53 +00:00
John Baldwin
27de234992 Remove a bogus assertion and readd it in a more correct location. A thread
might be enqueued on a sleep queue but not be asleep when the timeout fires
if it is blocked on a lock trying to check for pending signals before going
to sleep.  In the case of fixing up the TDF_TIMEOUT race, however, the
thread must be marked asleep.

Reported by:	kan (the bogus one)
2004-03-16 18:56:22 +00:00
John Baldwin
1ed3e44f22 - Remove old sleep queues.
- Remove sleepqueue argument from sleepq_set_timeout() since it is not
  used.
2004-03-12 19:06:18 +00:00
John Baldwin
efac7951fe Always assert that the passed in lock is the same as the saved lock in the
sleep queue now that the one abnormal case has been fixed.
2004-03-02 15:02:08 +00:00
John Baldwin
dd75b0a90d Add an implementation of a generic sleep queue abstraction that is used
to queue threads sleeping on a wait channel similar to how turnstiles are
used to queue threads waiting for a lock.  This subsystem will be used as
the backend for sleep/wakeup and condition variables initially.  Eventually
it will also be used to replace the ithread-specific iwait thread
inhibitor.

Sleep queues are also not locked by sched_lock, so this splits sched_lock
up a bit further increasing concurrency within the scheduler.  Sleep queues
also natively support timeouts on sleeps and interruptible sleeps allowing
for the reduction of a lot of duplicated code between the sleep/wakeup and
condition variable implementations.  For more details on the sleep queue
implementation, check the comments in sys/sleepqueue.h and
kern/subr_sleepqueue.c.
2004-02-27 18:33:09 +00:00