Commit Graph

163 Commits

Author SHA1 Message Date
Mitchell Horne
1029dab634 mi_switch(): clean up switch types and their usage
Overall, this is a non-functional change, except for kernels built with
SCHED_STATS. However, the switch types are useful for communicating the
intent of the caller.

1. Ensure that every caller provides a type. In most cases, we upgrade
   the basic yield to sched_relinquish() aka SWT_RELINQUISH.
2. The case of sched_bind() is distinct, so add a new switch type SWT_BIND.
3. Remove the two unused types, SWT_PREEMPT and SWT_SLEEPQTIMO.
4. Remove SWT_NONE altogether and assert that callers always provide
   a type flag.
5. Reference the mi_switch(9) man page in the comments, as these flags
   will be documented there.

Reviewed by:	kib, markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38184
2023-02-09 12:01:32 -04:00
Alan Somers
6622e299ac Fix the build with SCHED_STATS after d3f96f6610
MFC with:	d3f96f6610
Sponsored by:	Axcient
2022-09-26 20:20:46 -06:00
Alexander Motin
c6c52d8e39 kern: Remove CTLFLAG_NEEDGIANT from some more sysctls.
MFC after:	2 weeks
2021-12-26 23:07:33 -05:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Jeff Roberson
686bcb5c14 schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
Matt Macy
6443773dab make critical_{enter, exit} inline
Avoid pulling in all of the <sys/proc.h> dependencies by
automatically generating a stripped down thread_lite exporting
only the fields of interest. The field declarations are type checked
against the original and the offsets of the generated result is
automatically checked.

kib has expressed disagreement and would have preferred to simply
use genassym style offsets (which loses type check enforcement).
jhb has expressed dislike of it due to header pollution and a
duplicate structure. He would have preferred to just have defined
thread in _thread.h. Nonetheless, he admits that this is the only
viable solution at the moment.

The impetus for this came from mjg's D15331:
"Inline critical_enter/exit for amd64"

Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D16078
2018-07-03 01:55:09 +00:00
Mateusz Guzik
6fee84e35e Remove incorrect owepreempt assertion added in r334062
Yet another preemption request hitting between the counter being 0
and the check being reached will result in the flag no longer being
set.

Note the situation was already present prior to r334062 and is harmless.

Reported by:	pho
Reviewed by:	kib
2018-05-23 10:13:17 +00:00
Mateusz Guzik
748b15fc02 Move preemption handling out of critical_exit.
In preperataion for making the enter/exit pair inline.

Reviewed by:	kib
2018-05-22 19:24:57 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Mateusz Guzik
32aef9ff05 sched: move panic handling code out of choosethread
This avoids jumps in the common case of the kernel not being panicked.
2017-11-17 02:45:38 +00:00
Konstantin Belousov
3467f88cd6 Add comments explaining unobvious td_critnest adjustments in
critical_exit().

Based on the discussion with:	jhb
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	D9276
MFC after:	1 week
2017-01-22 19:41:42 +00:00
Hans Petter Selasky
a115fb62ed Revert for r277213:
FreeBSD developers need more time to review patches in the surrounding
areas like the TCP stack which are using MPSAFE callouts to restore
distribution of callouts on multiple CPUs.

Bump the __FreeBSD_version instead of reverting it.

Suggested by:		kmacy, adrian, glebius and kib
Differential Revision:	https://reviews.freebsd.org/D1438
2015-01-22 11:12:42 +00:00
Hans Petter Selasky
1a26c3c047 Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
Attilio Rao
e68ccbe85e Add a comment on why inlining critical_enter() may not be a good idea
for the general case.

Reviewed by:	bde
MFC after:	1 week
2012-12-09 04:54:22 +00:00
Andriy Gapon
5e27a60372 critical_exit: ignore td_owepreempt if kdb_active is set
calling mi_switch in such a context results in a recursion via
kdb_switch

Suggested by:	jhb
Reviewed by:	jhb
MFC after:	5 weeks
2011-12-04 21:27:41 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Jeff Roberson
791d9a6e5d - Use DPCPU for SCHED_STATS. This is somewhat awkward because the
offset of the stat is not known until link time so we must emit a
   function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC
   directly.
 - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu
   variables.  Sched stats are always incremented while we're holding
   a spinlock so no further protection is required.

Reviewed by:	sam
2009-06-25 01:33:51 +00:00
Julian Elischer
681e40627d fix typo in runz_fuzz
noticed by:Elijah Buck
2008-05-12 06:42:06 +00:00
Jeff Roberson
8df78c41d6 - Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
 - In reset walk the children of kern_sched_stats and reset the counters
   via the oid_arg1 pointer.  This allows us to add arbitrary counters to
   the tree and still reset them properly.
 - Define a set of switch types to be passed with flags to mi_switch().
   These types are named SWT_*.  These types correspond to SCHED_STATS
   counters and are automatically handled in this way.
 - Make the new SWT_ types more specific than the older switch stats.
   There are now stats for idle switches, remote idle wakeups, remote
   preemption ithreads idling, etc.
 - Add switch statistics for ULE's pickcpu algorithm.  These stats include
   how much migration there is, how often affinity was successful, how
   often threads were migrated to the local cpu on wakeup, etc.

Sponsored by:	Nokia
2008-04-17 04:20:10 +00:00
Jeff Roberson
9727e63745 - Restore runq to manipulating threads directly by putting runq links and
rqindex back in struct thread.
 - Compile kern_switch.c independently again and stop #include'ing it from
   schedulers.
 - Remove the ts_thread backpointers and convert most code to go from
   struct thread to struct td_sched.
 - Cleanup the ts_flags #define garbage that was causing us to sometimes
   do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD.
 - Export the kern.sched sysctl node in sysctl.h
2008-03-20 05:51:16 +00:00
Jeff Roberson
52e95411f8 - Remove the unused and redundant sched_newproc() function.
- Remove the unused and redundant sched_newthread() which peaks into scheduler
   private structures.
2008-03-20 03:09:15 +00:00
Jeff Roberson
a90f3f2547 - Move maybe_preempt() from kern_switch.c to sched_4bsd.c. This is function
is only used by 4bsd.
 - Create a new runq_choose_fuzz() function rather than polluting runq_choose()
   with 4BSD specific code.
 - Move the fuzz sysctl into sched_4bsd.c
 - Remove some dead code from kern_switch.c
2008-03-20 02:14:02 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Julian Elischer
431f890614 generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
2007-11-14 06:21:24 +00:00
Jeff Roberson
5bce4ae3be - Fix ULE in kernels without PREEMPTION compiled in by always enabling the
critical_exit() owepreempt check.  ULE will always use owepreempt to
   preempt the idle thread.  This change does not effect 4BSD since it will
   never set owepreempt without PREEMPTION enabled.
 - Remove some unused code from choosethread().

Discussed with:	jhb
Approved by:	re
2007-10-08 23:37:28 +00:00
Attilio Rao
c8790f5d09 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Jeff Roberson
67e20930bd - Improve runq_findbit_from() which is used by ULE's circular queue. Mask
of the bits we want to ignore on the first pass rather than doing a
   linear scan.  This puts us within a few instructions of the cost of
   runq_findbit() and removes this function from the top of profiling output
   for context switch heavy workloads.

Approved by:	re
2007-08-20 06:36:12 +00:00
Jeff Roberson
413ea6f543 - Set SW_PREEMPT when we preempt in critical_exit().
Approved by:	re
2007-08-03 23:35:35 +00:00
Jeff Roberson
56696bd1ab - Remove explicit references to sched_lock. A simpler assert will do.
Approved by:	re
2007-07-19 08:58:40 +00:00
Jeff Roberson
671f2709ae - Garbage collect unused concurrency functions. 2007-06-12 19:50:31 +00:00
Jeff Roberson
7b20fb19fb Commit 1/14 of sched_lock decomposition.
- Move all scheduler locking into the schedulers utilizing a technique
   similar to solaris's container locking.
 - A per-process spinlock is now used to protect the queue of threads,
   thread count, suspension count, p_sflags, and other process
   related scheduling fields.
 - The new thread lock is actually a pointer to a spinlock for the
   container that the thread is currently owned by.  The container may
   be a turnstile, sleepqueue, or run queue.
 - thread_lock() is now used to protect access to thread related scheduling
   fields.  thread_unlock() unlocks the lock and thread_set_lock()
   implements the transition from one lock to another.
 - A new "blocked_lock" is used in cases where it is not safe to hold the
   actual thread's lock yet we must prevent access to the thread.
 - sched_throw() and sched_fork_exit() are introduced to allow the
   schedulers to fix-up locking at these points.
 - Add some minor infrastructure for optionally exporting scheduler
   statistics that were invaluable in solving performance problems with
   this patch.  Generally these statistics allow you to differentiate
   between different causes of context switches.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:30 +00:00
Jeff Roberson
ed0e8f2fe9 - Change types for necent runq additions to u_char rather than int.
- Fix these types in ULE as well.  This fixes bugs in priority index
   calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.

Reported by:	kkenn using pho's stress2.
2007-02-08 01:52:25 +00:00
Jeff Roberson
f0393f063a - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
Jeff Roberson
cd49bb7047 - Don't pass a pointer into runq_choose_from(). The caller can adjust the
index if it chooses to.
2007-01-04 12:10:58 +00:00
Jeff Roberson
3fed7d239a - Add three new functions to support circular run queues.
- runq_add_pri allows the caller to position the thread at any rqindex
   regardless of priority.
 - runq_choose_from() chooses the lowest priority thread starting from a given
   index.  The index is updated with the rqindex of the chosen thread.  This
   routine is used to pick the lowest priority relative to a given index.
 - runq_remove_idx() updates the index if the run queue that held the removed
   thread is now empty.
2007-01-04 08:39:58 +00:00
Robert Watson
2da78e3862 Prefer a more traditional spelling of inhibited in comments and panic
messages.
2006-12-31 15:56:04 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
Olivier Houchard
4bb0f51d1d sched_rem() already sets ke->ke_state to KES_THREAD, so there's no need
to redo it.
2006-06-01 22:45:56 +00:00
Alexander Kabaev
3f34977614 Trim trailing whitespace. 2005-12-28 17:13:31 +00:00
Nate Lawson
1335c4df32 Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED.
Requested by:	scottl, jhb
2005-12-18 18:10:57 +00:00
Nate Lawson
8615fd8696 Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB,
and KTR_IO as they were never used.  Remove KTR_CLK since it was only
used for hardclock firing and use KTR_INTR there instead.  Remove
KTR_CRITICAL since it was only used for crit enter/exit and use
KTR_CONTENTION instead.
2005-12-17 03:57:10 +00:00
David Xu
3c424d1447 In adjustrunqueue(), add code to handle thread migrating case for
ULE scheduler. In original code, local run queue of threaded ksegrp
is corrupted if adjustrunqueue() is called while thread is migrating.
2005-08-03 01:23:45 +00:00
Stephan Uphoff
3ea6bbc59a Restore preemption of idle threads.
Submitted by:	jhb
2005-06-10 03:00:29 +00:00
Stephan Uphoff
a3f2d84279 Lots of whitespace cleanup.
Fix for broken if condition.

Submitted by:	nate@
2005-06-09 19:43:08 +00:00