Commit Graph

132 Commits

Author SHA1 Message Date
Jeff Roberson
413ea6f543 - Set SW_PREEMPT when we preempt in critical_exit().
Approved by:	re
2007-08-03 23:35:35 +00:00
Jeff Roberson
56696bd1ab - Remove explicit references to sched_lock. A simpler assert will do.
Approved by:	re
2007-07-19 08:58:40 +00:00
Jeff Roberson
671f2709ae - Garbage collect unused concurrency functions. 2007-06-12 19:50:31 +00:00
Jeff Roberson
7b20fb19fb Commit 1/14 of sched_lock decomposition.
- Move all scheduler locking into the schedulers utilizing a technique
   similar to solaris's container locking.
 - A per-process spinlock is now used to protect the queue of threads,
   thread count, suspension count, p_sflags, and other process
   related scheduling fields.
 - The new thread lock is actually a pointer to a spinlock for the
   container that the thread is currently owned by.  The container may
   be a turnstile, sleepqueue, or run queue.
 - thread_lock() is now used to protect access to thread related scheduling
   fields.  thread_unlock() unlocks the lock and thread_set_lock()
   implements the transition from one lock to another.
 - A new "blocked_lock" is used in cases where it is not safe to hold the
   actual thread's lock yet we must prevent access to the thread.
 - sched_throw() and sched_fork_exit() are introduced to allow the
   schedulers to fix-up locking at these points.
 - Add some minor infrastructure for optionally exporting scheduler
   statistics that were invaluable in solving performance problems with
   this patch.  Generally these statistics allow you to differentiate
   between different causes of context switches.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:30 +00:00
Jeff Roberson
ed0e8f2fe9 - Change types for necent runq additions to u_char rather than int.
- Fix these types in ULE as well.  This fixes bugs in priority index
   calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.

Reported by:	kkenn using pho's stress2.
2007-02-08 01:52:25 +00:00
Jeff Roberson
f0393f063a - Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty.  The few asserts and thread state
   setting were moved to the individual schedulers.  sched_add() was
   chosen to displace it for naming consistency reasons.
 - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
   different on all three schedulers where it was only called in one place
   each.
 - Remove the long ifdef'd out remrunqueue code.
 - Remove the now redundant ts_state.  Inspect the thread state directly.
 - Don't set TSF_* flags from kern_switch.c, we were only doing this to
   support a feature in one scheduler.
 - Change sched_choose() to return a thread rather than a td_sched.  Also,
   rely on the schedulers to return the idlethread.  This simplifies the
   logic in choosethread().  Aside from the run queue links kern_switch.c
   mostly does not care about the contents of td_sched.

Discussed with:	julian

 - Move the idle thread loop into the per scheduler area.  ULE wants to
   do something different from the other schedulers.

Suggested by:	jhb

Tested on:	x86/amd64 sched_{4BSD, ULE, CORE}.
2007-01-23 08:46:51 +00:00
Jeff Roberson
cd49bb7047 - Don't pass a pointer into runq_choose_from(). The caller can adjust the
index if it chooses to.
2007-01-04 12:10:58 +00:00
Jeff Roberson
3fed7d239a - Add three new functions to support circular run queues.
- runq_add_pri allows the caller to position the thread at any rqindex
   regardless of priority.
 - runq_choose_from() chooses the lowest priority thread starting from a given
   index.  The index is updated with the rqindex of the chosen thread.  This
   routine is used to pick the lowest priority relative to a given index.
 - runq_remove_idx() updates the index if the run queue that held the removed
   thread is now empty.
2007-01-04 08:39:58 +00:00
Robert Watson
2da78e3862 Prefer a more traditional spelling of inhibited in comments and panic
messages.
2006-12-31 15:56:04 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
Olivier Houchard
4bb0f51d1d sched_rem() already sets ke->ke_state to KES_THREAD, so there's no need
to redo it.
2006-06-01 22:45:56 +00:00
Alexander Kabaev
3f34977614 Trim trailing whitespace. 2005-12-28 17:13:31 +00:00
Nate Lawson
1335c4df32 Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED.
Requested by:	scottl, jhb
2005-12-18 18:10:57 +00:00
Nate Lawson
8615fd8696 Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB,
and KTR_IO as they were never used.  Remove KTR_CLK since it was only
used for hardclock firing and use KTR_INTR there instead.  Remove
KTR_CRITICAL since it was only used for crit enter/exit and use
KTR_CONTENTION instead.
2005-12-17 03:57:10 +00:00
David Xu
3c424d1447 In adjustrunqueue(), add code to handle thread migrating case for
ULE scheduler. In original code, local run queue of threaded ksegrp
is corrupted if adjustrunqueue() is called while thread is migrating.
2005-08-03 01:23:45 +00:00
Stephan Uphoff
3ea6bbc59a Restore preemption of idle threads.
Submitted by:	jhb
2005-06-10 03:00:29 +00:00
Stephan Uphoff
a3f2d84279 Lots of whitespace cleanup.
Fix for broken if condition.

Submitted by:	nate@
2005-06-09 19:43:08 +00:00
Stephan Uphoff
f3a0f87396 Fix some race conditions for pinned threads that may cause them to run
on the wrong CPU.

Add IPI support for preempting a thread on another CPU.

MFC after:3 weeks
2005-06-09 18:26:31 +00:00
Stephan Uphoff
d13ec71369 Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
Stephan Uphoff
503c2ea34d Fix a bug that caused preemption to happen for a thread in the same
ksegrp with the same priority as the currently running thread.
This can cause propagate_priority() to panic.

Pointy hat to: ups
2005-05-19 01:08:30 +00:00
Stephan Uphoff
779186434a Sprinkle some volatile magic and rearrange things a bit to avoid race
conditions in critical_exit now that it no longer blocks interrupts.

Reviewed by:	jhb
2005-04-08 03:37:53 +00:00
John Baldwin
c6a37e8413 Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions.  They no longer have any affect on
interrupts.  This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit().  This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock.  For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections.  Note that I've also taken this
opportunity to push a few things into MD code rather than MI.  For example,
critical_fork_exit() no longer exists.  Instead, MD code ensures that new
threads have the correct state when they are created.  Also, we no longer
try to fixup the idlethreads for APs in MI code.  Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by:	grehan, cognet, arch@, others
Tested on:	i386, alpha, sparc64, powerpc, arm, possibly more
2005-04-04 21:53:56 +00:00
Robert Watson
6220dcba84 Add a read-only kern.sched.preemption sysctl so that user space can tell
if "options PREEMPTION" is compiled into the kernel.
2005-03-20 17:05:12 +00:00
Robert Watson
bc60830675 A further step on the journey of meaking panics and debugging more reliable:
in the window between the beginning of panic() and entering the debugger,
it's possible to receive interrupts.  If we receive an interrupt, don't
preempt if panicstr != NULL, as the system is in the process of failing, and
the preempting thread is likely to stumble over the failure.  The typical
scenario is during the printf() in panic() prior to entering the debugger,
but when running with a slower console type such as serial console.

It could be that the panic string should be passed to the debugger to print,
so that it can run from the debugger's environment rather than a regular
kernel printf.

Glanced at by:	jhb
2005-03-17 15:18:01 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Jeff Roberson
85da7a569b - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
Jeff Roberson
7842f65e7f - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
David Schultz
6db36923ad Remove local definitions of RANGEOF() and use __rangeof() instead.
Also remove a few bogus casts.
2004-11-20 23:00:59 +00:00
Robert Watson
f42a43fa2d Add basic critical section tracing to KTR using event type KTR_CRITICAL.
This generates a KTR event for each critical section entered and exited.

It would be desirable to also log the filename and line number of the
source entering or exiting the critical section, but this requires
hacking up the critical section API, so I've not done that yet.
2004-11-07 23:11:32 +00:00
Scott Long
b96741f410 If a process needs to be swapped in, wakeup the swapper from within
critical_exit as the process is getting scheduled to run.  This is subotimal
but for now avoid the LOR between the scheduler and the sleepq systems.
This is a 5.3 candidate.

Submitted by: davidxu
MFC After: 3 days
2004-10-16 06:38:22 +00:00
Stephan Uphoff
7c71b6453a Fix maybe_preempt_in_ksegrp for !SMP.
Tested   by: tegge
Reviewed by: julian
Approved by: sam (mentor)
MFC after: 3 days
2004-10-13 22:07:04 +00:00
Poul-Henning Kamp
13e7430fde Make !SMP kernels compile, and as far as I can tell, work again. 2004-10-12 20:57:37 +00:00
Stephan Uphoff
84f9d4b137 Prevent preemption in slot_fill.
Implement preemption between threads in the same ksegp in out of slot
situations to prevent priority inversion.

Tested   by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:30:20 +00:00
Julian Elischer
042b7b1af0 Don't release the slot twice.. sched_rem() has already done it.
Submitted by:	stephan uphoff (ups at tree dot com)
MFC after:	3 days
2004-10-10 05:19:22 +00:00
Julian Elischer
c20c691bed When preempting a thread, put it back on the HEAD of its run queue.
(Only really implemented in 4bsd)

MFC after:	4 days
2004-10-05 22:03:10 +00:00
Julian Elischer
d39063f20d Use some macros to trach available scheduler slots to allow
easier debugging.

MFC after:	4 days
2004-10-05 21:10:44 +00:00
David Schultz
8daa8c602a The zone from which proc structures are allocated is marked
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called.  Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code.  I have retained
vm_proc_dispose(), since I consider its disuse a bug.
2004-09-19 18:34:17 +00:00
Julian Elischer
14f0e2e9bf clean up thread runq accounting a bit.
MFC after:	3 days
2004-09-16 07:12:59 +00:00
Julian Elischer
9da3e923f4 e specific code to revert a partial add ot teh run queue, not
remrunqueue() which can't handle a partially added thread.

MFC after:	1 week
2004-09-16 05:37:40 +00:00
Julian Elischer
e8807f22f9 Oops accidentally removed #ifdef SCHED_4BSD
as part of another commit
This function is not yet used in ULE
2004-09-15 03:51:51 +00:00
Julian Elischer
1f9f5df61d Commit a fix for some panics we've been seeing with preemption.
MFC after:	2 days
2004-09-13 23:06:39 +00:00
Julian Elischer
b2578c6c06 Add some kasserts 2004-09-13 23:02:52 +00:00
Julian Elischer
3389af30e8 Add some code to allow threads to nominat a sibling to run if theyu are going to sleep.
MFC after:	1 week
2004-09-10 21:04:38 +00:00
Julian Elischer
5498350529 Make debug printf less threatenning and make it only print out once.
MFC after:	2 days
2004-09-07 06:38:22 +00:00
Julian Elischer
6a574b2afc Don't do IPIs on behalf of interrupt threads.
just punt straight on through to teh preemption code.

Make a KASSSERT out of a condition that can no longer occur.
MFC after:	1 week
2004-09-06 07:23:14 +00:00
Julian Elischer
ed062c8d66 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
Julian Elischer
44692526be remove unused code
MFC after:	 2 days
2004-09-02 23:37:41 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00