Commit Graph

297 Commits

Author SHA1 Message Date
Jeff Roberson
fe54587ffa - Move some common code out of sched_fork_exit() and back into fork_exit(). 2007-06-12 07:47:09 +00:00
Jeff Roberson
710eacdc5f - Placing the 'volatile' on the right side of the * in the td_lock
declaration removes the need for __DEVOLATILE().

Pointed out by:	tegge
2007-06-06 03:40:47 +00:00
Jeff Roberson
95e3a0bca3 - Better fix for previous error; use DEVOLATILE on the td_lock pointer
it can actually sometimes be something other than sched_lock even on
   schedulers which rely on a global scheduler lock.

Tested by:	kan
2007-06-05 04:12:46 +00:00
Jeff Roberson
c219b097af - Pass &sched_lock as the third argument to cpu_switch() as this will
always be the correct lock and we don't get volatile warnings this
   way.

Pointed out by:	kan
2007-06-05 03:46:54 +00:00
Jeff Roberson
36b369163b - Define TDQ_ID() for the !SMP case.
- Default pick_pri to off.  It is not faster in most cases.
2007-06-05 02:53:51 +00:00
Jeff Roberson
7b20fb19fb Commit 1/14 of sched_lock decomposition.
- Move all scheduler locking into the schedulers utilizing a technique
   similar to solaris's container locking.
 - A per-process spinlock is now used to protect the queue of threads,
   thread count, suspension count, p_sflags, and other process
   related scheduling fields.
 - The new thread lock is actually a pointer to a spinlock for the
   container that the thread is currently owned by.  The container may
   be a turnstile, sleepqueue, or run queue.
 - thread_lock() is now used to protect access to thread related scheduling
   fields.  thread_unlock() unlocks the lock and thread_set_lock()
   implements the transition from one lock to another.
 - A new "blocked_lock" is used in cases where it is not safe to hold the
   actual thread's lock yet we must prevent access to the thread.
 - sched_throw() and sched_fork_exit() are introduced to allow the
   schedulers to fix-up locking at these points.
 - Add some minor infrastructure for optionally exporting scheduler
   statistics that were invaluable in solving performance problems with
   this patch.  Generally these statistics allow you to differentiate
   between different causes of context switches.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:30 +00:00
Kip Macy
fb1e3ccd7e Schedule the ithread on the same cpu as the interrupt
Tested by: kmacy
Submitted by: jeffr
2007-04-20 05:45:46 +00:00
Jeff Roberson
52bc574cc7 - Handle the case where slptime == runtime.
Submitted by:	Atoine Brodin
2007-03-17 23:32:48 +00:00
Jeff Roberson
4499aff6ec - Cast the intermediate value in priority computtion back down to
unsigned char.  Weirdly, casting the 1 constant to u_char still produces
   a signed integer result that is then used in the % computation.  This
   avoids that mess all together and causes a 0 pri to turn into 255 % 64
   as we expect.

Reported by:	kkenn (about 4 times, thanks)
2007-03-17 18:13:32 +00:00
Julian Elischer
486a941418 Instead of doing comparisons using the pcpu area to see if
a thread is an idle thread, just see if it has the IDLETD
flag set. That flag will probably move to the pflags word
as it's permenent and never chenges for the life of the
system so it doesn't need locking.
2007-03-08 06:44:34 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Jeff Roberson
ed0e8f2fe9 - Change types for necent runq additions to u_char rather than int.
- Fix these types in ULE as well.  This fixes bugs in priority index
   calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.

Reported by:	kkenn using pho's stress2.
2007-02-08 01:52:25 +00:00
Jeff Roberson
fc3a97dcb7 - Implement much more intelligent ipi sending. This algorithm tries to
minimize IPIs and rescheduling when scheduling like tasks while keeping
   latency low for important threads.
   1) An idle thread is running.
   2) The current thread is worse than realtime and the new thread is
      better than realtime.  Realtime to realtime doesn't preempt.
   3) The new thread's priority is less than the threshold.
2007-01-25 23:51:59 +00:00
Jeff Roberson
1461899028 - Get rid of the unused DIDRUN flag. This was really only present to
support sched_4bsd.
 - Rename the KTR level for non schedgraph parsed events.  They take event
   space from things we'd like to graph.
 - Reset our slice value after we sleep.  The slice is simply there to
   prevent starvation among equal priorities.  A thread which had almost
   exhausted it's slice and then slept doesn't need to be rescheduled a
   tick after it wakes up.
 - Set the maximum slice value to a more conservative 100ms now that it is
   more accurately enforced.
2007-01-25 19:14:11 +00:00
Jeff Roberson
9a93305a2e - With a sleep time over 2097 seconds hzticks and slptime could end up
negative.  Use unsigned integers for sleep and run time so this doesn't
   disturb sched_interact_score().  This should fix the invalid interactive
   priority panics reported by several users.
2007-01-24 18:18:43 +00:00
Jeff Roberson
7a5e5e2a59 - Catch up to setrunqueue/choosethread/etc. api changes.
- Define our own maybe_preempt() as sched_preempt().  We want to be able
   to preempt idlethread in all cases.
 - Define our idlethread to require preemption to exit.
 - Get the cpu estimation tick from sched_tick() so we don't have to worry
   about errors from a sampling interval that differs from the time
   domain.  This was the source of sched_priority prints/panics and
   inaccurate pctcpu display in top.
2007-01-23 08:50:34 +00:00
Jeff Roberson
5cea64d54f - Disable the long-term load balancer. I believe that steal_busy works
better and gives more predictable results.
2007-01-20 21:24:05 +00:00
Jeff Roberson
c95d2db298 - We do need to IPI the idlethread on some systems. It may be stuck in
a power saving mode otherwise.
 - If the thread is already bound in sched_bind() unbind it before
   re-binding it to a new cpu.  I don't like these semantics but they are
   expected by some code in the tree.  Patch by jkoshy.
2007-01-20 17:03:33 +00:00
Jeff Roberson
6b2f763f7c - In tdq_transfer() always set NEEDRESCHED when necessary regardless of
the ipi settings.  If NEEDRESCHED is set and an ipi is later delivered
   it will clear it rather than cause extra context switches.  However, if
   we miss setting it we can have terrible latency.
 - In sched_bind() correctly implement bind.  Also be slightly more
   tolerant of code which calls bind multiple times.  However, we don't
   change binding if another call is made with a different cpu.  This
   does not presently work with hwpmc which I believe should be changed.
2007-01-20 09:03:43 +00:00
Jeff Roberson
7b8bfa0de9 Major revamp of ULE's cpu load balancing:
- Switch back to direct modification of remote CPU run queues.  This added
   a lot of complexity with questionable gain.  It's easy enough to
   reimplement if it's shown to help on huge machines.
 - Re-implement the old tdq_transfer() call as tdq_pickidle().  Change
   sched_add() so we have selectable cpu choosers and simplify the logic
   a bit here.
 - Implement tdq_pickpri() as the new default cpu chooser.  This algorithm
   is similar to Solaris in that it tries to always run the threads with
   the best priorities.  It is actually slightly more complex than
   solaris's algorithm because we also tend to favor the local cpu over
   other cpus which has a boost in latency but also potentially enables
   cache sharing between the waking thread and the woken thread.
 - Add a bunch of tunables that can be used to measure effects of different
   load balancing strategies.  Most of these will go away once the
   algorithm is more definite.
 - Add a new mechanism to steal threads from busy cpus when we idle.  This
   is enabled with kern.sched.steal_busy and kern.sched.busy_thresh.  The
   threshold is the required length of a tdq's run queue before another
   cpu will be able to steal runnable threads.  This prevents most queue
   imbalances that contribute the long latencies.
2007-01-19 21:56:08 +00:00
Jeff Roberson
eddb4efacd - Don't let SCHED_TICK_TOTAL() return less than hz. This can cause integer
divide faults in roundup() later if it is able to return 0.  For some
   reason this bug only shows up on my laptop and not my testboxes.
2007-01-06 12:33:43 +00:00
Jeff Roberson
1e516cf534 - Fix the sched_priority() invalid priority bugs. Use roundup() instead
of max() when computing the divisor in SCHED_TICK_PRI().  This prevents
   cases where rounding down would allow the quotient to exceed
   SCHED_PRI_RANGE.
 - Garbage collect some unused flags and fields.
 - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply
   duplicated this functionality.
 - Re-enable the rebalancer by default and fix the sysctl so it can be
   modified.
2007-01-06 08:44:13 +00:00
Jeff Roberson
9330bbbb61 - Don't IPI unless we're going to interrupt something exiting in the kernel.
otherwise we can afford the latency.  This makes a significant performance
   improvement.
2007-01-06 02:34:23 +00:00
Jeff Roberson
155b6ca12b - Fix a comparison in sched_choose() that caused cpus to be constantly
marked idle, thus breaking cpu load balancing.
 - Change sched_interact_update() to fix cases where the stored history
   has expanded significantly rather than handling them in the callers.  This
   fixes a case where sched_priority() could compute a bad value.
 - Add a sysctl to disable the global load balancer for experimentation.
2007-01-05 23:45:38 +00:00
Jeff Roberson
8ab80cf009 - ftick was initialized to -1 for init and any of it's children. Fix this by
setting ftick = ltick = ticks in schedinit().
 - Update the priority when we are pulled off of the run queue and when we
   are inserted onto the run queue so that it more accurately reflects our
   present status.  This is important for efficient priority propagation
   functioning.
 - Move the frequency test into sched_pctcpu_update() so we don't repeat it
   each time we'd like to call it.
 - Put some temporary work-around code in sched_priority() in case the tick
   mechanism produces a bad priority.  Eventually this should revert to an
   assert again.
2007-01-05 08:50:38 +00:00
Jeff Roberson
3f872f85d2 - Only allow the tdq_idx to increase by one each tick rather than up to
the most recently chosen index.  This significantly improves nice
   behavior.  This allows a lower priority thread to run some multiple of
   times before the higher priority thread makes it to the front of
   the queue.  A nice +20 cpu hog now only gets ~5% of the cpu when running
   with a nice 0 cpu hog and about 1.5% with a nice -20 hog.  A nice
   difference of 1 makes a 4% difference in cpu usage between two hogs.
 - Track a seperate insert and removal index.  When the removal index is
   empty it is updated to point at the current insert index.
 - Don't remove and re-add a thread to the runq when it is being adjusted
   down in priority.
 - Pull some conditional code out of sched_tick().  It's looking a bit
   large now.
2007-01-04 12:16:19 +00:00
Jeff Roberson
e7d50326de ULE 2.0:
- Remove the double queue mechanism for timeshare threads.  It was slow
   due to excess cache lines in play, caused suboptimal scheduling behavior
   with niced and other non-interactive processes, complicated priority
   lending, etc.
 - Use a circular queue with a floating starting index for timeshare threads.
   Enforces fairness by moving the insertion point closer to threads with
   worse priorities over time.
 - Give interactive timeshare threads real-time user-space priorities and
   place them on the realtime/ithd queue.
 - Select non-interactive timeshare thread priorities based on their cpu
   utilization over the last 10 seconds combined with the nice value.  This
   gives us more sane priorities and behavior in a loaded system as
   compared to the old method of using the interactivity score.  The
   interactive score quickly hit a ceiling if threads were non-interactive
   and penalized new hog threads.
 - Use one slice size for all threads.  The slice is not currently
   dynamically set to adjust scheduling behavior of different threads.
 - Add some new sysctls for scheduling parameters.

Bug fixes/Clean up:
 - Fix zeroing of td_sched after initialization in sched_fork_thread() caused
   by recent ksegrp removal.
 - Fix KSE interactivity issues related to frequent forking and exiting of
   kse threads.  We simply disable the penalty for thread creation and exit
   for kse threads.
 - Cleanup the cpu estimator by using tickincr here as well.  Keep ticks and
   ltick/ftick in the same frequency.  Previously ticks were stathz and
   others were hz.
 - Lots of new and updated comments.
 - Many many others.

Tested on:	up x86/amd64, 8way amd64.
2007-01-04 08:56:25 +00:00
Jeff Roberson
c02bbb43a0 - More search and replace prettying. 2006-12-29 12:55:32 +00:00
Jeff Roberson
d2ad694caa - Clean up a bit after the most recent KSE restructuring. 2006-12-29 10:37:07 +00:00
Julian Elischer
fc6c30f6c6 Changes to try fix sched_ule.c courtesy of David Xu. 2006-12-06 06:55:59 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Maxim Konovalov
f645b5da88 o Fix a couple of obvious typos. 2006-11-08 09:09:07 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
David Xu
3db720fdce Add user priority loaning code to support priority propagation for
1:1 threading's POSIX priority mutexes, the code is no-op unless
priority-aware umtx code is committed.
2006-08-25 06:12:53 +00:00
David Xu
36ec198bd5 Add scheduler API sched_relinquish(), the API is used to implement
yield() and sched_yield() syscalls. Every scheduler has its own way
to relinquish cpu, the ULE and CORE schedulers have two internal run-
queues, a timesharing thread which calls yield() syscall should be
moved to inactive queue.
2006-06-15 06:37:39 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
David Xu
0ae716e5ee Make ke_rqindex unsigned. 2006-06-06 12:26:17 +00:00
David Xu
9f8eb3cb52 Use variable i instead of variable cpus as an index to get correct kseq. 2005-12-27 12:02:03 +00:00
David Xu
a1d4fe69d2 Fix a bug in slice calculation code, current code uses hz but
sched_clock() is called by state clock.

Submitted by: taku at tackymt dot homeip dot net
2005-12-19 08:26:09 +00:00
David Xu
a861574011 Temporarily disable nice threshold detection code, as it can starve
a thread holding critical resource, e.g mutex or other implicit
synchronous flags. Give thread which exceeds nice threshold a minimum
time slice.

PR: kern/86087
2005-09-22 01:19:37 +00:00
David Xu
f8ec133ed0 Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly
for PRI_ITHD and PRI_REALTIME threads.
2005-08-19 11:51:41 +00:00
David Xu
1278181c6c Try best to keep a preempted thread at front of run queue, this seems
improved performance a bit for some workloads, but still seeing interactive
lagging unless cpu idling race is fixed.
2005-08-08 14:20:10 +00:00
David Xu
3d16f519b6 If a thread was removed from system run queue, kse_assign shouldn't
add it again.
2005-07-31 15:11:21 +00:00
Xin LI
05a6b7ad62 Cast to uintptr_t when the compiler complains. This unbreaks ULE
scheduler breakage accompanied by the recent atomic_ptr() change.
2005-07-25 10:21:49 +00:00
Peter Wemm
4da0d332f4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
Jeff Roberson
6680bbd529 - Fix the case where we're not preempting but there is already a newtd
as this happens via thread_switchout().  I don't particularly like the
   structure of the code here.  We twice call out to thread code when
   a thread is voluntarily switching.  Once to thread_switchout() and once
   to slot_fill(), while sched_4BSD does even more work which is redundant
   to select another thread to use our remaining slice.  This should be
   simplified in the future, but for now I'm only going to fix the bug not
   the bad design.
2005-06-07 02:59:16 +00:00
Jeff Roberson
9fe02f7e16 - It's 2005 already, I've been working on this for three years. 2005-06-04 09:24:15 +00:00
Jeff Roberson
21381d1b9e - Don't SLOT_USE() in the preempt case, sched_add() has already taken the
slot for us.  Previously, we would take two slots on every preempt, and
   setrunqueue() would fix it up for us in the non threaded case.  The
   threaded case was simply broken.
 - Clean up flags, prototypes, comments.
2005-06-04 09:23:28 +00:00
Joseph Koshy
ebccf1e3a6 Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by:	alc, jhb (kernel changes)
2005-04-19 04:01:25 +00:00
Stephan Uphoff
779186434a Sprinkle some volatile magic and rearrange things a bit to avoid race
conditions in critical_exit now that it no longer blocks interrupts.

Reviewed by:	jhb
2005-04-08 03:37:53 +00:00
Jeff Roberson
7a9507b60e - A test in sched_switch() is no longer necessary and it is incorrect
when td0 is preempted before it voluntarily switches.

Discovered by:	Arjan Van Leeuwen <avleeuwen@gmail.com>
2005-02-23 00:50:26 +00:00
Jeff Roberson
42a29039de - Add ke_runq == NULL to the conditions which will cause us to abort
adjusting timeshare loads in sched_class().  This is only important if
   the thread has never run, otherwise the state checks should work as
   expected.
2005-02-04 17:22:46 +00:00
John Baldwin
50aaa791ba Fix a typo and two whitespace nits. 2004-12-30 22:17:00 +00:00
John Baldwin
f5c157d986 Rework the interface between priority propagation (lending) and the
schedulers a bit to ensure more correct handling of priorities and fewer
priority inversions:
- Add two functions to the sched(9) API to handle priority lending:
  sched_lend_prio() and sched_unlend_prio().  The turnstile code uses these
  functions to ask the scheduler to lend a thread a set priority and to
  tell the scheduler when it thinks it is ok for a thread to stop borrowing
  priority.  The unlend case is slightly complex in that the turnstile code
  tells the scheduler what the minimum priority of the thread needs to be
  to satisfy the requirements of any other threads blocked on locks owned
  by the thread in question.  The scheduler then decides where the thread
  can go back to normal mode (if it's normal priority is high enough to
  satisfy the pending lock requests) or it it should continue to use the
  priority specified to the sched_unlend_prio() call.  This involves adding
  a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag
  for priority elevation.
- Schedulers now refuse to lower the priority of a thread that is currently
  borrowing another therad's priority.
- If a scheduler changes the priority of a thread that is currently sitting
  on a turnstile, it will call a new function turnstile_adjust() to inform
  the turnstile code of the change.  This function resorts the thread on
  the priority list of the turnstile if needed, and if the thread ends up
  at the head of the list (due to having the highest priority) and its
  priority was raised, then it will propagate that new priority to the
  owner of the lock it is blocked on.

Some additional fixes specific to the 4BSD scheduler include:
- Common code for updating the priority of a thread when the user priority
  of its associated kse group has been consolidated in a new static
  function resetpriority_thread().  One change to this function is that
  it will now only adjust the priority of a thread if it already has a
  time sharing priority, thus preserving any boosts from a tsleep() until
  the thread returns to userland.  Also, resetpriority() no longer calls
  maybe_resched() on each thread in the group. Instead, the code calling
  resetpriority() is responsible for calling resetpriority_thread() on
  any threads that need to be updated.
- schedcpu() now uses resetpriority_thread() instead of just calling
  sched_prio() directly after it updates a kse group's user priority.
- sched_clock() now uses resetpriority_thread() rather than writing
  directly to td_priority.
- sched_nice() now updates all the priorities of the threads after the
  group priority has been adjusted.

Discussed with:	bde
Reviewed by:	ups, jeffr
Tested on:	4bsd, ule
Tested on:	i386, alpha, sparc64
2004-12-30 20:52:44 +00:00
Jeff Roberson
2ebf8eb132 - Unintentionally checked in a debugging panic. Remove that. 2004-12-26 23:21:48 +00:00
Jeff Roberson
598b368d6c - Fix a long standing problem where an ithread would not honor sched_pin().
- Remove the sched_add wrapper that used sched_add_internal() as a backend.
   Its only purpose was to interpret one flag and turn it into an int.  Do
   the right thing and interpret the flag in sched_add() instead.
 - Pass the flag argument to sched_add() to kseq_runq_add() so that we can
   get the SRQ_PREEMPT optimization too.
 - Add a KEF_INTERNAL flag.  If KEF_INTERNAL is set we don't adjust the SLOT
   counts, otherwise the slot counts are adjusted as soon as we enter
   sched_add() or sched_rem() rather than when the thread is actually placed
   on the run queue.  This greatly simplifies the handling of slots.
 - Remove the explicit prevention of migration for ithreads on non-x86
   platforms.  This was never shown to have any real benefit.
 - Remove the unused class argument to KSE_CAN_MIGRATE().
 - Add ktr points for thread migration events.
 - Fix a long standing bug on platforms which don't initialize the cpu
   topology.  The ksg_maxid variable was never correctly set on these
   platforms which caused the long term load balancer to never inspect
   more than the first group or processor.
 - Fix another bug which prevented the long term load balancer from working
   properly.  If stathz != hz we can't expect sched_clock() to be called
   on the exact tick count that we're anticipating.
 - Rearrange sched_switch() a bit to reduce indentation levels.
2004-12-26 22:56:08 +00:00
Jeff Roberson
81d47d3f4b - Remove earlier KTR_ULE tracepoints.
- Define new KTR_SCHED points so that we can graph the operation of the
   scheduler.
2004-12-26 00:15:33 +00:00
Jeff Roberson
7842f65e7f - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
Jeff Roberson
8ffb8f5558 - In kseq_choose(), don't recalculate slice values for processes with a
nice of 0.  Doing so can cause an infinite loop because they should be
   running, but a nice -20 process could prevent them from doing so.
 - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority
   elevated due to priority propagation.  If a thread has had its priority
   elevated, we assume that it must go on the current queue and it must
   get a slice.
 - In sched_userret() if our priority was elevated and we shouldn't have
   a timeslice, yield here until we should.

Found/Tested by:	glebius
2004-12-14 10:34:27 +00:00
Jeff Roberson
2d59a44dc0 - Take up a 'slot' while we're on the assigned queue, waiting to be
posted to another processor.  Otherwise, kern_switch() gets confused
   and tries to sched_add(NULL).
2004-12-13 13:09:33 +00:00
Jeff Roberson
3ba5c2faab - Temporarily disable the nice -20 throttling code. It has some interaction
with APM that I do not understand yet.

Reported & Tested by:	glebius
2004-11-11 19:48:57 +00:00
Jeff Roberson
0516c8dd4a - When choosing a thread on the run queue, check to see if its nice is
outside of the nice threshold due to a recently awoken thread with a
   lower nice value.  This further reduces the amount of time a positively
   niced thread gets while running in conjunction with a workload that has
   many short sleeps (ie buildworld).
2004-10-30 12:19:15 +00:00
Jeff Roberson
6bd0c7fd53 - In sched_prio() check to see if the kse is assigned to a runq as the
check for TD_ON_RUNQ() no longer means the thread is really on a run-
   queue.  I suspect this state should be re-evaluated as it must mean
   something else now.  This fixes ULE+KSE+PREEMPTION on UP x86.
2004-10-30 07:35:53 +00:00
Julian Elischer
f8135176c9 Fix whitespace botch that only showed up in the commit message diff :-/
MFC after:	4 days
2004-10-05 22:14:02 +00:00
Julian Elischer
c20c691bed When preempting a thread, put it back on the HEAD of its run queue.
(Only really implemented in 4bsd)

MFC after:	4 days
2004-10-05 22:03:10 +00:00
Julian Elischer
c5c3fb335f Oops. left out part of the diff.
MFC after:	4 days
2004-10-05 21:26:27 +00:00
Julian Elischer
d39063f20d Use some macros to trach available scheduler slots to allow
easier debugging.

MFC after:	4 days
2004-10-05 21:10:44 +00:00
Julian Elischer
14f0e2e9bf clean up thread runq accounting a bit.
MFC after:	3 days
2004-09-16 07:12:59 +00:00
Scott Long
1e7fad6b6a Revert the previous round of changes to td_pinned. The scheduler isn't
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic.  Use ke_pinned instead as was originally
done with Tor's patch.

Approved by: julian
2004-09-11 10:07:22 +00:00
Julian Elischer
513efa5b39 Try committing from the right tree this time
MFC after:	2 days
2004-09-11 00:11:09 +00:00
Julian Elischer
5c854accc1 Make up my mind if cpu pinning is stored in the thread structure or the
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.

Submitted by:	tegge  (of course!)   (with changes)
MFC after:	3 days
2004-09-10 22:28:33 +00:00
Julian Elischer
3389af30e8 Add some code to allow threads to nominat a sibling to run if theyu are going to sleep.
MFC after:	1 week
2004-09-10 21:04:38 +00:00
Julian Elischer
ed062c8d66 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
Julian Elischer
2630e4c90c Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after:	2 days
2004-09-01 02:11:28 +00:00
Peter Wemm
91c1172a5a Commit Jeff's suggested changes for avoiding a bug that is exposed by
preemption and/or the rev 1.79 kern_switch.c change that was backed out.

The thread was being assigned to a runq without adding in the load, which
would cause the counter to hit -1.
2004-08-28 00:49:22 +00:00
Jeff Roberson
f2b74cbf28 - Introduce a new flag KEF_HOLD that prevents sched_add() from doing a
migration.  Use this in sched_prio() and sched_switch() to stop us from
   migrating threads that are in short term sleeps or are runnable.  These
   extra migrations were added in the patches to support KSE.
 - Only set NEEDRESCHED if the thread we're adding in sched_add() is a
   lower priority and is being placed on the current queue.
 - Fix some minor whitespace problems.
2004-08-12 07:56:33 +00:00
Jeff Roberson
2454aaf51c - Use a new flag, KEF_XFERABLE, to record with certainty that this kse had
contributed to the transferable load count.  This prevents any potential
   problems with sched_pin() being used around calls to setrunqueue().
 - Change the sched_add() load balancing algorithm to try to migrate on
   wakeup.  This attempts to place threads that communicate with each other
   on the same CPU.
 - Don't clear the idle counts in kseq_transfer(), let the cpus do that when
   they call sched_add() from kseq_assign().
 - Correct a few out of date comments.
 - Make sure the ke_cpu field is correct when we preempt.
 - Call kseq_assign() from sched_clock() to catch any assignments that were
   done without IPI.  Presently all assignments are done with an IPI, but I'm
   trying a patch that limits that.
 - Don't migrate a thread if it is still runnable in sched_add().  Previously,
   this could only happen for KSE threads, but due to changes to
   sched_switch() all threads went through this path.
 - Remove some code that was added with preemption but is not necessary.
2004-08-10 07:52:21 +00:00
Alexander Kabaev
00fbcda80d Avoid casts as lvalues. 2004-07-28 06:42:41 +00:00
Scott Long
e038d35422 Clean up whitespace, increase consistency and correctness.
Submitted by: bde
2004-07-23 23:09:00 +00:00
Julian Elischer
55d44f79ea When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
John Baldwin
52eb84641d - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags
since they are only accessed by curthread and thus do not need any
  locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
  and directly into struct thread as td_profil_addr and td_profil_ticks
  as these variables are really per-thread.  (They are used to defer an
  addupc_intr() that was too "hard" until ast()).
2004-07-16 21:04:55 +00:00
Marcel Moolenaar
2c3490b1a8 Update for the KDB framework:
o  Call kdb_backtrace() instead of backtrace().
2004-07-10 21:38:22 +00:00
John Baldwin
63fcce68f1 - Move contents of sched_add() into a sched_add_internal() function that
takes an argument to specify if it should preempt or not.  Don't preempt
  when sched_add_internal() is called from kseq_idled() or kseq_assign()
  as in those cases we are about to call mi_switch() anyways.  Also, doing
  so during the first context switch on an AP leads to a NULL pointer deref
  because curthread is NULL.
- Reenable preemption for ULE.

Submitted by:	Taku YAMAMOTO taku at tackymt.homeip.net
2004-07-08 21:45:04 +00:00
Robert Watson
df623e3c2f Temporarily disable preemption in SCHED_ULE due to reported panics and
hangs due to recent preemption changes.  This change appears to remove
the panic that I was running into, but at the cost of increasing
ithread scheduling latency, and as such is a temporary band-aid until
jhb has a chance to resolve the ule<->preemption interaction that is
the source of the problem.  If it doesn't fix the problem for others--
sorry!
2004-07-06 05:57:29 +00:00
Poul-Henning Kamp
279f949ee5 Add NULL arg to mi_switch() call to stop kernel compiles from breaking. 2004-07-03 16:57:51 +00:00
Bosko Milekic
abdb4e5d01 Fix SCHED_ULE build on SMP. The previous revision (1.110)
introduced a KSE_CAN_MIGRATE() invocation with one argument
missing (class).  Either this is a genuine forget or it crept
in from JHB's repo where he may have modified it.  If it's
the latter then it may require more attention.  For now fix
the make depend.
2004-07-03 01:19:46 +00:00
John Baldwin
0c0b25ae91 Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
  determine if a thread about to be added to a run queue should be
  preempted to directly.  If it is not safe to preempt or if the new
  thread does not have a high enough priority, then the function returns
  false and sched_add() adds the thread to the run queue.  If the thread
  should be preempted to but the current thread is in a nested critical
  section, then the flag TDF_OWEPREEMPT is set and the thread is added
  to the run queue.  Otherwise, mi_switch() is called immediately and the
  thread is never added to the run queue since it is switch to directly.
  When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
  then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
  setrunqueue() now does all the correct work.  This also removes the
  do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
  supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
  chance to run if the architecture supports native preemption since
  the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
  architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
  preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by:	scottl (with his re@ hat)
2004-07-02 20:21:44 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
Scott Long
dc09579417 Add the sysctl node 'kern.sched.name' that has the name of the scheduler
currently in use.  Move the 4bsd kern.quantum node to kern.sched.quantum
for consistency.
2004-06-21 22:05:46 +00:00
Julian Elischer
fa88511615 Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
Jeff Roberson
dc03363dd8 - Run sched_balance() and sched_balance_groups() from hardclock via
sched_clock() rather than using callouts.  This means we no longer have to
   take the load of the callout thread into consideration while balancing and
   should make the balancing decisions simpler and more accurate.

Tested on:	x86/UP, amd64/SMP
2004-06-02 05:46:48 +00:00
David E. O'Brien
207a6c0dcb There was a thread on "unusually high load averages" when running under
sched_ule, in January 2004.  Looking at this, "pagezero" is (one of) the
culprit(s).  We had no provision for processes with P_NOLOAD set.  With
pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as
another-normal-process, thus the "expected-plus-one" load reported in
the above thread.

Submitted by:	Nikos Ntarmos <ntarmos@ceid.upatras.gr>
2004-04-22 21:37:46 +00:00
Olivier Houchard
d50c87decf Spell "switches" a more conventional way. 2004-04-09 14:31:29 +00:00
Jeff Roberson
37a35e4a60 - Use the proper constant in sched_interact_update(). Previously,
SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed.  This was
   causing the interactivity scaler to lose history at a more dramatic rate
   than intended.
2004-04-04 19:12:56 +00:00
Marcel Moolenaar
b2ae7ed72c Change the type of the various CPU masks to cpumask_t. Note that as
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
2004-03-27 18:21:24 +00:00
David E. O'Brien
b003da7938 Give a more reasonable CPU time to the threads which are using scheduler
activation (i.e., applications are using libpthread).  This is because
SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily.
Which doesn't give fair amount of CPU time to processes which are
using scheduler-activation-based threads when other (semi-)CPU-intensive,
non-P_SA processes are running.

Further work will no doubt be done by jeffr at a later date.

Submitted by:	Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Reviewed by:	rwatson, freebsd-current@
2004-03-21 18:53:29 +00:00
John Baldwin
44f3b09204 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
Jeff Roberson
0392e39dff - Allow interactive tasks to use the maximum time-slice. This is not as
detrimental as I thought it would be in the case of massive process
   storms from a shell and it makes regular desktop usage noticeably
   better.
2004-02-01 10:38:13 +00:00
Jeff Roberson
33916c360e - Add a new member to struct kseq called ksq_sysload. This is intended to
track the load for the sched_load() function.  In the SMP case this member
   is not defined because it would be redundant with the ksg_load member
   which already tracks the non ithd load.
 - For sched_load() in the UP case simply return ksq_sysload.  In the SMP
   case traverse the list of kseq groups and sum up their ksg_load fields.
2004-02-01 02:48:36 +00:00
Jeff Roberson
c77ac1fdee - sched_strict has been dead for a long time now. Get rid of it. 2004-01-25 08:58:14 +00:00
Jeff Roberson
c494ddc8a1 - Clean up KASSERTS. 2004-01-25 08:57:38 +00:00
Jeff Roberson
29bcc4514f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
Jeff Roberson
249e0bea8f - Make our transfer decisions based on load and not transferable load. A
cpu could have been bogged down with non-transferable load and still not
   migrated a new thread to an idle cpu.  This required some benchmarking and
   tuning to get right as the comment above it suggests.
2003-12-20 22:35:20 +00:00
Jeff Roberson
e7a976f415 - Enable ithread migration on x86. This is done to work around a bug in the
IO APIC on Xeons that prevents round-robin interrupt assignment from
   working.
2003-12-20 20:36:19 +00:00
Jeff Roberson
670c524f08 - In kseq_transfer() return if smp has not been started.
- In sched_add(), do the idle check prior to the transfer check so that we
   don't try to transfer load from an idle cpu.  This fixes panics caused by
   IPIs on UP machines running SMP kernels.

Reported/Debugged by:	seanc
2003-12-20 14:03:14 +00:00
Jeff Roberson
9b5f6f623d - Running interactive tasks with the minimum time-slice is fine for vi and
sh, but not so great for mozilla, X, etc.  Add a fixed define for the slice
   size granted to interactive KSEs.
2003-12-20 12:54:35 +00:00
Jeff Roberson
86e1c22aa4 - Assign the ke_cpu field in kseq_notify() so that all of our callers do not
have to do it.
 - Set the ke_runq to NULL in sched_add() before calling kseq_notify().
   Otherwise we may panic in sched_add() if INVARIANTS is on.
2003-12-14 02:06:29 +00:00
Jeff Roberson
cac77d0422 - Now that we have kseq groups, balance them seperately.
- The new sched_balance_groups() function does intra-group balancing while
   sched_balance() balances the available groups.
 - Pick a random time between 0 ticks and hz * 2 ticks to restart each
   balancing process.  Each balancer has its own timeout.
 - Pick a random place in the list of groups to start the search for lowest
   and highest group loads.  This prevents us from prefering a group based on
   numeric position.
 - Use a nasty hack to stop us from preferring cpu 0.  The problem is that
   softclock always runs on cpu 0, so it always has a little extra load.  We
   ignore this load in the balancer for now.  In the future softclock should
   run on a random cpu and these hacks can go away.
2003-12-12 07:33:51 +00:00
Jeff Roberson
2e227f0406 - Don't let the pctcpu rate limiter throttle us if we have recorded over
SCHED_CPU_TICKS ticks.  This was allowing processes to display
   (1/SCHED_CPU_TIME * 100) % more cpu than they had used.
2003-12-11 04:23:39 +00:00
Jeff Roberson
b11fdad0fc - In sched_switch(), if a thread has been assigned, don't touch the runqueues
or load.  These things have already been taken care of in sched_bind()
   which should be the only place that we're switching in an assigned thread.
2003-12-11 04:00:49 +00:00
Jeff Roberson
80f86c9f88 - Add support for CPU groups to ule. All SMT cores on the same physical
cpu are added to a group.
 - Don't place a cpu into the kseq_idle bitmask until all cpus in that group
   have idled.
 - Prefer idle groups over idle group members in the new kseq_transfer()
   function.  In this way we will prefer to balance load across full cores
   rather than add further load a partial core.
 - Before a cpu goes idle, check the other group members for threads.  Since
   SMT cpus may freely share threads, this is cheap.
 - SMT cores may be individually pinned and bound to now.  This contrasts the
   old mechanism where binding or pinning would have allowed a thread to run
   on any available cpu.
 - Remove some unnecessary logic from sched_switch().  Priority propagation
   should be properly taken care of in sched_prio() now.
2003-12-11 03:57:10 +00:00
Peter Wemm
a2640c9ba9 rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).
Be sure to shift (long)1 << 33 and higher, not (int)1.  Otherwise bad
things happen(TM).  This is why beast.freebsd.org paniced with ULE.

Reviewed by:  jeff
2003-12-07 09:57:51 +00:00
John Baldwin
b6c71225a9 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
Jeff Roberson
fa9c971710 - Mark ksq_assigned as volatile so that when this code is used without
sched_lock we can be sure that we'll pick up the new value.
2003-11-17 08:27:11 +00:00
Jeff Roberson
093c05e39d - Remove long dead code. rslices hasn't been used in some time and neither
has sched_pickcpu().
2003-11-17 08:24:14 +00:00
Jeff Roberson
155b9987a3 - Introduce kseq_runq_{add,rem}() which are used to insert and remove
kses from the run queues.  Also, on SMP, we track the transferable
   count here.  Threads are transferable only as long as they are on the
   run queue.
 - Previously, we adjusted our load balancing based on the transferable count
   minus the number of actual cpus.  This was done to account for the threads
   which were likely to be running.  All of this logic is simpler now that
   transferable accounts for only those threads which can actually be taken.
   Updated various places in sched_add() and kseq_balance() to account for
   this.
 - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're
   really doing.  The load is accounted for seperately from the runq because
   the load is accounted for even as the thread is running.
 - Fix a bug in sched_class() where we weren't properly using the PRI_BASE()
   version of the kg_pri_class.
 - Add a large comment that describes the impact of a seemingly simple
   conditional in sched_add().
 - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE()
   prior to checking kseq_idle.  This reduces the frequency of access for
   kseq_idle which is a shared resource.
2003-11-15 07:32:07 +00:00
Jeff Roberson
f28b3340c1 - Somehow I botched my last commit. Add an extra ( to fix things up. I'm
still not sure how this happened.

Reported by:	ps
2003-11-06 07:56:01 +00:00
Jeff Roberson
a70d729bff - Remove the local definition of sched_pin and unpin. They are provided in
sched.h now.
 - Respect the td pin count.
2003-11-06 03:09:51 +00:00
Jeff Roberson
46f8b26550 - It's ok if sched_runnable() has races in it, we don't need the sched_lock
here unless we have something on the assigned queue.
2003-11-05 05:30:12 +00:00
Jeff Roberson
9bacd788a1 - Add initial support for pinning and binding. 2003-11-04 07:45:41 +00:00
Jeff Roberson
112b6d3aa9 - Remove kseq_find(), we no longer scan other cpu's run queues when we go
idle.  They figure out that we're idle fast enough that the cache pollution
   introduces by scanning their run queue is more expensive than waiting
   a little longer.
 - Add kseq_setidle() to mark us as being idle.  Use this in place of
   kseq_find().
 - Remove kseq_load_highest(), kseq_find() was the only consumer of this
   interface.  kseq_balance() has it's own customized version that finds the
   lowest and highest loads simultaneously.

Continuously told that this would be faster by:	terry
2003-11-03 03:27:22 +00:00
Jeff Roberson
ef1134c9ad - Remove the ksq_loads[] array. We are only interested in three counts,
the total load, the timeshare load, and the number of threads that can
   be migrated to another cpu.  Account for these seperately.
 - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
   can be migrated to another CPU.  Currently, this only checks to see if
   we're an interrupt handler.  Eventually this will also be used to support
   CPU binding.
2003-11-02 10:56:48 +00:00
Jeff Roberson
769a363537 - In sched_prio() only force us onto the current queue if our priority is
being elevated (numerically smaller).
2003-11-02 04:25:59 +00:00
Jeff Roberson
7d1a81b4dc - Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in
slice assignment.  Add a comment describing what it does.
 - Remove a stale XXX comment, the nice should not impact the interactivity,
   nice adjustments only effect non-interactive tasks in ULE.
 - Don't allow nice -20 tasks to totally starve nice 0 tasks.  Give them at
   least SCHED_SLICE_MIN ticks.  We still allow nice 0 tasks to starve nice
   +20 tasks as intended.
2003-11-02 04:10:15 +00:00
Jeff Roberson
a0a931cec7 - Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
   do not have to account for it in the few places that we use it.

Requested by:	bde
2003-11-02 03:49:32 +00:00
Jeff Roberson
d322132c62 - Change sched_interact_update() to only accept slp+runtime values between
0 and SCHED_SLP_RUN_MAX * 2.  This allows us to simplify the algorithm
   quite a bit.  Before, it dealt with arbitrary values which required us
   to do nasty integer division tricks that didn't quite work out correctly.
 - Chnage sched_wakeup() to detect conditions where the slp+runtime could
   exceed SCHED_SLP_RUN_MAX * 2.  This can happen if we go to sleep for
   longer than 6 seconds.  In this case, we'll just clear the runtime and
   set the sleep time to the max.
 - Define a new function, sched_interact_fork() which updates the slp+runtime
   of a newly forked thread.  We want to limit the amount of history retained
   from the parent so that we learn the child's behavior quickly.  We don't,
   however want to decay it to nothing.  Previously, we would simply divide
   each parameter by 100 whenever we forked.  After a few forks the values
   would reach 0 and tasks would not be considered interactive.
 - Add another KTR entry, cleanup some existing entries.
 - Remove a useless sched_interact_update() from sched_priority().  This is
   already done by the callers that require it.
2003-11-02 03:36:33 +00:00
Jeff Roberson
22bf7d9a0e - Add static to local functions and data where it was missing.
- Add an IPI based mechanism for migrating kses.  This mechanism is
   broken down into several components.  This is intended to reduce cache
   thrashing by eliminating most cases where one cpu touches another's
   run queues.
 - kseq_notify() appends a kse to a lockless singly linked list and
   conditionally sends an IPI to the target processor.  Right now this is
   protected by sched_lock but at some point I'd like to get rid of the
   global lock.  This is why I used something more complicated than a
   standard queue.
 - kseq_assign() processes our list of kses that have been assigned to us
   by other processors.  This simply calls sched_add() for each item on the
   list after clearing the new KEF_ASSIGNED flag.  This flag is used to
   indicate that we have been appeneded to the assigned queue but not
   added to the run queue yet.
 - In sched_add(), instead of adding a KSE to another processor's queue we
   use kse_notify() so that we don't touch their queue.  Also in sched_add(),
   if KEF_ASSIGNED is already set return immediately.  This can happen if
   a thread is removed and readded so that the priority is recorded properly.
 - In sched_rem() return immediately if KEF_ASSIGNED is set.  All callers
   immediately readd simply to adjust priorites etc.
 - In sched_choose(), if we're running an IDLE task or the per cpu idle thread
   set our cpumask bit in 'kseq_idle' so that other processors may know that
   we are idle.  Before this, make a single pass through the run queues of
   other processors so that we may find work more immediately if it is
   available.
 - In sched_runnable(), don't scan each processor's run queue, they will IPI
   us if they have work for us to do.
 - In sched_add(), if we're adding a thread that can be migrated and we have
   plenty of work to do, try to migrate the thread to an idle kseq.
 - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
   consideration.
 - No longer use kseq_choose() to steal threads, it can lose it's last
   argument.
 - Create a new function runq_steal() which operates like runq_choose() but
   skips threads based on some criteria.  Currently it will not steal
   PRI_ITHD threads.  In the future this will be used for CPU binding.
 - Create a kseq_steal() that checks each run queue with runq_steal(), use
   kseq_steal() in the places where we used kseq_choose() to steal with
   before.
2003-10-31 11:16:04 +00:00
Bruce Evans
89674a9f77 Removed sched_nest variable in sched_switch(). Context switches always
begin with sched_lock held but not recursed, so this variable was
always 0.

Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch().  Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up.  Only
sched_lock.mtx_lock really needs a fixup.

Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up.  This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
2003-10-29 14:40:41 +00:00
Jeff Roberson
1aca9909e5 - Only change the run queue in sched_prio() if the kse is non null. threads
can be in the TD_ON_RUNQ state and not have an associated kse.
 - Remove the PRI_IDLE special case from sched_clock(), it was not actually
   necessary.
2003-10-28 03:28:48 +00:00
Jeff Roberson
3f741ca117 - Use a better algorithm in sched_pctcpu_update()
Contributed by:	Thomaswuerfl@gmx.de

 - In sched_prio(), adjust the run queue for threads which may need to move
   to the current queue due to priority propagation .
 - In sched_switch(), fix style bug introduced when the KSE support went in.
   Columns are 80 chars wide, not 90.
 - In sched_switch(), Fix the comparison in the idle case and explicitly
   re-initialize the runq in the not propagated case.
 - Remove dead code in sched_clock().
 - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
   that have become runnable will get a chance to.
 - In sched_runnable(), if we're not the IDLETD, we should not consider
   curthread when examining the load.  This mimics the 4BSD behavior of
   returning 0 when the only runnable thread is running.
 - In sched_userret(), remove the code for setting NEEDRESCHED entirely.
   This is not necessary and is not implemented in 4BSD.
 - Use the correct comparison in sched_add() when checking to see if an idle
   prio task has had it's priority temporarily elevated.
2003-10-27 06:47:05 +00:00
Jeff Roberson
484288de56 - If a thread is not bound to a kse return 0 from sched_pctcpu().
Reported by:	 pawel.worach@nordea.com
2003-10-20 19:55:21 +00:00
Jeff Roberson
0e0f626628 - Only kse_reassign() in the !running case.
Reported by:	kris
2003-10-16 20:32:57 +00:00
Jeff Roberson
0c7da3a43d - Call sched_add() with the correct argument on SMP.
Reported by:	Valentin Chopov <valentin@valcho.net>
2003-10-16 20:06:19 +00:00
Jeff Roberson
b72f347bdb - Fix a minor problem with my last commit, we don't want to return from
sched_switch if the thread is running, we want to fall through and pick
   a new thread because we have been preempted.
2003-10-16 10:04:54 +00:00
Jeff Roberson
ae53b483cc - Collapse sched_switchin() and sched_switchout() into sched_switch(). Now
mi_switch() calls sched_switch() which calls cpu_switch().  This is
   actually one less function call than it had been.
2003-10-16 08:53:46 +00:00
Jeff Roberson
7cf90fb376 - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td
argument rather than a kse.
2003-10-16 08:39:15 +00:00
Jeff Roberson
4c9612c622 - The non iterative algorithm for interact_update was broken due to
rounding errors.  This was the source of the majority of the
   interactivity problems.  Reintroduce the old algorithm and its XXX.
 - Up the interactivity threshold to 30.  It really could stand to be even
   a tiny bit higher.
 - Let the sleep and run time accumulate up to 5 seconds of history rather
   than two.  This helps stop XFree86 from becoming non-interactive during
   bursts of activity.
2003-10-16 08:17:43 +00:00
Jeff Roberson
08fd6713b2 - If our user_pri doesn't match our actual priority our priority has been
elevated either due to priority propagation or because we're in the
   kernel in either case, put us on the current queue so that we dont
   stop others from using important resources.  At some point the priority
   elevations from sleeping in the kernel should go away.
 - Remove an optimization in sched_userret().  Before we would only set
   NEEDRESCHED if there was something of a higher priority available.  This
   is a trivial optimization and it breaks priority propagation because it
   doesn't take threads which we may be blocking into account.  Notice that
   the thread which is blocking others gets up to one tick of cpu time before
   we honor this NEEDRESCHED in sched_clock().
2003-10-15 07:47:06 +00:00
Jeff Roberson
736c97c7b3 - In SCHED_CURR() add holding Giant to the list of criteria that will keep
you on the current queue.  In the future, it would be nice if priority
   propagation could deterministicly pluck a thread off of the next queue
   and put it on the current queue.  Until then this hack stops us from
   holding up our entire current queue, including interrupt handlers, while
   a thread on the next queue is blocked while holding Giant.
 - Inherit our pctcpu information from our parent.
2003-10-12 21:07:31 +00:00
Jeff Roberson
8ec82641d8 - Change a lame iterative algorithm to a constant time algorithm. Remove
the XXX that complains about it as well.

Submitted by:	ThomasWuerfl@gmx.de
2003-10-04 17:41:13 +00:00
Jeff Roberson
81de51bf1d - Somewhere along the line I stupidly removed critical logic from
sched_ptcpu_update().  This caused erroneous cpu times in TOP for
   processes that were asleep.  Replace the code that was removed.
2003-09-20 02:05:58 +00:00
David Xu
ab2baa7254 Let SA process work under ULE scheduler, originally it would panic kernel.
Reviewed by: jeff
2003-08-26 11:33:15 +00:00
Sam Leffler
c06eb4e293 Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
Jeff Roberson
0c0a98b231 - When stealing a kse in kseq_move() ignore the current kseq's min nice
value.  We want to steal any thread, even one that is not given a slice
   on its current queue.
2003-07-08 06:19:40 +00:00
Jeff Roberson
0ec896fd28 - Clean up an unused variable.
Submitted by:	Steve Kargl <skg@routmask.apl.washington.edu>
2003-07-07 21:08:28 +00:00
Jeff Roberson
749d01b011 - Parse the cpu topology map in sched_setup().
- Associate logical CPUs on the same physical core with the same kseq.
 - Adjust code that assumed there would only be one running thread in any
   kseq.
 - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef.  This is a start
   towards HyperThreading support but it isn't quite there yet.
2003-07-04 19:59:00 +00:00
Jeff Roberson
7a20304f84 - Don't migrate to stopped cpus. 2003-06-28 09:09:33 +00:00
Jeff Roberson
86f8ae9663 - If smp is not started yet don't try to load balance or we'll put threads
on cpus that aren't running yet.
2003-06-28 08:24:42 +00:00
Jeff Roberson
a91172ade1 - Throttle the inherited sleep and run time in sched_fork_kseg(). This
allows us to learn the behavior of a thread much more quickly after it
   starts up.
2003-06-28 06:19:56 +00:00