Commit Graph

89 Commits

Author SHA1 Message Date
jeff
3eb0771fde - Now that we have kseq groups, balance them seperately.
- The new sched_balance_groups() function does intra-group balancing while
   sched_balance() balances the available groups.
 - Pick a random time between 0 ticks and hz * 2 ticks to restart each
   balancing process.  Each balancer has its own timeout.
 - Pick a random place in the list of groups to start the search for lowest
   and highest group loads.  This prevents us from prefering a group based on
   numeric position.
 - Use a nasty hack to stop us from preferring cpu 0.  The problem is that
   softclock always runs on cpu 0, so it always has a little extra load.  We
   ignore this load in the balancer for now.  In the future softclock should
   run on a random cpu and these hacks can go away.
2003-12-12 07:33:51 +00:00
jeff
8d0b945510 - Don't let the pctcpu rate limiter throttle us if we have recorded over
SCHED_CPU_TICKS ticks.  This was allowing processes to display
   (1/SCHED_CPU_TIME * 100) % more cpu than they had used.
2003-12-11 04:23:39 +00:00
jeff
fb678712c0 - In sched_switch(), if a thread has been assigned, don't touch the runqueues
or load.  These things have already been taken care of in sched_bind()
   which should be the only place that we're switching in an assigned thread.
2003-12-11 04:00:49 +00:00
jeff
7446b875ce - Add support for CPU groups to ule. All SMT cores on the same physical
cpu are added to a group.
 - Don't place a cpu into the kseq_idle bitmask until all cpus in that group
   have idled.
 - Prefer idle groups over idle group members in the new kseq_transfer()
   function.  In this way we will prefer to balance load across full cores
   rather than add further load a partial core.
 - Before a cpu goes idle, check the other group members for threads.  Since
   SMT cpus may freely share threads, this is cheap.
 - SMT cores may be individually pinned and bound to now.  This contrasts the
   old mechanism where binding or pinning would have allowed a thread to run
   on any available cpu.
 - Remove some unnecessary logic from sched_switch().  Priority propagation
   should be properly taken care of in sched_prio() now.
2003-12-11 03:57:10 +00:00
peter
8555a54271 rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).
Be sure to shift (long)1 << 33 and higher, not (int)1.  Otherwise bad
things happen(TM).  This is why beast.freebsd.org paniced with ULE.

Reviewed by:  jeff
2003-12-07 09:57:51 +00:00
jhb
57e73abc21 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
jeff
983baa9011 - Mark ksq_assigned as volatile so that when this code is used without
sched_lock we can be sure that we'll pick up the new value.
2003-11-17 08:27:11 +00:00
jeff
00a91d589b - Remove long dead code. rslices hasn't been used in some time and neither
has sched_pickcpu().
2003-11-17 08:24:14 +00:00
jeff
e9e589def9 - Introduce kseq_runq_{add,rem}() which are used to insert and remove
kses from the run queues.  Also, on SMP, we track the transferable
   count here.  Threads are transferable only as long as they are on the
   run queue.
 - Previously, we adjusted our load balancing based on the transferable count
   minus the number of actual cpus.  This was done to account for the threads
   which were likely to be running.  All of this logic is simpler now that
   transferable accounts for only those threads which can actually be taken.
   Updated various places in sched_add() and kseq_balance() to account for
   this.
 - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're
   really doing.  The load is accounted for seperately from the runq because
   the load is accounted for even as the thread is running.
 - Fix a bug in sched_class() where we weren't properly using the PRI_BASE()
   version of the kg_pri_class.
 - Add a large comment that describes the impact of a seemingly simple
   conditional in sched_add().
 - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE()
   prior to checking kseq_idle.  This reduces the frequency of access for
   kseq_idle which is a shared resource.
2003-11-15 07:32:07 +00:00
jeff
d2b806a68c - Somehow I botched my last commit. Add an extra ( to fix things up. I'm
still not sure how this happened.

Reported by:	ps
2003-11-06 07:56:01 +00:00
jeff
d292b17be9 - Remove the local definition of sched_pin and unpin. They are provided in
sched.h now.
 - Respect the td pin count.
2003-11-06 03:09:51 +00:00
jeff
d412042cfb - It's ok if sched_runnable() has races in it, we don't need the sched_lock
here unless we have something on the assigned queue.
2003-11-05 05:30:12 +00:00
jeff
b02317efca - Add initial support for pinning and binding. 2003-11-04 07:45:41 +00:00
jeff
ec019244c7 - Remove kseq_find(), we no longer scan other cpu's run queues when we go
idle.  They figure out that we're idle fast enough that the cache pollution
   introduces by scanning their run queue is more expensive than waiting
   a little longer.
 - Add kseq_setidle() to mark us as being idle.  Use this in place of
   kseq_find().
 - Remove kseq_load_highest(), kseq_find() was the only consumer of this
   interface.  kseq_balance() has it's own customized version that finds the
   lowest and highest loads simultaneously.

Continuously told that this would be faster by:	terry
2003-11-03 03:27:22 +00:00
jeff
2815a31011 - Remove the ksq_loads[] array. We are only interested in three counts,
the total load, the timeshare load, and the number of threads that can
   be migrated to another cpu.  Account for these seperately.
 - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
   can be migrated to another CPU.  Currently, this only checks to see if
   we're an interrupt handler.  Eventually this will also be used to support
   CPU binding.
2003-11-02 10:56:48 +00:00
jeff
fc1a170d75 - In sched_prio() only force us onto the current queue if our priority is
being elevated (numerically smaller).
2003-11-02 04:25:59 +00:00
jeff
b2b3aebceb - Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in
slice assignment.  Add a comment describing what it does.
 - Remove a stale XXX comment, the nice should not impact the interactivity,
   nice adjustments only effect non-interactive tasks in ULE.
 - Don't allow nice -20 tasks to totally starve nice 0 tasks.  Give them at
   least SCHED_SLICE_MIN ticks.  We still allow nice 0 tasks to starve nice
   +20 tasks as intended.
2003-11-02 04:10:15 +00:00
jeff
1a72976008 - Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
   do not have to account for it in the few places that we use it.

Requested by:	bde
2003-11-02 03:49:32 +00:00
jeff
2be3574946 - Change sched_interact_update() to only accept slp+runtime values between
0 and SCHED_SLP_RUN_MAX * 2.  This allows us to simplify the algorithm
   quite a bit.  Before, it dealt with arbitrary values which required us
   to do nasty integer division tricks that didn't quite work out correctly.
 - Chnage sched_wakeup() to detect conditions where the slp+runtime could
   exceed SCHED_SLP_RUN_MAX * 2.  This can happen if we go to sleep for
   longer than 6 seconds.  In this case, we'll just clear the runtime and
   set the sleep time to the max.
 - Define a new function, sched_interact_fork() which updates the slp+runtime
   of a newly forked thread.  We want to limit the amount of history retained
   from the parent so that we learn the child's behavior quickly.  We don't,
   however want to decay it to nothing.  Previously, we would simply divide
   each parameter by 100 whenever we forked.  After a few forks the values
   would reach 0 and tasks would not be considered interactive.
 - Add another KTR entry, cleanup some existing entries.
 - Remove a useless sched_interact_update() from sched_priority().  This is
   already done by the callers that require it.
2003-11-02 03:36:33 +00:00
jeff
1d5138f29a - Add static to local functions and data where it was missing.
- Add an IPI based mechanism for migrating kses.  This mechanism is
   broken down into several components.  This is intended to reduce cache
   thrashing by eliminating most cases where one cpu touches another's
   run queues.
 - kseq_notify() appends a kse to a lockless singly linked list and
   conditionally sends an IPI to the target processor.  Right now this is
   protected by sched_lock but at some point I'd like to get rid of the
   global lock.  This is why I used something more complicated than a
   standard queue.
 - kseq_assign() processes our list of kses that have been assigned to us
   by other processors.  This simply calls sched_add() for each item on the
   list after clearing the new KEF_ASSIGNED flag.  This flag is used to
   indicate that we have been appeneded to the assigned queue but not
   added to the run queue yet.
 - In sched_add(), instead of adding a KSE to another processor's queue we
   use kse_notify() so that we don't touch their queue.  Also in sched_add(),
   if KEF_ASSIGNED is already set return immediately.  This can happen if
   a thread is removed and readded so that the priority is recorded properly.
 - In sched_rem() return immediately if KEF_ASSIGNED is set.  All callers
   immediately readd simply to adjust priorites etc.
 - In sched_choose(), if we're running an IDLE task or the per cpu idle thread
   set our cpumask bit in 'kseq_idle' so that other processors may know that
   we are idle.  Before this, make a single pass through the run queues of
   other processors so that we may find work more immediately if it is
   available.
 - In sched_runnable(), don't scan each processor's run queue, they will IPI
   us if they have work for us to do.
 - In sched_add(), if we're adding a thread that can be migrated and we have
   plenty of work to do, try to migrate the thread to an idle kseq.
 - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
   consideration.
 - No longer use kseq_choose() to steal threads, it can lose it's last
   argument.
 - Create a new function runq_steal() which operates like runq_choose() but
   skips threads based on some criteria.  Currently it will not steal
   PRI_ITHD threads.  In the future this will be used for CPU binding.
 - Create a kseq_steal() that checks each run queue with runq_steal(), use
   kseq_steal() in the places where we used kseq_choose() to steal with
   before.
2003-10-31 11:16:04 +00:00
bde
38ee08e2c9 Removed sched_nest variable in sched_switch(). Context switches always
begin with sched_lock held but not recursed, so this variable was
always 0.

Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch().  Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up.  Only
sched_lock.mtx_lock really needs a fixup.

Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up.  This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
2003-10-29 14:40:41 +00:00
jeff
8805933f41 - Only change the run queue in sched_prio() if the kse is non null. threads
can be in the TD_ON_RUNQ state and not have an associated kse.
 - Remove the PRI_IDLE special case from sched_clock(), it was not actually
   necessary.
2003-10-28 03:28:48 +00:00
jeff
2813e83639 - Use a better algorithm in sched_pctcpu_update()
Contributed by:	Thomaswuerfl@gmx.de

 - In sched_prio(), adjust the run queue for threads which may need to move
   to the current queue due to priority propagation .
 - In sched_switch(), fix style bug introduced when the KSE support went in.
   Columns are 80 chars wide, not 90.
 - In sched_switch(), Fix the comparison in the idle case and explicitly
   re-initialize the runq in the not propagated case.
 - Remove dead code in sched_clock().
 - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
   that have become runnable will get a chance to.
 - In sched_runnable(), if we're not the IDLETD, we should not consider
   curthread when examining the load.  This mimics the 4BSD behavior of
   returning 0 when the only runnable thread is running.
 - In sched_userret(), remove the code for setting NEEDRESCHED entirely.
   This is not necessary and is not implemented in 4BSD.
 - Use the correct comparison in sched_add() when checking to see if an idle
   prio task has had it's priority temporarily elevated.
2003-10-27 06:47:05 +00:00
jeff
d191470e5b - If a thread is not bound to a kse return 0 from sched_pctcpu().
Reported by:	 pawel.worach@nordea.com
2003-10-20 19:55:21 +00:00
jeff
43fc476731 - Only kse_reassign() in the !running case.
Reported by:	kris
2003-10-16 20:32:57 +00:00
jeff
bd49f43790 - Call sched_add() with the correct argument on SMP.
Reported by:	Valentin Chopov <valentin@valcho.net>
2003-10-16 20:06:19 +00:00
jeff
ae9dcb1e7c - Fix a minor problem with my last commit, we don't want to return from
sched_switch if the thread is running, we want to fall through and pick
   a new thread because we have been preempted.
2003-10-16 10:04:54 +00:00
jeff
c7b38e5072 - Collapse sched_switchin() and sched_switchout() into sched_switch(). Now
mi_switch() calls sched_switch() which calls cpu_switch().  This is
   actually one less function call than it had been.
2003-10-16 08:53:46 +00:00
jeff
0e11a68932 - Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td
argument rather than a kse.
2003-10-16 08:39:15 +00:00
jeff
da51f9561a - The non iterative algorithm for interact_update was broken due to
rounding errors.  This was the source of the majority of the
   interactivity problems.  Reintroduce the old algorithm and its XXX.
 - Up the interactivity threshold to 30.  It really could stand to be even
   a tiny bit higher.
 - Let the sleep and run time accumulate up to 5 seconds of history rather
   than two.  This helps stop XFree86 from becoming non-interactive during
   bursts of activity.
2003-10-16 08:17:43 +00:00
jeff
7c8fe7f222 - If our user_pri doesn't match our actual priority our priority has been
elevated either due to priority propagation or because we're in the
   kernel in either case, put us on the current queue so that we dont
   stop others from using important resources.  At some point the priority
   elevations from sleeping in the kernel should go away.
 - Remove an optimization in sched_userret().  Before we would only set
   NEEDRESCHED if there was something of a higher priority available.  This
   is a trivial optimization and it breaks priority propagation because it
   doesn't take threads which we may be blocking into account.  Notice that
   the thread which is blocking others gets up to one tick of cpu time before
   we honor this NEEDRESCHED in sched_clock().
2003-10-15 07:47:06 +00:00
jeff
a483e85d3a - In SCHED_CURR() add holding Giant to the list of criteria that will keep
you on the current queue.  In the future, it would be nice if priority
   propagation could deterministicly pluck a thread off of the next queue
   and put it on the current queue.  Until then this hack stops us from
   holding up our entire current queue, including interrupt handlers, while
   a thread on the next queue is blocked while holding Giant.
 - Inherit our pctcpu information from our parent.
2003-10-12 21:07:31 +00:00
jeff
7450fbaf7c - Change a lame iterative algorithm to a constant time algorithm. Remove
the XXX that complains about it as well.

Submitted by:	ThomasWuerfl@gmx.de
2003-10-04 17:41:13 +00:00
jeff
6d14292e55 - Somewhere along the line I stupidly removed critical logic from
sched_ptcpu_update().  This caused erroneous cpu times in TOP for
   processes that were asleep.  Replace the code that was removed.
2003-09-20 02:05:58 +00:00
davidxu
f840df8e8a Let SA process work under ULE scheduler, originally it would panic kernel.
Reviewed by: jeff
2003-08-26 11:33:15 +00:00
sam
77a4adf88f Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
jeff
3214436a8e - When stealing a kse in kseq_move() ignore the current kseq's min nice
value.  We want to steal any thread, even one that is not given a slice
   on its current queue.
2003-07-08 06:19:40 +00:00
jeff
530634290e - Clean up an unused variable.
Submitted by:	Steve Kargl <skg@routmask.apl.washington.edu>
2003-07-07 21:08:28 +00:00
jeff
9cf0862530 - Parse the cpu topology map in sched_setup().
- Associate logical CPUs on the same physical core with the same kseq.
 - Adjust code that assumed there would only be one running thread in any
   kseq.
 - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef.  This is a start
   towards HyperThreading support but it isn't quite there yet.
2003-07-04 19:59:00 +00:00
jeff
d6b1383aa5 - Don't migrate to stopped cpus. 2003-06-28 09:09:33 +00:00
jeff
2ead0e377a - If smp is not started yet don't try to load balance or we'll put threads
on cpus that aren't running yet.
2003-06-28 08:24:42 +00:00
jeff
f2d12a9a0f - Throttle the inherited sleep and run time in sched_fork_kseg(). This
allows us to learn the behavior of a thread much more quickly after it
   starts up.
2003-06-28 06:19:56 +00:00
jeff
de4acd13c4 - Adjust the default maximum slice value to ~140ms. This has improved the
nice distribution without significantly impacting interactive response.
   As a side effect it should also allow batch processes to run for a
   slightly longer period which will positively impact their performance.
2003-06-28 06:04:47 +00:00
jeff
ef73855fa0 - lticks was erroneously being updated in sched_pctcpu(). This was causing
us to skip the pctcpu_update() call which lead to inaccurate cpu usage
   statistics for processes that didn't run often.
2003-06-21 02:31:49 +00:00
jeff
66f14a8109 - Don't allow nice to have such a large effect on priority. This was
causing poor interactive performance while unnice processes were running.
   The new scheme still allows nice to have an effect on priority but it is
   not as dramatic as the effect of the interactivity score.
2003-06-21 02:22:47 +00:00
jeff
ae5a61d3ab - Use a more robust mechanism for determining whether or not a kse is on a
kseq.
2003-06-17 19:49:18 +00:00
jeff
ef2a619ae8 - Temporarily patch a problem where the interact score could be negative
because the run time exceeds the largest value a signed int can hold.
   The real solution involves calculating how far we are over the limit.
   To quickly solve this problem we loop removing 1/5th of the current value
   until it falls below the limit.  The common case requires no passes.
2003-06-17 10:21:34 +00:00
jeff
81d8e072f0 - Add a new function "sched_interact_update()" that scales back the sleep
and run time.
 - Scale the sleep and run time back via sched_interact_update() in more
   places.  This is to keep the statistic more accurate.
 - Charge a parent one tick for forking a child.
 - Add only the run time and not the sleep time to the parents kg when a
   thread exits.  This allows us to give a penalty for having an expensive
   thread exit but does not give a bonus for having an interactive thread
   exit.
 - Change the SLP_RUN_THROTTLE to limit us to 4/5th and not 1/2.
 - Change the SLP_RUN_MAX to two seconds.  This keeps bursty interactive
   applications like mozilla and openoffice in the interactive range even
   through expensive tasks.
 - Recalculate the slice after every sleep.  This ensures that once a task
   has been marked interactive it only has a slice of 1 at the risk of
   giving tasks that sleep for a very brief period a longer time slice.
2003-06-17 06:39:51 +00:00
jeff
7b6397c86b - Increase the ksegrp's cpu time history buffer to 250ms.
- Decrease the history buffer divisor to 2 so that we remember more of the
   old behavior.
2003-06-15 04:14:25 +00:00
jeff
90c6fbd963 - Cap the growth of sleep and run time in sched_exit_kse(). 2003-06-15 02:52:29 +00:00