freebsd-skq

Author	SHA1	Message	Date
jeff	280a86b2ee	- Improve grammar. s/it's/its/. - Improve load long-term load balancer by always IPIing exactly once. Previously the delay after rebalancing could cause problems with uneven workloads. - Allow nice to have a linear effect on the interactivity score. This allows negatively niced programs to stay interactive longer. It may be useful with very expensive Xorg servers under high loads. In general it should not be necessary to alter the nice level to improve interactive response. We may also want to consider never allowing positively niced processes to become interactive at all. - Initialize ccpu to 0 rather than 0.0. The decimal point was leftover from when the code was copied from 4bsd. ccpu is 0 in ULE because ULE only exports weighted cpu values. Reported by: Steve Kargl (Load balancing problem) Approved by: re	2007-09-22 02:20:14 +00:00
jeff	bc0eadb21d	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 04:10:23 +00:00
jeff	3fc0f8b973	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
jeff	0f3cc9a72e	- Set steal_thresh to log2(ncpus). This improves idle-time load balancing on 2cpu machines by reducing it to 1 by default. This improves loaded operation on 8cpu machines by increasing it to 3 where the extra idle time is not as critical. Approved by: re	2007-08-20 06:34:20 +00:00
jeff	155be8b518	- Fix one line that erroneously crept in my last commit. Approved by: re	2007-08-04 01:21:28 +00:00
jeff	8b42bd66ad	- Share scheduler locks between hyper-threaded cores to protect the tdq_group structure. Hyper-threaded cores won't really benefit from seperate locks anyway. - Seperate out the migration case from sched_switch to simplify the main switch code. We only migrate here if called via sched_bind(). - When preempted place the preempted thread back in the same queue at the head. - Improve the cpu group and topology infrastructure. Tested by: many on current@ Approved by: re	2007-08-03 23:38:46 +00:00
jeff	e2ebe96ef4	- Refine the load balancer to improve buildkernel times on dual core machines. - Leave the long-term load balancer running by default once per second. - Enable stealing load from the idle thread only when the remote processor has more than two transferable tasks. Setting this to one further improves buildworld. Setting it higher improves mysql. - Remove the bogus pick_zero option. I had not intended to commit this. - Entirely disallow migration for threads with SRQ_YIELDING set. This balances out the extra migration allowed for with the load balancers. It also makes pick_pri perform better as I had anticipated. Tested by: Dmitry Morozovsky <marck@rinet.ru> Approved by: re	2007-07-19 20:03:15 +00:00
jeff	550dacee12	- When newtd is specified to sched_switch() it was not being initialized properly. We have to temporarily unlock the TDQ lock so we can lock the thread and add it to the run queue. This is used only for KSE. - When we add a thread from the tdq_move() via sched_balance() we need to ipi the target if it's sitting in the idle thread or it'll never run. Reported by: Rene Landan Approved by: re	2007-07-19 19:51:45 +00:00
jeff	119fa6328d	ULE 3.0: Fine grain scheduler locking and affinity improvements. This has been in development for over 6 months as SCHED_SMP. - Implement one spin lock per thread-queue. Threads assigned to a run-queue point to this lock via td_lock. - Improve the facility for assigning threads to CPUs now that sched_lock contention no longer dominates scheduling decisions on larger SMP machines. - Re-write idle time stealing in an attempt to make it less damaging to general performance. This is still disabled by default. See kern.sched.steal_idle. - Call the long-term load balancer from a callout rather than sched_clock() so there are no locks held. This is disabled by default. See kern.sched.balance. - Parameterize many scheduling decisions via sysctls. Try to document these via sysctl descriptions. - General structural and naming cleanups. - Document each function with comments. Tested by: current@ amd64, x86, UP, SMP. Approved by: re	2007-07-17 22:53:23 +00:00
jeff	6164fe7d05	- Fix an off by one error in sched_pri_range. - In tdq_choose() only assert that a thread does not have too high a priority (low value) for the queue we removed it from. This will catch bugs in priority elevation. It's not a serious error for the thread to have too low a priority as we don't change queues in this case as an optimization. Reported by: kris	2007-06-15 19:33:58 +00:00
jeff	bc31b141bb	- Move some common code out of sched_fork_exit() and back into fork_exit().	2007-06-12 07:47:09 +00:00
jeff	7a9c95c100	- Placing the 'volatile' on the right side of the * in the td_lock declaration removes the need for __DEVOLATILE(). Pointed out by: tegge	2007-06-06 03:40:47 +00:00
jeff	dcba3b4fa8	- Better fix for previous error; use DEVOLATILE on the td_lock pointer it can actually sometimes be something other than sched_lock even on schedulers which rely on a global scheduler lock. Tested by: kan	2007-06-05 04:12:46 +00:00
jeff	bec3346df4	- Pass &sched_lock as the third argument to cpu_switch() as this will always be the correct lock and we don't get volatile warnings this way. Pointed out by: kan	2007-06-05 03:46:54 +00:00
jeff	c5295bd51b	- Define TDQ_ID() for the !SMP case. - Default pick_pri to off. It is not faster in most cases.	2007-06-05 02:53:51 +00:00
jeff	186ae07cb6	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:30 +00:00
kmacy	d7e93cb21b	Schedule the ithread on the same cpu as the interrupt Tested by: kmacy Submitted by: jeffr	2007-04-20 05:45:46 +00:00
jeff	07c8782e65	- Handle the case where slptime == runtime. Submitted by: Atoine Brodin	2007-03-17 23:32:48 +00:00
jeff	82143f95ad	- Cast the intermediate value in priority computtion back down to unsigned char. Weirdly, casting the 1 constant to u_char still produces a signed integer result that is then used in the % computation. This avoids that mess all together and causes a 0 pri to turn into 255 % 64 as we expect. Reported by: kkenn (about 4 times, thanks)	2007-03-17 18:13:32 +00:00
julian	80d6cde009	Instead of doing comparisons using the pcpu area to see if a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.	2007-03-08 06:44:34 +00:00
kmacy	6508c4f27b	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review	2007-02-26 08:26:44 +00:00
jeff	7038c5de35	- Change types for necent runq additions to u_char rather than int. - Fix these types in ULE as well. This fixes bugs in priority index calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64. Reported by: kkenn using pho's stress2.	2007-02-08 01:52:25 +00:00
jeff	0f05ca9b5b	- Implement much more intelligent ipi sending. This algorithm tries to minimize IPIs and rescheduling when scheduling like tasks while keeping latency low for important threads. 1) An idle thread is running. 2) The current thread is worse than realtime and the new thread is better than realtime. Realtime to realtime doesn't preempt. 3) The new thread's priority is less than the threshold.	2007-01-25 23:51:59 +00:00
jeff	94085f7612	- Get rid of the unused DIDRUN flag. This was really only present to support sched_4bsd. - Rename the KTR level for non schedgraph parsed events. They take event space from things we'd like to graph. - Reset our slice value after we sleep. The slice is simply there to prevent starvation among equal priorities. A thread which had almost exhausted it's slice and then slept doesn't need to be rescheduled a tick after it wakes up. - Set the maximum slice value to a more conservative 100ms now that it is more accurately enforced.	2007-01-25 19:14:11 +00:00
jeff	743ea48fbc	- With a sleep time over 2097 seconds hzticks and slptime could end up negative. Use unsigned integers for sleep and run time so this doesn't disturb sched_interact_score(). This should fix the invalid interactive priority panics reported by several users.	2007-01-24 18:18:43 +00:00
jeff	8fd8265087	- Catch up to setrunqueue/choosethread/etc. api changes. - Define our own maybe_preempt() as sched_preempt(). We want to be able to preempt idlethread in all cases. - Define our idlethread to require preemption to exit. - Get the cpu estimation tick from sched_tick() so we don't have to worry about errors from a sampling interval that differs from the time domain. This was the source of sched_priority prints/panics and inaccurate pctcpu display in top.	2007-01-23 08:50:34 +00:00
jeff	5fd995e14a	- Disable the long-term load balancer. I believe that steal_busy works better and gives more predictable results.	2007-01-20 21:24:05 +00:00
jeff	a1996060b3	- We do need to IPI the idlethread on some systems. It may be stuck in a power saving mode otherwise. - If the thread is already bound in sched_bind() unbind it before re-binding it to a new cpu. I don't like these semantics but they are expected by some code in the tree. Patch by jkoshy.	2007-01-20 17:03:33 +00:00
jeff	3f693f3417	- In tdq_transfer() always set NEEDRESCHED when necessary regardless of the ipi settings. If NEEDRESCHED is set and an ipi is later delivered it will clear it rather than cause extra context switches. However, if we miss setting it we can have terrible latency. - In sched_bind() correctly implement bind. Also be slightly more tolerant of code which calls bind multiple times. However, we don't change binding if another call is made with a different cpu. This does not presently work with hwpmc which I believe should be changed.	2007-01-20 09:03:43 +00:00
jeff	a5cccc05cb	Major revamp of ULE's cpu load balancing: - Switch back to direct modification of remote CPU run queues. This added a lot of complexity with questionable gain. It's easy enough to reimplement if it's shown to help on huge machines. - Re-implement the old tdq_transfer() call as tdq_pickidle(). Change sched_add() so we have selectable cpu choosers and simplify the logic a bit here. - Implement tdq_pickpri() as the new default cpu chooser. This algorithm is similar to Solaris in that it tries to always run the threads with the best priorities. It is actually slightly more complex than solaris's algorithm because we also tend to favor the local cpu over other cpus which has a boost in latency but also potentially enables cache sharing between the waking thread and the woken thread. - Add a bunch of tunables that can be used to measure effects of different load balancing strategies. Most of these will go away once the algorithm is more definite. - Add a new mechanism to steal threads from busy cpus when we idle. This is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The threshold is the required length of a tdq's run queue before another cpu will be able to steal runnable threads. This prevents most queue imbalances that contribute the long latencies.	2007-01-19 21:56:08 +00:00
jeff	0f9511e94e	- Don't let SCHED_TICK_TOTAL() return less than hz. This can cause integer divide faults in roundup() later if it is able to return 0. For some reason this bug only shows up on my laptop and not my testboxes.	2007-01-06 12:33:43 +00:00
jeff	b19ed2c7b0	- Fix the sched_priority() invalid priority bugs. Use roundup() instead of max() when computing the divisor in SCHED_TICK_PRI(). This prevents cases where rounding down would allow the quotient to exceed SCHED_PRI_RANGE. - Garbage collect some unused flags and fields. - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply duplicated this functionality. - Re-enable the rebalancer by default and fix the sysctl so it can be modified.	2007-01-06 08:44:13 +00:00
jeff	80a97a8d5e	- Don't IPI unless we're going to interrupt something exiting in the kernel. otherwise we can afford the latency. This makes a significant performance improvement.	2007-01-06 02:34:23 +00:00
jeff	08aa1698e7	- Fix a comparison in sched_choose() that caused cpus to be constantly marked idle, thus breaking cpu load balancing. - Change sched_interact_update() to fix cases where the stored history has expanded significantly rather than handling them in the callers. This fixes a case where sched_priority() could compute a bad value. - Add a sysctl to disable the global load balancer for experimentation.	2007-01-05 23:45:38 +00:00
jeff	47d8080afa	- ftick was initialized to -1 for init and any of it's children. Fix this by setting ftick = ltick = ticks in schedinit(). - Update the priority when we are pulled off of the run queue and when we are inserted onto the run queue so that it more accurately reflects our present status. This is important for efficient priority propagation functioning. - Move the frequency test into sched_pctcpu_update() so we don't repeat it each time we'd like to call it. - Put some temporary work-around code in sched_priority() in case the tick mechanism produces a bad priority. Eventually this should revert to an assert again.	2007-01-05 08:50:38 +00:00
jeff	ae96850d62	- Only allow the tdq_idx to increase by one each tick rather than up to the most recently chosen index. This significantly improves nice behavior. This allows a lower priority thread to run some multiple of times before the higher priority thread makes it to the front of the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice difference of 1 makes a 4% difference in cpu usage between two hogs. - Track a seperate insert and removal index. When the removal index is empty it is updated to point at the current insert index. - Don't remove and re-add a thread to the runq when it is being adjusted down in priority. - Pull some conditional code out of sched_tick(). It's looking a bit large now.	2007-01-04 12:16:19 +00:00
jeff	2c3282f28a	ULE 2.0: - Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc. - Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time. - Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue. - Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads. - Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads. - Add some new sysctls for scheduling parameters. Bug fixes/Clean up: - Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal. - Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads. - Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz. - Lots of new and updated comments. - Many many others. Tested on: up x86/amd64, 8way amd64.	2007-01-04 08:56:25 +00:00
jeff	9c815f4892	- More search and replace prettying.	2006-12-29 12:55:32 +00:00
jeff	e74edb3876	- Clean up a bit after the most recent KSE restructuring.	2006-12-29 10:37:07 +00:00
julian	948c671f4a	Changes to try fix sched_ule.c courtesy of David Xu.	2006-12-06 06:55:59 +00:00
julian	396ed947f6	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
maxim	75bc36366a	o Fix a couple of obvious typos.	2006-11-08 09:09:07 +00:00
jb	f82c799735	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
davidxu	8e3f035b3f	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
davidxu	a4976ce481	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
davidxu	82b666ed4a	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00
davidxu	75c47e2f15	Make ke_rqindex unsigned.	2006-06-06 12:26:17 +00:00
davidxu	50b9056f24	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
davidxu	9dda14459d	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
davidxu	ebee9317bd	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
davidxu	fae38f5558	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
davidxu	fee0e762f4	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
davidxu	811c17378d	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
delphij	049e7675aa	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
peter	c9db8c3eb4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
jeff	6eaf0bed4f	- Fix the case where we're not preempting but there is already a newtd as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.	2005-06-07 02:59:16 +00:00
jeff	d33221f20a	- It's 2005 already, I've been working on this for three years.	2005-06-04 09:24:15 +00:00
jeff	c720bf6f50	- Don't SLOT_USE() in the preempt case, sched_add() has already taken the slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.	2005-06-04 09:23:28 +00:00
jkoshy	dc3444cd91	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
ups	7bac02c146	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
jeff	45e4d1fa2a	- A test in sched_switch() is no longer necessary and it is incorrect when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>	2005-02-23 00:50:26 +00:00
jeff	ef8ea3a09d	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
jhb	ad4d00347b	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
jhb	3f307e93e3	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
jeff	94a75d08c3	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
jeff	ceca9b8f9e	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
jeff	d378b46f4e	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00
jeff	c94fadce10	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.	2004-12-14 10:53:55 +00:00
jeff	422a07b8e1	- In kseq_choose(), don't recalculate slice values for processes with a nice of 0. Doing so can cause an infinite loop because they should be running, but a nice -20 process could prevent them from doing so. - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority elevated due to priority propagation. If a thread has had its priority elevated, we assume that it must go on the current queue and it must get a slice. - In sched_userret() if our priority was elevated and we shouldn't have a timeslice, yield here until we should. Found/Tested by: glebius	2004-12-14 10:34:27 +00:00
jeff	00f132eec6	- Take up a 'slot' while we're on the assigned queue, waiting to be posted to another processor. Otherwise, kern_switch() gets confused and tries to sched_add(NULL).	2004-12-13 13:09:33 +00:00
jeff	311bbf4546	- Temporarily disable the nice -20 throttling code. It has some interaction with APM that I do not understand yet. Reported & Tested by: glebius	2004-11-11 19:48:57 +00:00
jeff	917f19c039	- When choosing a thread on the run queue, check to see if its nice is outside of the nice threshold due to a recently awoken thread with a lower nice value. This further reduces the amount of time a positively niced thread gets while running in conjunction with a workload that has many short sleeps (ie buildworld).	2004-10-30 12:19:15 +00:00
jeff	d8f987468c	- In sched_prio() check to see if the kse is assigned to a runq as the check for TD_ON_RUNQ() no longer means the thread is really on a run- queue. I suspect this state should be re-evaluated as it must mean something else now. This fixes ULE+KSE+PREEMPTION on UP x86.	2004-10-30 07:35:53 +00:00
julian	d5dfe59f9e	Fix whitespace botch that only showed up in the commit message diff :-/ MFC after: 4 days	2004-10-05 22:14:02 +00:00
julian	57fb03da54	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
julian	7d0504ed38	Oops. left out part of the diff. MFC after: 4 days	2004-10-05 21:26:27 +00:00
julian	7b170fd9fa	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
julian	6461286b21	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
scottl	1e56230631	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
julian	4b041e0e33	Try committing from the right tree this time MFC after: 2 days	2004-09-11 00:11:09 +00:00
julian	7cae3c9d5b	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
julian	9993c65718	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
julian	5813d27029	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
scottl	d9af98161a	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
julian	e9d9514975	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
peter	587d1d74f3	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
jeff	8745e98dd0	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
jeff	b109ddffbc	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
kan	48b2ea77a3	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
scottl	f72aae4cb2	Clean up whitespace, increase consistency and correctness. Submitted by: bde	2004-07-23 23:09:00 +00:00
julian	a488bebcd2	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
jhb	0cb3276d57	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
marcel	fdcee40a0e	Update for the KDB framework: o Call kdb_backtrace() instead of backtrace().	2004-07-10 21:38:22 +00:00
jhb	eeb3c91445	- Move contents of sched_add() into a sched_add_internal() function that takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net	2004-07-08 21:45:04 +00:00
rwatson	fef549cb01	Temporarily disable preemption in SCHED_ULE due to reported panics and hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!	2004-07-06 05:57:29 +00:00
phk	abeab8c454	Add NULL arg to mi_switch() call to stop kernel compiles from breaking.	2004-07-03 16:57:51 +00:00
bmilekic	067d8e4e13	Fix SCHED_ULE build on SMP. The previous revision (1.110) introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.	2004-07-03 01:19:46 +00:00
jhb	696704716d	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
jhb	1b16b181d1	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
scottl	95c9bdb62a	Add the sysctl node 'kern.sched.name' that has the name of the scheduler currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.	2004-06-21 22:05:46 +00:00

1 2 3 4 5 ...

257 Commits