freebsd-nq

Author	SHA1	Message	Date
Jeff Roberson	52bc574cc7	- Handle the case where slptime == runtime. Submitted by: Atoine Brodin	2007-03-17 23:32:48 +00:00
Jeff Roberson	4499aff6ec	- Cast the intermediate value in priority computtion back down to unsigned char. Weirdly, casting the 1 constant to u_char still produces a signed integer result that is then used in the % computation. This avoids that mess all together and causes a 0 pri to turn into 255 % 64 as we expect. Reported by: kkenn (about 4 times, thanks)	2007-03-17 18:13:32 +00:00
Julian Elischer	486a941418	Instead of doing comparisons using the pcpu area to see if a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.	2007-03-08 06:44:34 +00:00
Kip Macy	fe68a91631	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review	2007-02-26 08:26:44 +00:00
Jeff Roberson	ed0e8f2fe9	- Change types for necent runq additions to u_char rather than int. - Fix these types in ULE as well. This fixes bugs in priority index calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64. Reported by: kkenn using pho's stress2.	2007-02-08 01:52:25 +00:00
Jeff Roberson	fc3a97dcb7	- Implement much more intelligent ipi sending. This algorithm tries to minimize IPIs and rescheduling when scheduling like tasks while keeping latency low for important threads. 1) An idle thread is running. 2) The current thread is worse than realtime and the new thread is better than realtime. Realtime to realtime doesn't preempt. 3) The new thread's priority is less than the threshold.	2007-01-25 23:51:59 +00:00
Jeff Roberson	1461899028	- Get rid of the unused DIDRUN flag. This was really only present to support sched_4bsd. - Rename the KTR level for non schedgraph parsed events. They take event space from things we'd like to graph. - Reset our slice value after we sleep. The slice is simply there to prevent starvation among equal priorities. A thread which had almost exhausted it's slice and then slept doesn't need to be rescheduled a tick after it wakes up. - Set the maximum slice value to a more conservative 100ms now that it is more accurately enforced.	2007-01-25 19:14:11 +00:00
Jeff Roberson	9a93305a2e	- With a sleep time over 2097 seconds hzticks and slptime could end up negative. Use unsigned integers for sleep and run time so this doesn't disturb sched_interact_score(). This should fix the invalid interactive priority panics reported by several users.	2007-01-24 18:18:43 +00:00
Jeff Roberson	7a5e5e2a59	- Catch up to setrunqueue/choosethread/etc. api changes. - Define our own maybe_preempt() as sched_preempt(). We want to be able to preempt idlethread in all cases. - Define our idlethread to require preemption to exit. - Get the cpu estimation tick from sched_tick() so we don't have to worry about errors from a sampling interval that differs from the time domain. This was the source of sched_priority prints/panics and inaccurate pctcpu display in top.	2007-01-23 08:50:34 +00:00
Jeff Roberson	5cea64d54f	- Disable the long-term load balancer. I believe that steal_busy works better and gives more predictable results.	2007-01-20 21:24:05 +00:00
Jeff Roberson	c95d2db298	- We do need to IPI the idlethread on some systems. It may be stuck in a power saving mode otherwise. - If the thread is already bound in sched_bind() unbind it before re-binding it to a new cpu. I don't like these semantics but they are expected by some code in the tree. Patch by jkoshy.	2007-01-20 17:03:33 +00:00
Jeff Roberson	6b2f763f7c	- In tdq_transfer() always set NEEDRESCHED when necessary regardless of the ipi settings. If NEEDRESCHED is set and an ipi is later delivered it will clear it rather than cause extra context switches. However, if we miss setting it we can have terrible latency. - In sched_bind() correctly implement bind. Also be slightly more tolerant of code which calls bind multiple times. However, we don't change binding if another call is made with a different cpu. This does not presently work with hwpmc which I believe should be changed.	2007-01-20 09:03:43 +00:00
Jeff Roberson	7b8bfa0de9	Major revamp of ULE's cpu load balancing: - Switch back to direct modification of remote CPU run queues. This added a lot of complexity with questionable gain. It's easy enough to reimplement if it's shown to help on huge machines. - Re-implement the old tdq_transfer() call as tdq_pickidle(). Change sched_add() so we have selectable cpu choosers and simplify the logic a bit here. - Implement tdq_pickpri() as the new default cpu chooser. This algorithm is similar to Solaris in that it tries to always run the threads with the best priorities. It is actually slightly more complex than solaris's algorithm because we also tend to favor the local cpu over other cpus which has a boost in latency but also potentially enables cache sharing between the waking thread and the woken thread. - Add a bunch of tunables that can be used to measure effects of different load balancing strategies. Most of these will go away once the algorithm is more definite. - Add a new mechanism to steal threads from busy cpus when we idle. This is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The threshold is the required length of a tdq's run queue before another cpu will be able to steal runnable threads. This prevents most queue imbalances that contribute the long latencies.	2007-01-19 21:56:08 +00:00
Jeff Roberson	eddb4efacd	- Don't let SCHED_TICK_TOTAL() return less than hz. This can cause integer divide faults in roundup() later if it is able to return 0. For some reason this bug only shows up on my laptop and not my testboxes.	2007-01-06 12:33:43 +00:00
Jeff Roberson	1e516cf534	- Fix the sched_priority() invalid priority bugs. Use roundup() instead of max() when computing the divisor in SCHED_TICK_PRI(). This prevents cases where rounding down would allow the quotient to exceed SCHED_PRI_RANGE. - Garbage collect some unused flags and fields. - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply duplicated this functionality. - Re-enable the rebalancer by default and fix the sysctl so it can be modified.	2007-01-06 08:44:13 +00:00
Jeff Roberson	9330bbbb61	- Don't IPI unless we're going to interrupt something exiting in the kernel. otherwise we can afford the latency. This makes a significant performance improvement.	2007-01-06 02:34:23 +00:00
Jeff Roberson	155b6ca12b	- Fix a comparison in sched_choose() that caused cpus to be constantly marked idle, thus breaking cpu load balancing. - Change sched_interact_update() to fix cases where the stored history has expanded significantly rather than handling them in the callers. This fixes a case where sched_priority() could compute a bad value. - Add a sysctl to disable the global load balancer for experimentation.	2007-01-05 23:45:38 +00:00
Jeff Roberson	8ab80cf009	- ftick was initialized to -1 for init and any of it's children. Fix this by setting ftick = ltick = ticks in schedinit(). - Update the priority when we are pulled off of the run queue and when we are inserted onto the run queue so that it more accurately reflects our present status. This is important for efficient priority propagation functioning. - Move the frequency test into sched_pctcpu_update() so we don't repeat it each time we'd like to call it. - Put some temporary work-around code in sched_priority() in case the tick mechanism produces a bad priority. Eventually this should revert to an assert again.	2007-01-05 08:50:38 +00:00
Jeff Roberson	3f872f85d2	- Only allow the tdq_idx to increase by one each tick rather than up to the most recently chosen index. This significantly improves nice behavior. This allows a lower priority thread to run some multiple of times before the higher priority thread makes it to the front of the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice difference of 1 makes a 4% difference in cpu usage between two hogs. - Track a seperate insert and removal index. When the removal index is empty it is updated to point at the current insert index. - Don't remove and re-add a thread to the runq when it is being adjusted down in priority. - Pull some conditional code out of sched_tick(). It's looking a bit large now.	2007-01-04 12:16:19 +00:00
Jeff Roberson	e7d50326de	ULE 2.0: - Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc. - Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time. - Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue. - Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads. - Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads. - Add some new sysctls for scheduling parameters. Bug fixes/Clean up: - Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal. - Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads. - Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz. - Lots of new and updated comments. - Many many others. Tested on: up x86/amd64, 8way amd64.	2007-01-04 08:56:25 +00:00
Jeff Roberson	c02bbb43a0	- More search and replace prettying.	2006-12-29 12:55:32 +00:00
Jeff Roberson	d2ad694caa	- Clean up a bit after the most recent KSE restructuring.	2006-12-29 10:37:07 +00:00
Julian Elischer	fc6c30f6c6	Changes to try fix sched_ule.c courtesy of David Xu.	2006-12-06 06:55:59 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Maxim Konovalov	f645b5da88	o Fix a couple of obvious typos.	2006-11-08 09:09:07 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
David Xu	36ec198bd5	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
David Xu	b41f1452d9	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00
David Xu	0ae716e5ee	Make ke_rqindex unsigned.	2006-06-06 12:26:17 +00:00
David Xu	9f8eb3cb52	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
David Xu	a1d4fe69d2	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
David Xu	a861574011	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
David Xu	f8ec133ed0	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
David Xu	1278181c6c	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
David Xu	3d16f519b6	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
Xin LI	05a6b7ad62	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Jeff Roberson	6680bbd529	- Fix the case where we're not preempting but there is already a newtd as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.	2005-06-07 02:59:16 +00:00
Jeff Roberson	9fe02f7e16	- It's 2005 already, I've been working on this for three years.	2005-06-04 09:24:15 +00:00
Jeff Roberson	21381d1b9e	- Don't SLOT_USE() in the preempt case, sched_add() has already taken the slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.	2005-06-04 09:23:28 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
Jeff Roberson	7a9507b60e	- A test in sched_switch() is no longer necessary and it is incorrect when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>	2005-02-23 00:50:26 +00:00
Jeff Roberson	42a29039de	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
John Baldwin	50aaa791ba	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
Jeff Roberson	2ebf8eb132	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
Jeff Roberson	598b368d6c	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
Jeff Roberson	81d47d3f4b	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00

1 2 3 4

190 Commits