freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	1e516cf534	- Fix the sched_priority() invalid priority bugs. Use roundup() instead of max() when computing the divisor in SCHED_TICK_PRI(). This prevents cases where rounding down would allow the quotient to exceed SCHED_PRI_RANGE. - Garbage collect some unused flags and fields. - Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply duplicated this functionality. - Re-enable the rebalancer by default and fix the sysctl so it can be modified.	2007-01-06 08:44:13 +00:00
Jeff Roberson	9330bbbb61	- Don't IPI unless we're going to interrupt something exiting in the kernel. otherwise we can afford the latency. This makes a significant performance improvement.	2007-01-06 02:34:23 +00:00
Jeff Roberson	155b6ca12b	- Fix a comparison in sched_choose() that caused cpus to be constantly marked idle, thus breaking cpu load balancing. - Change sched_interact_update() to fix cases where the stored history has expanded significantly rather than handling them in the callers. This fixes a case where sched_priority() could compute a bad value. - Add a sysctl to disable the global load balancer for experimentation.	2007-01-05 23:45:38 +00:00
Jeff Roberson	8ab80cf009	- ftick was initialized to -1 for init and any of it's children. Fix this by setting ftick = ltick = ticks in schedinit(). - Update the priority when we are pulled off of the run queue and when we are inserted onto the run queue so that it more accurately reflects our present status. This is important for efficient priority propagation functioning. - Move the frequency test into sched_pctcpu_update() so we don't repeat it each time we'd like to call it. - Put some temporary work-around code in sched_priority() in case the tick mechanism produces a bad priority. Eventually this should revert to an assert again.	2007-01-05 08:50:38 +00:00
Jeff Roberson	3f872f85d2	- Only allow the tdq_idx to increase by one each tick rather than up to the most recently chosen index. This significantly improves nice behavior. This allows a lower priority thread to run some multiple of times before the higher priority thread makes it to the front of the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice difference of 1 makes a 4% difference in cpu usage between two hogs. - Track a seperate insert and removal index. When the removal index is empty it is updated to point at the current insert index. - Don't remove and re-add a thread to the runq when it is being adjusted down in priority. - Pull some conditional code out of sched_tick(). It's looking a bit large now.	2007-01-04 12:16:19 +00:00
Jeff Roberson	e7d50326de	ULE 2.0: - Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc. - Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time. - Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue. - Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads. - Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads. - Add some new sysctls for scheduling parameters. Bug fixes/Clean up: - Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal. - Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads. - Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz. - Lots of new and updated comments. - Many many others. Tested on: up x86/amd64, 8way amd64.	2007-01-04 08:56:25 +00:00
Jeff Roberson	c02bbb43a0	- More search and replace prettying.	2006-12-29 12:55:32 +00:00
Jeff Roberson	d2ad694caa	- Clean up a bit after the most recent KSE restructuring.	2006-12-29 10:37:07 +00:00
Julian Elischer	fc6c30f6c6	Changes to try fix sched_ule.c courtesy of David Xu.	2006-12-06 06:55:59 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Maxim Konovalov	f645b5da88	o Fix a couple of obvious typos.	2006-11-08 09:09:07 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
David Xu	36ec198bd5	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
David Xu	b41f1452d9	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00
David Xu	0ae716e5ee	Make ke_rqindex unsigned.	2006-06-06 12:26:17 +00:00
David Xu	9f8eb3cb52	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
David Xu	a1d4fe69d2	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
David Xu	a861574011	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
David Xu	f8ec133ed0	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
David Xu	1278181c6c	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
David Xu	3d16f519b6	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
Xin LI	05a6b7ad62	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Jeff Roberson	6680bbd529	- Fix the case where we're not preempting but there is already a newtd as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.	2005-06-07 02:59:16 +00:00
Jeff Roberson	9fe02f7e16	- It's 2005 already, I've been working on this for three years.	2005-06-04 09:24:15 +00:00
Jeff Roberson	21381d1b9e	- Don't SLOT_USE() in the preempt case, sched_add() has already taken the slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.	2005-06-04 09:23:28 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
Jeff Roberson	7a9507b60e	- A test in sched_switch() is no longer necessary and it is incorrect when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>	2005-02-23 00:50:26 +00:00
Jeff Roberson	42a29039de	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
John Baldwin	50aaa791ba	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
Jeff Roberson	2ebf8eb132	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
Jeff Roberson	598b368d6c	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
Jeff Roberson	81d47d3f4b	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00
Jeff Roberson	7842f65e7f	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.	2004-12-14 10:53:55 +00:00
Jeff Roberson	8ffb8f5558	- In kseq_choose(), don't recalculate slice values for processes with a nice of 0. Doing so can cause an infinite loop because they should be running, but a nice -20 process could prevent them from doing so. - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority elevated due to priority propagation. If a thread has had its priority elevated, we assume that it must go on the current queue and it must get a slice. - In sched_userret() if our priority was elevated and we shouldn't have a timeslice, yield here until we should. Found/Tested by: glebius	2004-12-14 10:34:27 +00:00
Jeff Roberson	2d59a44dc0	- Take up a 'slot' while we're on the assigned queue, waiting to be posted to another processor. Otherwise, kern_switch() gets confused and tries to sched_add(NULL).	2004-12-13 13:09:33 +00:00
Jeff Roberson	3ba5c2faab	- Temporarily disable the nice -20 throttling code. It has some interaction with APM that I do not understand yet. Reported & Tested by: glebius	2004-11-11 19:48:57 +00:00
Jeff Roberson	0516c8dd4a	- When choosing a thread on the run queue, check to see if its nice is outside of the nice threshold due to a recently awoken thread with a lower nice value. This further reduces the amount of time a positively niced thread gets while running in conjunction with a workload that has many short sleeps (ie buildworld).	2004-10-30 12:19:15 +00:00
Jeff Roberson	6bd0c7fd53	- In sched_prio() check to see if the kse is assigned to a runq as the check for TD_ON_RUNQ() no longer means the thread is really on a run- queue. I suspect this state should be re-evaluated as it must mean something else now. This fixes ULE+KSE+PREEMPTION on UP x86.	2004-10-30 07:35:53 +00:00
Julian Elischer	f8135176c9	Fix whitespace botch that only showed up in the commit message diff :-/ MFC after: 4 days	2004-10-05 22:14:02 +00:00
Julian Elischer	c20c691bed	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
Julian Elischer	c5c3fb335f	Oops. left out part of the diff. MFC after: 4 days	2004-10-05 21:26:27 +00:00
Julian Elischer	d39063f20d	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
Julian Elischer	14f0e2e9bf	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
Scott Long	1e7fad6b6a	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
Julian Elischer	513efa5b39	Try committing from the right tree this time MFC after: 2 days	2004-09-11 00:11:09 +00:00
Julian Elischer	5c854accc1	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Scott Long	9923b511ed	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
Julian Elischer	2630e4c90c	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
Peter Wemm	91c1172a5a	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
Jeff Roberson	f2b74cbf28	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
Jeff Roberson	2454aaf51c	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
Alexander Kabaev	00fbcda80d	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
Scott Long	e038d35422	Clean up whitespace, increase consistency and correctness. Submitted by: bde	2004-07-23 23:09:00 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
John Baldwin	52eb84641d	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
Marcel Moolenaar	2c3490b1a8	Update for the KDB framework: o Call kdb_backtrace() instead of backtrace().	2004-07-10 21:38:22 +00:00
John Baldwin	63fcce68f1	- Move contents of sched_add() into a sched_add_internal() function that takes an argument to specify if it should preempt or not. Don't preempt when sched_add_internal() is called from kseq_idled() or kseq_assign() as in those cases we are about to call mi_switch() anyways. Also, doing so during the first context switch on an AP leads to a NULL pointer deref because curthread is NULL. - Reenable preemption for ULE. Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net	2004-07-08 21:45:04 +00:00
Robert Watson	df623e3c2f	Temporarily disable preemption in SCHED_ULE due to reported panics and hangs due to recent preemption changes. This change appears to remove the panic that I was running into, but at the cost of increasing ithread scheduling latency, and as such is a temporary band-aid until jhb has a chance to resolve the ule<->preemption interaction that is the source of the problem. If it doesn't fix the problem for others-- sorry!	2004-07-06 05:57:29 +00:00
Poul-Henning Kamp	279f949ee5	Add NULL arg to mi_switch() call to stop kernel compiles from breaking.	2004-07-03 16:57:51 +00:00
Bosko Milekic	abdb4e5d01	Fix SCHED_ULE build on SMP. The previous revision (1.110) introduced a KSE_CAN_MIGRATE() invocation with one argument missing (class). Either this is a genuine forget or it crept in from JHB's repo where he may have modified it. If it's the latter then it may require more attention. For now fix the make depend.	2004-07-03 01:19:46 +00:00
John Baldwin	0c0b25ae91	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
John Baldwin	bf0acc273a	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
Scott Long	dc09579417	Add the sysctl node 'kern.sched.name' that has the name of the scheduler currently in use. Move the 4bsd kern.quantum node to kern.sched.quantum for consistency.	2004-06-21 22:05:46 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Jeff Roberson	dc03363dd8	- Run sched_balance() and sched_balance_groups() from hardclock via sched_clock() rather than using callouts. This means we no longer have to take the load of the callout thread into consideration while balancing and should make the balancing decisions simpler and more accurate. Tested on: x86/UP, amd64/SMP	2004-06-02 05:46:48 +00:00
David E. O'Brien	207a6c0dcb	There was a thread on "unusually high load averages" when running under sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the culprit(s). We had no provision for processes with P_NOLOAD set. With pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as another-normal-process, thus the "expected-plus-one" load reported in the above thread. Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>	2004-04-22 21:37:46 +00:00
Olivier Houchard	d50c87decf	Spell "switches" a more conventional way.	2004-04-09 14:31:29 +00:00
Jeff Roberson	37a35e4a60	- Use the proper constant in sched_interact_update(). Previously, SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed. This was causing the interactivity scaler to lose history at a more dramatic rate than intended.	2004-04-04 19:12:56 +00:00
Marcel Moolenaar	b2ae7ed72c	Change the type of the various CPU masks to cpumask_t. Note that as long as there are still explicit uses of int, whether in types or in function names (such as atomic_set_int() in sched_ule.c), we can not change cpumask_t to be anything other than u_int. See also the commit log for sys/sys/types.h, revision 1.84.	2004-03-27 18:21:24 +00:00
David E. O'Brien	b003da7938	Give a more reasonable CPU time to the threads which are using scheduler activation (i.e., applications are using libpthread). This is because SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily. Which doesn't give fair amount of CPU time to processes which are using scheduler-activation-based threads when other (semi-)CPU-intensive, non-P_SA processes are running. Further work will no doubt be done by jeffr at a later date. Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Reviewed by: rwatson, freebsd-current@	2004-03-21 18:53:29 +00:00
John Baldwin	44f3b09204	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
Jeff Roberson	0392e39dff	- Allow interactive tasks to use the maximum time-slice. This is not as detrimental as I thought it would be in the case of massive process storms from a shell and it makes regular desktop usage noticeably better.	2004-02-01 10:38:13 +00:00
Jeff Roberson	33916c360e	- Add a new member to struct kseq called ksq_sysload. This is intended to track the load for the sched_load() function. In the SMP case this member is not defined because it would be redundant with the ksg_load member which already tracks the non ithd load. - For sched_load() in the UP case simply return ksq_sysload. In the SMP case traverse the list of kseq groups and sum up their ksg_load fields.	2004-02-01 02:48:36 +00:00
Jeff Roberson	c77ac1fdee	- sched_strict has been dead for a long time now. Get rid of it.	2004-01-25 08:58:14 +00:00
Jeff Roberson	c494ddc8a1	- Clean up KASSERTS.	2004-01-25 08:57:38 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Jeff Roberson	249e0bea8f	- Make our transfer decisions based on load and not transferable load. A cpu could have been bogged down with non-transferable load and still not migrated a new thread to an idle cpu. This required some benchmarking and tuning to get right as the comment above it suggests.	2003-12-20 22:35:20 +00:00
Jeff Roberson	e7a976f415	- Enable ithread migration on x86. This is done to work around a bug in the IO APIC on Xeons that prevents round-robin interrupt assignment from working.	2003-12-20 20:36:19 +00:00
Jeff Roberson	670c524f08	- In kseq_transfer() return if smp has not been started. - In sched_add(), do the idle check prior to the transfer check so that we don't try to transfer load from an idle cpu. This fixes panics caused by IPIs on UP machines running SMP kernels. Reported/Debugged by: seanc	2003-12-20 14:03:14 +00:00
Jeff Roberson	9b5f6f623d	- Running interactive tasks with the minimum time-slice is fine for vi and sh, but not so great for mozilla, X, etc. Add a fixed define for the slice size granted to interactive KSEs.	2003-12-20 12:54:35 +00:00
Jeff Roberson	86e1c22aa4	- Assign the ke_cpu field in kseq_notify() so that all of our callers do not have to do it. - Set the ke_runq to NULL in sched_add() before calling kseq_notify(). Otherwise we may panic in sched_add() if INVARIANTS is on.	2003-12-14 02:06:29 +00:00
Jeff Roberson	cac77d0422	- Now that we have kseq groups, balance them seperately. - The new sched_balance_groups() function does intra-group balancing while sched_balance() balances the available groups. - Pick a random time between 0 ticks and hz * 2 ticks to restart each balancing process. Each balancer has its own timeout. - Pick a random place in the list of groups to start the search for lowest and highest group loads. This prevents us from prefering a group based on numeric position. - Use a nasty hack to stop us from preferring cpu 0. The problem is that softclock always runs on cpu 0, so it always has a little extra load. We ignore this load in the balancer for now. In the future softclock should run on a random cpu and these hacks can go away.	2003-12-12 07:33:51 +00:00
Jeff Roberson	2e227f0406	- Don't let the pctcpu rate limiter throttle us if we have recorded over SCHED_CPU_TICKS ticks. This was allowing processes to display (1/SCHED_CPU_TIME * 100) % more cpu than they had used.	2003-12-11 04:23:39 +00:00
Jeff Roberson	b11fdad0fc	- In sched_switch(), if a thread has been assigned, don't touch the runqueues or load. These things have already been taken care of in sched_bind() which should be the only place that we're switching in an assigned thread.	2003-12-11 04:00:49 +00:00
Jeff Roberson	80f86c9f88	- Add support for CPU groups to ule. All SMT cores on the same physical cpu are added to a group. - Don't place a cpu into the kseq_idle bitmask until all cpus in that group have idled. - Prefer idle groups over idle group members in the new kseq_transfer() function. In this way we will prefer to balance load across full cores rather than add further load a partial core. - Before a cpu goes idle, check the other group members for threads. Since SMT cpus may freely share threads, this is cheap. - SMT cores may be individually pinned and bound to now. This contrasts the old mechanism where binding or pinning would have allowed a thread to run on any available cpu. - Remove some unnecessary logic from sched_switch(). Priority propagation should be properly taken care of in sched_prio() now.	2003-12-11 03:57:10 +00:00
Peter Wemm	a2640c9ba9	rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64). Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad things happen(TM). This is why beast.freebsd.org paniced with ULE. Reviewed by: jeff	2003-12-07 09:57:51 +00:00
John Baldwin	b6c71225a9	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
Jeff Roberson	fa9c971710	- Mark ksq_assigned as volatile so that when this code is used without sched_lock we can be sure that we'll pick up the new value.	2003-11-17 08:27:11 +00:00
Jeff Roberson	093c05e39d	- Remove long dead code. rslices hasn't been used in some time and neither has sched_pickcpu().	2003-11-17 08:24:14 +00:00
Jeff Roberson	155b9987a3	- Introduce kseq_runq_{add,rem}() which are used to insert and remove kses from the run queues. Also, on SMP, we track the transferable count here. Threads are transferable only as long as they are on the run queue. - Previously, we adjusted our load balancing based on the transferable count minus the number of actual cpus. This was done to account for the threads which were likely to be running. All of this logic is simpler now that transferable accounts for only those threads which can actually be taken. Updated various places in sched_add() and kseq_balance() to account for this. - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're really doing. The load is accounted for seperately from the runq because the load is accounted for even as the thread is running. - Fix a bug in sched_class() where we weren't properly using the PRI_BASE() version of the kg_pri_class. - Add a large comment that describes the impact of a seemingly simple conditional in sched_add(). - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE() prior to checking kseq_idle. This reduces the frequency of access for kseq_idle which is a shared resource.	2003-11-15 07:32:07 +00:00
Jeff Roberson	f28b3340c1	- Somehow I botched my last commit. Add an extra ( to fix things up. I'm still not sure how this happened. Reported by: ps	2003-11-06 07:56:01 +00:00
Jeff Roberson	a70d729bff	- Remove the local definition of sched_pin and unpin. They are provided in sched.h now. - Respect the td pin count.	2003-11-06 03:09:51 +00:00
Jeff Roberson	46f8b26550	- It's ok if sched_runnable() has races in it, we don't need the sched_lock here unless we have something on the assigned queue.	2003-11-05 05:30:12 +00:00
Jeff Roberson	9bacd788a1	- Add initial support for pinning and binding.	2003-11-04 07:45:41 +00:00
Jeff Roberson	112b6d3aa9	- Remove kseq_find(), we no longer scan other cpu's run queues when we go idle. They figure out that we're idle fast enough that the cache pollution introduces by scanning their run queue is more expensive than waiting a little longer. - Add kseq_setidle() to mark us as being idle. Use this in place of kseq_find(). - Remove kseq_load_highest(), kseq_find() was the only consumer of this interface. kseq_balance() has it's own customized version that finds the lowest and highest loads simultaneously. Continuously told that this would be faster by: terry	2003-11-03 03:27:22 +00:00
Jeff Roberson	ef1134c9ad	- Remove the ksq_loads[] array. We are only interested in three counts, the total load, the timeshare load, and the number of threads that can be migrated to another cpu. Account for these seperately. - Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE can be migrated to another CPU. Currently, this only checks to see if we're an interrupt handler. Eventually this will also be used to support CPU binding.	2003-11-02 10:56:48 +00:00
Jeff Roberson	769a363537	- In sched_prio() only force us onto the current queue if our priority is being elevated (numerically smaller).	2003-11-02 04:25:59 +00:00
Jeff Roberson	7d1a81b4dc	- Rename SCHED_PRI_NTHRESH to SCHED_SLICE_NTHRESH since it is only used in slice assignment. Add a comment describing what it does. - Remove a stale XXX comment, the nice should not impact the interactivity, nice adjustments only effect non-interactive tasks in ULE. - Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice +20 tasks as intended.	2003-11-02 04:10:15 +00:00
Jeff Roberson	a0a931cec7	- Remove uses of PRIO_TOTAL and replace them with SCHED_PRI_NRESV - SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we do not have to account for it in the few places that we use it. Requested by: bde	2003-11-02 03:49:32 +00:00
Jeff Roberson	d322132c62	- Change sched_interact_update() to only accept slp+runtime values between 0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm quite a bit. Before, it dealt with arbitrary values which required us to do nasty integer division tricks that didn't quite work out correctly. - Chnage sched_wakeup() to detect conditions where the slp+runtime could exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for longer than 6 seconds. In this case, we'll just clear the runtime and set the sleep time to the max. - Define a new function, sched_interact_fork() which updates the slp+runtime of a newly forked thread. We want to limit the amount of history retained from the parent so that we learn the child's behavior quickly. We don't, however want to decay it to nothing. Previously, we would simply divide each parameter by 100 whenever we forked. After a few forks the values would reach 0 and tasks would not be considered interactive. - Add another KTR entry, cleanup some existing entries. - Remove a useless sched_interact_update() from sched_priority(). This is already done by the callers that require it.	2003-11-02 03:36:33 +00:00
Jeff Roberson	22bf7d9a0e	- Add static to local functions and data where it was missing. - Add an IPI based mechanism for migrating kses. This mechanism is broken down into several components. This is intended to reduce cache thrashing by eliminating most cases where one cpu touches another's run queues. - kseq_notify() appends a kse to a lockless singly linked list and conditionally sends an IPI to the target processor. Right now this is protected by sched_lock but at some point I'd like to get rid of the global lock. This is why I used something more complicated than a standard queue. - kseq_assign() processes our list of kses that have been assigned to us by other processors. This simply calls sched_add() for each item on the list after clearing the new KEF_ASSIGNED flag. This flag is used to indicate that we have been appeneded to the assigned queue but not added to the run queue yet. - In sched_add(), instead of adding a KSE to another processor's queue we use kse_notify() so that we don't touch their queue. Also in sched_add(), if KEF_ASSIGNED is already set return immediately. This can happen if a thread is removed and readded so that the priority is recorded properly. - In sched_rem() return immediately if KEF_ASSIGNED is set. All callers immediately readd simply to adjust priorites etc. - In sched_choose(), if we're running an IDLE task or the per cpu idle thread set our cpumask bit in 'kseq_idle' so that other processors may know that we are idle. Before this, make a single pass through the run queues of other processors so that we may find work more immediately if it is available. - In sched_runnable(), don't scan each processor's run queue, they will IPI us if they have work for us to do. - In sched_add(), if we're adding a thread that can be migrated and we have plenty of work to do, try to migrate the thread to an idle kseq. - Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into consideration. - No longer use kseq_choose() to steal threads, it can lose it's last argument. - Create a new function runq_steal() which operates like runq_choose() but skips threads based on some criteria. Currently it will not steal PRI_ITHD threads. In the future this will be used for CPU binding. - Create a kseq_steal() that checks each run queue with runq_steal(), use kseq_steal() in the places where we used kseq_choose() to steal with before.	2003-10-31 11:16:04 +00:00
Bruce Evans	89674a9f77	Removed sched_nest variable in sched_switch(). Context switches always begin with sched_lock held but not recursed, so this variable was always 0. Removed fixup of sched_lock.mtx_recurse after context switches in sched_switch(). Context switches always end with this variable in the same state that it began in, so there is no need to fix it up. Only sched_lock.mtx_lock really needs a fixup. Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion that sched_lock is owned and not recursed after it is fixed up. This assertion much match the one in mi_switch(), and if sched_lock were recursed then a non-null fixup of sched_lock.mtx_recurse would probably be needed again, unlike in sched_switch(), since fork_exit() doesn't return to its caller in the normal way.	2003-10-29 14:40:41 +00:00
Jeff Roberson	1aca9909e5	- Only change the run queue in sched_prio() if the kse is non null. threads can be in the TD_ON_RUNQ state and not have an associated kse. - Remove the PRI_IDLE special case from sched_clock(), it was not actually necessary.	2003-10-28 03:28:48 +00:00
Jeff Roberson	3f741ca117	- Use a better algorithm in sched_pctcpu_update() Contributed by: Thomaswuerfl@gmx.de - In sched_prio(), adjust the run queue for threads which may need to move to the current queue due to priority propagation . - In sched_switch(), fix style bug introduced when the KSE support went in. Columns are 80 chars wide, not 90. - In sched_switch(), Fix the comparison in the idle case and explicitly re-initialize the runq in the not propagated case. - Remove dead code in sched_clock(). - In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads that have become runnable will get a chance to. - In sched_runnable(), if we're not the IDLETD, we should not consider curthread when examining the load. This mimics the 4BSD behavior of returning 0 when the only runnable thread is running. - In sched_userret(), remove the code for setting NEEDRESCHED entirely. This is not necessary and is not implemented in 4BSD. - Use the correct comparison in sched_add() when checking to see if an idle prio task has had it's priority temporarily elevated.	2003-10-27 06:47:05 +00:00
Jeff Roberson	484288de56	- If a thread is not bound to a kse return 0 from sched_pctcpu(). Reported by: pawel.worach@nordea.com	2003-10-20 19:55:21 +00:00
Jeff Roberson	0e0f626628	- Only kse_reassign() in the !running case. Reported by: kris	2003-10-16 20:32:57 +00:00
Jeff Roberson	0c7da3a43d	- Call sched_add() with the correct argument on SMP. Reported by: Valentin Chopov <valentin@valcho.net>	2003-10-16 20:06:19 +00:00
Jeff Roberson	b72f347bdb	- Fix a minor problem with my last commit, we don't want to return from sched_switch if the thread is running, we want to fall through and pick a new thread because we have been preempted.	2003-10-16 10:04:54 +00:00
Jeff Roberson	ae53b483cc	- Collapse sched_switchin() and sched_switchout() into sched_switch(). Now mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.	2003-10-16 08:53:46 +00:00
Jeff Roberson	7cf90fb376	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.	2003-10-16 08:39:15 +00:00
Jeff Roberson	4c9612c622	- The non iterative algorithm for interact_update was broken due to rounding errors. This was the source of the majority of the interactivity problems. Reintroduce the old algorithm and its XXX. - Up the interactivity threshold to 30. It really could stand to be even a tiny bit higher. - Let the sleep and run time accumulate up to 5 seconds of history rather than two. This helps stop XFree86 from becoming non-interactive during bursts of activity.	2003-10-16 08:17:43 +00:00
Jeff Roberson	08fd6713b2	- If our user_pri doesn't match our actual priority our priority has been elevated either due to priority propagation or because we're in the kernel in either case, put us on the current queue so that we dont stop others from using important resources. At some point the priority elevations from sleeping in the kernel should go away. - Remove an optimization in sched_userret(). Before we would only set NEEDRESCHED if there was something of a higher priority available. This is a trivial optimization and it breaks priority propagation because it doesn't take threads which we may be blocking into account. Notice that the thread which is blocking others gets up to one tick of cpu time before we honor this NEEDRESCHED in sched_clock().	2003-10-15 07:47:06 +00:00
Jeff Roberson	736c97c7b3	- In SCHED_CURR() add holding Giant to the list of criteria that will keep you on the current queue. In the future, it would be nice if priority propagation could deterministicly pluck a thread off of the next queue and put it on the current queue. Until then this hack stops us from holding up our entire current queue, including interrupt handlers, while a thread on the next queue is blocked while holding Giant. - Inherit our pctcpu information from our parent.	2003-10-12 21:07:31 +00:00
Jeff Roberson	8ec82641d8	- Change a lame iterative algorithm to a constant time algorithm. Remove the XXX that complains about it as well. Submitted by: ThomasWuerfl@gmx.de	2003-10-04 17:41:13 +00:00
Jeff Roberson	81de51bf1d	- Somewhere along the line I stupidly removed critical logic from sched_ptcpu_update(). This caused erroneous cpu times in TOP for processes that were asleep. Replace the code that was removed.	2003-09-20 02:05:58 +00:00
David Xu	ab2baa7254	Let SA process work under ULE scheduler, originally it would panic kernel. Reviewed by: jeff	2003-08-26 11:33:15 +00:00
Sam Leffler	c06eb4e293	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
Jeff Roberson	0c0a98b231	- When stealing a kse in kseq_move() ignore the current kseq's min nice value. We want to steal any thread, even one that is not given a slice on its current queue.	2003-07-08 06:19:40 +00:00
Jeff Roberson	0ec896fd28	- Clean up an unused variable. Submitted by: Steve Kargl <skg@routmask.apl.washington.edu>	2003-07-07 21:08:28 +00:00
Jeff Roberson	749d01b011	- Parse the cpu topology map in sched_setup(). - Associate logical CPUs on the same physical core with the same kseq. - Adjust code that assumed there would only be one running thread in any kseq. - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef. This is a start towards HyperThreading support but it isn't quite there yet.	2003-07-04 19:59:00 +00:00
Jeff Roberson	7a20304f84	- Don't migrate to stopped cpus.	2003-06-28 09:09:33 +00:00
Jeff Roberson	86f8ae9663	- If smp is not started yet don't try to load balance or we'll put threads on cpus that aren't running yet.	2003-06-28 08:24:42 +00:00
Jeff Roberson	a91172ade1	- Throttle the inherited sleep and run time in sched_fork_kseg(). This allows us to learn the behavior of a thread much more quickly after it starts up.	2003-06-28 06:19:56 +00:00
Jeff Roberson	e493a5d90c	- Adjust the default maximum slice value to ~140ms. This has improved the nice distribution without significantly impacting interactive response. As a side effect it should also allow batch processes to run for a slightly longer period which will positively impact their performance.	2003-06-28 06:04:47 +00:00
Jeff Roberson	1a7a9d0ec2	- lticks was erroneously being updated in sched_pctcpu(). This was causing us to skip the pctcpu_update() call which lead to inaccurate cpu usage statistics for processes that didn't run often.	2003-06-21 02:31:49 +00:00
Jeff Roberson	665cb285a8	- Don't allow nice to have such a large effect on priority. This was causing poor interactive performance while unnice processes were running. The new scheme still allows nice to have an effect on priority but it is not as dramatic as the effect of the interactivity score.	2003-06-21 02:22:47 +00:00
Jeff Roberson	d07ac847ef	- Use a more robust mechanism for determining whether or not a kse is on a kseq.	2003-06-17 19:49:18 +00:00
Jeff Roberson	7cd0f83355	- Temporarily patch a problem where the interact score could be negative because the run time exceeds the largest value a signed int can hold. The real solution involves calculating how far we are over the limit. To quickly solve this problem we loop removing 1/5th of the current value until it falls below the limit. The common case requires no passes.	2003-06-17 10:21:34 +00:00
Jeff Roberson	4b60e3242e	- Add a new function "sched_interact_update()" that scales back the sleep and run time. - Scale the sleep and run time back via sched_interact_update() in more places. This is to keep the statistic more accurate. - Charge a parent one tick for forking a child. - Add only the run time and not the sleep time to the parents kg when a thread exits. This allows us to give a penalty for having an expensive thread exit but does not give a bonus for having an interactive thread exit. - Change the SLP_RUN_THROTTLE to limit us to 4/5th and not 1/2. - Change the SLP_RUN_MAX to two seconds. This keeps bursty interactive applications like mozilla and openoffice in the interactive range even through expensive tasks. - Recalculate the slice after every sleep. This ensures that once a task has been marked interactive it only has a slice of 1 at the risk of giving tasks that sleep for a very brief period a longer time slice.	2003-06-17 06:39:51 +00:00
Jeff Roberson	3c12473229	- Increase the ksegrp's cpu time history buffer to 250ms. - Decrease the history buffer divisor to 2 so that we remember more of the old behavior.	2003-06-15 04:14:25 +00:00
Jeff Roberson	b41f3d22cc	- Cap the growth of sleep and run time in sched_exit_kse().	2003-06-15 02:52:29 +00:00
Jeff Roberson	210491d3d9	- Fix the maximum slice value. I accidentally checked in a value of '2' which meant no process would run for longer than 20ms. - Slightly redo the interactivity scorer. It follows the same algorithm but in a slightly more correct way. Previously values above half were incorrect. - Lower the interactivity threshold to 20. It seems that in testing non- interactive tasks are hardly ever near there and expensive interactive tasks can sometimes surpass it. This area needs more testing. - Remove an unnecessary KTR. - Fix a case where an idle thread that had an elevated priority due to priority prop. would be placed back on the idle queue. - Delay setting NEEDRESCHED until userret() for threads that haad their priority elevated while in kernel. This gives us the same context switch optimization as SCHED_4BSD. - Limit the child's slice to 1 in sched_fork_kse() so we detect its behavior more quickly. - Inhert some of the run/slp time from the child in sched_exit_ksegrp(). - Redo some of the priority comparisons so they are more clear. - Throttle the frequency of sched_pctcpu_update() so that rounding errors do not make it invalid.	2003-06-15 02:18:29 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Jeff Roberson	356500a306	- Add a simple CPU load balancing algorithm. This works by executing once a second and equalizing the load between the two most imbalanced CPU. This is intended to clear up long term load imbalances that would not be handled by the 'pull' method in sched_choose(). - Pull out some bits of sched_choose() into a kseq_move() function that moves an arbitrary thread from one kseq to another.	2003-06-09 00:39:09 +00:00
Jeff Roberson	b90816f188	- When a new thread is added to a kseq the load is incremented prior to adding it to the nice tables. Therefore, in kseq_add_nice, we should keep in mind that the load will be 1 if we are the only thread, and not 0. - Assert that the sched lock is held in all the appropriate places. - Increase the scope of the sched lock in sched_pctcpu_update(). - Hold the sched lock in sched_runnable(). It is not held by the caller.	2003-06-08 00:47:33 +00:00
Julian Elischer	43fdafb1e1	Fix typo in last commit	2003-05-02 06:18:55 +00:00
Julian Elischer	b1ac98d8b2	Move the flag that indicates an idle thread from the KSE to the thread. It was always referenced via the thread anyhow. Reviewed by: jhb (a LOOOOONG time ago)	2003-05-02 00:33:12 +00:00
John Baldwin	2056d0a168	Add lock assertions for various proc/thread/kse/ksegroup fields to the scheduler functions.	2003-04-23 18:51:05 +00:00
John Baldwin	0b5318c81a	- Assert that the proc lock and sched_lock are held in sched_nice(). - For the 4BSD scheduler, this means that all callers of the static function resetpriority() now always hold sched_lock, so don't lock sched_lock explicitly in that function.	2003-04-22 20:50:38 +00:00
John Baldwin	828e7683bf	Protect p_swtime with the sched_lock.	2003-04-22 19:48:25 +00:00
Jeff Roberson	7cd650a972	- Set the ke_cpu field in sched_add() for interrupt and realtime threads since they are going on the current cpu and not their previously assigned cpu. - sched_runnable() should only return true in the SMP case if the other processor has more than one thread that is runnable. We can not steal curthread. - Change kseq_print() to accept the cpuid instead of a kseq pointer. This makes use of this function in ddb much easier.	2003-04-18 05:24:10 +00:00
Jeff Roberson	a5f099d0c4	- Unbreak priority prop. for timeshare threads. Always place something on the current queue if its priority is really elevated. This needs more work as there are cases where a next queue kse could be holding up what would be a curr queue kse, and thus hurting interactivity. Also, when a thread with an elevated priority has its priority lowered it should be placed back on the next queue.	2003-04-12 22:33:24 +00:00
Jeff Roberson	9bca28a703	- Clean up some debug code left over from my earlier megacommit.	2003-04-12 07:28:36 +00:00
Jeff Roberson	b5c4c4a7e5	- We only care about the base priority. Ignore the SCHED_FIFO_BIT so that we dont get confused. Reported and debugged by: Steve Kargl <sgk@troutmask.apl.washington.edu>	2003-04-12 07:00:16 +00:00
Jeff Roberson	141ad61c78	- Add sched_exit_* - Call sched_exit_kse() from sched_exit() instead of implementing it here.	2003-04-11 19:24:00 +00:00
Jeff Roberson	58177de2de	- Only select kseqs with more than one kse to steal. The running kse is reflected in the load now and you can't very well migrate that.	2003-04-11 18:40:34 +00:00
Jeff Roberson	c36ccfa22b	- When migrating a kse from one kseq to the next actually insert it onto the second kseq's run queue so that it is referenced by the kse when it is switched out. - Spell ksq_rslices properly. Reported by: Ian Freislich <ianf@za.uu.net>	2003-04-11 18:37:34 +00:00
Jeff Roberson	15dc847e52	- Add a SYSCTL node for the ule scheduler. - Allow user adjustable min and max time slices (suggested by hiten). - Change the SLP_RUN_MAX to 100ms from 2 seconds so that we learn whether a process is interactive or not much more quickly. - Place a process on the current run queue if it is interactive or if it is running at an interrupt thread priority due to priority prop. - Use the 'current' timeshare queue for interrupt threads, realtime threads, and idle threads that are running at higher priority due to priority prop. This fixes problems where priorities would have been elevated but we would not check the timeshare run queue until other lower priority tasks were no longer runnable. - Keep an array of loads indexed by the priority class as well as a global load. - Keep an bucket of nice values with a count of the number of kses currently runnable with that nice value. - Keep track of the minimum nice value of any running thread. - Remove the unused short term sleep accounting. I was attempting to use this for load balancing but it didn't work out. - Define a kseq_print() for use with debugging. - Add KTR debugging at useful places so we can easily debug slice and priority assignment. - Decouple the runq assignment from the kseq assignment. kseq_add now keeps track of statistics. This is done so that the nice and load is still tracked for the currently running process. Previously if a niced process was added while a non nice process was running the niced process would still get a slice since it was not aware of the unnice process. - Make adjustments for the sched api changes.	2003-04-11 03:47:14 +00:00
Julian Elischer	060563ec50	Move the _oncpu entry from the KSE to the thread. The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.	2003-04-10 17:35:44 +00:00
Jeff Roberson	a8949de20e	- Keep seperate statistics and run queues for different scheduling classes. - Treat each class specially in kseq_{choose,add,rem}. Let the rest of the code be less aware of scheduling classes. - Skip the interactivity calculation for non TIMESHARE ksegrps. - Move slice and runq selection into kseq_add(). Uninline it now that it's big.	2003-04-03 00:29:28 +00:00
Jeff Roberson	5053d272c2	- Make the interactivity calculator decay faster. - Make the pcpu estimator update faster.	2003-04-02 08:22:33 +00:00
Jeff Roberson	98c9b132d1	- I meant divide by two and not shift by two in SCHED_PRI_NHALF.	2003-04-02 08:21:24 +00:00
Jeff Roberson	245f3abfd5	- Add in support for KSEs with 0 slice values on the run queue. If we try to select a KSE with a slice of 0 we will update its slice and insert it onto the next queue. - Pass the KSE instead of the ksegrp into sched_slice(). This more accurately reflects the behavior of the code. Slices are granted to kses. - Add a function kseq_nice_min() which finds the smallest nice value assigned to the kseg of any KSE on the queue. - Rewrite the logic in sched_slice(). Add a large comment describing the new slice selection scheme. To summarize, slices are assigned based on the nice value. Priorities are still calculated based on the nice and interactivity of a process. Slice sizes of 0 may be granted for KSEs whos nice is 20 or futher away from the lowest nice on the run queue. Other nice values are scaled across the range [min, min+20]. This fixes ULEs bad behavior with positively niced processes.	2003-04-02 06:46:43 +00:00
Jeff Roberson	e1f89c222b	- Create a function sched_interact_score() which decides on the interactivity of a kseg and assigns it a value of 0 through 100. - Use sched_interact_score() to determine the dynamic priority. - Define SCHED_CURR() in terms of sched_interact_score(). - Adjust the maximum slice back down to 100ms. - Remove redundant clearing of ke_runq in sched_wakeup() - Clean up #defines and comment them.	2003-03-04 02:45:59 +00:00
Jeff Roberson	65c8760dbf	- Shift the tick count by 10 and back around sched_pctcpu_update() calculations. Keep this changes local to the function so the tick count is in its natural form otherwise. Previously 1000 was added each time a tick fired and we divided by 1000 when it was reported. This is done to reduce rounding errors.	2003-03-03 05:29:09 +00:00
Jeff Roberson	a6ed41865b	- In sched_add() special case PRI_TIMESHARE and PRI_ITHD\|PRI_REALTIME. We always place ITHD & REALTIME threads on the current queue of the current cpu. Prior to this change an interrupt thread would only ever run on one cpu.	2003-03-03 04:28:07 +00:00
Jeff Roberson	f1e8dc4a3b	- Refrain from setting the td_priority in sched_wakeup(). It will be reset before we return to user space.	2003-03-03 04:11:40 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Julian Elischer	4a338afd7a	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@	2003-02-17 09:55:10 +00:00
Jeff Roberson	783caefbbf	- Enable STRICT_RESCHED until code that dynamically decides on resched strictness based on the current workload is finished.	2003-02-10 14:11:23 +00:00
Jeff Roberson	407b015791	- Add a new variable 'kg_runtime' that tracks the amount of time we've run. - Use the ratio of kg_runtime / kg_slptime to determine our dynamic priority. - Scale kg_runtime and kg_slptime back when the sum of the two exceeds SCHED_SLP_RUN_MAX. This allows us to slowly forget old behavior. - Scale back the runtime and slptime in fork so that the new process has the same ratio but much less accumulated time. This causes new behavior to be noticed more quickly.	2003-02-10 14:03:45 +00:00
Jeff Roberson	5d7ef00cfe	- Make some context switches conditional on SCHED_STRICT_RESCHED. This may have some negative effect on interactivity but it yields great perf. gains. This also brings the conditions under which ULE context switches inline with SCHED_4BSD. - Define some new kseq_* functions for manipulating the run queue. - Add a new kseq member ksq_rslices and ksq_bload. rslices is the sum of the slices of runnable kses. This will be used for push load balance decisions. bload is the number of threads blocked waiting on IO.	2003-02-03 05:30:07 +00:00
Jeff Roberson	cd6e33df1c	- Stop abusing oncpu for our cpu binding. Define a scheduler local element in the kse datastructure called ke_cpu. This is the cpu which we are currently bound to. Some flags may be added later to support hard binding.	2003-02-03 02:26:28 +00:00
Scott Long	7121cce58a	Use hz if stathz is zero. Adopted from sched_4bsd.	2003-02-02 08:24:32 +00:00
Jeff Roberson	0a016a05a4	- Use ksq_load as the authoritive count of kses on the pair of kseqs for sched_runnable() et all. - Remove some dead code in sched_clock(). - Define two macros KSEQ_SELF() and KSEQ_CPU() for getting the kseq of the current cpu or some alternate cpu. - Start introducing kseq_() functions, such as kseq_choose() and kseq_setup().	2003-01-29 07:00:51 +00:00
Jeff Roberson	bf857e69a2	- Remove debugging code that didn't work on UP.	2003-01-29 00:26:47 +00:00
Jeff Roberson	d465fb9589	- Allow idle's pctcpu time to be calculated.	2003-01-28 09:30:17 +00:00
Jeff Roberson	c9f25d8f92	- Fix the ksq_load calculation. It now reflects the number of entries on the run queue for each cpu. - Introduce kse stealing into the sched_choose() code. This helps balance cpus better in cases where process turnover is high. This implementation is fairly trivial and will likely be only a temporary measure until something more sophisticated has been written.	2003-01-28 09:28:20 +00:00
Jeff Roberson	35e6168fcd	- Add the ule scheduler. This is intended to be a general purpose process scheduler with many SMP benefits. It is still very experimental and should be used only in test environments.	2003-01-26 05:23:15 +00:00

... 3 4 5 6 7 ...

376 Commits