freebsd-dev

Author	SHA1	Message	Date
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
David Xu	36ec198bd5	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
David Xu	b41f1452d9	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00
David Xu	0ae716e5ee	Make ke_rqindex unsigned.	2006-06-06 12:26:17 +00:00
David Xu	9f8eb3cb52	Use variable i instead of variable cpus as an index to get correct kseq.	2005-12-27 12:02:03 +00:00
David Xu	a1d4fe69d2	Fix a bug in slice calculation code, current code uses hz but sched_clock() is called by state clock. Submitted by: taku at tackymt dot homeip dot net	2005-12-19 08:26:09 +00:00
David Xu	a861574011	Temporarily disable nice threshold detection code, as it can starve a thread holding critical resource, e.g mutex or other implicit synchronous flags. Give thread which exceeds nice threshold a minimum time slice. PR: kern/86087	2005-09-22 01:19:37 +00:00
David Xu	f8ec133ed0	Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly for PRI_ITHD and PRI_REALTIME threads.	2005-08-19 11:51:41 +00:00
David Xu	1278181c6c	Try best to keep a preempted thread at front of run queue, this seems improved performance a bit for some workloads, but still seeing interactive lagging unless cpu idling race is fixed.	2005-08-08 14:20:10 +00:00
David Xu	3d16f519b6	If a thread was removed from system run queue, kse_assign shouldn't add it again.	2005-07-31 15:11:21 +00:00
Xin LI	05a6b7ad62	Cast to uintptr_t when the compiler complains. This unbreaks ULE scheduler breakage accompanied by the recent atomic_ptr() change.	2005-07-25 10:21:49 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Jeff Roberson	6680bbd529	- Fix the case where we're not preempting but there is already a newtd as this happens via thread_switchout(). I don't particularly like the structure of the code here. We twice call out to thread code when a thread is voluntarily switching. Once to thread_switchout() and once to slot_fill(), while sched_4BSD does even more work which is redundant to select another thread to use our remaining slice. This should be simplified in the future, but for now I'm only going to fix the bug not the bad design.	2005-06-07 02:59:16 +00:00
Jeff Roberson	9fe02f7e16	- It's 2005 already, I've been working on this for three years.	2005-06-04 09:24:15 +00:00
Jeff Roberson	21381d1b9e	- Don't SLOT_USE() in the preempt case, sched_add() has already taken the slot for us. Previously, we would take two slots on every preempt, and setrunqueue() would fix it up for us in the non threaded case. The threaded case was simply broken. - Clean up flags, prototypes, comments.	2005-06-04 09:23:28 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
Jeff Roberson	7a9507b60e	- A test in sched_switch() is no longer necessary and it is incorrect when td0 is preempted before it voluntarily switches. Discovered by: Arjan Van Leeuwen <avleeuwen@gmail.com>	2005-02-23 00:50:26 +00:00
Jeff Roberson	42a29039de	- Add ke_runq == NULL to the conditions which will cause us to abort adjusting timeshare loads in sched_class(). This is only important if the thread has never run, otherwise the state checks should work as expected.	2005-02-04 17:22:46 +00:00
John Baldwin	50aaa791ba	Fix a typo and two whitespace nits.	2004-12-30 22:17:00 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
Jeff Roberson	2ebf8eb132	- Unintentionally checked in a debugging panic. Remove that.	2004-12-26 23:21:48 +00:00
Jeff Roberson	598b368d6c	- Fix a long standing problem where an ithread would not honor sched_pin(). - Remove the sched_add wrapper that used sched_add_internal() as a backend. Its only purpose was to interpret one flag and turn it into an int. Do the right thing and interpret the flag in sched_add() instead. - Pass the flag argument to sched_add() to kseq_runq_add() so that we can get the SRQ_PREEMPT optimization too. - Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT counts, otherwise the slot counts are adjusted as soon as we enter sched_add() or sched_rem() rather than when the thread is actually placed on the run queue. This greatly simplifies the handling of slots. - Remove the explicit prevention of migration for ithreads on non-x86 platforms. This was never shown to have any real benefit. - Remove the unused class argument to KSE_CAN_MIGRATE(). - Add ktr points for thread migration events. - Fix a long standing bug on platforms which don't initialize the cpu topology. The ksg_maxid variable was never correctly set on these platforms which caused the long term load balancer to never inspect more than the first group or processor. - Fix another bug which prevented the long term load balancer from working properly. If stathz != hz we can't expect sched_clock() to be called on the exact tick count that we're anticipating. - Rearrange sched_switch() a bit to reduce indentation levels.	2004-12-26 22:56:08 +00:00
Jeff Roberson	81d47d3f4b	- Remove earlier KTR_ULE tracepoints. - Define new KTR_SCHED points so that we can graph the operation of the scheduler.	2004-12-26 00:15:33 +00:00
Jeff Roberson	7842f65e7f	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.	2004-12-14 10:53:55 +00:00
Jeff Roberson	8ffb8f5558	- In kseq_choose(), don't recalculate slice values for processes with a nice of 0. Doing so can cause an infinite loop because they should be running, but a nice -20 process could prevent them from doing so. - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority elevated due to priority propagation. If a thread has had its priority elevated, we assume that it must go on the current queue and it must get a slice. - In sched_userret() if our priority was elevated and we shouldn't have a timeslice, yield here until we should. Found/Tested by: glebius	2004-12-14 10:34:27 +00:00
Jeff Roberson	2d59a44dc0	- Take up a 'slot' while we're on the assigned queue, waiting to be posted to another processor. Otherwise, kern_switch() gets confused and tries to sched_add(NULL).	2004-12-13 13:09:33 +00:00
Jeff Roberson	3ba5c2faab	- Temporarily disable the nice -20 throttling code. It has some interaction with APM that I do not understand yet. Reported & Tested by: glebius	2004-11-11 19:48:57 +00:00
Jeff Roberson	0516c8dd4a	- When choosing a thread on the run queue, check to see if its nice is outside of the nice threshold due to a recently awoken thread with a lower nice value. This further reduces the amount of time a positively niced thread gets while running in conjunction with a workload that has many short sleeps (ie buildworld).	2004-10-30 12:19:15 +00:00
Jeff Roberson	6bd0c7fd53	- In sched_prio() check to see if the kse is assigned to a runq as the check for TD_ON_RUNQ() no longer means the thread is really on a run- queue. I suspect this state should be re-evaluated as it must mean something else now. This fixes ULE+KSE+PREEMPTION on UP x86.	2004-10-30 07:35:53 +00:00
Julian Elischer	f8135176c9	Fix whitespace botch that only showed up in the commit message diff :-/ MFC after: 4 days	2004-10-05 22:14:02 +00:00
Julian Elischer	c20c691bed	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
Julian Elischer	c5c3fb335f	Oops. left out part of the diff. MFC after: 4 days	2004-10-05 21:26:27 +00:00
Julian Elischer	d39063f20d	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
Julian Elischer	14f0e2e9bf	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
Scott Long	1e7fad6b6a	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
Julian Elischer	513efa5b39	Try committing from the right tree this time MFC after: 2 days	2004-09-11 00:11:09 +00:00
Julian Elischer	5c854accc1	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Scott Long	9923b511ed	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
Julian Elischer	2630e4c90c	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
Peter Wemm	91c1172a5a	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
Jeff Roberson	f2b74cbf28	- Introduce a new flag KEF_HOLD that prevents sched_add() from doing a migration. Use this in sched_prio() and sched_switch() to stop us from migrating threads that are in short term sleeps or are runnable. These extra migrations were added in the patches to support KSE. - Only set NEEDRESCHED if the thread we're adding in sched_add() is a lower priority and is being placed on the current queue. - Fix some minor whitespace problems.	2004-08-12 07:56:33 +00:00
Jeff Roberson	2454aaf51c	- Use a new flag, KEF_XFERABLE, to record with certainty that this kse had contributed to the transferable load count. This prevents any potential problems with sched_pin() being used around calls to setrunqueue(). - Change the sched_add() load balancing algorithm to try to migrate on wakeup. This attempts to place threads that communicate with each other on the same CPU. - Don't clear the idle counts in kseq_transfer(), let the cpus do that when they call sched_add() from kseq_assign(). - Correct a few out of date comments. - Make sure the ke_cpu field is correct when we preempt. - Call kseq_assign() from sched_clock() to catch any assignments that were done without IPI. Presently all assignments are done with an IPI, but I'm trying a patch that limits that. - Don't migrate a thread if it is still runnable in sched_add(). Previously, this could only happen for KSE threads, but due to changes to sched_switch() all threads went through this path. - Remove some code that was added with preemption but is not necessary.	2004-08-10 07:52:21 +00:00
Alexander Kabaev	00fbcda80d	Avoid casts as lvalues.	2004-07-28 06:42:41 +00:00
Scott Long	e038d35422	Clean up whitespace, increase consistency and correctness. Submitted by: bde	2004-07-23 23:09:00 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
John Baldwin	52eb84641d	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).	2004-07-16 21:04:55 +00:00
Marcel Moolenaar	2c3490b1a8	Update for the KDB framework: o Call kdb_backtrace() instead of backtrace().	2004-07-10 21:38:22 +00:00

1 2 3 4

164 Commits