freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	0d2cf8374a	- Use __XSTRING where I want the define to be expanded. This resulted in sizeof("MAXCPU") being used to calculate a string length rather than something more reasonable such as sizeof("32"). This shouldn't have caused any ill effect until we run on machines with 1000000 or more cpus.	2009-01-25 07:35:10 +00:00
Jeff Roberson	8f51ad55e7	- Implement generic macros for producing KTR records that are compatible with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia	2009-01-17 07:17:57 +00:00
John Baldwin	c3ea337801	When choosing a CPU for a thread in a cpuset, prefer the last CPU that the thread ran on if there are no other CPUs in the set with a shorter per-CPU runqueue.	2008-07-28 20:39:21 +00:00
John Baldwin	f200843b72	Implement support for cpusets in the 4BSD scheduler. - When a cpuset is applied to a thread, walk the cpuset to see if it is a "full" cpuset (includes all available CPUs). If not, set a new TDS_AFFINITY flag to indicate that this thread can't run on all CPUs. When inheriting a cpuset from another thread during thread creation, the new thread also inherits this flag. It is in a new ts_flags field in td_sched rather than using one of the TDF_SCHEDx flags because fork() clears td_flags after invoking sched_fork(). - When placing a thread on a runqueue via sched_add(), if the thread is not pinned or bound but has the TDS_AFFINITY flag set, then invoke a new routine (sched_pickcpu()) to pick a CPU for the thread to run on next. sched_pickcpu() walks the cpuset and picks the CPU with the shortest per-CPU runqueue length. Note that the reason for the TDS_AFFINITY flag is to avoid having to walk the cpuset and examine runq lengths in the common case. - To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array of counters to hold the length of the per-CPU runqueues and update them when adding and removing threads to per-CPU runqueues. MFC after: 2 weeks	2008-07-28 17:25:24 +00:00
John Baldwin	8aa3d7ffc0	Various and sundry style and whitespace fixes.	2008-07-28 15:52:02 +00:00
John Birrell	6f5f25e521	Add the vtime (virtual time) hooks for DTrace.	2008-05-25 01:44:58 +00:00
Jeff Roberson	6c47aaae12	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
Jeff Roberson	8df78c41d6	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
Jeff Roberson	9727e63745	- Restore runq to manipulating threads directly by putting runq links and rqindex back in struct thread. - Compile kern_switch.c independently again and stop #include'ing it from schedulers. - Remove the ts_thread backpointers and convert most code to go from struct thread to struct td_sched. - Cleanup the ts_flags #define garbage that was causing us to sometimes do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD. - Export the kern.sched sysctl node in sysctl.h	2008-03-20 05:51:16 +00:00
Jeff Roberson	8b16c208e6	- ULE and 4BSD share only one line of code from sched_newthread() so implement the required pieces in sched_fork_thread(). The td_sched pointer is already setup by thread_init anyway.	2008-03-20 03:06:33 +00:00
Jeff Roberson	a90f3f2547	- Move maybe_preempt() from kern_switch.c to sched_4bsd.c. This is function is only used by 4bsd. - Create a new runq_choose_fuzz() function rather than polluting runq_choose() with 4BSD specific code. - Move the fuzz sysctl into sched_4bsd.c - Remove some dead code from kern_switch.c	2008-03-20 02:14:02 +00:00
Jeff Roberson	a564bfc7fa	- Directly include opt_sched.h in sched_4bsd.	2008-03-20 01:32:48 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Jeff Roberson	c5aa6b581d	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
Jeff Roberson	1e24c28f46	- Add a sched_preempt() routine to be called by md code after IPI_PREEMPT is delivered. - Add a simple implementation to 4bsd.	2008-03-10 01:30:35 +00:00
Marcel Moolenaar	f5a3ef99c2	Unbreak after cpuset: initialize td_cpuset in sched_fork_thread().	2008-03-02 21:34:57 +00:00
Jeff Roberson	885d51a38a	- Add a new sched_affinity() api to be used in the upcoming cpuset implementation. - Add empty implementations of sched_affinity() to 4BSD and ULE. Sponsored by: Nokia	2008-03-02 07:19:35 +00:00
Jeff Roberson	eea4f254fe	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia	2007-12-15 23:13:31 +00:00
David Xu	435806d31b	Fix LOR of thread lock and umtx's priority propagation mutex due to the reworking of scheduler lock. MFC: after 3 days	2007-12-11 08:25:36 +00:00
Julian Elischer	431f890614	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
Robert Watson	088f584961	Remove unused variable td from sched_idletd(). MFC after: 3 days Found with: Coverity Prevent(tm) CID: 3561	2007-11-05 12:01:12 +00:00
John Baldwin	9dddab6fc1	Change the roundrobin implementation in the 4BSD scheduler to trigger a userland preemption directly from hardclock() via sched_clock() when a thread uses up a full quantum instead of using a periodic timeout to cause a userland preemption every so often. This fixes a potential deadlock when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held by a thread pinned or bound to another CPU. The current thread on that CPU will never be preempted while softclock is blocked. Note that ULE already drives its round-robin userland preemption from sched_clock() as well and always enables IPI_PREEMPT. MFC after: 1 week	2007-10-27 22:07:40 +00:00
Julian Elischer	7ab24ea3b9	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
Jeff Roberson	05dc0eb204	- Restore historical sched_yield() behavior by changing sched_relinquish() to simply switch rather than lowering priority and switching. This allows threads of equal priority to run but not lesser priority. Discussed with: davidxu Reported by: NIIMI Satoshi <sa2c@sa2c.net> Approved by: re	2007-10-08 23:45:24 +00:00
Jeff Roberson	54b0e65f84	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 04:10:23 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Jeff Roberson	6ea38de8aa	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re	2007-07-18 20:46:06 +00:00
Jeff Roberson	fe54587ffa	- Move some common code out of sched_fork_exit() and back into fork_exit().	2007-06-12 07:47:09 +00:00
Jeff Roberson	710eacdc5f	- Placing the 'volatile' on the right side of the * in the td_lock declaration removes the need for __DEVOLATILE(). Pointed out by: tegge	2007-06-06 03:40:47 +00:00
Jeff Roberson	95e3a0bca3	- Better fix for previous error; use DEVOLATILE on the td_lock pointer it can actually sometimes be something other than sched_lock even on schedulers which rely on a global scheduler lock. Tested by: kan	2007-06-05 04:12:46 +00:00
Jeff Roberson	c219b097af	- Pass &sched_lock as the third argument to cpu_switch() as this will always be the correct lock and we don't get volatile warnings this way. Pointed out by: kan	2007-06-05 03:46:54 +00:00
Jeff Roberson	7b20fb19fb	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:30 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Julian Elischer	c6226eea4c	Move the seting of the idle_mask bits to a place where they can't be wrong. Also use the IDLETD bit in the thread mask to test if its an idle thread rather than doing a PCPU access.	2007-02-02 05:14:22 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Robert Watson	2da78e3862	Prefer a more traditional spelling of inhibited in comments and panic messages.	2006-12-31 15:56:04 +00:00
David Xu	34e1241b9d	Fix typo, p_slptime should be td_slptime.	2006-12-24 01:52:27 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Julian Elischer	de38cd9d8b	whitespace fix only	2006-11-20 16:13:02 +00:00
David Xu	653385756c	Fix a copy-paste bug in NON-KSE case.	2006-11-14 05:48:27 +00:00
David Xu	5a21514727	Unbreak userland priority inheriting in NO_KSE case.	2006-11-11 13:11:29 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
Maxim Konovalov	1f36c876a1	o Fix grammar in the comment, indent macros. No functional changes.	2006-07-02 20:53:52 +00:00
Maxim Konovalov	75d960eb2e	o Remove rev. 1.57 leftover, not reached code.	2006-07-02 20:49:46 +00:00
David E. O'Brien	2e4db89cfc	Fix building with GCC 4.2: define data types before referring to them.	2006-06-29 19:37:31 +00:00
David Xu	36ec198bd5	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
David Xu	b41f1452d9	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00

1 2 3

131 Commits