freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	61b926921f	MFC	2011-05-31 21:22:44 +00:00
Nathan Whitehorn	d098f93019	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
Attilio Rao	a0a43452ae	Merge r221285 from largeSMP project: - Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw, in the future. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing. Reviewed by: julian Tested by: several	2011-05-17 22:14:00 +00:00
Attilio Rao	d59dd76c22	Merge r221278 from largeSMP project: idle_cpus_mask is just used in sched_4bsd, thus make it private for it. Tested by: several	2011-05-16 23:20:12 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	f0283a735c	- Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing Approved by: julian	2011-04-30 23:28:07 +00:00
Attilio Rao	3121f5347e	idle_cpus_mask is just used in the SMP case and within sched_4BSD. Declare appropriately.	2011-04-30 22:30:18 +00:00
Ryan Stone	60dd73b78b	If the 4BSD scheduler tries to schedule a thread that has been pinned or bound to an AP before SMP has started, the system will panic when we try to touch per-CPU state for that AP because that state has not been initialized yet. Fix this in the same way as ULE: place all threads in the global run queue before SMP has started. Reviewed by: jhb MFC after: 1 month	2011-04-26 20:34:30 +00:00
John Baldwin	e806d352d2	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
Fabien Thomas	586cb6ec77	Clearing the flag when preempting will let the preempted thread run too much time. This can finish in a scheduler deadlock with ping-pong between two threads. One sample of this is: - device lapic (to have a preemption point on critical_exit()) - options DEVICE_POLLING with HZ>1499 (to have lapic freq = hardclock freq) - running a cpu intensive task (that does not enter the kernel) - only one CPU on SMP or no SMP. As requested by jhb@ 4BSD have received the same type of fix instead of propagating the flag to the new thread. Reviewed by: jhb, jeff MFC after: 1 month	2011-03-31 13:59:47 +00:00
John Baldwin	2dc29adb9f	Rework realtime priority support: - Move the realtime priority range up above kernel sleep priorities and just below interrupt thread priorities. - Contract the interrupt and kernel sleep priority ranges a bit so that the timesharing priority band can be increased. The new timeshare range is now slightly larger than the old realtime + timeshare ranges. - Change the ULE scheduler to no longer use realtime priorities for interactive threads. Instead, the larger timeshare range is now split into separate subranges for interactive and non-interactive ("batch") threads. The end result is that interactive threads and non-interactive threads still use the same priority ranges as before, but realtime threads now have a separate, dedicated priority range. - Do not modify the priority of non-timeshare threads in sched_sleep() or via cv_broadcastpri(). Realtime and idle priority threads will no longer have their priorities affected by sleeping in the kernel. Reviewed by: jeff	2011-01-14 17:06:54 +00:00
Matthew D Fleming	52c0b557cc	One more sysctl(9) type-safety that I missed before.	2011-01-13 18:20:37 +00:00
John Baldwin	22d19207e9	- Move sched_fork() later in fork() after the various sections of the new thread and proc have been copied and zeroed from the old thread and proc. Otherwise attempts to modify thread or process data in sched_fork() could be undone. - Don't copy td_{base,}_user_pri from the old thread to the new thread in sched_fork_thread() in ULE. This is already done courtesy the bcopy() of the thread copy region. - Always initialize the real priority (td_priority) of new threads to the new thread's base priority (td_base_pri) to avoid bogusly inheriting a borrowed priority from the parent thread. MFC after: 2 weeks	2011-01-06 22:24:00 +00:00
David Xu	c8e368a933	- Follow r216313, the sched_unlend_user_prio is no longer needed, always use sched_lend_user_prio to set lent priority. - Improve pthread priority-inherit mutex, when a contender's priority is lowered, repropagete priorities, this may cause mutex owner's priority to be lowerd, in old code, mutex owner's priority is rise-only.	2010-12-29 09:26:46 +00:00
David Xu	acbe332a58	MFp4: It is possible a lower priority thread lending priority to higher priority thread, in old code, it is ignored, however the lending should always be recorded, add field td_lend_user_pri to fix the problem, if a thread does not have borrowed priority, its value is PRI_MAX. MFC after: 1 week	2010-12-09 02:42:02 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Alexander Motin	b722ad008b	Merge some SCHED_ULE features to SCHED_4BSD: - Teach SCHED_4BSD to inform cpu_idle() about high sleep/wakeup rate to choose optimized handler. In case of x86 it is MONITOR/MWAIT. Also it will be needed to bypass forthcoming idle tick skipping logic to not consume resources on events rescheduling when it won't give any benefits. - Teach SCHED_4BSD to wake up idle CPUs without using IPI. In case of x86, when MONITOR/MWAIT is active, it require just single memory write. This doubles performance on some heavily switching test loads.	2010-09-11 07:08:22 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
John Baldwin	3da35a0a52	Assert that the thread lock is held in sched_pctcpu() instead of recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month	2010-06-03 16:02:11 +00:00
John Baldwin	1d7830edd5	Assert that the thread passed to sched_bind() and sched_unbind() is curthread as those routines are only supported for curthread currently. MFC after: 1 month	2010-05-21 17:15:56 +00:00
Attilio Rao	58060789e6	Split out an invariant in order to better check that newtd, when provided, must be on a runqueue. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC: 2 weeks X-MFC: r202889	2010-01-24 18:16:38 +00:00
Attilio Rao	b0b9dee5c9	- Fix a race in sched_switch() of sched_4bsd. In the case of the thread being on a sleepqueue or a turnstile, the sched_lock was acquired (without the aid of the td_lock interface) and the td_lock was dropped. This was going to break locking rules on other threads willing to access to the thread (via the td_lock interface) and modify his flags (allowed as long as the container lock was different by the one used in sched_switch). In order to prevent this situation, while sched_lock is acquired there the td_lock gets blocked. [0] - Merge the ULE's internal function thread_block_switch() into the global thread_lock_block() and make the former semantic as the default for thread_lock_block(). This means that thread_lock_block() will not disable interrupts when called (and consequently thread_unlock_block() will not re-enabled them when called). This should be done manually when necessary. Note, however, that ULE's thread_unblock_switch() is not reaped because it does reflect a difference in semantic due in ULE (the td_lock may not be necessarilly still blocked_lock when calling this). While asymmetric, it does describe a remarkable difference in semantic that is good to keep in mind. [0] Reported by: Kohji Okuno <okuno dot kohji at jp dot panasonic dot com> Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC: 2 weeks	2010-01-23 15:54:21 +00:00
Attilio Rao	6eac7e5752	- Fix a bug in sched_4bsd where the timestamp for the sleeping operation is not cleaned up on the wakeup but reset. This is harmless mostly because td_slptick (and ki_slptime from userland) should be analyzed only with the assumption that the thread is actually sleeping (thus while the td_slptick is correctly set) but without this invariant the number is nomore consistent. - Move td_slptick from u_int to int in order to follow 'ticks' signedness and wrap up accordingly [0] [0] Submitted by: emaste Sponsored by: Sandvine Incorporated MFC 1 week	2010-01-08 14:55:11 +00:00
Konstantin Belousov	17c4c3563c	Allow swap out of the kernel stack for the thread with priority greater or equial then PSOCK, not less or equial. Higher priority has lesser numerical value. Existing test does not allow for swapout of the thread waiting for advisory lock, for exiting child or sleeping for timeout. On the other hand, high-priority waiters of VFS/VM events can be swapped out. Tested by: pho Reviewed by: jhb MFC after: 1 week	2009-12-31 18:52:58 +00:00
Attilio Rao	1b9d701fee	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>	2009-11-03 16:46:52 +00:00
Jeff Roberson	0d2cf8374a	- Use __XSTRING where I want the define to be expanded. This resulted in sizeof("MAXCPU") being used to calculate a string length rather than something more reasonable such as sizeof("32"). This shouldn't have caused any ill effect until we run on machines with 1000000 or more cpus.	2009-01-25 07:35:10 +00:00
Jeff Roberson	8f51ad55e7	- Implement generic macros for producing KTR records that are compatible with src/tools/sched/schedgraph.py. This allows developers to quickly create a graphical view of ktr data for any resource in the system. - Add sched_tdname() and the pcpu field 'name' for quickly and uniformly identifying records associated with a thread or cpu. - Reimplement the KTR_SCHED traces using the new generic facility. Obtained from: attilio Discussed with: jhb Sponsored by: Nokia	2009-01-17 07:17:57 +00:00
John Baldwin	c3ea337801	When choosing a CPU for a thread in a cpuset, prefer the last CPU that the thread ran on if there are no other CPUs in the set with a shorter per-CPU runqueue.	2008-07-28 20:39:21 +00:00
John Baldwin	f200843b72	Implement support for cpusets in the 4BSD scheduler. - When a cpuset is applied to a thread, walk the cpuset to see if it is a "full" cpuset (includes all available CPUs). If not, set a new TDS_AFFINITY flag to indicate that this thread can't run on all CPUs. When inheriting a cpuset from another thread during thread creation, the new thread also inherits this flag. It is in a new ts_flags field in td_sched rather than using one of the TDF_SCHEDx flags because fork() clears td_flags after invoking sched_fork(). - When placing a thread on a runqueue via sched_add(), if the thread is not pinned or bound but has the TDS_AFFINITY flag set, then invoke a new routine (sched_pickcpu()) to pick a CPU for the thread to run on next. sched_pickcpu() walks the cpuset and picks the CPU with the shortest per-CPU runqueue length. Note that the reason for the TDS_AFFINITY flag is to avoid having to walk the cpuset and examine runq lengths in the common case. - To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array of counters to hold the length of the per-CPU runqueues and update them when adding and removing threads to per-CPU runqueues. MFC after: 2 weeks	2008-07-28 17:25:24 +00:00
John Baldwin	8aa3d7ffc0	Various and sundry style and whitespace fixes.	2008-07-28 15:52:02 +00:00
John Birrell	6f5f25e521	Add the vtime (virtual time) hooks for DTrace.	2008-05-25 01:44:58 +00:00
Jeff Roberson	6c47aaae12	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
Jeff Roberson	8df78c41d6	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
Jeff Roberson	9727e63745	- Restore runq to manipulating threads directly by putting runq links and rqindex back in struct thread. - Compile kern_switch.c independently again and stop #include'ing it from schedulers. - Remove the ts_thread backpointers and convert most code to go from struct thread to struct td_sched. - Cleanup the ts_flags #define garbage that was causing us to sometimes do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD. - Export the kern.sched sysctl node in sysctl.h	2008-03-20 05:51:16 +00:00
Jeff Roberson	8b16c208e6	- ULE and 4BSD share only one line of code from sched_newthread() so implement the required pieces in sched_fork_thread(). The td_sched pointer is already setup by thread_init anyway.	2008-03-20 03:06:33 +00:00
Jeff Roberson	a90f3f2547	- Move maybe_preempt() from kern_switch.c to sched_4bsd.c. This is function is only used by 4bsd. - Create a new runq_choose_fuzz() function rather than polluting runq_choose() with 4BSD specific code. - Move the fuzz sysctl into sched_4bsd.c - Remove some dead code from kern_switch.c	2008-03-20 02:14:02 +00:00
Jeff Roberson	a564bfc7fa	- Directly include opt_sched.h in sched_4bsd.	2008-03-20 01:32:48 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Jeff Roberson	c5aa6b581d	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
Jeff Roberson	1e24c28f46	- Add a sched_preempt() routine to be called by md code after IPI_PREEMPT is delivered. - Add a simple implementation to 4bsd.	2008-03-10 01:30:35 +00:00
Marcel Moolenaar	f5a3ef99c2	Unbreak after cpuset: initialize td_cpuset in sched_fork_thread().	2008-03-02 21:34:57 +00:00
Jeff Roberson	885d51a38a	- Add a new sched_affinity() api to be used in the upcoming cpuset implementation. - Add empty implementations of sched_affinity() to 4BSD and ULE. Sponsored by: Nokia	2008-03-02 07:19:35 +00:00
Jeff Roberson	eea4f254fe	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia	2007-12-15 23:13:31 +00:00
David Xu	435806d31b	Fix LOR of thread lock and umtx's priority propagation mutex due to the reworking of scheduler lock. MFC: after 3 days	2007-12-11 08:25:36 +00:00
Julian Elischer	431f890614	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
Robert Watson	088f584961	Remove unused variable td from sched_idletd(). MFC after: 3 days Found with: Coverity Prevent(tm) CID: 3561	2007-11-05 12:01:12 +00:00
John Baldwin	9dddab6fc1	Change the roundrobin implementation in the 4BSD scheduler to trigger a userland preemption directly from hardclock() via sched_clock() when a thread uses up a full quantum instead of using a periodic timeout to cause a userland preemption every so often. This fixes a potential deadlock when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held by a thread pinned or bound to another CPU. The current thread on that CPU will never be preempted while softclock is blocked. Note that ULE already drives its round-robin userland preemption from sched_clock() as well and always enables IPI_PREEMPT. MFC after: 1 week	2007-10-27 22:07:40 +00:00
Julian Elischer	7ab24ea3b9	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.	2007-10-26 08:00:41 +00:00
Jeff Roberson	05dc0eb204	- Restore historical sched_yield() behavior by changing sched_relinquish() to simply switch rather than lowering priority and switching. This allows threads of equal priority to run but not lesser priority. Discussed with: davidxu Reported by: NIIMI Satoshi <sa2c@sa2c.net> Approved by: re	2007-10-08 23:45:24 +00:00
Jeff Roberson	54b0e65f84	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 04:10:23 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Jeff Roberson	6ea38de8aa	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re	2007-07-18 20:46:06 +00:00
Jeff Roberson	fe54587ffa	- Move some common code out of sched_fork_exit() and back into fork_exit().	2007-06-12 07:47:09 +00:00
Jeff Roberson	710eacdc5f	- Placing the 'volatile' on the right side of the * in the td_lock declaration removes the need for __DEVOLATILE(). Pointed out by: tegge	2007-06-06 03:40:47 +00:00
Jeff Roberson	95e3a0bca3	- Better fix for previous error; use DEVOLATILE on the td_lock pointer it can actually sometimes be something other than sched_lock even on schedulers which rely on a global scheduler lock. Tested by: kan	2007-06-05 04:12:46 +00:00
Jeff Roberson	c219b097af	- Pass &sched_lock as the third argument to cpu_switch() as this will always be the correct lock and we don't get volatile warnings this way. Pointed out by: kan	2007-06-05 03:46:54 +00:00
Jeff Roberson	7b20fb19fb	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:50:30 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Julian Elischer	c6226eea4c	Move the seting of the idle_mask bits to a place where they can't be wrong. Also use the IDLETD bit in the thread mask to test if its an idle thread rather than doing a PCPU access.	2007-02-02 05:14:22 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Robert Watson	2da78e3862	Prefer a more traditional spelling of inhibited in comments and panic messages.	2006-12-31 15:56:04 +00:00
David Xu	34e1241b9d	Fix typo, p_slptime should be td_slptime.	2006-12-24 01:52:27 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Julian Elischer	de38cd9d8b	whitespace fix only	2006-11-20 16:13:02 +00:00
David Xu	653385756c	Fix a copy-paste bug in NON-KSE case.	2006-11-14 05:48:27 +00:00
David Xu	5a21514727	Unbreak userland priority inheriting in NO_KSE case.	2006-11-11 13:11:29 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
David Xu	3db720fdce	Add user priority loaning code to support priority propagation for 1:1 threading's POSIX priority mutexes, the code is no-op unless priority-aware umtx code is committed.	2006-08-25 06:12:53 +00:00
Maxim Konovalov	1f36c876a1	o Fix grammar in the comment, indent macros. No functional changes.	2006-07-02 20:53:52 +00:00
Maxim Konovalov	75d960eb2e	o Remove rev. 1.57 leftover, not reached code.	2006-07-02 20:49:46 +00:00
David E. O'Brien	2e4db89cfc	Fix building with GCC 4.2: define data types before referring to them.	2006-06-29 19:37:31 +00:00
David Xu	36ec198bd5	Add scheduler API sched_relinquish(), the API is used to implement yield() and sched_yield() syscalls. Every scheduler has its own way to relinquish cpu, the ULE and CORE schedulers have two internal run- queues, a timesharing thread which calls yield() syscall should be moved to inactive queue.	2006-06-15 06:37:39 +00:00
David Xu	b41f1452d9	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.	2006-06-13 13:12:56 +00:00
David Xu	0ae716e5ee	Make ke_rqindex unsigned.	2006-06-06 12:26:17 +00:00
John-Mark Gurney	5c06d111b8	back out for now... revert ccpu to being kern.ccpu...	2006-04-27 17:57:59 +00:00
John-Mark Gurney	c71ce6a445	move remaining sysctl into the kern.sched tree...	2006-04-26 19:42:38 +00:00
John Baldwin	0f180a7cce	Change msleep() and tsleep() to not alter the calling thread's priority if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.	2006-04-17 18:20:38 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Stephan Uphoff	a3f2d84279	Lots of whitespace cleanup. Fix for broken if condition. Submitted by: nate@	2005-06-09 19:43:08 +00:00
Stephan Uphoff	f3a0f87396	Fix some race conditions for pinned threads that may cause them to run on the wrong CPU. Add IPI support for preempting a thread on another CPU. MFC after:3 weeks	2005-06-09 18:26:31 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Maxim Konovalov	f305048664	Fix a typo in the comment. Noticed by: Samy Al Bahra	2005-04-15 14:01:43 +00:00
Stephan Uphoff	779186434a	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb	2005-04-08 03:37:53 +00:00
John Baldwin	f5c157d986	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
Jeff Roberson	907bdbc288	- Wrap the thread count adjustment in sched_load_add() and sched_load_rem() so that we may place some ktr entries nearby. - Define other KTR_SCHED tracepoints so that we may graph the operation of the scheduler.	2004-12-26 00:16:24 +00:00
Jeff Roberson	7842f65e7f	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.	2004-12-14 10:53:55 +00:00
Stephan Uphoff	5656474145	Propagate TDF_NEEDRESCHED to replacement thread in sched_switch(). Reviewed by: julian, jhb (in October) Approved by: sam (mentor) MFC after: 4 weeks	2004-12-07 18:17:24 +00:00
Julian Elischer	c20c691bed	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
Julian Elischer	d39063f20d	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
Julian Elischer	14f0e2e9bf	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
Julian Elischer	b2578c6c06	Add some kasserts	2004-09-13 23:02:52 +00:00
Scott Long	1e7fad6b6a	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
Julian Elischer	5c854accc1	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
Julian Elischer	6a574b2afc	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week	2004-09-06 07:23:14 +00:00

1 2 3 4 5

209 Commits