freebsd-skq

Author	SHA1	Message	Date
kib	8de1718b60	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
attilio	1c940ef4f4	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>	2009-11-03 16:46:52 +00:00
trhodes	f37865f7f0	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
jeff	9d30d1d7a4	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
alc	c29e755cdb	Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun". Approved by: re (hrs)	2007-07-14 19:00:44 +00:00
alc	a8415c5a0d	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
alc	909177e227	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
alc	4881bd38e2	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
jeff	474b917526	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
julian	396ed947f6	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
jb	f82c799735	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
alc	fe4d04c555	Add _vm_stats and _vm_stats_misc to the sysctl declarations in sysctl.h and eliminate their declarations from various source files.	2006-08-21 06:27:28 +00:00
alc	24209e2264	vm_page_zero_idle()'s return value serves no purpose. Eliminate it.	2006-08-21 00:55:05 +00:00
jhb	d535a5cb81	Change msleep() and tsleep() to not alter the calling thread's priority if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.	2006-04-17 18:20:38 +00:00
netchild	507a9b3e93	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
jhb	7758b042a5	Trim a couple of unneeded includes.	2005-09-29 19:13:52 +00:00
ssouhlal	972ed7b626	Set the scheduling class of the zeroidle thread to PRI_IDLE. Reviewed by: jhb Approved by: grehan (mentor) MFC after: 1 week	2005-02-04 06:18:31 +00:00
phk	8b96099f88	Remove dangling variable	2004-11-06 11:33:11 +00:00
jhb	88680af3b7	- Set the priority of the page zeroing thread using sched_prio() when the thread is created rather than adjusting the priority in the main function. (kthread_create() should probably take the initial priority as an argument.) - Only yield the CPU in the !PREEMPTION case if there are any other runnable threads. Yielding when there isn't anything else better to do just wastes time in pointless context switches (albeit while the system is idle.)	2004-11-05 19:14:02 +00:00
alc	d9ced80d66	Introduce a Boolean variable wakeup_needed to avoid repeated, unnecessary calls to wakeup() by vm_page_zero_idle_wakeup().	2004-10-31 19:32:57 +00:00
alc	ee68591e10	Eliminate an unused but initialized variable.	2004-10-30 20:11:23 +00:00
scottl	d9af98161a	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
julian	e9d9514975	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
iedowse	e97493a743	Prevent vm_page_zero_idle_wakeup() from attempting to wake up the page zeroing thread before it has been created. It was possible for calls to free() very early in the boot process to panic here because the sleep queues were not yet initialised. Specifically, sysinit_add() running at SI_SUB_KLD would trigger this if the array of pointers became big enough to require uma_large_alloc() allocations. Submitted by: peter	2004-08-29 01:02:33 +00:00
jhb	696704716d	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
jhb	1b16b181d1	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
bde	557b43a4bb	Record exactly where this file was copied from. It wasn't repo-copied so this is not very obvious. Fixed some style bugs (mainly missing parentheses around return values).	2004-03-04 10:18:17 +00:00
jeff	60c153f457	- Use a seperate startup function for the zeroidle kthread. Use this to set P_NOLOAD prior to running the thread.	2004-02-02 07:51:03 +00:00
jeff	c85cdc3d0f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
imp	3bc162cfa3	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon	2003-08-12 23:24:05 +00:00
obrien	b0678d7a44	Use __FBSDID().	2003-06-11 23:50:51 +00:00
des	9dc45d46b9	Rename a static variable to avoid future conflicts.	2003-04-04 12:08:42 +00:00
jeff	ef4d4e378e	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch	2002-10-12 05:32:24 +00:00
peter	febe98ab98	Set P_NOLOAD on the pagezero kthread so that it doesn't artificially skew the loadav. This is not real load. If you have a nice process running in the background, pagezero may sit in the run queue for ages and add one to the loadav, and thereby affecting other scheduling decisions.	2002-07-19 21:06:01 +00:00
alc	1ba951badc	o Remove the acquisition and release of Giant from the idle priority thread that pre-zeroes free pages. o Remove GIANT_REQUIRED from some low-level page queue functions. (Instead assertions on the page queue lock are being added to the higher-level functions, like vm_page_wire(), etc.) In collaboration with: peter	2002-07-18 17:40:07 +00:00
alc	fe71e4fa20	o Use vm_pageq_remove_nowakeup() and vm_pageq_enqueue() in vm_page_zero_idle() instead of partially duplicated implementations. In particular, this change guarantees that the number of free pages in the free queue(s) matches the global free page count when Giant is released. Submitted by: peter (via his p4 "pmap" branch)	2002-07-16 19:39:40 +00:00
dillon	dc5d856e71	Re-enable the idle page-zeroing code. Remove all IPIs from the idle page-zeroing code as well as from the general page-zeroing code and use a lazy tlb page invalidation scheme based on a callback made at the end of mi_switch. A number of people came up with this idea at the same time so credit belongs to Peter, John, and Jake as well. Two-way SMP buildworld -j 5 tests (second run, after stabilization) 2282.76 real 2515.17 user 704.22 sys before peter's IPI commit 2266.69 real 2467.50 user 633.77 sys after peter's commit 2232.80 real 2468.99 user 615.89 sys after this commit Reviewed by: peter, jhb Approved by: peter	2002-07-12 20:17:06 +00:00
peter	770030e679	vm_page_queue_free_mtx is a spin mutex, not a normal sleep mutex. I do not know why this didn't panic my box, but I have most certainly been using it: peter@overcee[3:14pm]~src/sys/i386/i386-110> sysctl -a \| grep zero vm.stats.misc.zero_page_count: 2235 vm.stats.misc.cnt_prezero: 638951 vm.idlezero_enable: 1 vm.idlezero_maxrun: 16 Submitted by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor's patches are never wrong. :-)	2002-07-08 23:12:37 +00:00
peter	94af63d292	Turn the zeroidle process off for SMP systems, there is still a possible TLB problem when bouncing from one cpu to another (the original cpu will not have purged its TLB if the it simply went idle). Pointed out by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor is never wrong. :-)	2002-07-08 23:09:11 +00:00
peter	62e40d1277	Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed. We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86. Use Luigi's hack for preserving the (lack of) priority. Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.	2002-07-08 04:24:26 +00:00
alc	ffff9f17c5	o Lock accesses to the free queue(s) in vm_page_zero_idle().	2002-07-07 19:27:57 +00:00
julian	aa2dc0a5d9	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
peter	3d8c7d4cab	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)	2002-04-15 16:00:03 +00:00
julian	37369620df	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
julian	5596676e6c	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00

1 2

55 Commits