freebsd-skq

Author	SHA1	Message	Date
Alan Cox	da31e3aa04	Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks	2007-11-25 20:37:29 +00:00
Alan Cox	5dfc28704d	Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks	2007-11-23 00:30:19 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Alan Cox	4cd457233b	Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week	2007-09-15 18:30:28 +00:00
Alan Cox	14137dc045	In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-07-02 06:56:37 +00:00
Alan Cox	97824da382	Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks	2007-06-26 18:24:05 +00:00
Alan Cox	9e897b1bc6	Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge	2007-06-18 02:04:38 +00:00
Alan Cox	2446e4f02c	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
Alan Cox	d076fbea58	Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.	2007-06-13 06:10:10 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Alan Cox	e9f995d824	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
Xin LI	f67af5c918	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 15:05:52 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Alan Cox	78985e424a	Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change. The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access. Reviewed by: marcel@ (ia64)	2006-08-01 19:06:06 +00:00
Tor Egge	625e6c0af4	Expand scope of marker to reduce the number of page queue scan restarts.	2006-02-17 21:02:39 +00:00
Tor Egge	db27dcc0f0	Check return value from nonblocking call to vn_start_write().	2006-02-17 18:22:19 +00:00
Alan Cox	3b7db47d7e	Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid.	2006-02-04 22:37:10 +00:00
Alan Cox	997e1c252b	Use the new macros abstracting the page coloring/queues implementation. (There are no functional changes.)	2006-01-27 07:28:51 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Alan Cox	7a35a21e7b	Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations. Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away. Reviewed by: tegge	2005-11-09 08:19:21 +00:00
Tor Egge	8dbca793a9	Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE due to the vm object being locked. When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects. Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes. This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable. Reviewed by: alc	2005-08-10 00:17:36 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Alan Cox	df2e33bf42	Revise the part of vm_pageout_scan() that moves pages from the cache queue to the free queue. With this change, if a page from the cache queue belongs to a locked object, it is simply skipped over rather than moved to the inactive queue.	2005-01-06 20:22:36 +00:00
Alan Cox	34d9e6fdae	During traversal of the inactive queue, try locking the page's containing object before accessing the page's flags or the object's reference count.	2004-11-05 06:24:05 +00:00
Alan Cox	d19ef81437	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
Alan Cox	b86e6ec007	During traversal of the active queue by vm_pageout_page_stats(), try locking the page's containing object before accessing the page's flags.	2004-10-30 23:30:53 +00:00
Alan Cox	4b8a5c4095	Add an assignment statement that I omitted from the previous revision.	2004-10-30 07:09:46 +00:00
Alan Cox	b08abf6cc0	During traversal of the active queue, try locking the page's containing object before accessing the page's flags or the object's reference count. If the trylock fails, handle the page as though it is busy.	2004-10-27 18:29:17 +00:00
Alan Cox	3e36afbe27	Remove the GIANT_REQUIRED preceding pmap_remove() in vm_pageout_map_deactivate_pages().	2004-07-18 04:38:11 +00:00
Alan Cox	3d2e54c317	Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.	2004-07-15 18:00:43 +00:00
Alan Cox	26354d4c08	Remove an unused and unimplemented sysctl. (For the record, it was marked as unimplemented in revision 1.129 nearly six years ago.)	2004-07-12 17:45:37 +00:00
Alan Cox	5e609009de	Call vm_pageout_page_stats() with the page queues lock held.	2004-06-24 04:08:43 +00:00
Alan Cox	1aab16a6b6	Remove spl calls.	2004-06-24 03:13:30 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Alan Cox	f651b12907	Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.	2004-05-12 04:10:35 +00:00
Bruce Evans	dcbcd518e0	Minor style fixes. In vm_daemon(), don't fetch the rss limit long before it is needed.	2004-03-04 09:36:46 +00:00
Alan Cox	9ea8d1a67c	Eliminate the second, unnecessary call to pmap_page_protect() near the end of vm_pageout_flush(). Instead, assert that the page is still write protected. Discussed with: tegge	2004-02-21 23:32:00 +00:00
Alan Cox	84d98bf699	- Correct a long-standing race condition in vm_page_try_to_cache() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251. - Simplify the code surrounding the fix to this same race condition in vm_pageout.c's revision 1.251. There should be no behavioral change. Reviewed by: tegge MFC after: 7 days	2004-02-14 08:54:37 +00:00
Alan Cox	a3dfacb51c	Correct a long-standing race condition in the inactive queue scan. (See the added comment for low-level details.) The effect of this race condition is a panic "vm_page_cache: caching a dirty page, ..." Reviewed by: tegge MFC after: 7 days	2004-02-10 18:34:27 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alan Cox	2e3b314d3a	- Push down Giant from vm_pageout() to vm_pageout_scan(), freeing vm_pageout_page_stats() from Giant. - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the vm object to be locked on entry. (All of the pager routines now expect this.)	2003-10-24 06:43:04 +00:00
Alan Cox	ab42316c2f	- Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from vm_pageout_scan(). Rationale: I don't like leaving a busy page in the cache queue with neither the vm object nor the vm page queues lock held. - Assert that the page is active in vm_pageout_page_stats().	2003-10-22 18:41:32 +00:00
Alan Cox	d3c09dd7db	- Assert that every page found in the active queue is an active page.	2003-10-22 03:08:24 +00:00
Alan Cox	7a93508274	- Increase the object lock's scope in vm_contig_launder() so that access to the object's type field and the call to vm_pageout_flush() are synchronized. - The above change allows for the eliminaton of the last parameter to vm_pageout_flush(). - Synchronize access to the page's valid field in vm_pageout_flush() using the containing object's lock.	2003-10-18 21:09:21 +00:00
Alan Cox	6989c456b3	- Synchronize access to a vm page's valid field using the containing vm object's lock. - Release the vm object and vm page queues locks around vput().	2003-10-17 05:07:17 +00:00

1 2 3 4 5 ...

293 Commits