freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Edward Tomasz Napierala	afcc55f318	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
Alan Cox	a8229fa37c	Initialize marker pages as held rather than fictitious/wired. Marking the page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER. Reviewed by: kib	2011-07-02 23:34:47 +00:00
Edward Tomasz Napierala	f497cda257	In vm_daemon(), do not skip processes stopped with SIGSTOP.	2011-04-06 16:27:04 +00:00
Edward Tomasz Napierala	099e7e950f	Add RACCT_RSS. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 16:24:24 +00:00
John Baldwin	8e6fa660f2	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week	2011-03-24 18:40:11 +00:00
Edward Tomasz Napierala	3fccbe4397	In vm_daemon(), when iterating over all processes in the system, skip those which are not yet fully initialized (i.e. ones with p_state == PRS_NEW). Without it, we could panic in _thread_lock_flags(). Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that require similar fix. Reported by: pho, keramida Discussed with: kib	2011-03-18 06:47:23 +00:00
Alan Cox	4c6a2e7a1f	Shift responsibility for synchronizing access to the page's act_count field to the object's lock. Reviewed by: kib@	2011-01-16 18:01:39 +00:00
Alan Cox	17f6a17bf7	Release the page lock early in vm_pageout_clean(). There is no reason to hold this lock until the end of the function. With the aforementioned change to vm_pageout_clean(), page locks don't need to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions. Reviewed by: kib	2011-01-03 00:41:56 +00:00
Konstantin Belousov	1e8a675c73	vm_pageout_flush() might cache the pages that finished write to the backing storage. Such pages might be then reused, racing with the assert in vm_object_page_collect_flush() that verified that dirty pages from the run (most likely, pages with VM_PAGER_AGAIN status) are write-protected still. In fact, the page indexes for the pages that were removed from the object page list should be ignored by vm_object_page_clean(). Return the length of successfully written run from vm_pageout_flush(), that is, the count of pages between requested page and first page after requested with status VM_PAGER_AGAIN. Supply the requested page index in the array to vm_pageout_flush(). Use the returned run length to forward the index of next page to clean in vm_object_page_clean(). Reported by: avg Reviewed by: alc MFC after: 1 week	2010-11-18 21:09:02 +00:00
Nathan Whitehorn	42768fec0f	On architectures with non-tree-based page tables like PowerPC, every page in a range must be checked when calling pmap_remove(). Calling pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range of the map could result in attempting to demap an extraordinary number of pages (> 10^15), so iterate through each map entry and unmap each of them individually. MFC after: 6 weeks	2010-09-09 13:32:58 +00:00
Alan Cox	cdaba1f2be	Push down the acquisition of the page queues lock into vm_pageout_page_stats(). In particular, avoid acquiring the page queues lock unless iterating over the active queue.	2010-07-02 20:56:22 +00:00
Alan Cox	9cf5198832	With the demise of page coloring, the page queue macros no longer serve any useful purpose. Eliminate them. Reviewed by: kib	2010-07-02 15:02:51 +00:00
Alan Cox	95976f3f38	Simplify entry to vm_pageout_clean(). Expect the page to be locked. Previously, the caller unlocked the page, and vm_pageout_clean() immediately reacquired the page lock. Also, assert rather than test that the page is neither busy nor held. Since vm_pageout_clean() is called with the object and page locked, the page can't have changed state since the caller verified that the page is neither busy nor held.	2010-06-30 17:20:33 +00:00
Alan Cox	91b4f42767	Introduce vm_page_next() and vm_page_prev(), and use them in vm_pageout_clean(). When iterating over a range of pages, these functions can be cheaper than vm_page_lookup() because their implementation takes advantage of the vm_object's memq being ordered. Reviewed by: kib@ MFC after: 3 weeks	2010-06-21 23:27:24 +00:00
Alan Cox	9ee2165f5d	Eliminate checks for a page having a NULL object in vm_pageout_scan() and vm_pageout_page_stats(). These checks were recently introduced by the first page locking commit, r207410, but they are not needed. At the same time, eliminate some redundant accesses to the page's object field. (These accesses should have neen eliminated by r207410.) Make the assertion in vm_page_flag_set() stricter. Specifically, only managed pages should have PG_WRITEABLE set. Add a comment documenting an assertion to vm_page_flag_clear(). It has long been the case that fictitious pages have their wire count permanently set to one. Add comments to vm_page_wire() and vm_page_unwire() documenting this. Add assertions to these functions as well. Update the comment describing vm_page_unwire(). Much of the old comment had little to do with vm_page_unwire(), but a lot to do with _vm_page_deactivate(). Move relevant parts of the old comment to _vm_page_deactivate(). Only pages that belong to an object can be paged out. Therefore, it is pointless for vm_page_unwire() to acquire the page queues lock and enqueue such pages in one of the paging queues. Generally speaking, such pages are immediately freed after the call to vm_page_unwire(). Previously, it was the call to vm_page_free() that reacquired the page queues lock and removed these pages from the paging queues. Now, we will never acquire the page queues lock for this case. (It is also worth noting that since both vm_page_unwire() and vm_page_free() occurred with the page locked, the page daemon never saw the page with its object field set to NULL.) Change the panic with vm_page_unwire() to provide a more precise message. Reviewed by: kib@	2010-06-14 19:54:19 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Jung-uk Kim	af394cfa36	Fix a typo in the previous commit.	2010-05-07 21:06:52 +00:00
Konstantin Belousov	af4b86b949	One more use for vm_pageout_init_marker(). Reviewed by: alc	2010-05-07 18:57:26 +00:00
Konstantin Belousov	8c6162468b	Add a helper function vm_pageout_page_lock(), similar to tegge' vm_pageout_fallback_object_lock(), to obtain the page lock while having page queue lock locked, and still maintain the page position in a queue. Use the helper to lock the page in the pageout daemon and contig launder iterators instead of skipping the page if its lock is contested. Skipping locked pages easily causes pagedaemon or launder to not make a progress with page cleaning. Proposed and reviewed by: alc	2010-05-06 04:57:33 +00:00
Alan Cox	6c56db5c9e	Eliminate an assignment that was made redundant by r207410.	2010-05-02 21:04:59 +00:00
Alan Cox	447fe2a4c6	Defer the acquisition of the page and page queues locks in vm_pageout_object_deactivate_pages().	2010-05-02 20:46:17 +00:00
Kip Macy	7bec141b12	push up dropping of the page queue lock to avoid holding it in vm_pageout_flush	2010-04-30 22:31:37 +00:00
Kip Macy	e8f263195d	- don't check hold_count without the page lock held - don't leak the page lock if m->object is NULL (assuming that that check will in fact even be valid when m->object is protected by the page lock)	2010-04-30 19:40:37 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	82bfb965d1	Simplify the inner loop of vm_pageout_object_deactivate_pages(). Rather than checking each page for PG_UNMANAGED, check the vm object's type. Only OBJT_PHYS can have unmanaged pages. Eliminate a pointless counter. The vm object is locked, that lock is never released by the inner loop, and the set of pages contained by the vm object is not changed by the inner loop. Therefore, the counter serves no purpose.	2010-04-29 16:18:45 +00:00
Alan Cox	5d4a7b7945	Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to an object whose reference count is zero, then that page cannot possibly be mapped.	2010-04-20 04:16:39 +00:00
Alan Cox	b28889a2fc	Remove a nonsensical test from vm_pageout_clean(). A page can't be in the inactive queue and have a non-zero wire count. Reviewed by: kib MFC after: 3 weeks	2010-04-18 21:29:28 +00:00
Konstantin Belousov	3f1c4c4f31	When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks	2010-04-06 10:43:01 +00:00
Konstantin Belousov	b9f180d1de	When a vnode-backed vm object is referenced, it increments the vnode reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks	2010-01-17 21:26:14 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Alan Cox	26f4eea53f	The bits set in a page's dirty mask are a subset of the bits set in its valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid. Eliminate an unnecessary #include.	2009-06-24 04:45:03 +00:00
Alan Cox	b78ddb0b8a	Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically, rather than unconditionally making partially dirty pages fully dirty, only make partially dirty pages fully dirty if the pmap says that the page has been modified. (This change is also a small optimization. It eliminate an unnecessary call to pmap_is_modified() on pages that are mapped read only.) Suggested by: tegge	2009-05-28 06:52:14 +00:00
Alan Cox	47916d0c37	Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan(). If the page belongs to an object with a reference count of zero, then it can't have any managed mappings on which to clear a reference bit.	2009-05-17 20:40:41 +00:00
Konstantin Belousov	7981aa2431	Use the acquired reference to the vmspace instead of direct dereferencing of p->p_vmspace in a place where it was missed in r191277. Noted by: pluknet gmail com	2009-04-28 11:45:36 +00:00
Konstantin Belousov	6bed074cd2	In both pageout oom handler and vm_daemon, acquire the reference to the vmspace of the examined process instead of directly accessing its vmspace, that may change. Also, as an optimization, check for P_INEXEC flag before examining the process. Reported and tested by: pho (previous version) Reviewed by: alc MFC after: 3 week	2009-04-19 20:53:47 +00:00
Alan Cox	f4b0c119c0	Calling pmap_clear_modify() after calling pmap_remove_write() is pointless. The latter function already clears the modified status from each of the page's mappings.	2009-04-19 07:18:08 +00:00
Konstantin Belousov	6129343d5d	Instead of forcing vn_start_write() to reset mp back to NULL for the failed calls with non-NULL vp, explicitely clear mp after failure. Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week	2008-11-16 21:57:54 +00:00
Konstantin Belousov	2025d69ba7	Move the code for doing out-of-memory grass from vm_pageout_scan() into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call. Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted. Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks	2008-09-29 19:45:12 +00:00
Alan Cox	8d28bf04e2	Prevent an integer overflow in vm_pageout_page_stats() on machines with a large number of physical pages. PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days	2008-09-21 18:01:34 +00:00
Tom Rhodes	6bd9cb1c81	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
Alan Cox	e5b006ffca	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Alan Cox	da31e3aa04	Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks	2007-11-25 20:37:29 +00:00
Alan Cox	5dfc28704d	Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks	2007-11-23 00:30:19 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00

1 2 3 4 5 ...

339 Commits