freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	9815066425	Make swapoff reliable. The swap_pager_swapoff() function uses trylock for the object lock before pagein, which means that either i/o to md(4) over swap, or intensive page faults over swap pager objects might prevent swapoff() from making any progress. Then the retry < 100 check fails and machine panics. If trylock fails, acquire the object lock in the blockable way and restart the hash bucket walk. Keep retries logic for now. Reported and tested by: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7688	2016-08-31 14:49:58 +00:00
Mark Johnston	915d1b71cd	Restore swap pager readahead after r292373. The removal of vm_fault_additional_pages() meant that a hard fault on a swap-backed page would result in only that page being read in. This change implements readahead and readbehind for the swap pager in swap_pager_getpages(). swap_pager_haspage() is modified to return the largest contiguous non-resident range of pages containing the requested range. Reviewed by: alc, kib Tested by: pho MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7677	2016-08-30 05:56:21 +00:00
Alan Cox	ce3ee09b53	Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when neither vm_pager_has_page() nor vm_pager_get_pages() is called. Reviewed by: kib, markj MFC after: 3 weeks	2016-08-14 22:00:45 +00:00
Mark Johnston	842ee21e20	Strengthen assertions about the busy state of newly-allocated pages. Reviewed by: alc MFC after: 1 week	2016-08-13 19:49:32 +00:00
Mark Johnston	fc85a6f0c4	Initialize page busy lock state in vm_phys_add_page(). MFC after: 1 week	2016-08-13 19:48:43 +00:00
Alan Cox	791444089f	Correct errors and clean up the comments on the active queue scan. Eliminate some unnecessary blank lines. Reviewed by: kib, markj MFC after: 1 week	2016-08-12 03:22:58 +00:00
Edward Tomasz Napierala	411455a8fb	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
Alan Cox	f0edf3f806	Correct a spelling error.	2016-08-05 16:44:11 +00:00
Alan Cox	248fe642a7	Clean up the comments and code style in and around vm_pageout_cluster(). In particular, fix factual, grammatical, and spelling errors in various comments, and remove comments that are out of place in this function. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7410	2016-08-04 16:20:12 +00:00
Konstantin Belousov	0c657d22eb	Explain why swapgeom_close_ev() is delegated. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 07:11:19 +00:00
Alan Cox	87ff568c26	Restore the historical behavior of "sysctl vm.swap_idle_enabled=1". Prior to r254304, we had separate functions for reclamation and laundering (vm_pageout_scan) versus updating usage information, i.e., "reference bits", on active pages (vm_pageout_page_stats), and we only performed vm_req_vmdaemon(VM_SWAP_IDLE) if vm_pages_needed was true. However, since r254303, if vm_swap_idle_enabled was "1", we have performed vm_req_vmdaemon(VM_SWAP_IDLE) regardless of whether we are short of free pages. This was unintended and too aggressive, so I suspect no one uses this feature. With this change, we restore the historical behavior and only perform vm_req_vmdaemon(VM_SWAP_IDLE) when we are short of free pages. Reviewed by: kib, markj	2016-08-01 17:25:07 +00:00
Mark Johnston	897d0c6617	Use vm_page_undirty() instead of manually setting a page field. Reviewed by: alc MFC after: 3 days	2016-07-29 21:05:37 +00:00
Alan Cox	793172ea88	Remove a probe declaration that has been unused since r292469, when vm_pageout_grow_cache() was replaced. MFC after: 3 days	2016-07-29 16:43:51 +00:00
Alan Cox	f095d1bbc7	Remove any mention of cache (PG_CACHE) pages from the comments in vm_pageout_scan(). That function has not cached pages since r284376. MFC after: 3 days	2016-07-28 22:30:48 +00:00
Konstantin Belousov	88ad2d7b47	Do not delegate a work to geom event thread which can be done inline. In particular, swapongeom_ev() needed event thread context when swap pager configuration was performed under Giant and geom asserted that Giant is not owned. Now both of the reason went away. On the other hand, note that swpageom_release() is called from the bio_done context, and possible close cannot be performed inline. Also fix some minor issues. The swapgeom() function does not use the td argument, remove it. Recheck that the vnode passed is still VCHR and not reclaimed after the lock. Reviewed by: mav Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-28 15:57:01 +00:00
Konstantin Belousov	2174a0c607	Fix style and typo. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-28 15:49:51 +00:00
Mark Johnston	3ac8f842ea	De-pluralize "queues" where appropriate in the pagedaemon code. MFC after: 1 week	2016-07-27 17:11:03 +00:00
Alan Cox	a766ffd061	Update a comment to reflect r284376. MFC after: 3 days	2016-07-27 03:49:00 +00:00
Mark Johnston	44be0a8ea5	Correct a comment - each page queue has its own lock. Reviewed by: alc MFC after: 3 days	2016-07-23 21:03:25 +00:00
Mark Johnston	efe1ff4cf0	Update a comment in vm_page_advise() to match behaviour after r290529. Reviewed by: alc MFC after: 3 days	2016-07-23 21:02:36 +00:00
Alan Cox	8d67b8c863	Add a comment describing the 'fast path' that was introduced in r270011. Reviewed by: kib MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-07-20 17:20:22 +00:00
Mark Johnston	afa5d70339	Release the second critical section in uma_zfree_arg() slightly earlier. It is only needed when removing a full bucket from the per-CPU cache. The bucket cache (uz_buckets) is protected by the zone mutex and thus the critical section can be released before inserting into that list. MFC after: 1 week	2016-07-20 01:01:50 +00:00
Mark Johnston	20c58db95a	Make vm_pageout_wakeup_thresh a u_int rather than an int. It's a threshold for v_free_count, which is of type u_int. This also lets us get rid of a cast in vm_paging_needed(). Reviewed by: alc MFC after: 1 week	2016-07-20 00:09:22 +00:00
Alan Cox	0c3a489325	Break up vm_fault()'s implementation of the read-ahead and delete-behind optimizations into two distinct pieces. The first piece consists of the code that should only be performed once per page fault and requires the map to be locked. The second piece consists of the code that should be performed each time a pager is called on an object in the shadow chain. (This second piece expects the map to be unlocked.) Previously, the entire implementation could be executed multiple times. Moreover, the second and subsequent executions would occur with the map unlocked. Usually, the ensuing unsynchronized accesses to the map were harmless because the map was not changing. Nonetheless, it was possible for a use-after-free error to occur, where vm_fault() wrote to a freed map entry. This change corrects that problem. Reported by: avg Reviewed by: kib MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2016-07-18 04:20:26 +00:00
Konstantin Belousov	19efd8a5a8	In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called, if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate(). The vnode_destroy_object(), when calling into vm_object_terminate(), must be able to flush buffers. BO_DEAD purpose is to quickly destroy buffers on write when the underlying vnode is not operable any more (one example is the devfs node after geom is gone). Setting BO_DEAD for reclaiming vnode before object is terminated is premature, and results in unability to flush buffers with live SU dependencies from vinvalbuf() in vm_object_terminate(). Reported by: David Cross <dcrosstech@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-07-11 14:19:09 +00:00
Robert Watson	0df4264748	When mmap(2) is used with a vnode, capture vnode attributes in the audit trail. This was not required for Common Criteria auditing (which requires only that the intent to read or write be audited at the time of open(2)), but is useful for contemporary live analysis and forensics. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 11:49:10 +00:00
Robert Watson	51d1f69069	Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2). This auditing is not required by the Common Criteria (and hence was not being performed), but is valuable in both contemporary live analysis and forensic use cases. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-07-10 08:04:02 +00:00
Alan Cox	381b724280	Change the type of the map entry's next_read field from a vm_pindex_t to a vm_offset_t. (This field is used to detect sequential access to the virtual address range represented by the map entry.) There are three reasons to make this change. First, a vm_offset_t is smaller on 32-bit architectures. Consequently, a struct vm_map_entry is now smaller on 32-bit architectures. Second, a vm_offset_t can be written atomically, whereas it may not be possible to write a vm_pindex_t atomically on a 32-bit architecture. Third, using a vm_pindex_t makes the next_read field dependent on which object in the shadow chain is being read from. Replace an "XXX" comment. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division	2016-07-07 20:58:16 +00:00
Colin Percival	34caa842a4	Autotune the number of pages set aside for UMA startup based on the number of CPUs present. On amd64 this unbreaks the boot for systems with 92 or more CPUs; the limit will vary on other systems depending on the size of their uma_zone and uma_cache structures. The major consumer of pages during UMA startup is the 19 zone structures which are set up before UMA has bootstrapped itself sufficiently to use the rest of the available memory: UMA Slabs, UMA Hash, 4 / 6 / 8 / 12 / 16 / 32 / 64 / 128 / 256 Bucket, vmem btag, VM OBJECT, RADIX NODE, MAP, KMAP ENTRY, MAP ENTRY, VMSPACE, and fakepg. If the zone structures occupy more than one page, they will not share pages and the number of pages currently needed for startup is 19 * pages_per_zone + N, where N is the number of pages used for allocating other structures; on amd64 N = 3 at present (2 pages are allocated for UMA Kegs, and one page for UMA Hash). This patch adds a new definition UMA_BOOT_PAGES_ZONES, currently set to 32, and if a zone structure does not fit into a single page sets boot_pages to UMA_BOOT_PAGES_ZONES * pages_per_zone instead of UMA_BOOT_PAGES (which remains at 64). Consequently this patch has no effect on systems where the zone structure fits into 2 or fewer pages (on amd64, 59 or fewer CPUs), but increases boot_pages sufficiently on systems where the large number of CPUs makes this structure larger. It seems safe to assume that systems with 60+ CPUs can afford to set aside an additional 128kB of memory per 32 CPUs. The vm.boot_pages tunable continues to override this computation, but is unlikely to be necessary in the future. Tested on: EC2 x1.32xlarge Relnotes: FreeBSD can now boot on 92+ CPU systems without requiring vm.boot_pages to be manually adjusted. Reviewed by: jeff, alc, adrian Approved by: re (kib)	2016-07-07 18:37:12 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Konstantin Belousov	90880a1b29	Clarify the vnode_destroy_vobject() logic handling for already terminated objects. Assert that there is no new waiters for the already terminated objects. Old waiters should have been notified by the termination calling vnode_pager_dealloc() (old/new are with regard of the lock acquisition interval). Only clear the vp->v_object for the case of already terminated object, since other branches call vnode_pager_dealloc(), which should clear the pointer. Assert this. Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-07-05 11:21:02 +00:00
Konstantin Belousov	3f1c66b8d2	Change type of the 'dead' variable to boolean. Requested by: alc MFC after: 1 week Approved by: re (gjb)	2016-07-03 00:08:17 +00:00
Konstantin Belousov	725441f69b	If the vm_fault() handler raced with the vm_object_collapse() sleepable scan, iteration over the shadow chain looking for a page could find an OBJ_DEAD object. Such state of the mapping is only transient, the dead object will be terminated and removed from the chain shortly. We must not return KERN_PROTECTION_FAILURE unless the object type is changed to OBJT_DEAD in the chain, indicating that paging on this address is really impossible. Returning KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered to processes, or kernel accesses to UVA spuriously failing with EFAULT. If the object with OBJ_DEAD flag is found, only return KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD. Otherwise, sleep a tick and retry the fault handling. Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by waiting until the paging on this object is finished. But to do so, we need to reference the dead object, while vm_object_collapse() insists on owning the final reference on the collapsed object. This could be fixed by e.g. changing the assert to shared reference release between vm_fault() and vm_object_collapse(), but it seems to be too much complications for rare boundary condition. PR: 204426 Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation X-Differential revision: https://reviews.freebsd.org/D6085 MFC after: 2 weeks Approved by: re (gjb)	2016-06-27 21:54:19 +00:00
Konstantin Belousov	35e8002c58	In vm_page_xunbusy_maybelocked(), add fast path for unbusy when no waiters exist, same as for vm_page_xunbusy(). If previous value of busy_lock was VPB_SINGLE_EXCLUSIVER, no waiters existed and wakeup is not needed. Move common code from vm_page_xunbusy_maybelocked() and vm_page_xunbusy_hard() to vm_page_xunbusy_locked(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-23 08:28:13 +00:00
Konstantin Belousov	505cd5d13b	Add a comment noting locking regime for vm_page_xunbusy(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-23 08:27:38 +00:00
Konstantin Belousov	95e2409a33	Fix a LOR between vnode locks and allproc_lock. There is an order between covered vnode lock and allproc_lock, which is established by calling mountcheckdirs() while owning the covered vnode lock. mountcheckdirs() iterates over the processes, protected by allproc_lock. This order is needed and seems to be not avoidable. On the other hand, various VM daemons also need to iterate over all processes, and they lock and unlock user maps. Since unlock of the user map may trigger processing of the deferred map entries, it causes vnode locking to occur. Or, when vmspace is freed, dropping references on the vnode-backed object also lock vnodes. We get reverted order comparing with the mount/unmount order. For VM daemons, there is no need to own allproc_lock while we operate on vmspaces. If the process is held, it serves as the marker for allproc list, which allows to continue the iteration. Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in of the kernel stacks. It is used instead of _PHOLD() in vm code, since e.g. calling faultin() in OOM conditions only exaggerates the problem. Modernize comment describing PHOLD. Reported by: lists@yamagi.org Tested by: pho (previous version) Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 week Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6679	2016-06-22 20:15:37 +00:00
Konstantin Belousov	d3b9828d0d	The vmtotal sysctl handler marks active vm objects to calculate statistics. Marking is done by setting the OBJ_ACTIVE flag. The flags change is locked, but the problem is that many parts of system assume that vm object initialization ensures that no other code could change the object, and thus performed lockless. The end result is corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag, causing random hangs. Avoid the active object marking, instead provide equally inexact but immutable is_object_alive() definition for the object mapped state. Avoid iterating over the processes mappings altogether by using arguably improved definition of the paging thread as one which sleeps on the v_free_count. PR: 204764 Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2016-06-21 17:49:33 +00:00
Konstantin Belousov	eb4d6a1b3b	Fix inconsistent locking of the swap pager named objects list. Right now, all modifications of the list are locked by sw_alloc_mtx. But initial lookup of the object by the handle in swap_pager_alloc() is not protected by sw_alloc_mtx, which means that vm_pager_object_lookup() could follow freed pointer. Create a new named swap object with the OBJT_SWAP type, instead of OBJT_DEFAULT. With this change, swp_pager_meta_build() never need to upgrade named OBJT_DEFAULT to OBJT_SWAP (in the other place, we do not forbid for client code to create named OBJT_DEFAULT objects at all). That change allows to remove sw_alloc_mtx and make the list locked by sw_alloc_sx lock. Update swap_pager_copy() to new locking mode. Create helper swap_pager_alloc_init() to consolidate named and anonymous swap objects creation, while a caller ensures that the neccesary locks are held around the helper. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (hrs)	2016-06-13 03:42:46 +00:00
Konstantin Belousov	1571927369	Explicitely initialize sw_alloc_sx. Currently it is not initialized but works due to zeroed out bss on startup. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs)	2016-06-13 03:39:16 +00:00
Mark Johnston	0a1dc6e23c	Reset the page busy lock state after failing to insert into the object. Freeing a shared-busy page is not permitted. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6670	2016-06-02 17:11:24 +00:00
Mark Johnston	e705296958	Don't preserve the page's object linkage in vm_page_insert_after(). Per the KASSERT at the beginning of the function, we expect that the page does not belong to any object, so its object and pindex fields are meaningless. Reset them in the rare case that vm_radix_insert() fails. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6669	2016-06-02 16:58:47 +00:00
Mark Johnston	bc9d08e1cf	Fix memguard(9) in kernels with INVARIANTS enabled. With r284861, UMA zones use the trash ctor and dtor by default. This is incompatible with memguard, which frees the backing page when the item is freed. Modify the UMA debug functions to be no-ops if the item was allocated from memguard. This also fixes constructors such as mb_ctor_pack(), which invokes the trash ctor in addition to performing some initialization. Reviewed by: glebius MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D6562	2016-06-01 22:31:35 +00:00
Konstantin Belousov	e5f0191f20	If the fast path unbusy in vm_page_replace() fails, slow path needs to acquire the page lock, which recurses. Avoid the recursion by reusing the code from vm_page_remove() in a new helper vm_page_xunbusy_maybelocked(). Reviewed by: alc Sponsored by: The FreeBSD Foundation	2016-06-01 20:39:00 +00:00
Konstantin Belousov	9f790a1756	Do not leak the vm object lock when swap reservation failed, in vm_object_coalesce(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-29 15:46:19 +00:00
Alan Cox	56ce06907c	The flag "vm_pages_needed" has long served two distinct purposes: (1) to indicate that threads are waiting for free pages to become available and (2) to indicate whether a wakeup call has been sent to the page daemon. The trouble is that a single flag cannot really serve both purposes, because we have two distinct targets for when to wakeup threads waiting for free pages versus when the page daemon has completed its work. In particular, the flag will be cleared by vm_page_free() before the page daemon has met its target, and this can lead to the OOM killer being invoked prematurely. To address this problem, a new flag "vm_pageout_wanted" is introduced. Discussed with: jeff Reviewed by: kib, markj Tested by: markj Sponsored by: EMC / Isilon Storage Division	2016-05-27 19:15:45 +00:00
Alan Cox	bccdea450b	Use vm_page_replace_checked() instead of vm_page_rename() for implementing optimized copy-on-write faults. This has two advantages: (1) one less radix tree operation is performed and (2) vm_page_replace_checked() cannot fail, making the code simpler. Submitted by: Ryan Libby Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4478	2016-05-27 06:05:12 +00:00
Konstantin Belousov	aa9bc3b171	Prevent parallel object collapses. Both vm_object_collapse_scan() and swap_pager_copy() might unlock the object, which allows the parallel collapse to execute. Besides destroying the object, it also might move the reference from parent to the backing object, firing the assertion ref_count == 1. Collapses are prevented by bumping paging_in_progress counters on both the object and its backing object. Reported by: cem Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D6085	2016-05-26 16:59:29 +00:00
Konstantin Belousov	98f139daef	Style changes to some most outrageous violations in vm_object_collapse(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-26 16:51:38 +00:00
Konstantin Belousov	0e38422096	In vm_page_cache(), only drop the vnode after radix insert failure for empty page cache when the object type if OBJT_VNODE. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-24 19:20:30 +00:00
Konstantin Belousov	30a8a5f7a6	In vm_page_alloc_contig(), on vm_page_insert() failure, mark each freed page as VPO_UNMANAGED. Otherwise vm_pge_free_toq() insists on owning the page lock. Previously, VPO_UNMANAGED was only set up to the last processed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-24 10:21:39 +00:00

1 2 3 4 5 ...

3515 Commits