freebsd-skq

Author	SHA1	Message	Date
kib	a5a52555d0	Handle a race of collapse with a retrying fault. Both vm_object_scan_all_shadowed() and vm_object_collapse_scan() might observe an invalid page left in the default backing object by the fault handler that retried. Check for the condition and refuse to collapse. Reported and tested by: pho Reviewed by: jeff Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23331	2020-01-24 19:42:53 +00:00
dougm	c89a1dc405	Most uses of vm_map_clip_start follow a call to vm_map_lookup. Define an inline function vm_map_lookup_clip_start that invokes them both and use it in places that invoke both. Drop a couple of local variables made unnecessary by this function. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22987	2020-01-24 07:48:11 +00:00
markj	425bc748d2	vm_map_submap(): Avoid unnecessary clipping. A submap can only be created from an entry spanning the entire request range. In particular, if vm_map_lookup_entry() returns false or the returned entry contains "end". Since the only use of submaps in FreeBSD is for the static pipe and execve argument KVA maps, this has no functional effect. Github PR: https://github.com/freebsd/freebsd/pull/420 Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> (original) Reviewed by: dougm, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23299	2020-01-23 16:45:10 +00:00
jeff	e386d832f5	(fault 9/9) Move zero fill into a dedicated function to make the object lock state more clear. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23326	2020-01-23 05:23:37 +00:00
jeff	d14b7761cb	(fault 8/9) Restructure some code to reduce duplication and simplify flow control. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23321	2020-01-23 05:22:02 +00:00
jeff	c4d6a06afa	(fault 7/9) Move fault population and allocation into a dedicated function Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23320	2020-01-23 05:19:39 +00:00
jeff	397f6f8f80	(fault 6/9) Move getpages and associated logic into a dedicated function. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23311	2020-01-23 05:18:00 +00:00
jeff	d1678854d9	(fault 5/9) Move the backing_object traversal into a dedicated function. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23310	2020-01-23 05:14:41 +00:00
jeff	d785509538	(fault 4/9) Move copy-on-write into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23304	2020-01-23 05:11:01 +00:00
jeff	5af99f9fc4	(fault 3/9) Move map relookup into a dedicated function. Add a new VM return code KERN_RESTART which means, deallocate and restart in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23303	2020-01-23 05:07:01 +00:00
jeff	ca622b0816	(fault 2/9) Move map lookup into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23302	2020-01-23 05:05:39 +00:00
jeff	5b70c83fbf	(fault 1/9) Move a handful of stack variables into the faultstate. This additionally fixes a potential bug/pessimization where we could fail to reload the original fault_type on restart. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23301	2020-01-23 05:03:34 +00:00
rlibby	bf25d2f57f	uma: fix zone domain overlaying pcpu cache with disabled cpus UMA zone structures have two arrays at the end which are sized according to the machine: an array of CPU count length, and an array of NUMA domain count length. The CPU counting was wrong in the case where some CPUs are disabled (when mp_ncpus != mp_maxid + 1), and this caused the second array to be overlaid with the first. Reported by: olivier Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23318	2020-01-23 04:56:38 +00:00
rlibby	0a8301e160	uma: report leaks more accurately Previously UMA had some false negatives in the leak report at keg destruction time, where it only reported leaks if there were free items in the slab layer (rather than allocated items), which notably would not be true for single-item slabs (large items). Now, report a leak if there are any allocated pages, and calculate and report the number of allocated items rather than free items. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23275	2020-01-23 04:56:34 +00:00
jeff	9b2bbf0c4f	Consistently use busy and vm_page_valid() rather than touching page bits directly. This improves API compliance, asserts, etc. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23283	2020-01-23 04:54:49 +00:00
jeff	32b6328d9f	Some architectures with DMAP still consume boot kva. Simplify the test for claiming kva in uma_startup2() to handle this. Reported by: bdragon	2020-01-23 03:37:35 +00:00
jeff	eb0615e4ca	Move readahead and dropbehind fault functionality into a helper routine for clarity. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23282	2020-01-21 00:12:57 +00:00
jeff	b758a31759	Reduce object locking in vm_fault. Once we have an exclusively busied page we no longer need an object lock. This reduces the longest hold times and eliminates some trylock code blocks. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23034	2020-01-20 22:49:52 +00:00
jeff	977145ca92	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
jeff	58c1d63d1f	It has not been possible to recursively terminate a vnode object for some time now. Eliminate the dead code that supports it. Approved by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:36:03 +00:00
jeff	065795cf7c	Make collapse synchronization more explicit and allow it to complete during paging. Shadow objects are marked with a COLLAPSING flag while they are collapsing with their backing object. This gives us an explicit test rather than overloading paging-in-progress. While split is on-going we mark an object with SPLIT. These two operations will modify the swap tree so they must be serialized and swap_pager_getpages() can now directly detect these conditions and page more conservatively. Callers to vm_object_collapse() now will reliably wait for a collapse to finish so that the backing chain is as short as possible before other decisions are made that may inflate the object chain. For example, split, coalesce, etc. It is now safe to run fault concurrently with collapse. It is safe to increase or decrease paging in progress with no lock so long as there is another valid ref on increase. This change makes collapse more reliable as a secondary benefit. The primary benefit is making it safe to drop the object lock much earlier in fault or never acquire it at all. This was tested with a new shadow chain test script that uncovered long standing bugs and will be integrated with stress2. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:30:23 +00:00
gallatin	4677642250	pcpu_page_alloc: guard against empty NUMA domains Some systems, such as higher end Threadripper, may have NUMA domains with no physical memory, Don't allocate from these domains. This fixes a "panic: vm_wait in early boot" on my 2990WX desktop Reviewed by: jeff Sponsored by: Netflix	2020-01-18 18:25:37 +00:00
jeff	272010ae27	Fix a long standing bug that was made worse in r355765. When we are cowing a page that was previously mapped read-only it exists in pmap until pmap_enter() returns. However, we held no reference to the original page after the copy was complete. This allowed vm_object_scan_all_shadowed() to collapse an object that still had pages mapped. To resolve this, add another page pointer to the faultstate so we can keep the page xbusy until we're done with pmap_enter(). Handle busy pages in scan_all_shadowed. This is already done in vm_object_collapse_scan(). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23155	2020-01-17 03:44:04 +00:00
jeff	f650847e50	Simplify VM and UMA startup by eliminating boot pages. Instead use careful ordering to allocate early pages in the same way boot pages were but only as needed. After the KVA allocator has started up we allocate the KVA that we consumed during boot. This also makes the boot pages freeable since they have vm_page structures allocated with the rest of memory. Parts of this patch were written and tested by markj. Reviewed by: glebius, markj Differential Revision: https://reviews.freebsd.org/D23102	2020-01-16 05:01:21 +00:00
mav	46f2095782	Restore loop break in vm_pageout_lowmem(). r355004 removed return statement from this loop with intention to also call uma_reclaim_wakeup(). But in case of vm.lowmem_period=0 it causes infinite loop. Reviewed by: markj Sponsored by: iXsystems, Inc.	2020-01-14 03:27:57 +00:00
rlibby	db12eb3fcd	uma: split slabzone into two sizes By allowing more items per slab, we can improve memory efficiency for small allocs. If we were just to increase the bitmap size of the slabzone, we would then waste slabzone memory. So, split slabzone into two zones, one especially for 8-byte allocs (512 per slab). The practical effect should be reduced memory usage for counter(9). Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23149	2020-01-14 02:14:15 +00:00
rlibby	d3d56117a6	uma: fixup some ktr messages Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23148	2020-01-14 02:13:46 +00:00
mjg	cf0f029464	vm: add missing CLTFLAG_MPSAFE annotations This covers all vm/* files.	2020-01-12 05:08:57 +00:00
glebius	775d900fd6	Always multiple vm.pgcache_zone_max to number of CPUs, and rename it respectively. The tunable controls how big is the size of per-cpu vm page cache. Previously the value was split for all CPUs in system, so configuring same value on machines with different count of CPUs yielded in different cache size available to a particular CPU. Reviewed by: markj Obtained from: Netflix	2020-01-10 19:32:08 +00:00
markj	31586da764	UMA: Don't destroy zones after the system shutdown process starts. Some kernel subsystems, notably ZFS, will destroy UMA zones from a shutdown eventhandler. This causes the zone to be drained. For slabs that are mapped into KVA this can be very expensive and so it needlessly delays the shutdown process. Add a new state to the "booted" variable, BOOT_SHUTDOWN. Once kern_reboot() starts invoking shutdown handlers, turn uma_zdestroy() into a no-op, provided that the zone does not have a custom finalization routine. PR: 242427 Reviewed by: jeff, kib, rlibby MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23066	2020-01-09 19:17:42 +00:00
rlibby	43dec6c1d8	uma: unify layout paths and improve efficiency Unify the keg layout selection paths (keg_small_init, keg_large_init, keg_cachespread_init), and slightly improve memory efficiecy by: - using the padding of the final item to store the slab header, - not going OFFPAGE if we have a choice unless it improves efficiency. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23048	2020-01-09 02:03:17 +00:00
rlibby	e4a8d055e9	uma: reorganize flags - Garbage collect UMA_ZONE_PAGEABLE & UMA_ZONE_STATIC. - Move flag VTOSLAB from public to private. - Introduce public NOTPAGE flag and make HASH private. - Introduce public NOTOUCH flag and make OFFPAGE private. - Update man page. The net effect of this should be to make the contract with clients more clear. Clients should choose constraints, UMA will figure out how to implement them. This also breaks the confusing double meaning of OFFPAGE. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23016	2020-01-09 02:03:03 +00:00
jeff	2d206baae6	Fix uma boot pages calculations on NUMA machines that also don't have MD_UMA_SMALL_ALLOC. This is unusual but not impossible. Fix the alignemnt of zones while here. This was already correct because uz_cpu strongly aligned the zone structure but the specified alignment did not match reality and involved redundant defines. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23046	2020-01-06 02:51:19 +00:00
jeff	4281552371	The fix in r356353 was insufficient. Not every architecture returns 0 for EARLY_COUNTER. Only amd64 seems to. Suggested by: markj Reported by: lwhsu Reviewed by: markj PR: 243117	2020-01-05 22:54:25 +00:00
kevans	9c1edae0ac	kern_mmap: restore character deleted in transit Pointy hat to: kevans X-MFC-With: r356359	2020-01-04 23:51:44 +00:00
kevans	a004277319	kern_mmap: add a variant that allows caller to inspect fp Linux mmap rejects mmap() on a write-only file with EACCES. linux_mmap_common currently does a fun dance to grab the fp associated with the passed in fd, validates it, then drops the reference and calls into kern_mmap(). Doing so is perhaps both fragile and premature; there's still plenty of chance for the request to get rejected with a more appropriate error, and it's prone to a race where the file we ultimately mmap has changed after it drops its referenced. This change alleviates the need to do this by providing a kern_mmap variant that allows the caller to inspect the fp just before calling into the fileop layer. The callback takes flags, prot, and maxprot as one could imagine scenarios where any of these, in conjunction with the file itself, may influence a caller's decision. The file type check in the linux compat layer has been removed; EINVAL is seemingly not an appropriate response to the file not being a vnode or device. The fileop layer will reject the operation with ENODEV if it's not supported, which more closely matches the common linux description of mmap(2) return values. If we discover that we're allowing an mmap() on a file type that Linux normally wouldn't, we should restrict those explicitly. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22977	2020-01-04 23:39:58 +00:00
jeff	4751bdaa18	Fix an assertion introduced in r356348. On architectures without UMA_MD_SMALL_ALLOC vmem has a more complicated startup sequence that violated the new assert. Resolve this by rewriting the COLD asserts to look at the per-cpu allocation counts for evidence of api activity. Discussed with: rlibby Reviewed by: markj Reported by: lwhsu	2020-01-04 19:29:25 +00:00
jeff	93bdc5ab4e	UMA NUMA flag day. UMA_ZONE_NUMA was a source of confusion. Make the names more consistent with other NUMA features as UMA_ZONE_FIRSTTOUCH and UMA_ZONE_ROUNDROBIN. The system will now pick a select a default depending on kernel configuration. API users need only specify one if they want to override the default. Remove the UMA_XDOMAIN and UMA_FIRSTTOUCH kernel options and key only off of NUMA. XDOMAIN is now fast enough in all cases to enable whenever NUMA is. Reviewed by: markj Discussed with: rlibby Differential Revision: https://reviews.freebsd.org/D22831	2020-01-04 18:48:13 +00:00
jeff	40d7ad3dd3	Sort cross-domain frees into per-domain buckets before inserting these onto their respective bucket lists. This is a several order of magnitude improvement in contention on the keg lock under heavy free traffic while requiring only an additional bucket per-domain worth of memory. Discussed with: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22830	2020-01-04 07:56:28 +00:00
jeff	5c24f16c23	Use per-domain keg locks. This provides both a lock and separate space accounting for each NUMA domain. Independent keg domain locks are important with cross-domain frees. Hashed zones are non-numa and use a single keg lock to protect the hash table. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D22829	2020-01-04 03:30:08 +00:00
jeff	61b7947203	Use a separate lock for the zone and keg. This provides concurrency between populating buckets from the slab layer and fetching full buckets from the zone layer. Eliminate some nonsense locking patterns where we lock to fetch a single variable. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D22828	2020-01-04 03:15:34 +00:00
jeff	44b19cf506	Use atomics for the zone limit and sleeper count. This relies on the sleepq to serialize sleepers. This patch retains the existing sleep/wakeup paradigm to limit 'thundering herd' wakeups. It resolves a missing wakeup in one case but otherwise should be bug for bug compatible. In particular, there are still various races surrounding adjusting the limit via sysctl that are now documented. Discussed with: markj Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D22827	2020-01-04 03:04:46 +00:00
mjg	f121d45000	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
markj	0dcafeb3c7	Clear queue operation flags when migrating a page to another queue. The page daemon loops may move pages back to the active queue if references are detected. In this case we must take care to clear existing queue operation flags. In particular, PGA_REQUEUE_HEAD may be set, and that flag is only valid if the page belongs to the inactive queue. Also fix a bug in the active queue scan where we were updating "old" instead of "new". This would only have been hit in rare cases where the page moved out of the active queue after the beginning of the scan. Reported by: Bob Prohaska, Idwer Vollering Tested by: Idwer Vollering Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D23001	2020-01-02 19:26:04 +00:00
dougm	a26b60ac01	The map-entry clipping functions modify start and end entries of an entry in the vm_map, making invariants related to the max_free entry field invalid. Move the clipping work into vm_map_entry_link, so that linking is okay when the new entry clips a current entry, and the vm_map doesn't have to be briefly corrupted. Change assertions and conditions in SPLAY_{LEFT,RIGHT}_STEP since the max_free invariants can now be trusted in all cases. Tested by: pho Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D22897	2019-12-31 22:20:54 +00:00
markj	3bf92db198	Restore a vm_page_wired() check in vm_page_mvqueue() after r356156. We now set PGA_DEQUEUE on a managed page when it is wired after allocation, and vm_page_mvqueue() ignores pages with this flag set, ensuring that they do not end up in the page queues. However, this is not sufficient for managed fictitious pages or pages managed by the TTM. In particular, the TTM makes use of the plinks.q queue linkage fields for its own purposes. PR: 242961 Reported and tested by: Greg V <greg@unrelenting.technology>	2019-12-29 20:01:03 +00:00
markj	e672242af0	Clear queue op flags in vm_page_mvqueue(). This fixes a regression in r356155, introduced at the last minute. In particular, we must clear PGA_REQUEUE_HEAD before inserting into any queue besides PQ_INACTIVE since that operation is implemented only for PQ_INACTIVE. Reported by: pho, Jenkins via lwhsu	2019-12-29 15:39:43 +00:00
markj	8d0591ce81	Remove some unused functions. The previous series of patches orphaned some vm_page functions, so remove them. Reviewed by: dougm, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22886	2019-12-28 19:04:29 +00:00
markj	66e708f0fb	Update the vm_page.h block comment to reflect recent changes. Explain the new locking rules for per-page queue state updates. Reviewed by: jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22884	2019-12-28 19:04:15 +00:00
markj	09e22f0663	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00

1 2 3 4 5 ...

4519 Commits