freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	0e71f4f77c	vm: add unlocked page lookup before trying vm_fault_soft_fast Shaves a read lock + tryupgrade trip most of the time. Stats from doing a kernel build (counters not present in the tree): vm.fault_soft_fast_ok: 262653 vm.fault_soft_fast_failed_other: 41 vm.fault_soft_fast_failed_no_page: 39595772 vm.fault_soft_fast_failed_page_busy: 1929 vm.fault_soft_fast_failed_page_invalid: 22183 Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D39268	2023-03-25 22:14:59 +00:00
Mateusz Guzik	0a310c94ee	vm: consistently prefix fault helpers with vm_fault_ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D39029	2023-03-13 11:00:28 +00:00
Mateusz Guzik	3c3a434f8e	vm: avoid lock upgrade if possible in vm_fault_next In my tests during buildkernel fs->m was always NULL at that stage. Note the change has no impact on vm obj contention during said workload. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D39027	2023-03-11 21:52:01 +00:00
Mateusz Guzik	fdb1dbb1cc	vm: read-locked fault handling for backing objects This is almost the simplest patch which manages to avoid write locking for backing objects, as a result mostly fixing vm object contention problems. What is not fixed: 1. cacheline ping pong due to read-locks 2. cacheline ping pong due to pip 3. cacheling ping pong due to object busying 4. write locking on first object On top of it the use of VM_OBJECT_UNLOCK instead of explicitly tracking the state is slower multithreaded that it needs to be, done for simplicity for the time being. Sample lock profiling results doing -j 104 buildkernel on tmpfs: before: 71446200 (rw:vmobject) 14689706 (sx:vm map (user)) 4166251 (rw:pmap pv list) 2799924 (spin mutex:turnstile chain) after: 19940411 (rw:vmobject) 8166012 (rw:pmap pv list) 6017608 (sx:vm map (user)) 1151416 (sleep mutex:pipe mutex) Reviewed by: kib Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D38964	2023-03-11 11:08:21 +00:00
Mateusz Guzik	73b951cd39	vm: move up object lock asserts in fault functions No functional changes. Reviewed by: kib Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D38964	2023-03-11 11:08:21 +00:00
Mark Johnston	e08302f649	vm_fault: Update a comment to reflect the removal of the default pager Fixes: `5d32157d4e` ("vm_object: Modify vm_object_allocate_anon() to return OBJT_SWAP objects") Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D38985	2023-03-09 11:15:49 -05:00
Mark Johnston	d099194818	vm_fault: Fix a race in vm_fault_soft_fast() When vm_fault_soft_fast() creates a mapping, it release the VM map lock before unbusying the top-level object. Without the map lock, however, nothing prevents the VM object from being deallocated while still busy. Fix the problem by unbusying the object before releasing the VM map lock. If vm_fault_soft_fast() fails to create a mapping, the VM map lock is not released, so those cases don't need to change. Reported by: syzkaller Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D38527	2023-02-13 16:35:47 -05:00
Konstantin Belousov	ec201dddfb	vm_pager: add method to veto page allocation Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37097	2022-12-09 14:15:37 +02:00
Mark Johnston	5c50e900ad	vm_fault: Shoot down shared mappings in vm_fault_copy_entry() As in vm_fault_cow(), it's possible, albeit rare, for multiple vm_maps to share a shadow object. When copying a page from a backing object into the shadow, all mappings of the source page must therefore be removed. Otherwise, future operations on the object tree may detect that the source page is fully shadowed and thus can be freed. Approved by: so Security: FreeBSD-SA-22:11.vm Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35635	2022-08-09 15:44:45 -04:00
Mark Johnston	0cb2610ee2	vm: Remove handling for OBJT_DEFAULT objects Now that OBJT_DEFAULT objects can't be instantiated, we can simplify checks of the form object->type == OBJT_DEFAULT \|\| (object->flags & OBJ_SWAP) != 0. No functional change intended. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35788	2022-07-17 07:09:48 -04:00
Mark Johnston	5d32157d4e	vm_object: Modify vm_object_allocate_anon() to return OBJT_SWAP objects With this change, OBJT_DEFAULT objects are no longer allocated. Instead, anonymous objects are always of type OBJT_SWAP and always have OBJ_SWAP set. Modify the page fault handler to check the swap block radix tree in places where it checked for objects of type OBJT_DEFAULT. In particular, there's no need to invoke getpages for an OBJT_SWAP object with no swap blocks assigned. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35785	2022-07-17 07:09:48 -04:00
Mark Johnston	b57be759d0	vm_fault: Fix some nits in vm_fault_copy_entry() - Correct the description (vm_fault_copy_entry() does not create a shadow object). - Move some initialization and assertions out of the scope of the object locks, when doing so makes sense. - Merge a pair of conditional blocks. - Use __unused when appropriate. No functional change intended. Reviewed by: alc MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-07-11 15:58:42 -04:00
Mark Johnston	1f88394b7f	vm_fault: Avoid unnecessary object relocking in vm_fault_copy_entry() Suggested by: alc Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35485	2022-06-14 18:19:07 -04:00
Mark Johnston	d0443e2b98	vm_fault: Fix a racy copy of page valid bits We do not hold the object lock or a page busy lock when copying src_m's validity state. Prior to commit `45d72c7d7f` we marked dst_m as fully valid. Use the source object's read lock to ensure that valid bits are not concurrently cleared. Reviewed by: alc, kib Fixes: `45d72c7d7f` ("vm_fault_copy_entry: accept invalid source pages.") MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35471	2022-06-14 18:18:09 -04:00
John Baldwin	40cbcb996c	vm_fault_dontneed: Inline value of variable used once in an assertion.	2022-04-13 16:08:19 -07:00
Peter Jeremy	9a89977bf6	kern: Fix typo in kassert message. - s/unepxected/unexpected/ MFC after: 3 days	2022-04-02 21:36:17 +11:00
Mark Johnston	88642d978a	vm_fault: Fix vm_fault_populate()'s handling of VM_FAULT_WIRE vm_map_wire() works by calling vm_fault(VM_FAULT_WIRE) on each page in the rage. (For largepage mappings, it calls vm_fault() once per large page.) A pager's populate method may return more than one page to be mapped. If VM_FAULT_WIRE is also specified, we'd wire each page in the run, not just the fault page. Consider an object with two pages mapped in a vm_map_entry, and suppose vm_map_wire() is called on the entry. Then, the first vm_fault() would allocate and wire both pages, and the second would encounter a valid page upon lookup and wire it again in the regular fault handler. So the second page is wired twice and will be leaked when the object is destroyed. Fix the problem by modify vm_fault_populate() to wire only the fault page. Also modify the error handler for pmap_enter(psind=1) to not test fs->wired, since it must be false. PR: 260347 Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33416	2021-12-14 15:10:46 -05:00
Mark Johnston	d47d3a94bb	vm_fault: Factor out per-object operations into vm_fault_object() No functional change intended. Obtained from: jeff (object_concurrency patches) Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33018	2021-11-24 14:02:56 -05:00
Mark Johnston	f1b642c255	vm_fault: Introduce a fault_status enum for internal return types Rather than overloading the meanings of the Mach statuses, introduce a new set for use internally in the fault code. This makes the control flow easier to follow and provides some extra error checking when a fault status variable is used in a switch statement. vm_fault_lookup() and vm_fault_relookup() continue to use Mach statuses for now, as there isn't much benefit to converting them and they effectively pass through a status from vm_map_lookup(). Obtained from: jeff (object_concurrency patches) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33017	2021-11-24 14:02:55 -05:00
Mark Johnston	45c09a74d6	vm_fault: Move nera into faultstate This makes it easier to factor out pieces of vm_fault(). No functional change intended. Obtained from: jeff (object_concurrency patches) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33016	2021-11-24 14:02:55 -05:00
Mark Johnston	87b646630c	vm_page: Consolidate page busy sleep mechanisms - Modify vm_page_busy_sleep() and vm_page_busy_sleep_unlocked() to take a VM_ALLOC_* flag indicating whether to sleep on shared-busy, and fix up callers. - Modify vm_page_busy_sleep() to return a status indicating whether the object lock was dropped, and fix up callers. - Convert callers of vm_page_sleep_if_busy() to use vm_page_busy_sleep() instead. - Remove vm_page_sleep_if_(x)busy(). No functional change intended. Obtained from: jeff (object_concurrency patches) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D32947	2021-11-15 13:01:30 -05:00
Mark Johnston	b801c79dda	vm_fault: Stop specifying VM_ALLOC_ZERO Now vm_page_alloc() and friends will unconditionally preserve PG_ZERO, so there is no point in setting this flag. Eliminate a local variable and add a comment explaining why we prioritize the allocation when the process is doomed. No functional change intended. Reviewed by: kib, alc Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32036	2021-10-19 21:22:56 -04:00
Konstantin Belousov	174aad047e	vm_fault: do not trigger OOM too early Wakeup in vm_waitpfault() does not mean that the thread would get the page on the next vm_page_alloc() call, other thread might steal the free page we were waiting for. On the other hand, this wakeup might come much earlier than just vm_pfault_oom_wait seconds, if the rate of the page reclamation is high enough. If wakeups come fast and we loose the allocation race enough times, OOM could be undeservably triggered much earlier than vm_pfault_oom_attempts x vm_pfault_oom_wait seconds. Fix it by not counting the number of sleeps, but measuring the time to th first allocation failure, and triggering OOM when it was older than oom_attempts x oom_wait seconds. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D32287	2021-10-08 12:24:46 +03:00
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Mark Johnston	982693bb72	vm_fault: Shoot down multiply mapped COW source page mappings Reviewed by: kib, rlibby Discussed with: alc Approved by: so Security: CVE-2021-29626 Security: FreeBSD-SA-21:08.vm	2021-04-06 14:49:28 -04:00
Jason A. Harmening	8dc8feb53d	Clean up a couple of MD warts in vm_fault_populate(): --Eliminate a big ifdef that encompassed all currently-supported architectures except mips and powerpc32. This applied to the case in which we've allocated a superpage but the pager-populated range is insufficient for a superpage mapping. For platforms that don't support superpages the check should be inexpensive as we shouldn't get a superpage in the first place. Make the normal-page fallback logic identical for all platforms and provide a simple implementation of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc. --Apply the logic for handling pmap_enter() failure if a superpage mapping can't be supported due to additional protection policy. Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case, and note Intel PKU on amd64 as the first example of such protection policy. Reviewed by: kib, markj, bdragon Differential Revision: https://reviews.freebsd.org/D29439	2021-03-30 18:15:55 -07:00
Konstantin Belousov	c7b913aa47	vm_fault: handle KERN_PROTECTION_FAILURE pmap_enter(PMAP_ENTER_LARGEPAGE) may return KERN_PROTECTION_FAILURE due to PKRU inconsistency. Handle it in the call place from vm_fault_populate(), and in places which decode errors from vm_fault_populate()/ vm_fault_allocate(). Reviewed by: jah, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29442	2021-03-27 20:16:27 +02:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Leandro Lupori	e2d6c417e3	Implement superpages for PowerPC64 (HPT) This change adds support for transparent superpages for PowerPC64 systems using Hashed Page Tables (HPT). All pmap operations are supported. The changes were inspired by RISC-V implementation of superpages, by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture and existing MMU OEA64 code. While these changes are not better tested, superpages support is disabled by default. To enable it, use vm.pmap.superpages_enabled=1. In this initial implementation, when superpages are disabled, system performance stays at the same level as without these changes. When superpages are enabled, buildworld time increases a bit (~2%). However, for workloads that put a heavy pressure on the TLB the performance boost is much bigger (see HPC Challenge and pgbench on D25237). Reviewed by: jhibbits Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D25237	2020-11-06 14:12:45 +00:00
Mark Johnston	f31695cc64	Implement sparse core dumps Currently we allocate and map zero-filled anonymous pages when dumping core. This can result in lots of needless disk I/O and page allocations. This change tries to make the core dumper more clever and represent unbacked ranges of virtual memory by holes in the core dump file. Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to clean up and return an error when it would otherwise map a zero-filled page. Then, in the core dumper code, prefault all user pages and handle errors by simply extending the size of the core file. This also fixes a bug related to the fact that vn_io_fault1() does not attempt partial I/O in the face of errors from vm_fault_quick_hold_pages(): if a truncated file is mapped into a user process, an attempt to dump beyond the end of the file results in an error, but this means that valid pages immediately preceding the end of the file might not have been dumped either. The change reduces the core dump size of trivial programs by a factor of ten simply by excluding unaccessed libc.so pages. PR: 249067 Reviewed by: kib Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26590	2020-10-02 17:50:22 +00:00
Mark Johnston	78257765f2	Add a vmparam.h constant indicating pmap support for large pages. Enable SHM_LARGEPAGE support on arm64. Reviewed by: alc, kib Sponsored by: Juniper Networks, Inc., Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26467	2020-09-23 19:34:21 +00:00
Konstantin Belousov	d301b3580f	Support for userspace non-transparent superpages (largepages). Created with shm_open2(SHM_LARGEPAGE) and then configured with FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee that all userspace mappings of it are served by superpage non-managed mappings. Only amd64 for now, both 2M and 1G superpages can be requested, the later requires CPU feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652	2020-09-09 22:12:51 +00:00
Mateusz Guzik	c3aa3bf97c	vm: clean up empty lines in .c and .h files	2020-09-01 21:20:45 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mark Johnston	0f1e6ec591	Add a helper function for validating VA ranges. Functions which take untrusted user ranges must validate against the bounds of the map, and also check for wraparound. Instead of having the same logic duplicated in a number of places, add a function to check. Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25328	2020-06-19 03:32:04 +00:00
Konstantin Belousov	fe0dcc402f	Simplify the condition to enable superpage mappings in vm_fault_soft_fast(). The list of arches list there matches the list of arches where default VM_NRESERVLEVEL > 0. Before sparc64 removal, that was the only arch that defined VM_NRESERVLEVEL > 0 to help with cache coloring, but did not implemented superpages. Now it can be simplified. Submitted by: alc Reviewed by: markj	2020-05-27 21:44:26 +00:00
Justin Hibbits	d4ed51f329	Properly sort ifdef archs in vm_fault_soft_fast superpage guards. Sort broken in r360887.	2020-05-27 01:35:46 +00:00
Justin Hibbits	65bbba25d2	powerpc64: Implement Radix MMU for POWER9 CPUs Summary: POWER9 supports two MMU formats: traditional hashed page tables, and Radix page tables, similar to what's presesnt on most other architectures. The PowerISA also specifies a process table -- a table of page table pointers-- which on the POWER9 is only available with the Radix MMU, so we can take advantage of it with the Radix MMU driver. Written by Matt Macy. Differential Revision: https://reviews.freebsd.org/D19516	2020-05-11 02:33:37 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Konstantin Belousov	b70f6e1513	Restore OOM logic on page fault after r357026. Right now OOM is initiated unconditionally on the page allocation failure, after the wait. Reported by: Mark Millard <marklmi@yahoo.com> Reviewed by: cy, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23409	2020-01-29 12:02:47 +00:00
Jeff Roberson	fb4d37eac1	(fault 9/9) Move zero fill into a dedicated function to make the object lock state more clear. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23326	2020-01-23 05:23:37 +00:00
Jeff Roberson	be9d4fd6b4	(fault 8/9) Restructure some code to reduce duplication and simplify flow control. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23321	2020-01-23 05:22:02 +00:00
Jeff Roberson	df794f5caf	(fault 7/9) Move fault population and allocation into a dedicated function Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23320	2020-01-23 05:19:39 +00:00
Jeff Roberson	5909dafea9	(fault 6/9) Move getpages and associated logic into a dedicated function. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23311	2020-01-23 05:18:00 +00:00
Jeff Roberson	91eb2e908f	(fault 5/9) Move the backing_object traversal into a dedicated function. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23310	2020-01-23 05:14:41 +00:00
Jeff Roberson	5936b6a8f1	(fault 4/9) Move copy-on-write into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23304	2020-01-23 05:11:01 +00:00
Jeff Roberson	fcb0475833	(fault 3/9) Move map relookup into a dedicated function. Add a new VM return code KERN_RESTART which means, deallocate and restart in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23303	2020-01-23 05:07:01 +00:00
Jeff Roberson	c308a3a6c9	(fault 2/9) Move map lookup into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23302	2020-01-23 05:05:39 +00:00
Jeff Roberson	2c2f4413cc	(fault 1/9) Move a handful of stack variables into the faultstate. This additionally fixes a potential bug/pessimization where we could fail to reload the original fault_type on restart. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23301	2020-01-23 05:03:34 +00:00
Jeff Roberson	5949b1ca8c	Move readahead and dropbehind fault functionality into a helper routine for clarity. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23282	2020-01-21 00:12:57 +00:00

1 2 3 4 5 ...

495 Commits