freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Mark Johnston	982693bb72	vm_fault: Shoot down multiply mapped COW source page mappings Reviewed by: kib, rlibby Discussed with: alc Approved by: so Security: CVE-2021-29626 Security: FreeBSD-SA-21:08.vm	2021-04-06 14:49:28 -04:00
Jason A. Harmening	8dc8feb53d	Clean up a couple of MD warts in vm_fault_populate(): --Eliminate a big ifdef that encompassed all currently-supported architectures except mips and powerpc32. This applied to the case in which we've allocated a superpage but the pager-populated range is insufficient for a superpage mapping. For platforms that don't support superpages the check should be inexpensive as we shouldn't get a superpage in the first place. Make the normal-page fallback logic identical for all platforms and provide a simple implementation of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc. --Apply the logic for handling pmap_enter() failure if a superpage mapping can't be supported due to additional protection policy. Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case, and note Intel PKU on amd64 as the first example of such protection policy. Reviewed by: kib, markj, bdragon Differential Revision: https://reviews.freebsd.org/D29439	2021-03-30 18:15:55 -07:00
Konstantin Belousov	c7b913aa47	vm_fault: handle KERN_PROTECTION_FAILURE pmap_enter(PMAP_ENTER_LARGEPAGE) may return KERN_PROTECTION_FAILURE due to PKRU inconsistency. Handle it in the call place from vm_fault_populate(), and in places which decode errors from vm_fault_populate()/ vm_fault_allocate(). Reviewed by: jah, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29442	2021-03-27 20:16:27 +02:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Leandro Lupori	e2d6c417e3	Implement superpages for PowerPC64 (HPT) This change adds support for transparent superpages for PowerPC64 systems using Hashed Page Tables (HPT). All pmap operations are supported. The changes were inspired by RISC-V implementation of superpages, by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture and existing MMU OEA64 code. While these changes are not better tested, superpages support is disabled by default. To enable it, use vm.pmap.superpages_enabled=1. In this initial implementation, when superpages are disabled, system performance stays at the same level as without these changes. When superpages are enabled, buildworld time increases a bit (~2%). However, for workloads that put a heavy pressure on the TLB the performance boost is much bigger (see HPC Challenge and pgbench on D25237). Reviewed by: jhibbits Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D25237	2020-11-06 14:12:45 +00:00
Mark Johnston	f31695cc64	Implement sparse core dumps Currently we allocate and map zero-filled anonymous pages when dumping core. This can result in lots of needless disk I/O and page allocations. This change tries to make the core dumper more clever and represent unbacked ranges of virtual memory by holes in the core dump file. Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to clean up and return an error when it would otherwise map a zero-filled page. Then, in the core dumper code, prefault all user pages and handle errors by simply extending the size of the core file. This also fixes a bug related to the fact that vn_io_fault1() does not attempt partial I/O in the face of errors from vm_fault_quick_hold_pages(): if a truncated file is mapped into a user process, an attempt to dump beyond the end of the file results in an error, but this means that valid pages immediately preceding the end of the file might not have been dumped either. The change reduces the core dump size of trivial programs by a factor of ten simply by excluding unaccessed libc.so pages. PR: 249067 Reviewed by: kib Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26590	2020-10-02 17:50:22 +00:00
Mark Johnston	78257765f2	Add a vmparam.h constant indicating pmap support for large pages. Enable SHM_LARGEPAGE support on arm64. Reviewed by: alc, kib Sponsored by: Juniper Networks, Inc., Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26467	2020-09-23 19:34:21 +00:00
Konstantin Belousov	d301b3580f	Support for userspace non-transparent superpages (largepages). Created with shm_open2(SHM_LARGEPAGE) and then configured with FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee that all userspace mappings of it are served by superpage non-managed mappings. Only amd64 for now, both 2M and 1G superpages can be requested, the later requires CPU feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652	2020-09-09 22:12:51 +00:00
Mateusz Guzik	c3aa3bf97c	vm: clean up empty lines in .c and .h files	2020-09-01 21:20:45 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mark Johnston	0f1e6ec591	Add a helper function for validating VA ranges. Functions which take untrusted user ranges must validate against the bounds of the map, and also check for wraparound. Instead of having the same logic duplicated in a number of places, add a function to check. Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25328	2020-06-19 03:32:04 +00:00
Konstantin Belousov	fe0dcc402f	Simplify the condition to enable superpage mappings in vm_fault_soft_fast(). The list of arches list there matches the list of arches where default VM_NRESERVLEVEL > 0. Before sparc64 removal, that was the only arch that defined VM_NRESERVLEVEL > 0 to help with cache coloring, but did not implemented superpages. Now it can be simplified. Submitted by: alc Reviewed by: markj	2020-05-27 21:44:26 +00:00
Justin Hibbits	d4ed51f329	Properly sort ifdef archs in vm_fault_soft_fast superpage guards. Sort broken in r360887.	2020-05-27 01:35:46 +00:00
Justin Hibbits	65bbba25d2	powerpc64: Implement Radix MMU for POWER9 CPUs Summary: POWER9 supports two MMU formats: traditional hashed page tables, and Radix page tables, similar to what's presesnt on most other architectures. The PowerISA also specifies a process table -- a table of page table pointers-- which on the POWER9 is only available with the Radix MMU, so we can take advantage of it with the Radix MMU driver. Written by Matt Macy. Differential Revision: https://reviews.freebsd.org/D19516	2020-05-11 02:33:37 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Konstantin Belousov	b70f6e1513	Restore OOM logic on page fault after r357026. Right now OOM is initiated unconditionally on the page allocation failure, after the wait. Reported by: Mark Millard <marklmi@yahoo.com> Reviewed by: cy, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23409	2020-01-29 12:02:47 +00:00
Jeff Roberson	fb4d37eac1	(fault 9/9) Move zero fill into a dedicated function to make the object lock state more clear. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23326	2020-01-23 05:23:37 +00:00
Jeff Roberson	be9d4fd6b4	(fault 8/9) Restructure some code to reduce duplication and simplify flow control. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23321	2020-01-23 05:22:02 +00:00
Jeff Roberson	df794f5caf	(fault 7/9) Move fault population and allocation into a dedicated function Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23320	2020-01-23 05:19:39 +00:00
Jeff Roberson	5909dafea9	(fault 6/9) Move getpages and associated logic into a dedicated function. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23311	2020-01-23 05:18:00 +00:00
Jeff Roberson	91eb2e908f	(fault 5/9) Move the backing_object traversal into a dedicated function. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23310	2020-01-23 05:14:41 +00:00
Jeff Roberson	5936b6a8f1	(fault 4/9) Move copy-on-write into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23304	2020-01-23 05:11:01 +00:00
Jeff Roberson	fcb0475833	(fault 3/9) Move map relookup into a dedicated function. Add a new VM return code KERN_RESTART which means, deallocate and restart in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23303	2020-01-23 05:07:01 +00:00
Jeff Roberson	c308a3a6c9	(fault 2/9) Move map lookup into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23302	2020-01-23 05:05:39 +00:00
Jeff Roberson	2c2f4413cc	(fault 1/9) Move a handful of stack variables into the faultstate. This additionally fixes a potential bug/pessimization where we could fail to reload the original fault_type on restart. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23301	2020-01-23 05:03:34 +00:00
Jeff Roberson	5949b1ca8c	Move readahead and dropbehind fault functionality into a helper routine for clarity. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23282	2020-01-21 00:12:57 +00:00
Jeff Roberson	1e40fe41c5	Reduce object locking in vm_fault. Once we have an exclusively busied page we no longer need an object lock. This reduces the longest hold times and eliminates some trylock code blocks. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23034	2020-01-20 22:49:52 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Jeff Roberson	5844774900	Fix a long standing bug that was made worse in r355765. When we are cowing a page that was previously mapped read-only it exists in pmap until pmap_enter() returns. However, we held no reference to the original page after the copy was complete. This allowed vm_object_scan_all_shadowed() to collapse an object that still had pages mapped. To resolve this, add another page pointer to the faultstate so we can keep the page xbusy until we're done with pmap_enter(). Handle busy pages in scan_all_shadowed. This is already done in vm_object_collapse_scan(). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23155	2020-01-17 03:44:04 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Jeff Roberson	7e1b379e1e	Don't unnecessarily relock the vm object after sleeps. This results in a surprising amount of object contention on loop restarts in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22821	2019-12-24 18:38:06 +00:00
Jeff Roberson	419f0b1f95	Fix a bug introduced in r356002. Prior versions of this patchset had vm_page_remove() rather than !vm_page_wired() as the condition for free. When this changed back to wired the busy lock was leaked. Reported by: pho Reviewed by: markj	2019-12-22 20:35:50 +00:00
Jeff Roberson	3cf3b4e641	Make page busy state deterministic on free. Pages must be xbusy when removed from objects including calls to free. Pages must not be xbusy when freed and not on an object. Strengthen assertions to match these expectations. In practice very little code had to change busy handling to meet these rules but we can now make stronger guarantees to busy holders and avoid conditionally dropping busy in free. Refine vm_page_remove() and vm_page_replace() semantics now that we have stronger guarantees about busy state. This removes redundant and potentially problematic code that has proliferated. Discussed with: markj Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22822	2019-12-22 06:56:44 +00:00
Jeff Roberson	bef91632da	Move vm_fault busy logic into its own function for clarity and re-use by later changes. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22820	2019-12-22 04:21:16 +00:00
Jeff Roberson	4bf95d00ce	Previously we did not support invalid pages in default objects. This means that if fault fails to progress and needs to restart the loop it must free the page it is working on and allocate again on restart. Resolve the few places that need to be modified to support this condition and simply deactivate the page. Presently, we only permit this when fault restarts for busy contention. This has an added benefit of removing some object trylocking in this case. While here consolidate some page cleanup logic into fault_page_free() and fault_page_release() to reduce redundant code and automate some teardown. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22653	2019-12-15 04:08:24 +00:00
Jeff Roberson	a808177864	Add a deferred free mechanism for freeing swap space that does not require an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654	2019-12-15 03:15:06 +00:00
Konstantin Belousov	67388836f3	Store the bottom of the shadow chain in OBJ_ANON object->handle member. The handle value is stable for all shadow objects in the inheritance chain. This allows to avoid descending the shadow chain to get to the bottom of it in vm_map_entry_set_vnode_text(), and eliminate corresponding object relocking which appeared to be contending. Change vm_object_allocate_anon() and vm_object_shadow() to handle more of the cred/charge initialization for the new shadow object, in addition to set up the handle. Reported by: jeff Reviewed by: alc (previous version), jeff (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differrential revision: https://reviews.freebsd.org/D22541	2019-12-01 20:43:04 +00:00
Jeff Roberson	639676877b	Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119	2019-11-19 23:19:43 +00:00
Mark Johnston	be801aaaef	Fix a race in release_page(). Since r354156 we may call release_page() without the page's object lock held, specifically following the page copy during a CoW fault. release_page() must therefore unbusy the page only after scheduling the requeue, to avoid racing with a free of the page. Previously, the object lock prevented this race from occurring. Add some assertions that were helpful in tracking this down. Reported by: pho, syzkaller Tested by: pho Reviewed by: alc, jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22234	2019-11-06 16:59:16 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Jeff Roberson	4b3e066539	Drop the object lock earlier in fault and don't relock it after pmap_enter(). Recent changes in object and page locking have enabled more lock pushdown. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22036	2019-10-29 20:46:25 +00:00
Mark Johnston	be2c561003	Modify release_page() to handle a missing fault page. r353890 introduced a case where we may call release_page() with fs.m == NULL, since the fault handler may now lock the vnode prior to allocating a page for a page-in. Reported by: jhb Reviewed by: kib MFC with: r353890 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22120	2019-10-23 20:39:21 +00:00
Konstantin Belousov	16b0c09225	Assert that vm_fault_lock_vnode() returns locked saved vnode. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22113	2019-10-23 07:36:26 +00:00
Konstantin Belousov	208b81bb05	Add VV_VMSIZEVNLOCK flag. The flag specifies that vm_fault() handler should check the vnode' vm_object size under the vnode lock. It is converted into the object' OBJ_SIZEVNLOCK flag in vnode_pager_alloc(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:09:25 +00:00
Konstantin Belousov	0ddd3082a4	vm_fault(): extract code to lock the vnode into a helper vn_fault_lock_vnode(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 15:59:16 +00:00
Jeff Roberson	fff5403f84	(5/6) Move the VPO_NOSYNC to PGA_NOSYNC to eliminate the dependency on the object lock in vm_page_set_validclean(). Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21595	2019-10-15 03:48:22 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Jeff Roberson	205be21d99	(3/6) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held. This allows code that would need to acquire and release a very large number of page busy locks to use the old mechanism where busy is only checked and not held. This comes at the cost of false positives but never false negatives which the single consumer, vm_fault_soft_fast(), handles. Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21592	2019-10-15 03:41:36 +00:00
Jeff Roberson	8da1c09853	(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep(). This persists busy state across operations like rename and replace. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21549	2019-10-15 03:38:02 +00:00

1 2 3 4 5 ...

472 Commits