freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	100949103a	uma: Add KMSAN hooks For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation	2021-08-10 21:27:54 -04:00
Mark Johnston	8978608832	amd64: Populate the KMSAN shadow maps and integrate with the VM - During boot, allocate PDP pages for the shadow maps. The region above KERNBASE is currently not shadowed. - Create a dummy shadow for the vm page array. For now, this array is not protected by the shadow map to help reduce kernel memory usage. - Grow shadows when growing the kernel map. - Increase the default kernel stack size when KMSAN is enabled. As with KASAN, sanitizer instrumentation appears to create stack frames large enough that the default value is not sufficient. - Disable UMA's use of the direct map when KMSAN is configured. KMSAN cannot validate the direct map. - Disable unmapped I/O when KMSAN configured. - Lower the limit on paging buffers when KMSAN is configured. Each buffer has a static MAXPHYS-sized allocation of KVA, which in turn eats 2*MAXPHYS of space in the shadow map. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31295	2021-08-10 21:27:53 -04:00
Ka Ho Ng	de2e152959	Add vnode_pager_purge_range(9) KPI This KPI is created in addition to the existing vnode_pager_setsize(9) KPI. The KPI is intended for file systems that are able to turn a range of file into sparse range, also known as hole-punching. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27194	2021-08-05 22:52:26 +08:00
Konstantin Belousov	0ef5eee9d9	Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405	2021-08-04 19:40:13 +03:00
Konstantin Belousov	041b7317f7	Add pmap_vm_page_alloc_check() which is the place to put MD asserts about allocated pages. On amd64, verify that allocated page does not belong to the kernel (text, data) or early allocated pages. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121	2021-07-31 16:53:42 +03:00
Mark Johnston	4e8e26a004	redzone: Raise a compile error if KASAN is configured redzone(9) does some munging of the allocation to insert redzones before and after a valid memory buffer, but KASAN does not know about this and will raise false positives if both are configured. Until this is fixed, do not allow both to be configured. Note that KASAN provides similar checking on its own but currently does not force the creation of redzones for all UMA allocations; this should be addressed as well. Sponsored by: The FreeBSD Foundation	2021-07-23 10:47:13 -04:00
Mark Johnston	b0dfc48684	uma: Fix a few problems with KASAN integration - Ensure that all items returned by UMA are aligned to KASAN_SHADOW_SCALE (8). This was true in practice since smaller alignments are not used by any consumers, but we should enforce it anyway. - Use a non-zero code for marking redzones that appear naturally in items that are not a multiple of the scale factor in size. Currently we do not modify keg layouts to force the creation of redzones. - Use a non-zero code for marking freed per-CPU items, otherwise accesses of freed per-CPU items are not detected by the runtime. Sponsored by: The FreeBSD Foundation	2021-07-09 20:38:50 -04:00
Konstantin Belousov	5b10e79edb	Un-staticise vm_page_init_page() Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30785	2021-06-17 16:58:44 +03:00
Mateusz Guzik	128e25842e	vm: add another pager private flag Move OBJ_SHADOWLIST around to let pager flags be next to each other. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30258	2021-05-15 20:47:29 +00:00
Konstantin Belousov	28bc23ab92	tmpfs: dynamically register tmpfs pager Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into tmpfs_subr.c. There is no longer any code to directly support tmpfs in sys/vm, most tmpfs knowledge is shared by non-anon swap object type implementation. The tmpfs-specific methods are provided by registered tmpfs pager, which inherits from the swap pager. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:13:34 +03:00
Konstantin Belousov	b730fd30b7	vm: Add KPI to dynamically register pagers Pager is allowed to inherit part of its implementation from the existing pager, which is done by copying non-NULL virtual method slots. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:12:29 +03:00
Konstantin Belousov	7079449b0b	sys/vm: remove several other uses of OBJT_SWAP_TMPFS Mostly in cases where OBJ_SWAP flag works as well, or by reversing the condition so that object types can be listed. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Konstantin Belousov	3e7a11ca21	vm_object_set_memattr(): handle all object types without listing them explicitly This avoids the need to know all existing object types in advance, by the cost of loosing the assert that unknown object type is handled in a sane manner. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Konstantin Belousov	00a3fe968b	vm_object_kvme_type(): reimplement by embedding kvme_type into pagerops Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Mark Johnston	9246b3090c	fork: Suspend other threads if both RFPROC and RFMEM are not set Otherwise, a multithreaded parent process may trigger races in vm_forkproc() if one thread calls rfork() with RFMEM set and another calls rfork() without RFMEM. Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to see if the address space is shared. Reported by: syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com Reported by: syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30220	2021-05-13 08:33:23 -04:00
Mark Johnston	06d1fd9f42	swap_pager: Zero swap info before exporting to userspace Otherwise padding bytes are leaked. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-05-12 12:52:05 -04:00
Konstantin Belousov	d474440ab3	Constify vm_pager-related virtual tables. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	0d2dfc6fed	pagertab: use designated initializers Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	838adc533f	Style enum obj_type Put each type into dedicated line, which makes addition of new types cleaner. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	a7c198a24b	Implement vm_object_vnode() using vm_pager_getvp() Allow vp_heldp argument to be NULL, in which case the returned vnode is not held for tmpfs swap objects. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	1390a5cbeb	Add pgo_freespace method Makes the code in vm_object collapse/page_remove cleaner Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	192112b74f	Add pgo_getvp method This eliminates the staircase of conditions in vm_map_entry_set_vnode_text(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	c23c555bc1	Add pgo_mightbedirty method Used to implement vm_object_mightbedirty() Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	180bcaa46c	vm_pager: add pgo_set_writeable_dirty method specialized for swap and vnode pagers, and used to implement vm_object_set_writeable_dirty(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Konstantin Belousov	ee4211bca6	vm_pager: style some wrappers Fill lines with the function definitions. Use local var to shorten repeated extra-long expressions. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:02 +03:00
Konstantin Belousov	a0850dd057	swappagerops: slightly more style-compliant formatting Remove excess spaces from comments. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:02 +03:00
Mark Johnston	9a7c2de364	realloc: Fix KASAN(9) shadow map updates When copying from the old buffer to the new buffer, we don't know the requested size of the old allocation, but only the size of the allocation provided by UMA. This value is "alloc". Because the copy may access bytes in the old allocation's red zone, we must mark the full allocation valid in the shadow map. Do so using the correct size. Reported by: kp Tested by: kp Sponsored by: The FreeBSD Foundation	2021-05-05 17:12:51 -04:00
Alexander Motin	2760658b21	Improve UMA cache reclamation. When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't. Update working set size on every reclamation call, shrinking caches faster under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches. Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation. Reviewed by: markj, jeff (previous version) MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29790	2021-05-02 19:45:23 -04:00
Konstantin Belousov	ecfbddf0cd	sysctl vm.objects: report backing object and swap use For anonymous objects, provide a handle kvo_me naming the object, and report the handle of the backing object. This allows userspace to deconstruct the shadow chain. Right now the handle is the address of the object in KVA, but this is not guaranteed. For the same anonymous objects, report the swap space used for actually swapped out pages, in kvo_swapped field. I do not believe that it is useful to report full 64bit counter there, so only uint32_t value is returned, clamped to the max. For kinfo_vmentry, report anonymous object handle backing the entry, so that the shadow chain for the specific mapping can be deconstructed. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29771	2021-04-19 21:32:01 +03:00
Mark Johnston	aabe13f145	uma: Introduce per-domain reclamation functions Make it possible to reclaim items from a specific NUMA domain. - Add uma_zone_reclaim_domain() and uma_reclaim_domain(). - Permit parallel reclamations. Use a counter instead of a flag to synchronize with zone_dtor(). - Use the zone lock to protect cache_shrink() now that parallel reclaims can happen. - Add a sysctl that can be used to trigger reclamation from a specific domain. Currently the new KPIs are unused, so there should be no functional change. Reviewed by: mav MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29685	2021-04-14 13:03:34 -04:00
Mark Johnston	54f421f9e8	uma: Split bucket_cache_drain() to permit per-domain reclamation Note that the per-domain variant does not shrink the target bucket size. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-04-14 13:03:34 -04:00
Mark Johnston	2b914b85dd	kmem: Add KASAN state transitions Memory allocated with kmem_* is unmapped upon free, so KASAN doesn't provide a lot of benefit, but since allocations are always a multiple of the page size we can create a redzone when the allocation request size is not a multiple of the page size. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29458	2021-04-13 17:42:21 -04:00
Mark Johnston	244f3ec642	kstack: Add KASAN state transitions We allocate kernel stacks using a UMA cache zone. Cache zones have KASAN disabled by default, but in this case it makes sense to enable it. Reviewed by: andrew MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29457	2021-04-13 17:42:21 -04:00
Mark Johnston	09c8cb717d	uma: Add KASAN state transitions - Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular zone should not be sanitized. This is applied implicitly for NOFREE and cache zones. - Add KASAN call backs which get invoked: 1) when a slab is imported into a keg 2) when an item is allocated from a zone 3) when an item is freed to a zone 4) when a slab is freed back to the VM In state transitions 1 and 3, memory is poisoned so that accesses will trigger a panic. In state transitions 2 and 4, memory is marked valid. - Disable trashing if KASAN is enabled. It just adds extra CPU overhead to catch problems that are detected by KASAN. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29456	2021-04-13 17:42:21 -04:00
Mark Johnston	982693bb72	vm_fault: Shoot down multiply mapped COW source page mappings Reviewed by: kib, rlibby Discussed with: alc Approved by: so Security: CVE-2021-29626 Security: FreeBSD-SA-21:08.vm	2021-04-06 14:49:28 -04:00
Konstantin Belousov	89619b747b	Add sysctl debug.uma_reclaim Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-04-04 20:39:06 +03:00
Konstantin Belousov	51a7be5f60	Style Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-04-04 20:39:06 +03:00
Jason A. Harmening	8dc8feb53d	Clean up a couple of MD warts in vm_fault_populate(): --Eliminate a big ifdef that encompassed all currently-supported architectures except mips and powerpc32. This applied to the case in which we've allocated a superpage but the pager-populated range is insufficient for a superpage mapping. For platforms that don't support superpages the check should be inexpensive as we shouldn't get a superpage in the first place. Make the normal-page fallback logic identical for all platforms and provide a simple implementation of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc. --Apply the logic for handling pmap_enter() failure if a superpage mapping can't be supported due to additional protection policy. Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case, and note Intel PKU on amd64 as the first example of such protection policy. Reviewed by: kib, markj, bdragon Differential Revision: https://reviews.freebsd.org/D29439	2021-03-30 18:15:55 -07:00
Konstantin Belousov	c7b913aa47	vm_fault: handle KERN_PROTECTION_FAILURE pmap_enter(PMAP_ENTER_LARGEPAGE) may return KERN_PROTECTION_FAILURE due to PKRU inconsistency. Handle it in the call place from vm_fault_populate(), and in places which decode errors from vm_fault_populate()/ vm_fault_allocate(). Reviewed by: jah, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29442	2021-03-27 20:16:27 +02:00
Bryan Drewery	a771bf748f	Remove unused obj variable missed in r354870. Sponsored by: Dell EMC	2021-03-17 15:29:15 -07:00
Kristof Provost	b8f7267d49	uma: allow uma_zfree_pcu(..., NULL) We already allow free(NULL) and uma_zfree(..., NULL). Make uma_zfree_pcpu(..., NULL) work as well. This also means that counter_u64_free(NULL) will work. These make cleanup code simpler. MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29189	2021-03-12 12:12:35 +01:00
Mark Johnston	968079f253	vm_reserv: Fix list locking in vm_reserv_reclaim_contig() The per-domain partpop queue is locked by the combination of the per-domain lock and individual reservation mutexes. vm_reserv_reclaim_contig() scans the queue looking for partially populated reservations that can be reclaimed in order to satisfy the caller's allocation. During the scan, we drop the per-domain lock. At this point, the rvn pointer may be invalidated. Take care to load rvn after re-acquiring the per-domain lock. While here, simplify the condition used to check whether a reservation was dequeued while the per-domain lock was dropped. Reviewed by: alc, kib Reported by: gallatin MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29203	2021-03-11 10:35:35 -05:00
Mark Johnston	0401989282	vm: Round up npages and alignment for contig reclamation When searching for runs to reclaim, we need to ensure that the entire run will be added to the buddy allocator as a single unit. Otherwise, it will not be visible to vm_phys_alloc_contig() as it is currently implemented. This is a problem for allocation requests that are not a power of 2 in size, as with 9KB jumbo mbuf clusters. Reported by: alc Reviewed by: alc MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28924	2021-03-02 10:21:02 -05:00
Max Laier	14b5a3c7d5	vm pqbatch: move unmanaged page assert under pagequeue lock This KASSERT is overzealous because of the following race condition: 1) A managed page which is currently in PQ_LAUNDRY is freed. vm_page_free_prep calls vm_page_dequeue_deferred() The page state is: PQ_LAUNDRY, PGA_DEQUEUE\|PGA_ENQUEUED 2) The laundry worker comes around and pick up the page and calls vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the queue. We do a vm_page_astate_load and get PQ_LAUNDRY, PGA_DEQUEUE\|PGA_ENQUEUED as per above. 3) The laundry worker is pre-empted and another thread allocates our page from the free pool. For example vm_page_alloc_domain_after calls vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for an OBJT_UNMANAGED object. The page state is: PQ_NONE, 0 - VPO_UNMANAGED 4) The laundry worker resumes, and processes vm_pageout_defer based on the stale astate which leads to a call to vm_page_pqbatch_submit, which will trip on the KASSERT. Submitted by: mlaier Reviewed by: markj, rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D28563	2021-02-24 15:56:16 -08:00
Mark Johnston	537f92cd35	uma: Update the comment above startup_alloc() to reflect reality The scheme used for early slab allocations changed in commit `a81c400e75`. Reported by: alc Reviewed by: alc MFC after: 1 week	2021-02-22 18:22:51 -05:00
Mark Johnston	23e875fd97	vm_kern: Avoid sign extension in the KVA_QUANTUM definition Otherwise, on a powerpc64 NUMA system with hashed page tables, the first-level superpage reservation size is large enough that the value of the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is negative and gets sign-extended when passed to vmem_set_import(). This results in a boot-time hang on such platforms. Reported by: bdragon MFC after: 3 days	2021-02-22 15:50:09 -05:00
Alex Richardson	fa2528ac64	Use atomic loads/stores when updating td->td_state KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569	2021-02-18 14:02:48 +00:00
John Baldwin	67932460c7	Add a VA_IS_CLEANMAP() macro. This macro returns true if a provided virtual address is contained in the kernel's clean submap. In CHERI kernels, the buffer cache and transient I/O map are allocated as separate regions. Abstracting this check reduces the diff relative to FreeBSD. It is perhaps slightly more readable as well. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28710	2021-02-17 16:32:11 -08:00
Mark Johnston	5c18744ea9	vm: Honour the "noreuse" flag to vm_page_unwire_managed() This flag indicates that the page should be enqueued near the head of the inactive queue, skipping the LRU queue. It is used when unwiring pages from the buffer cache following direct I/O or after I/O when POSIX_FADV_NOREUSE or _DONTNEED advice was specified, or when sendfile(SF_NOCACHE) completes. For the direct I/O and sendfile cases we only enqueue the page if we decide not to free it, typically because it's mapped. Pass "noreuse" through to vm_page_release_toq() so that we actually honour the desired LRU policy for these scenarios. Reported by: bdrewery Reviewed by: alc, kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28555	2021-02-10 11:10:27 -05:00

1 2 3 4 5 ...

4549 Commits