freebsd-dev

Author	SHA1	Message	Date
Bryan Drewery	a771bf748f	Remove unused obj variable missed in r354870. Sponsored by: Dell EMC	2021-03-17 15:29:15 -07:00
Kristof Provost	b8f7267d49	uma: allow uma_zfree_pcu(..., NULL) We already allow free(NULL) and uma_zfree(..., NULL). Make uma_zfree_pcpu(..., NULL) work as well. This also means that counter_u64_free(NULL) will work. These make cleanup code simpler. MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29189	2021-03-12 12:12:35 +01:00
Mark Johnston	968079f253	vm_reserv: Fix list locking in vm_reserv_reclaim_contig() The per-domain partpop queue is locked by the combination of the per-domain lock and individual reservation mutexes. vm_reserv_reclaim_contig() scans the queue looking for partially populated reservations that can be reclaimed in order to satisfy the caller's allocation. During the scan, we drop the per-domain lock. At this point, the rvn pointer may be invalidated. Take care to load rvn after re-acquiring the per-domain lock. While here, simplify the condition used to check whether a reservation was dequeued while the per-domain lock was dropped. Reviewed by: alc, kib Reported by: gallatin MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29203	2021-03-11 10:35:35 -05:00
Mark Johnston	0401989282	vm: Round up npages and alignment for contig reclamation When searching for runs to reclaim, we need to ensure that the entire run will be added to the buddy allocator as a single unit. Otherwise, it will not be visible to vm_phys_alloc_contig() as it is currently implemented. This is a problem for allocation requests that are not a power of 2 in size, as with 9KB jumbo mbuf clusters. Reported by: alc Reviewed by: alc MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28924	2021-03-02 10:21:02 -05:00
Max Laier	14b5a3c7d5	vm pqbatch: move unmanaged page assert under pagequeue lock This KASSERT is overzealous because of the following race condition: 1) A managed page which is currently in PQ_LAUNDRY is freed. vm_page_free_prep calls vm_page_dequeue_deferred() The page state is: PQ_LAUNDRY, PGA_DEQUEUE\|PGA_ENQUEUED 2) The laundry worker comes around and pick up the page and calls vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the queue. We do a vm_page_astate_load and get PQ_LAUNDRY, PGA_DEQUEUE\|PGA_ENQUEUED as per above. 3) The laundry worker is pre-empted and another thread allocates our page from the free pool. For example vm_page_alloc_domain_after calls vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for an OBJT_UNMANAGED object. The page state is: PQ_NONE, 0 - VPO_UNMANAGED 4) The laundry worker resumes, and processes vm_pageout_defer based on the stale astate which leads to a call to vm_page_pqbatch_submit, which will trip on the KASSERT. Submitted by: mlaier Reviewed by: markj, rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D28563	2021-02-24 15:56:16 -08:00
Mark Johnston	537f92cd35	uma: Update the comment above startup_alloc() to reflect reality The scheme used for early slab allocations changed in commit `a81c400e75`. Reported by: alc Reviewed by: alc MFC after: 1 week	2021-02-22 18:22:51 -05:00
Mark Johnston	23e875fd97	vm_kern: Avoid sign extension in the KVA_QUANTUM definition Otherwise, on a powerpc64 NUMA system with hashed page tables, the first-level superpage reservation size is large enough that the value of the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is negative and gets sign-extended when passed to vmem_set_import(). This results in a boot-time hang on such platforms. Reported by: bdragon MFC after: 3 days	2021-02-22 15:50:09 -05:00
Alex Richardson	fa2528ac64	Use atomic loads/stores when updating td->td_state KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569	2021-02-18 14:02:48 +00:00
John Baldwin	67932460c7	Add a VA_IS_CLEANMAP() macro. This macro returns true if a provided virtual address is contained in the kernel's clean submap. In CHERI kernels, the buffer cache and transient I/O map are allocated as separate regions. Abstracting this check reduces the diff relative to FreeBSD. It is perhaps slightly more readable as well. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D28710	2021-02-17 16:32:11 -08:00
Mark Johnston	5c18744ea9	vm: Honour the "noreuse" flag to vm_page_unwire_managed() This flag indicates that the page should be enqueued near the head of the inactive queue, skipping the LRU queue. It is used when unwiring pages from the buffer cache following direct I/O or after I/O when POSIX_FADV_NOREUSE or _DONTNEED advice was specified, or when sendfile(SF_NOCACHE) completes. For the direct I/O and sendfile cases we only enqueue the page if we decide not to free it, typically because it's mapped. Pass "noreuse" through to vm_page_release_toq() so that we actually honour the desired LRU policy for these scenarios. Reported by: bdrewery Reviewed by: alc, kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28555	2021-02-10 11:10:27 -05:00
Ryan Stone	660344ca44	Add a VM flag to prevent reclaim on a failed contig allocation If a M_WAITOK contig alloc fails, the VM subsystem will try to reclaim contiguous memory twice before actually failing the request. On a system with 64GB of RAM I've observed this take 400-500ms before it finally gives up, and I believe that this will only be worse on systems with even more memory. In certain contexts this delay is extremely harmful, so add a flag that will skip reclaim for allocation requests to allow those paths to opt-out of doing an expensive reclaim. Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28422 Reviewed by: markj, kib	2021-02-03 16:16:51 -05:00
Brooks Davis	7a1591c1b6	Rename kern_mmap_req to kern_mmap Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap. Reand rename kern_mmap_req to kern_mmap . The helper saved some code churn initially, but having multiple interfaces is sub-optimal. Obtained from: CheriBSD Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28292	2021-01-25 21:50:37 +00:00
Konstantin Belousov	420d4be3e4	vm_map_protect(): remove not needed recalculations of new_prot, new_maxprot Requested by: alc Sponsored by: The FreeBSD Foundation	2021-01-14 10:02:43 +02:00
Konstantin Belousov	0659df6fad	vm_map_protect: allow to set prot and max_prot in one go. This prevents a situation where other thread modifies map entries permissions between setting max_prot, then relocking, then setting prot, confusing the operation outcome. E.g. you can get an error that is not possible if operation is performed atomic. Also enable setting rwx for max_prot even if map does not allow to set effective rwx protection. Reviewed by: brooks, markj (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28117	2021-01-13 01:35:22 +02:00
Konstantin Belousov	9402bb44f1	vmspace_fork: preserve wx settings in the child vm map after fork Noted by: markj Sponsored by: The FreeBSD Foundation	2021-01-12 08:09:59 +02:00
Konstantin Belousov	2e1c94aa1f	Implement enforcing write XOR execute mapping policy. It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE \| PROT_EXEC are never specified together, if vm_map has MAP_WX flag set. FreeBSD control flag allows specific binary to request WX exempt, and there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/ disable globally. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28050	2021-01-12 01:15:43 +02:00
Mark Johnston	663de81f85	uma: Avoid unmapping direct-mapped slabs startup_alloc() uses pmap_map() to map slabs used for bootstrapping the VM. pmap_map() may ignore the hint address and simply return a range from the direct map. In this case we must not unmap the range in startup_free(). UMA uses bootstart and bootmem to track the range of KVA into which slabs are mapped if the direct map is not used. Unmap a startup slab only if it was mapped into that range. Reported by: alc Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27885	2021-01-03 11:50:31 -05:00
Ryan Libby	942951ba46	uma dbg: catch more corruption with atomics Use atomic testandset and testandclear to catch concurrent double free, and to reduce the number of atomic operations. Submitted by: jeff Reviewed by: cem, kib, markj (all previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22703	2020-12-31 13:02:45 -08:00
Mark Johnston	81846def34	vm: Fix some bugs in the page busying code In vm_page_busy_acquire(), load the object pointer using atomic_load_ptr() as we do elsewhere. Per the comment, the object identity must be consistent across sleeps. In vm_page_grab_sleep(), pass the correct pindex to _vm_page_busy_sleep(). The pindex is used to re-check the page's identity before going to sleep. In particular, vm_page_grab_sleep() is used in unlocked grab, so the object lock is not necessarily held when verifying the page's identity, and the pindex may change if the page is moved, or freed and re-allocated. I believe this can result in spurious VM_PAGER_FAILs from vm_page_grab_valid_unlocked() or early termination of vm_page_grab_pages_unlocked(). In vm_page_grab_pages(), pass the correct pindex to vm_page_grab_sleep(). Otherwise I believe vm_page_grab_pages() will effectively spin when attempting to busy a busy page after the first index in the range. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27607	2020-12-27 17:01:44 -05:00
Mark Johnston	d2f1c44bc9	uma: Remove the MINBUCKET flag from the flag name list This should have been done in r368399 / commit `f8b6c51538`. Reported by: rlibby Sponsored by: The FreeBSD Foundation	2020-12-27 17:01:33 -05:00
Bryan Drewery	5fee468e83	Revert r368523 which fixed contig allocs waiting forever. This needs to account for empty NUMA domains or domains which do not satisfy the requested range. Discussed with: markj	2020-12-15 19:38:16 +00:00
Bryan Drewery	bbfec1633b	contig allocs: Don't retry forever on M_WAITOK. This restores behavior from before domain iterators were added in r327895 and r327896. The vm_domainset_iter_policy() will do a vm_wait_doms() and then restart its iterator when M_WAITOK is set. It will also force the containing loop to have M_NOWAIT. So we get an unbounded retry loop rather than the intended bounded retries that kmem_alloc_contig_pages() already handles. This also restores M_WAITOK to the vmem_alloc() call in kmem_alloc_attr_domain() and kmem_alloc_contig_domain(). Reviewed by: markj, kib MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D27507	2020-12-10 20:44:29 +00:00
Mark Johnston	e574d407ae	uma: Make uma_zone_set_maxcache() work better with small limits The old implementation chose the largest bucket zone such that if the per-CPU caches are fully populated, the total number of items cached is no larger than the specified limit. If no such zone existed, UMA would not do any caching. We can now use uz_bucket_size_max to set a precise limit on the number of items in a zone's bucket, so the total size of per-CPU caches can be bounded more easily. Implement a new policy in uma_zone_set_maxcache(): choose a bucket size such that up to half of the limit can be cached in per-CPU caches, with the rest going to the full bucket cache. This fixes a problem with the kstack_cache zone: the limit of 4 * mp_ncpus items meant that the zone would not do any caching, defeating the whole purpose of the zone. That's because the smallest bucket size holds up to 2 items and we may cache up to 3 full buckets per CPU, and 2 * 3 * mp_ncpus > 4 * mp_ncpus. Reported by: mjg Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27168	2020-12-06 22:45:50 +00:00
Mark Johnston	f8b6c51538	uma: Enforce the use of uz_bucket_size_max in the free path uz_bucket_size_max is the maximum permitted bucket size. When filling a new bucket to satisfy uma_zalloc(), the bucket is populated with at most uz_bucket_size_max items. The maximum number of entries in the bucket may be larger. When freeing items, however, we will fill per-CPPU buckets up to their maximum number of entries, potentially exceeding uz_bucket_size_max. This makes it difficult to precisely limit the number of items that may be cached in a zone. For example, if one wants to limit buckets to 1 entry for a particular zone, that's not possible since the smallest bucket holds up to 2 entries. Try to solve the problem by using uz_bucket_size_max to limit the number of entries in a bucket. Note that the ub_entries field is initialized upon every bucket allocation. Most zones are not affected since they do not impose any specific limit on the maximum bucket size. While here, remove the UMA_ZONE_MINBUCKET flag. It was unused and we now have uma_zone_set_maxcache() to control the zone's cache size more precisely. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27167	2020-12-06 22:45:39 +00:00
Mark Johnston	8a6776ca0f	uma: Use atomic load for uz_sleepers This field is updated locklessly. Sponsored by: The FreeBSD Foundation	2020-12-06 22:45:22 +00:00
Mark Johnston	991f23ef20	uma: Avoid allocating buckets with the cross-domain lock held Allocation of a bucket can trigger a cross-domain free in the bucket zone, e.g., if the per-CPU alloc bucket is empty, we free it and get migrated to a remote domain. This can lead to deadlocks since a bucket zone may allocate buckets from itself or a pair of bucket zones could be allocating from each other. Fix the problem by dropping the cross-domain lock before allocating a new bucket and handling refill races. Use a list of empty buckets to ensure that we can make forward progress. Reported by: imp, mjg (witness(9) warnings) Discussed with: jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27341	2020-11-30 16:18:33 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Mark Johnston	1fea4b25c9	Wrap a long line in vm_pqbatch_process_page()	2020-11-19 15:41:42 +00:00
Mark Johnston	9e3e737608	Micro-optimize vm_page_pqbatch_submit() Avoid calling vm_page_domain() twice. Discussed with: alc (in D27207)	2020-11-19 15:40:58 +00:00
Mark Johnston	431fb8abd7	vm_phys: Try to clean up NUMA KPIs It can useful for code outside the VM system to look up the NUMA domain of a page backing a virtual or physical address, specifically when creating NUMA-aware data structures. We have _vm_phys_domain() for this, but the leading underscore implies that it's an internal function, and vm_phys.h has dependencies on a number of other headers. Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to vm_phys_domain(). Make the latter an inline function. Add _vm_phys.h and define struct vm_phys_seg there so that it's easier to use in other headers. Include it from vm_page.h so that vm_page_domain() can be defined there. Include machine/vmparam.h from _vm_phys.h since it depends directly on some constants defined there. Reviewed by: alc Reviewed by: dougm, kib (earlier versions) Differential Revision: https://reviews.freebsd.org/D27207	2020-11-19 03:59:21 +00:00
Mark Johnston	20f02659d6	vm_map: Handle kernel map entry allocator recursion On platforms without a direct map[], vm_map_insert() may in rare situations need to allocate a kernel map entry in order to allocate kernel map entries. This poses a problem similar to the one solved for vmem boundary tags by vmem_bt_alloc(). In fact the kernel map case is a bit more complicated since we must allocate entries with the kernel map locked, whereas vmem can recurse into itself because boundary tags are allocated up-front. The solution is to add a custom slab allocator for kmapentzone which allocates KVA directly from kernel_map, bypassing the kmem_ layer. This avoids mutual recursion with the vmem btag allocator. Then, when vm_map_insert() allocates a new kernel map entry, it avoids triggering allocation of a new slab with M_NOVM until after the insertion is complete. Instead, vm_map_insert() allocates from the reserve and sets a flag in kernel_map to trigger re-population of the reserve just before the map is unlocked. This places an implicit upper bound on the number of kernel map entries that may be allocated before the kernel map lock is released, but in general a bound of 1 suffices. [*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a configuration required by some kernel sanitizers. Discussed with: kib, rlibby Reported by: andrew Tested by: pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26851	2020-11-11 17:16:39 +00:00
Jonathan T. Looney	7b516613aa	When destroying a UMA zone which has a reserve (set with uma_zone_reserve()), messages like the following appear on the console: "Freed UMA keg (Test zone) was not empty (0 items). Lost 528 pages of memory." When keg_drain_domain() is draining the zone, it tries to keep the number of items specified in the reservation. However, when we are destroying the UMA zone, we do not need to keep those items. Therefore, when destroying a non-secondary and non-cache zone, we should reset the keg reservation to 0 prior to draining the zone. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27129	2020-11-10 18:12:09 +00:00
Mateusz Guzik	3a440a421d	Add more per-cpu zones. This covers powers of 2 up to 64. Example pending user is ZFS.	2020-11-09 00:34:23 +00:00
Leandro Lupori	e2d6c417e3	Implement superpages for PowerPC64 (HPT) This change adds support for transparent superpages for PowerPC64 systems using Hashed Page Tables (HPT). All pmap operations are supported. The changes were inspired by RISC-V implementation of superpages, by @markj (r344106), but heavily adapted to fit PPC64 HPT architecture and existing MMU OEA64 code. While these changes are not better tested, superpages support is disabled by default. To enable it, use vm.pmap.superpages_enabled=1. In this initial implementation, when superpages are disabled, system performance stays at the same level as without these changes. When superpages are enabled, buildworld time increases a bit (~2%). However, for workloads that put a heavy pressure on the TLB the performance boost is much bigger (see HPC Challenge and pgbench on D25237). Reviewed by: jhibbits Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D25237	2020-11-06 14:12:45 +00:00
Mateusz Guzik	2dee296a3d	Rationalize per-cpu zones. The 2 provided zones had inconsistent naming between each other ("int" and "64") and other allocator zones (which use bytes). Follow malloc by naming them "pcpu-" + size in bytes. This is a step towards replacing ad-hoc per-cpu zones with general slabs.	2020-11-05 15:08:56 +00:00
Mark Johnston	f7db0c9532	vmspace: Convert to refcount(9) This is mostly mechanical except for vmspace_exit(). There, use the new refcount_release_if_last() to avoid switching to vmspace0 unless other processes are sharing the vmspace. In that case, upon switching to vmspace0 we can unconditionally release the reference. Remove the volatile qualifier from vm_refcnt now that accesses are protected using refcount(9) KPIs. Reviewed by: alc, kib, mmel MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27057	2020-11-04 16:30:56 +00:00
Alan Cox	ccfd886a1b	Conditionally compile struct vm_phys_seg's md_first field. This field is only used by arm64's pmap. Reviewed by: kib, markj, scottph Differential Revision: https://reviews.freebsd.org/D26907	2020-10-23 06:24:38 +00:00
Ed Maste	575a4437a9	uma: fix KTR message after r366840 Reported by: bz Sponsored by: The FreeBSD Foundation	2020-10-19 18:54:44 +00:00
Mark Johnston	f09cbea31a	uma: Respect uk_reserve in keg_drain() When a reserve of free items is configured for a zone, the reserve must not be reclaimed under memory pressure. Modify keg_drain() to simply respect the reserved pool. While here remove an always-false uk_freef == NULL check (kegs that shouldn't be drained should set _NOFREE instead), and make sure that the keg_drain() KTR statement does not reference an uninitialized variable. Reviewed by: alc, rlibby Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26772	2020-10-19 16:57:40 +00:00
Mark Johnston	1b2dcc8c54	uma: Avoid depleting keg reserves when filling a bucket zone_import() fetches a free or partially free slab from the keg and then uses its items to populate an array, typically filling a bucket. If a single allocation causes the keg to drop below its minimum reserve, the inner loop ends. However, if the bucket is still not full and M_USE_RESERVE is specified, the outer loop will continue to fetch items from the keg. If M_USE_RESERVE is specified and the number of free items is below the reserved limit, we should return only a single item. Otherwise, if the bucket size is larger than the reserve, all of the reserved items may end up in a single per-CPU bucket, invisible to other CPUs. Reviewed by: rlibby MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26771	2020-10-19 16:55:03 +00:00
Konstantin Belousov	6f3b523c9a	Avoid dump_avail[] redefinition. Move dump_avail[] extern declaration and inlines into a new header vm/vm_dumpset.h. This fixes default gcc build for mips. Reviewed by: alc, scottph Tested by: kevans (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26741	2020-10-14 22:51:40 +00:00
Bryan Drewery	c2c6fb90e0	Use unlocked page lookup for inmem() to avoid object lock contention Reviewed By: kib, markj Submitted by: mlaier Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D26653	2020-10-09 23:49:42 +00:00
Konstantin Belousov	42f96162c3	vm_page_dump_index_to_pa(): Add braces to the expression involving + and &. The precedence of the '&' operator is less than of '+'. Added braces do change the order of evaluation into the natural one, in my opinion. On the other hand, the value of the expression should not change since all elements should have page-aligned values. This fixes a gcc warning reported. Reported by: adrian Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-10-08 22:46:15 +00:00
Mark Johnston	2913cc4637	vm_pageout: Avoid rounding down the inactive scan target With helper page daemon threads, enabled by default in r364786, we divide the inactive target by the number of threads, rounding down, and sum the total number of pages freed by the threads. This sum is compared with the original target, but by rounding down we might lose pages, causing the page daemon control loop to conclude that inactive queue scanning isn't keeping up with demand for free pages. Typically this results in excessive swapping. Fix the problem by accounting for the error in the main pagedaemon thread's target. Note that by default the problem will manifest only in systems with >16 CPUs in a NUMA domain. Reviewed by: cem Discussed with: dougm Reported and tested by: dhw, glebius Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26610	2020-10-02 19:16:06 +00:00
Mark Johnston	06d8bdcbf7	uma: Use the bucket cache for cross-domain allocations uma_zalloc_domain() allocates from the requested domain instead of following a first-touch policy (the default for most zones). Currently it is only used by malloc_domainset(), and consumers free returned items with free(9) since r363834. Previously uma_zalloc_domain() worked by always going to the keg for an item. As a result, the use of UMA zone caches was unbalanced: we free items to the caches, but always allocate from the keg, skipping the caches. Make some effort to allocate from the UMA caches when performing a cross-domain allocation. This avoids blowing up the caches when something is performing many transient allocations with malloc_domainset(). Reported and tested by: dhw, glebius Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26427	2020-10-02 19:04:29 +00:00
Mark Johnston	5afdf5c1ca	uma: Use LIFO for non-SMR bucket caches When SMR was introduced, zone_put_bucket() was changed to always place full buckets at the end of the queue. However, it is generally preferable to use recently used buckets since their items are more likely to be resident in cache. So, for buckets that have no constraint on item reuse, use a last-in-first-out ordering as we did before. Reviewed by: rlibby Tested by: dhw, glebius Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26426	2020-10-02 19:04:09 +00:00
Mark Johnston	952c8964ba	uma: Remove newlines from panic messages Sponsored by: The FreeBSD Foundation	2020-10-02 19:03:42 +00:00
Mark Johnston	f31695cc64	Implement sparse core dumps Currently we allocate and map zero-filled anonymous pages when dumping core. This can result in lots of needless disk I/O and page allocations. This change tries to make the core dumper more clever and represent unbacked ranges of virtual memory by holes in the core dump file. Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to clean up and return an error when it would otherwise map a zero-filled page. Then, in the core dumper code, prefault all user pages and handle errors by simply extending the size of the core file. This also fixes a bug related to the fact that vn_io_fault1() does not attempt partial I/O in the face of errors from vm_fault_quick_hold_pages(): if a truncated file is mapped into a user process, an attempt to dump beyond the end of the file results in an error, but this means that valid pages immediately preceding the end of the file might not have been dumped either. The change reduces the core dump size of trivial programs by a factor of ten simply by excluding unaccessed libc.so pages. PR: 249067 Reviewed by: kib Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26590	2020-10-02 17:50:22 +00:00
Mark Johnston	114484b7ec	Flag vm_reserv and vm_phys sysctls as MPSAFE. Nothing in these subsystems relies on Giant. MFC after: 1 week	2020-09-23 19:36:07 +00:00
Mark Johnston	78257765f2	Add a vmparam.h constant indicating pmap support for large pages. Enable SHM_LARGEPAGE support on arm64. Reviewed by: alc, kib Sponsored by: Juniper Networks, Inc., Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26467	2020-09-23 19:34:21 +00:00

1 2 3 4 5 ...

4509 Commits