freebsd-nq

Author	SHA1	Message	Date
Mateusz Guzik	c3aa3bf97c	vm: clean up empty lines in .c and .h files	2020-09-01 21:20:45 +00:00
Vladimir Kondratyev	5d4bf0578f	LinuxKPI: Implement ksize() function. In Linux, ksize() gets the actual amount of memory allocated for a given object. This commit adds malloc_usable_size() to FreeBSD KPI which does the same. It also maps LinuxKPI ksize() to newly created function. ksize() function is used by drm-kmod. Reviewed by: hselasky, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26215	2020-08-29 19:26:31 +00:00
Eric van Gyzen	609de97e04	vm_pageout_scan_active: ensure ps_delta is initialized Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26212	2020-08-28 19:59:02 +00:00
Eric van Gyzen	a2e194654f	memstat_kvm_uma: fix reading of uma_zone_domain structures Coverity flagged the scaling by sizeof(uzd). That is the type of the pointer, so the scaling was already done by pointer arithmetic. However, this was also passing a stack frame pointer to kvm_read, so it was doubly wrong. Move ZDOM_GET into the !_KERNEL section and use it in libmemstat. Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26213	2020-08-28 19:50:40 +00:00
Mark Johnston	aea9103e06	Use a large kmem arena import size on NUMA systems. This helps minimize internal fragmentation that occurs when 2MB imports are interleaved across NUMA domains. Virtually all KVA allocations on direct map platforms consume more than one page, so the fragmentation manifests as runs of 511 4KB page mappings in the kernel. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26050	2020-08-26 14:31:48 +00:00
Conrad Meyer	74f5530d7a	vm_pageout: Scale worker threads with CPUs Autoscale vm_pageout worker threads from r364129 with CPU count. The default is arbitrarily chosen to be 16 CPUs per worker thread, but can be adjusted with the vm.pageout_cpus_per_thread tunable. There will never be less than 1 thread per populated NUMA domain, and the previous arbitrary upper limit (at most ncpus/2 threads per NUMA domain) is preserved. Care is taken to gracefully handle asymmetric NUMA nodes, such as empty node systems (e.g., AMD 2990WX) and systems with nodes of varying size (e.g., some larger >20 core Intel Haswell/Broadwell Xeon). Reviewed by: kib, markj Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D26152	2020-08-25 21:36:56 +00:00
Mark Johnston	411096d034	Permit vm_page_wire() to be called on pages not belonging to an object. For such pages ref_count is effectively a consumer-managed field, but there is no harm in calling vm_page_wire() on them. vm_page_unwire_noq() handles them as well. Relax the vm_page_wire() assertions to permit this case which is triggered by some out-of-tree code. [1] Also guard a conditional assertion with INVARIANTS. Otherwise the conditions are evaluated even though the result is unused. [2] Reported by: bz, cem [1], kib [2] Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26173	2020-08-25 13:45:06 +00:00
Matt Macy	9e5787d228	Merge OpenZFS support in to HEAD. The primary benefit is maintaining a completely shared code base with the community allowing FreeBSD to receive new features sooner and with less effort. I would advise against doing 'zpool upgrade' or creating indispensable pools using new features until this change has had a month+ to soak. Work on merging FreeBSD support in to what was at the time "ZFS on Linux" began in August 2018. I first publicly proposed transitioning FreeBSD to (new) OpenZFS on December 18th, 2018. FreeBSD support in OpenZFS was finally completed in December 2019. A CFT for downstreaming OpenZFS support in to FreeBSD was first issued on July 8th. All issues that were reported have been addressed or, for a couple of less critical matters there are pull requests in progress with OpenZFS. iXsystems has tested and dogfooded extensively internally. The TrueNAS 12 release is based on OpenZFS with some additional features that have not yet made it upstream. Improvements include: project quotas, encrypted datasets, allocation classes, vectorized raidz, vectorized checksums, various command line improvements, zstd compression. Thanks to those who have helped along the way: Ryan Moeller, Allan Jude, Zack Welch, and many others. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25872	2020-08-25 02:21:27 +00:00
Mateusz Guzik	feabaaf995	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho	2020-08-24 08:57:02 +00:00
Andrew Gallatin	791dda877f	uma: record allocation failures due to zone limits The zone limit mechanism was recently reworked, and allocation failures due to limits being exceeded were inadvertently no longer being recorded. This would lead to, for example, mbuf allocation failures not being indicated in netstat -m or vmstat -z Reviewed by: markj Sponsored by: Netflix	2020-08-21 18:31:57 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Mark Johnston	b21b022a81	Revert r364310. Some of the resulting fallout in CAM does not appear straightforward to fix, so simply revert the commit for now in the absence of a better solution. Discussed with: mjg Reported by: dhw	2020-08-18 14:09:49 +00:00
Gleb Smirnoff	1921bb7b68	With INVARIANTS panic immediately if M_WAITOK is requested in a non-sleepable context. Previously only _sleep() would panic. This will catch misuse of M_WAITOK at development stage rather than at stress load stage. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D26027	2020-08-17 15:37:08 +00:00
Mark Johnston	7efe14cb99	Commit a missing piece of r364302. This had failed to apply due to a merge conflict. Reported by: Jenkins MFC with: r364302	2020-08-17 14:06:51 +00:00
Mark Johnston	7dd979dfef	Remove the VM map zone. Today, the zone is only used to allocate a trio of kernel maps: the kernel map itself, and the exec and pipe submaps. Maps for user processes are dynamically allocated but are embedded in the vmspace structure, which is allocated from its own zone. Make the aforementioned kernel maps statically allocated and get rid of the zone. While here, remove a stale comment above vmspace_alloc() and change the names of locks initialized in vm_map_init() to match vmspace_zinit(). Reported by: alc Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26052	2020-08-17 13:02:01 +00:00
Konstantin Belousov	ffae7ea935	vm_object: allow paging_in_progress to be acquired after object termination. The vm objects are type-stable, and can be accessed even after the last reference is dropped, or in case of vnode objects, after vgone() destroyed it as well. Stop asserting that pip == 0 after vm_object_terminate() waited for existing owners to drop it, we only want to drain them before setting OBJ_DEAD flag. Also stop asserting pip == 0 in object destructor. Update comments explaining the interaction between paging_in_progress and termination. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 20:57:02 +00:00
Konstantin Belousov	419e5698a0	Atomically update vm_object vnp_size, where atomic is available. This will be used later, where it matters on 32bit arches. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 20:52:24 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Conrad Meyer	ea7b737a6f	vm_pageout: Correct threshold calculation on single-CPU systems Reported by: Michael Butler X-MFC-With: r364129	2020-08-14 18:48:48 +00:00
Conrad Meyer	b7883452d4	Back out unrelated change Reported by: kib, markj X-MFC-With: r364129	2020-08-12 00:21:30 +00:00
Conrad Meyer	0292c54bdb	Add support for multithreading the inactive queue pageout within a domain. In very high throughput workloads, the inactive scan can become overwhelmed as you have many cores producing pages and a single core freeing. Since Mark's introduction of batched pagequeue operations, we can now run multiple inactive threads working on independent batches. To avoid confusing the pid and other control algorithms, I (Jeff) do this in a mpi-like fan out and collect model that is driven from the primary page daemon. It decides whether the shortfall can be overcome with a single thread and if not dispatches multiple threads and waits for their results. The heuristic is based on timing the pageout activity and averaging a pages-per-second variable which is exponentially decayed. This is visible in sysctl and may be interesting for other purposes. I (Jeff) have verified that this does indeed double our paging throughput when used with two threads. With four we tend to run into other contention problems. For now I would like to commit this infrastructure with only a single thread enabled. The number of worker threads per domain can be controlled with the 'vm.pageout_threads_per_domain' tunable. Submitted by: jeff (earlier version) Discussed with: markj Tested by: pho Sponsored by: probably Netflix (based on contemporary commits) Differential Revision: https://reviews.freebsd.org/D21629	2020-08-11 20:37:45 +00:00
Mark Johnston	af32cefd7c	Check the UMA zone's full bucket cache before short-circuiting an alloc. The global "bucketdisable" flag indicates that we are in a low memory situation and should avoid allocating buckets. However, in the allocation path we were checking it before the full bucket cache and bailing even if the cache is non-empty. Defer the check so that we have a shot at allocating from the cache. This came up because M_NOWAIT allocations from the buf trie node zone must always succeed. In one scenario, all of the preallocated trie nodes were in the bucket list, and a new slab allocation could not succeed due to a memory shortage. The short-circuiting caused an allocation failure which triggered a panic. Reported by: pho Reviewed by: cem Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25980	2020-08-10 20:34:45 +00:00
Brooks Davis	9f9cc3f989	Preserve ASLR vm_map flags across fork In the most common case (fork+execve) this doesn't matter, but further attempts to apply entropy would fail in (e.g.) a pre-fork server. Reported by: Alfredo Mazzinghi Reviewed by: kib, markj Obtained from: CheriBSD MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D25966	2020-08-06 16:20:20 +00:00
Mark Johnston	efec381dd1	Remove most lingering references to the page lock in comments. Finish updating comments to reflect new locking protocols introduced over the past year. In particular, vm_page_lock is now effectively unused. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25868	2020-08-04 14:59:43 +00:00
Mark Johnston	96ad26eefb	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
Mark Johnston	958d8f527c	Remove the volatile qualifier from busy_lock. Use atomic(9) to load the lock state. Some places were doing this already, so it was inconsistent. In initialization code, the lock state is still initialized with plain stores. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25861	2020-07-29 19:38:49 +00:00
Mark Johnston	f72e5be58a	vm_page_xbusy_claim(): Use atomics to update busy lock state. vm_page_xbusy_claim() could clobber the waiter bit. For its original use, kernel memory pages, this was not a problem since nothing would ever block on the busy lock for such pages. r363607 introduced a new use where this could in principle be a problem. Fix the problem by using atomic_cmpset to update the lock owner. Since this macro is defined only for INVARIANTS kernels the extra overhead doesn't seem prohibitive. Reported by: vangyzen Reviewed by: alc, kib, vangyzen Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25859	2020-07-28 19:50:39 +00:00
Mark Johnston	782ebde52e	vm_page_free_invalid(): Relax the xbusy assertion. vm_page_assert_xbusied() asserts that the busying thread is the current thread. For some uses of vm_page_free_invalid() (e.g., error handling in vnode_pager_generic_getpages_done()), this condition might not hold. Reported by: Jenkins via trasz Reviewed by: chs, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25828	2020-07-27 14:25:10 +00:00
Doug Moore	00fd73d2da	Fix an overflow bug in the blist allocator that needlessly capped max swap size by dividing a value, which was always a multiple of 64, by 64. Remove the code that reduced max swap size down to that cap. Eliminate the distinction between BLIST_BMAP_RADIX and BLIST_META_RADIX. Call them both BLIST_RADIX. Make improvments to the blist self-test code to silence compiler warnings and to test larger blists. Reported by: jmallett Reviewed by: alc Discussed with: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25736	2020-07-25 18:29:10 +00:00
Mateusz Guzik	ee74412269	vm: fix swap reservation leak and clean up surrounding code The code did not subtract from the global counter if per-uid reservation failed. Cleanup highlights: - load overcommit once - move per-uid manipulation to dedicated routines - don't fetch wire count if requested size is below the limit - convert return type from int to bool - ifdef the routines with _KERNEL to keep vm.h compilable by userspace Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D25787	2020-07-24 13:23:32 +00:00
Mateusz Guzik	126a2470b9	vm: annotate swap_reserved with __exclusive_cache_line The counter keeps being updated all the time and variables read afterwards share the cacheline. Note this still fundamentally does not scale and needs to be replaced, in the meantime gets a bandaid. brk1_processes -t 52 ops/s: before: 8598298 after: 9098080	2020-07-23 08:42:16 +00:00
Chuck Silvers	1bd12a3bb2	Fix vnode_pager handling of read ahead/behind pages when a disk read fails. Rather than marking the read ahead/behind pages valid even though they were not initialized, free them using the new function vm_page_free_invalid(). Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:10:35 +00:00
Chuck Silvers	4dfa06e114	Add a new function vm_page_free_invalid() for freeing invalid pages that might be wired. If the page is wired then it cannot be freed now, but the thread that eventually unwires it will free it at that point. Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:09:36 +00:00
Chuck Silvers	c3dbadc1fd	Revert my change from r361855 in favor of a better fix. Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:08:01 +00:00
Mark Johnston	a7752896f0	Add vm_map_valid_range_KBI(). This is required for standalone module builds. Reported by: hselasky Reviewed by: dougm, hselasky, kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D25650	2020-07-13 16:39:27 +00:00
Scott Long	ffc568ba8b	Revert r362998, r326999 while a better compatibility strategy is devised.	2020-07-09 22:38:36 +00:00
Scott Long	b302c2e5c9	Migrate the feature of excluding RAM pages to use "excludelist" as its nomenclature. MFC after: 1 week	2020-07-07 20:33:11 +00:00
Conrad Meyer	8a64110e43	vm: Add missing WITNESS warnings for M_WAITOK allocation vm_map_clip_{end,start} and lookup_clip_start allocate memory M_WAITOK for !system_map vm_maps. Add WITNESS warning annotation for !system_map callers who may be holding non-sleepable locks. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25283	2020-06-29 16:54:00 +00:00
Mark Johnston	8c277118d8	Fix UMA's first-touch policy on systems with empty domains. Suppose a thread is running on a CPU in a NUMA domain with no physical RAM. When an item is freed to a first-touch zone, it ends up in the cross-domain bucket. When the bucket is full, it gets placed in another domain's bucket queue. However, when allocating an item, UMA will always go to the keg upon a per-CPU cache miss because the empty domain's bucket queue will always be empty. This means that a non-empty domain's bucket queues can grow very rapidly on such systems. For example, it can easily cause mbuf allocation failures when the zone limit is reached. Change cache_alloc() to follow a round-robin policy when running on an empty domain. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25355	2020-06-28 21:35:04 +00:00
Konstantin Belousov	ee06cffcd2	vm_page_free_prep(): correct description of the required page and object state. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25482	2020-06-27 02:31:39 +00:00
Mark Johnston	84242cf68a	Call swap_pager_freespace() from vm_object_page_remove(). All vm_object_page_remove() callers, except linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when removing a range of pages from an object. The LinuxKPI case appears to be an unintentional omission that could result in leaked swap blocks, so unconditionally free swap space in vm_object_page_remove() to protect against similar bugs in the future. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25329	2020-06-25 15:21:21 +00:00
Jeff Roberson	c8b0a88b8d	Clarify some language. Favor primary where both master and primary were used in conjunction with secondary.	2020-06-20 20:21:04 +00:00
Edward Tomasz Napierala	52c81be11a	Add linux_madvise(2) instead of having Linux apps call the native FreeBSD madvise(2) directly. While some of the flag values match, most don't. PR: kern/230160 Reported by: markj Reviewed by: markj Discussed with: brooks, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25272	2020-06-20 18:29:22 +00:00
Mark Johnston	cdd02f43b9	Revert r362360. This commit was simply wrong since two different objects are locked. Reported by: lwhsu, pho Pointy hat: markj	2020-06-19 11:04:49 +00:00
Mark Johnston	f034074034	Restore a check unintentionally dropped in r362361. MFC with: r362361	2020-06-19 04:18:20 +00:00
Mark Johnston	0f1e6ec591	Add a helper function for validating VA ranges. Functions which take untrusted user ranges must validate against the bounds of the map, and also check for wraparound. Instead of having the same logic duplicated in a number of places, add a function to check. Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25328	2020-06-19 03:32:04 +00:00
Mark Johnston	61b006887e	Fix a double object unlock in vm_object_backing_collapse_wait(). Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25327	2020-06-19 03:31:46 +00:00
Conrad Meyer	a116b5d3e4	vm: Drop vm_map_clip_{start,end} macro wrappers No functional change. Reviewed by: dougm, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25282	2020-06-16 22:53:56 +00:00
Eric van Gyzen	8cc8c5864a	Honor db_pager_quit in some vm_object ddb commands These can be rather verbose. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2020-06-12 21:53:08 +00:00
Mateusz Guzik	7ce3a31286	vm: rework swap_pager_status to execute in constant time The lock-protected iteration is trivially avoidable. This removes a serialisation point from Linux binaries (which end up calling here from the sysinfo syscall).	2020-06-09 14:16:18 +00:00
Chuck Silvers	bd7d64f548	Don't mark pages as valid if reading the contents from disk fails. Instead, just skip marking pages valid if the read fails. Future attempts to access such pages will notice that they are not marked valid and try to read them from disk again. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25138	2020-06-06 00:47:59 +00:00
Ed Maste	4d13f78444	Correct terminology in vm.imply_prot_max sysctl description As with r361769 (man page), PROT_* are properly called protections, not permissions. MFC after: 1 week MFC with: r361769 Sponsored by: The FreeBSD Foundation	2020-06-04 01:49:29 +00:00
Mateusz Guzik	1c58c09f5a	uma: hide item_domain under ifdef NUMA Fixes build warnings on mips.	2020-05-29 08:30:35 +00:00
Mark Johnston	81302f1d77	Fix boot on systems where NUMA domain 0 is unpopulated. - Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to ensure that segments registered during hammer_time() are placed in the right domain. Otherwise, since the SRAT is not parsed at that point, we just add them to domain 0, which may be incorrect and results in a domain with only several MB worth of memory. - Fix uma_startup1() to try allocating memory for zones from any domain. If domain 0 is unpopulated, the allocation will simply fail, resulting in a page fault slightly later during boot. - Change _vm_phys_domain() to return -1 for addresses not covered by the affinity table, and change vm_phys_early_alloc() to handle wildcard domains. This is necessary on amd64, where the page array is dense and pmap_page_array_startup() may allocate page table pages for non-existent page frames. Reported and tested by: Rafael Kitover <rkitover@gmail.com> Reviewed by: cem (earlier version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25001	2020-05-28 19:41:00 +00:00
Konstantin Belousov	fe0dcc402f	Simplify the condition to enable superpage mappings in vm_fault_soft_fast(). The list of arches list there matches the list of arches where default VM_NRESERVLEVEL > 0. Before sparc64 removal, that was the only arch that defined VM_NRESERVLEVEL > 0 to help with cache coloring, but did not implemented superpages. Now it can be simplified. Submitted by: alc Reviewed by: markj	2020-05-27 21:44:26 +00:00
Justin Hibbits	d4ed51f329	Properly sort ifdef archs in vm_fault_soft_fast superpage guards. Sort broken in r360887.	2020-05-27 01:35:46 +00:00
Mark Johnston	dc2b320563	Allocate UMA per-CPU counters earlier. Otherwise anything counted before SI_SUB_VM_CONF is discarded. However, it is useful to be able to see stats from allocations done early during boot. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24756	2020-05-14 16:06:54 +00:00
Kyle Evans	c79cee7136	kernel: provide panicky version of __unreachable __builtin_unreachable doesn't raise any compile-time warnings/errors on its own, so problems with its usage can't be easily detected. While it would be nice for this situation to change and compilers to at least add a warning for trivial cases where local state means the instruction can't be reached, this isn't the case at the moment and likely will not happen. This commit adds an __assert_unreachable, whose intent is incredibly clear: it asserts that this instruction is unreachable. On INVARIANTS builds, it's a panic(), and on non-INVARIANTS it expands to __unreachable(). Existing users of __unreachable() are converted to __assert_unreachable, to improve debuggability if this assumption is violated. Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D23793	2020-05-13 18:07:37 +00:00
Justin Hibbits	65bbba25d2	powerpc64: Implement Radix MMU for POWER9 CPUs Summary: POWER9 supports two MMU formats: traditional hashed page tables, and Radix page tables, similar to what's presesnt on most other architectures. The PowerISA also specifies a process table -- a table of page table pointers-- which on the POWER9 is only available with the Radix MMU, so we can take advantage of it with the Radix MMU driver. Written by Matt Macy. Differential Revision: https://reviews.freebsd.org/D19516	2020-05-11 02:33:37 +00:00
Mark Johnston	a9ea09e548	Re-check for wirings after busying the page in vm_page_release_locked(). A concurrent unlocked lookup can wire the page after vm_page_release_locked() releases the last wiring, in which case vm_page_release_locked() must not free the page. Once the xbusy lock is acquired, that, the object lock and the fact that the page is unmapped ensure that the wire count cannot increase, so re-check for new wirings after the page is xbusied. Update the comment above vm_page_wired() to reflect the new synchronization rules. Reported by: glebius Reviewed by: alc, jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24592	2020-04-28 13:51:41 +00:00
Mark Johnston	f13fa9df05	Use a single VM object for kernel stacks. Previously we allocated a separate VM object for each kernel stack. However, fully constructed kernel stacks are cached by UMA, so there is no harm in using a single global object for all stacks. This reduces memory consumption and makes it easier to define a memory allocation policy for kernel stack pages, with the aim of reducing physical memory fragmentation. Add a global kstack_object, and use the stack KVA address to index into the object like we do with kernel_object. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24473	2020-04-26 20:08:57 +00:00
Mark Johnston	33655d9546	Factor out the kmem contig page alloc and reclamation code. kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each other's page allocation and reclamation logic. Place it in a single function to make it easier to add additional consumers. No functional change intended. Reviewed by: jeff, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24475	2020-04-21 16:01:44 +00:00
Mark Johnston	303b77029b	Minimize conditional compilation for handling of M_EXEC. This simplifies some planned changes. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24474	2020-04-21 15:55:28 +00:00
Mark Johnston	70e68b19a4	Handle trashed queue pointers in vm_page_acquire_unlocked(). vm_page_acquire_unlocked() relies on type-stability of vm_page structures and assumes that the listq linkage pointers always point to a vm_page or are NULL. QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so add an explicit check for a trashed queue pointer before dereferencing. Reported and tested by: pho Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24472	2020-04-20 14:45:17 +00:00
Bryan Drewery	adc0388117	Remove dead code leftover from r331018. Sponsored by: Dell EMC	2020-03-31 01:12:53 +00:00
Konstantin Belousov	abfdf76791	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038	2020-03-30 21:44:30 +00:00
Konstantin Belousov	a7c55b3e1b	ddb show pginfo: print pages reference value in hex. It is more useful this way after the VPRC_ flags were introduced. Sponsored by: The FreeBSD Foundation	2020-03-28 12:21:52 +00:00
Jeff Roberson	d1105e9441	Check for busy or wired in vm_page_relookup(). Some callers will only keep a page wired and expect it to still be present. Reported by: delphij@FreeBSD.org Reviewed by: kib	2020-03-11 22:25:45 +00:00
Mark Johnston	54007ce8ae	Clean up uma_int.h a bit. This makes it easier to write libkvm programs that access UMA data structures. - Remove a couple of unused slab functions and make others local to uma_core.c. Similarly move SLAB_BITSETS, which affects the layout of slab structures, to uma_core.c. - Stop defining the slab structures under _KERNEL. There's no real reason they can't be visible to userspace like the rest of UMA's structures are. - Group KEG_ASSERT_COLD with other keg macros. - Convert an assertion about MAXMEMDOM to use _Static_assert. No functional change intended. Discussed with: jeff Reviewed by: rlibby Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23980	2020-03-07 15:37:23 +00:00
Mark Johnston	3fba886874	Move SMR pointer type definition and access macros to smr_types.h. The intent is to provide a header that can be included by other headers without introducing too much pollution. smr.h depends on various headers and will likely grow over time, but is less likely to be required by system headers. Rename SMR_TYPE_DECLARE() to SMR_POINTER(): - One might use SMR to protect more than just pointers; it could be used for resizeable arrays, for example, so TYPE seems too generic. - It is useful to be able to define anonymous SMR-protected pointer types and the _DECLARE suffix makes that look wrong. Reviewed by: jeff, mjg, rlibby Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23988	2020-03-07 00:55:46 +00:00
Brooks Davis	3823a5990a	Remove an apparently incorrect assertion. Without this change mips64 fails to boot. Discussed with: markj Sponsored by: DARPA	2020-03-06 23:31:09 +00:00
Mark Johnston	d869a17e62	Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978	2020-03-06 19:10:00 +00:00
Brooks Davis	d718de812f	Introduce kern_mmap_req(). This presents an extensible interface to the generic mmap(2) implementation via a struct pointer intended to use a designated initializer or compount literal. We take advantage of the mandatory zeroing of fields not listed in the initializer. Remove kern_mmap_fpcheck() and use kern_mmap_req(). The motivation for this change is a desire to keep the core implementation from growing an ever-increasing number of arguments that must be specified in the correct order for the lowest-level implementations. In CheriBSD we have already added two more arguments. Reviewed by: kib Discussed with: kevans Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D23164	2020-03-04 21:27:12 +00:00
Mark Johnston	1ed42f6fdd	Avoid doubly wiring a newly allocated page in vm_page_grab_valid(). This fixes a regression from r358363. Reported by: manu, jbeich Tested by: jbeich	2020-03-01 22:09:11 +00:00
Mateusz Guzik	7f746c9fcc	vm: add debug to uma_zone_set_smr Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23902	2020-03-01 21:49:16 +00:00
Jeff Roberson	6be21eb778	Provide a lock free alternative to resolve bogus pages. This is not likely to be much of a perf win, just a nice code simplification. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23866	2020-02-28 21:42:48 +00:00
Jeff Roberson	7aaf252c96	Convert a few triviail consumers to the new unlocked grab API. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23847	2020-02-28 20:34:30 +00:00
Jeff Roberson	3f39f80ab3	Support the NOCREAT flag for grab_valid_unlocked. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23865	2020-02-28 20:32:35 +00:00
Jeff Roberson	1a0c234eb2	Simplify vref() code in object_reference. The local temporary is no longer necessary. Fix formatting errors. Reported by: mjg Discussed with: kib	2020-02-28 20:30:53 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Jeff Roberson	fe835cbf5f	A pair of performance improvements. Swap buckets on free as well as alloc so that alloc is always the most cache-hot data. When selecting a zone domain for the round-robin bucket cache use the local domain unless there is a severe imbalance. This does not affinitize memory, only locks and queues. Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23824	2020-02-27 08:23:10 +00:00
Jeff Roberson	c49be4f1c6	Add unlocked grab* function variants that use lockless radix code to lookup pages. These variants will fall back to their locked counterparts if the page is not present. Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D23449	2020-02-27 02:37:27 +00:00
Ed Maste	acb8858f05	Return ENOTSUP for mmap/mprotect if prot not subset of prot_max From POSIX, [ENOTSUP] The implementation does not support the combination of accesses requested in the prot argument. This fits the case that prot contains permissions which are not a subset of prot_max. Reviewed by: brooks, cem Relnotes: Yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23843	2020-02-26 20:03:43 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Doug Moore	36b01270d1	The last argument to swp_pager_getswapspace is always 1. Remove that argument. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23810	2020-02-24 04:01:09 +00:00
Mark Johnston	7ca5539285	Allow swap_pager_putpages() to allocate one block at a time. The minimum allocation size of 4 blocks is an old policy that came with the "new" swap pager in r42957. Since then the blist allocator has gotten better at reducing fragmentation; for example, with r349777 it can return a range that spans multiple leaves. When swap space is close to being exhaused, the minimum of 4 blocks most likely exacerbates memory pressure, so reduce it to 1. Reported by: alc Tested by: pho Reviewed by: alc, dougm, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23763	2020-02-23 17:59:51 +00:00
Ryan Libby	eaa17d4291	sys/vm: quiet -Wwrite-strings Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796	2020-02-23 03:32:04 +00:00
Mark Johnston	0464f16e91	Constify uma_zcache_create() and uma_zsecond_create()'s "name" argument. It is already internally handled as a pointer to a const string, in particular by uma_zcreate(). Fix indentation while here. MFC after: 1 week	2020-02-22 17:44:28 +00:00
Kyle Evans	cef81f8f01	vm_radix: prefer __builtin_unreachable() to an unreachable panic() This provides the needed hint to GCC and offers an annotation for readers to observe that it's in-fact impossible to hit this point. We'll get hit with a a -Wswitch error if the enum applicable to the switch above were to get expanded without the new value(s) being handled.	2020-02-22 16:20:04 +00:00
Jeff Roberson	226dd6db47	Add an atomic-free tick moderated lazy update variant of SMR. This enables very cheap read sections with free-to-use latencies and memory overhead similar to epoch. On a recent AMD platform a read section cost 1ns vs 5ns for the default SMR. On Xeon the numbers should be more like 1 ns vs 11. The memory consumption should be proportional to the product of the free rate and 2*1/hz while normal SMR consumption is proportional to the product of free rate and maximum read section time. While here refactor the code to make future additions more straightforward. Name the overall technique Global Unbound Sequences (GUS) and adjust some comments accordingly. This helps distinguish discussions of the general technique (SMR) vs this specific implementation (GUS). Discussed with: rlibby, markj	2020-02-22 03:44:10 +00:00
Warner Losh	cafbf0c664	Don't convert all lower-layer errors to EIO. Don't convert all lower layer errors to EIO. Instead, pass the actual error up the stack. This will allow the upper layers that look for ENXIO to react properly to that signal from the lower layers and, for UFS, unmount the filesystem. Reviewed by: kib@ Differential Revision: https://reviews.freebsd.org/D23755	2020-02-20 01:33:01 +00:00
Warner Losh	65252dc903	Don't spam the console with an additional, and useless, error message. There's no need to spam the console with this error message. If there's an I/O error, the disk/cam driver will report it at the lower levels. If that's an actual problem, the upper layers will report that. Reviewed by: kib@ Differential Revision: https://reviews.freebsd.org/D23756	2020-02-20 00:34:46 +00:00
Jeff Roberson	4b3dac72b3	Silence a gcc warning about no return from a function that handles every possible enum in a switch statement. I verified that this emits nothing as expected on clang. radix relies on constant propagation to eliminate any branching from these access routines. Reported by: lwhsu/tinderbox	2020-02-19 22:34:22 +00:00
Jeff Roberson	1ddda2eb24	Use SMR to provide a safe unlocked lookup for vm_radix. The tree is kept correct for readers with store barriers and careful ordering. The existing object lock serializes writers. Consumers will be introduced in later commits. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23446	2020-02-19 19:58:31 +00:00
Jeff Roberson	c6fd3e23f7	Use per-domain locks for the bucket cache. This gives much better concurrency when there are a large number of cores per-domain and multiple domains. Avoid taking the lock entirely if it will not be productive. ROUNDROBIN domains will have mixed memory in each domain and will load balance to all domains. While here refactor the zone/domain separation and bucket limits to simplify callers. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23673	2020-02-19 18:48:46 +00:00
Jeff Roberson	e9ceb9dd11	Don't release xbusy on kmem pages. After lockless page lookup we will not be able to guarantee that they can be racquired without blocking. Reviewed by: kib Discussed with: markj Differential Revision: https://reviews.freebsd.org/D23506	2020-02-19 09:10:11 +00:00
Jeff Roberson	6c5f36ff30	Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in virtual address or physical page allocation need to be marked with this flag. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D23712	2020-02-19 08:17:27 +00:00
Mark Johnston	34e2051faf	Remove swblk_t. It was used only to store the bounds of each swap device. However, since swblk_t is a signed 32-bit int and daddr_t is a signed 64-bit int, swp_pager_isondev() may return an invalid result if swap devices are repeatedly added and removed and sw_end for a device ends up becoming a negative number. Note that the removed comment about maximum swap size still applies. Reviewed by: jeff, kib Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23666	2020-02-17 15:11:07 +00:00
Mark Johnston	725b4ff001	Fix a swap block allocation race. putpages' allocation of swap blocks is done under the global sw_dev lock. Previously it would drop that lock before inserting the allocated blocks into the object's trie, creating a window in which swap blocks are allocated but are not visible to swapoff. This can cause swp_pager_strategy() to fail and panic the system. Fix the problem bluntly, by allocating swap blocks under the object lock. Reviewed by: jeff, kib Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23665	2020-02-17 15:10:41 +00:00
Mark Johnston	c90d075be4	Fix object locking races in swapoff(2). swap_pager_swapoff_object()'s goal is to allocate pages for all valid swap blocks belonging to the object, for which there is no resident page. If the page corresponding to a block is already resident and valid, the block can simply be discarded. The existing implementation tries to minimize the number of I/Os used. For each cluster of swap blocks, it finds maximal runs of valid swap blocks not resident in memory, and valid resident pages. During this processing, the object lock may be dropped in several places: when calling getpages, or when blocking on a busy page in vm_page_grab_pages(). While the lock is dropped, another thread may free swap blocks, causing getpages to page in stale data. Fix the problem following a suggestion from Jeff: use getpages' readahead capability to perform clustering rather than doing it ourselves. The simplies the code a bit without reintroducing the old behaviour of performing one I/O per page. Reviewed by: jeff Reported by: dhw, gallatin Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23664	2020-02-17 15:09:40 +00:00
Jeff Roberson	ed581bf68f	Add a simple accessor that returns the bytes of memory consumed by a zone.	2020-02-17 01:59:55 +00:00
Jeff Roberson	f212367b42	Refactor _vm_page_busy_sleep to reduce the delta between the various sleep routines and introduce a variant that supports lockless sleep. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23612	2020-02-17 01:08:00 +00:00
Jeff Roberson	70260874ac	UMA has become more particular about zone types. Use the right allocator calls in uma_zwait().	2020-02-17 01:06:18 +00:00
Jeff Roberson	6d88d784f8	Slightly restructure uma_zalloc* to generate better code from clang and reduce duplication among zalloc functions. Reviewed by: markj Discussed with: mjg Differential Revision: https://reviews.freebsd.org/D23672	2020-02-16 01:07:19 +00:00
Mateusz Guzik	3379d2f926	vm: use new capsicum helpers	2020-02-15 01:29:07 +00:00
Mateusz Guzik	23ed568caa	vm: remove no longer needed atomic_load_ptr casts	2020-02-14 23:16:29 +00:00
Mark Johnston	06ef60525f	Fix handling of WAITFAIL in vm_page_grab() and vm_page_grab_pages(). After sleeping through a memory shortage, we must return NULL rather than retry. Discussed with: jeff Reported by: pho Sponsored by: The FreeBSD Foundation	2020-02-13 23:18:35 +00:00
Mark Johnston	cefc92e1a2	Update the zone-global count of cached items in bucket_cache_reclaim(). This was missed in r351673. The count is used to enfore cache limits, which are rarely used. Discussed with: jeff Sponsored by: The FreeBSD Foundation	2020-02-13 23:15:21 +00:00
Jeff Roberson	543117bed8	Fix a case where ub_seq would fail to be set if the cross bucket was flushed due to memory pressure. Reviewed by: markj Differential Revision: http://reviews.freebsd.org/D23614	2020-02-13 20:58:51 +00:00
Mateusz Guzik	3acb6572fc	Store offset into zpcpu allocations in the per-cpu area. This shorten zpcpu_get and allows more optimizations. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23570	2020-02-12 11:11:22 +00:00
Mark Johnston	4ab3aee8fb	Reduce lock hold time in keg_drain(). Maintain a count of free slabs in the per-domain keg structure and use that to clear the free slab list in constant time for most cases. This helps minimize lock contention induced by reclamation, in preparation for proactive trimming of excesses of free memory. Reviewed by: jeff, rlibby Tested by: pho Differential Revision: https://reviews.freebsd.org/D23532	2020-02-11 20:06:33 +00:00
Jonathan T. Looney	3c200db9d2	Modify the vm.panic_on_oom sysctl to take a count of events. Currently, the vm.panic_on_oom sysctl is a boolean which controls the behavior of the VM system when it encounters an out-of-memory situation. If set to 0, the VM system kills the largest process. If set to any other value, the VM system will initiate a panic. This change makes the sysctl a count of events. If set to 0, the VM system kills the largest process. If set to any other value, the VM system will kill the largest process until it has seen the specified number of out-of-memory events. Once it reaches the specified number of events, it will initiate a panic. This change is helpful in capturing cores when the system is in a perpetual cycle of out-of-memory events (as opposed to just hitting one or two sporadic out-of-memory events). Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23601	2020-02-10 18:06:38 +00:00
Ryan Libby	bae55c4aec	uma: remove UMA_ZFLAG_CACHEONLY flag UMA_ZFLAG_CACHEONLY was essentially the same thing as UMA_ZONE_VM, but with a more confusing name. Remove the flag, make UMA_ZONE_VM an inherit flag, and replace all references. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23516	2020-02-06 08:32:25 +00:00
Ryan Libby	33e5a1ea3b	uma: multipage chicken switch Add a switch to allow disabling multipage slabs, in order to facilitate measuring memory usage and performance effects. The tunable vm.debug.uma_multipage_slabs defaults to 1 and can be set to 0 to disable. The name may change soon. Reviewed by: markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23487	2020-02-04 22:40:45 +00:00
Ryan Libby	27ca37acb7	uma: grow slabs to enforce minimum memory efficiency Memory efficiency can be poor with awkward item sizes (e.g. 1/2 or 1 page size + epsilon). In order to achieve a minimum memory efficiency, select a slab size with a potentially larger number of pages if it yields a lower portion of waste. This may mean using page_alloc instead of uma_small_alloc, which could be more costly. Discussed with: jeff, mckusick Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23239	2020-02-04 22:40:34 +00:00
Ryan Libby	ec0d828071	uma: add UMA_ZONE_CONTIG, and a default contig_alloc For now, copy the mbuf allocator. Reviewed by: jeff, markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23237	2020-02-04 22:40:11 +00:00
Ryan Libby	5ba16cf3d7	uma: pcpu_page_free needs to startup_free pages from startup_alloc After r357392, it is apparent that we do have some early-boot PCPU zones. Make it so we can safely free pages from them if they are actually used during early boot. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23496	2020-02-04 22:39:58 +00:00
Jeff Roberson	ee9e43f8dd	Add an explicit busy state for free pages. This improves behavior with potential bugs that access freed pages as well as providing a path towards lockless page lookup. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23444	2020-02-04 20:33:01 +00:00
Jeff Roberson	e84130a0c0	Use literal bucket sizes for smaller buckets rather than the rounding system. Small bucket sizes already pack well even if they are an odd number of words. This prevents any potential new instances of the problem fixed in r357463 as well as making the system easier to understand. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23494	2020-02-04 20:28:06 +00:00
Konstantin Belousov	8d34a3bf7d	Enable vm_object_mightbedirty() and vm_object_page_clean() for swap objects backing tmpfs vnodes data. The clean scan is limited to only remove write permissions from the mapped pages of the objects. This fixes the issue that tmpfs vnode mtime is not updated from writes to the mmaped area after the initial page-in. Noted by: mjg Reviewed by: markj Discussed with: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23432	2020-02-04 19:03:37 +00:00
Jeff Roberson	dc3915c8c6	Use STAILQ instead of TAILQ for bucket lists. We only need FIFO behavior and this is more space efficient. Stop queueing recently used buckets to the head of the list. If the bucket goes to a different processor the cache coherency will be more expensive. We already try to encourage cache-hot behavior in the per-cpu layer. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23493	2020-02-04 02:41:24 +00:00
Mark Johnston	36cb95c736	Disable the smallest UMA bucket size on 32-bit platforms. With r357314, sizeof(struct uma_bucket) grew to 16 bytes on 32-bit platforms, so BUCKET_SIZE(4) is 0. This resulted in the creation of a bucket zone for buckets with zero capacity. A more general fix is planned, but for now this bandaid allows 32-bit platforms to boot again. PR: 243837 Discussed with: jeff Reported by: pho, Jenkins via lwhsu Tested by: pho Sponsored by: The FreeBSD Foundation	2020-02-03 19:29:02 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Jeff Roberson	f96d4157a7	Fix a bug in r356776 where the page allocator was not properly restored to the percpu page allocator after it had been temporarily overridden by startup_alloc. Reported by: pho, bdragon	2020-02-01 23:46:30 +00:00
Mark Johnston	f0a273c00f	Remove a couple of lingering usages of the page lock. Update vm_page_scan_contig() and vm_page_reclaim_run() to stop using vm_page_change_lock(). It has no use after r356157. Remove vm_page_change_lock() now that it has no users. Remove an unncessary check for wirings in vm_page_scan_contig(), which was previously checking twice. The check is racy until vm_page_reclaim_run() ensures that the page is unmapped, so one check is sufficient. Reviewed by: jeff, kib (previous versions) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23279	2020-02-01 18:23:51 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Jeff Roberson	9e47b34110	Fix LINT build with MEMGUARD.	2020-01-31 02:03:22 +00:00
Jeff Roberson	d4665eaa66	Implement a safe memory reclamation feature that is tightly coupled with UMA. This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is a unique algorithm. This has 3x the performance of epoch in a write heavy workload with less than half of the read side cost. The memory overhead is significantly lessened by limiting the free-to-use latency. A synthetic test uses 1/20th of the memory vs Epoch. There is significant further discussion in the comments and code review. This code should be considered experimental. I will write a man page after it has settled. After further validation the VM will begin using this feature to permit lockless page lookups. Both markj and cperciva tested on arm64 at large core counts to verify fences on weaker ordering architectures. I will commit a stress testing tool in a follow-up. Reviewed by: mmacy, markj, rlibby, hselasky Discussed with: sbahara Differential Revision: https://reviews.freebsd.org/D22586	2020-01-31 00:49:51 +00:00
Konstantin Belousov	b70f6e1513	Restore OOM logic on page fault after r357026. Right now OOM is initiated unconditionally on the page allocation failure, after the wait. Reported by: Mark Millard <marklmi@yahoo.com> Reviewed by: cy, markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23409	2020-01-29 12:02:47 +00:00
Konstantin Belousov	cd0047f3a9	Handle a race of collapse with a retrying fault. Both vm_object_scan_all_shadowed() and vm_object_collapse_scan() might observe an invalid page left in the default backing object by the fault handler that retried. Check for the condition and refuse to collapse. Reported and tested by: pho Reviewed by: jeff Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23331	2020-01-24 19:42:53 +00:00
Doug Moore	c7b23459b2	Most uses of vm_map_clip_start follow a call to vm_map_lookup. Define an inline function vm_map_lookup_clip_start that invokes them both and use it in places that invoke both. Drop a couple of local variables made unnecessary by this function. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22987	2020-01-24 07:48:11 +00:00
Mark Johnston	e6bd3a812d	vm_map_submap(): Avoid unnecessary clipping. A submap can only be created from an entry spanning the entire request range. In particular, if vm_map_lookup_entry() returns false or the returned entry contains "end". Since the only use of submaps in FreeBSD is for the static pipe and execve argument KVA maps, this has no functional effect. Github PR: https://github.com/freebsd/freebsd/pull/420 Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> (original) Reviewed by: dougm, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23299	2020-01-23 16:45:10 +00:00
Jeff Roberson	fb4d37eac1	(fault 9/9) Move zero fill into a dedicated function to make the object lock state more clear. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23326	2020-01-23 05:23:37 +00:00
Jeff Roberson	be9d4fd6b4	(fault 8/9) Restructure some code to reduce duplication and simplify flow control. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23321	2020-01-23 05:22:02 +00:00
Jeff Roberson	df794f5caf	(fault 7/9) Move fault population and allocation into a dedicated function Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23320	2020-01-23 05:19:39 +00:00
Jeff Roberson	5909dafea9	(fault 6/9) Move getpages and associated logic into a dedicated function. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23311	2020-01-23 05:18:00 +00:00
Jeff Roberson	91eb2e908f	(fault 5/9) Move the backing_object traversal into a dedicated function. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23310	2020-01-23 05:14:41 +00:00
Jeff Roberson	5936b6a8f1	(fault 4/9) Move copy-on-write into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23304	2020-01-23 05:11:01 +00:00
Jeff Roberson	fcb0475833	(fault 3/9) Move map relookup into a dedicated function. Add a new VM return code KERN_RESTART which means, deallocate and restart in fault. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23303	2020-01-23 05:07:01 +00:00
Jeff Roberson	c308a3a6c9	(fault 2/9) Move map lookup into a dedicated function. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23302	2020-01-23 05:05:39 +00:00
Jeff Roberson	2c2f4413cc	(fault 1/9) Move a handful of stack variables into the faultstate. This additionally fixes a potential bug/pessimization where we could fail to reload the original fault_type on restart. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23301	2020-01-23 05:03:34 +00:00
Ryan Libby	8d1c459ae5	uma: fix zone domain overlaying pcpu cache with disabled cpus UMA zone structures have two arrays at the end which are sized according to the machine: an array of CPU count length, and an array of NUMA domain count length. The CPU counting was wrong in the case where some CPUs are disabled (when mp_ncpus != mp_maxid + 1), and this caused the second array to be overlaid with the first. Reported by: olivier Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23318	2020-01-23 04:56:38 +00:00
Ryan Libby	7e2406774e	uma: report leaks more accurately Previously UMA had some false negatives in the leak report at keg destruction time, where it only reported leaks if there were free items in the slab layer (rather than allocated items), which notably would not be true for single-item slabs (large items). Now, report a leak if there are any allocated pages, and calculate and report the number of allocated items rather than free items. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23275	2020-01-23 04:56:34 +00:00
Jeff Roberson	91e31c3c08	Consistently use busy and vm_page_valid() rather than touching page bits directly. This improves API compliance, asserts, etc. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23283	2020-01-23 04:54:49 +00:00
Jeff Roberson	530cc6a25d	Some architectures with DMAP still consume boot kva. Simplify the test for claiming kva in uma_startup2() to handle this. Reported by: bdragon	2020-01-23 03:37:35 +00:00
Jeff Roberson	5949b1ca8c	Move readahead and dropbehind fault functionality into a helper routine for clarity. Reviewed by: dougm, kib, markj Differential Revision: https://reviews.freebsd.org/D23282	2020-01-21 00:12:57 +00:00
Jeff Roberson	1e40fe41c5	Reduce object locking in vm_fault. Once we have an exclusively busied page we no longer need an object lock. This reduces the longest hold times and eliminates some trylock code blocks. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23034	2020-01-20 22:49:52 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Jeff Roberson	9c83ff2d86	It has not been possible to recursively terminate a vnode object for some time now. Eliminate the dead code that supports it. Approved by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:36:03 +00:00

1 2 3 4 5 ...

4541 Commits