freebsd-nq

Author	SHA1	Message	Date
Mark Johnston	8002c3a495	Initialize last_target in the laundry thread control loop. In practice it is always initialized because nfreed must be positive in order to trigger background laundering, but this isn't obvious. CID: 1387997 MFC after: 1 week	2018-11-06 02:52:54 +00:00
Mark Johnston	2203c46d87	Initialize the eflags field of vm_map headers. Initializing the eflags field of the map->header entry to a value with a unique new bit set makes a few comparisons to &map->header unnecessary. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: alc, kib Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14005	2018-11-02 16:26:44 +00:00
Mark Johnston	5d277a85ad	Revert r336984. It appears to be responsible for random segfaults observed when lots of paging activity is taking place, but the root cause is not yet understood. Requested by: alc MFC after: now	2018-10-30 22:40:40 +00:00
Mark Johnston	9978bd996b	Add malloc_domainset(9) and _domainset variants to other allocator KPIs. Remove malloc_domain(9) and most other _domain KPIs added in r327900. The new functions allow the caller to specify a general NUMA domain selection policy, rather than specifically requesting an allocation from a specific domain. The latter policy tends to interact poorly with M_WAITOK, resulting in situations where a caller is blocked indefinitely because the specified domain is depleted. Most existing consumers of the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy, in which we fall back to other domains to satisfy the allocation request. This change also defines a set of DOMAINSET_FIXED() policies, which only permit allocations from the specified domain. Discussed with: gallatin, jeff Reported and tested by: pho (previous version) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17418	2018-10-30 18:26:34 +00:00
Mark Johnston	920239efde	Fix some problems that manifest when NUMA domain 0 is empty. - In uma_prealloc(), we need to check for an empty domain before the first allocation attempt, not after. Fix this by switching uma_prealloc() to use a vm_domainset iterator, which addresses the secondary issue of using a signed domain identifier in round-robin iteration. - Don't automatically create a page daemon for domain 0. - In domainset_empty_vm(), recompute ds_cnt and ds_order after excluding empty domains; otherwise we may frequently specify an empty domain when calling in to the page allocator, wasting CPU time. Convert DOMAINSET_PREF() policies for empty domains to round-robin. - When freeing bootstrap pages, don't count them towards the per-domain total page counts for now: some vm_phys segments are created before the SRAT is parsed and are thus always identified as being in domain 0 even when they are not. Then, when bootstrap pages are freed, they are added to a domain that we had previously thought was empty. Until this is corrected, we simply exclude them from the per-domain page count. Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com> Reviewed by: gallatin MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17704	2018-10-30 17:57:40 +00:00
Alan Cox	9f1abe3df4	Eliminate typically pointless calls to vm_fault_prefault() on soft, copy- on-write faults. On a page fault, when we call vm_fault_prefault(), it probes the pmap and the shadow chain of vm objects to see if there are opportunities to create read and/or execute-only mappings to neighoring pages. For example, in the case of hard faults, such effort typically pays off, that is, mappings are created that eliminate future soft page faults. However, in the the case of soft, copy-on-write faults, the effort very rarely pays off. (See the review for some specific data.) Reviewed by: kib, markj MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D17367	2018-10-27 17:49:46 +00:00
Mark Johnston	17fbf3cf34	Add a !NUMA definition for vm_domainset_iter_policy_ref_init(). Pointy hat: markj X-MFC with: r339661 Sponsored by: The FreeBSD Foundation	2018-10-24 17:09:20 +00:00
Mark Johnston	7571e24901	Add an #include required after r339686. X-MFC with: r339686 Sponsored by: The FreeBSD Foundation	2018-10-24 16:49:16 +00:00
Mark Johnston	194a979ee9	Use a vm_domainset iterator in keg_fetch_slab(). Previously, it used a hand-rolled round-robin iterator. This meant that the minskip logic in r338507 didn't apply to UMA allocations, and also meant that we would call vm_wait() for individual domains rather than permitting an allocation from any domain with sufficient free pages. Discussed with: jeff Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17420	2018-10-24 16:41:47 +00:00
Mark Johnston	87ab1a10b1	Initialize static domainsets regardless of whether an SRAT is present. Reported by: yuripv X-MFC with: r339452 Sponsored by: The FreeBSD Foundation	2018-10-23 18:07:16 +00:00
Mark Johnston	4c29d2de67	Refactor domainset iterators for use by malloc(9) and UMA. Before this change we had two flavours of vm_domainset iterators: "page" and "malloc". The latter was only used for kmem_() and hard-coded its behaviour based on kernel_object's policy. Moreover, its use contained a race similar to that fixed by r338755 since the kernel_object's iterator was being run without the object lock. In some cases it is useful to be able to explicitly specify a policy (domainset) or policy+iterator (domainset_ref) when performing memory allocations. To that end, refactor the vm_dominset_ KPI to permit this, and get rid of the "malloc" domainset_iter KPI in the process. Reviewed by: jeff (previous version) Tested by: pho (part of a larger patch) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17417	2018-10-23 16:35:58 +00:00
Mark Johnston	b61f314290	Make it possible to disable NUMA support with a tunable. This provides a chicken switch for anyone negatively impacted by enabling NUMA in the amd64 GENERIC kernel configuration. With NUMA disabled at boot-time, information about the NUMA topology is not exposed to the rest of the kernel, and all of physical memory is viewed as coming from a single domain. This method still has some performance overhead relative to disabling NUMA support at compile time. PR: 231460 Reviewed by: alc, gallatin, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17439	2018-10-22 20:13:51 +00:00
Mark Johnston	2801dd08d7	Fix the build after r339601. I committed some patches out of order and didn't build-test one of them. Reported by: Jenkins, O. Hartmann <ohartmann@walstatt.org> X-MFC with: r339601	2018-10-22 17:19:48 +00:00
Mark Johnston	2a843ae7d9	Avoid a redundancy in a comment updated by r339601. Reported by: alc X-MFC with: r339601	2018-10-22 17:17:30 +00:00
Mark Johnston	b00581965d	Swap in processes unless there's a global memory shortage. On NUMA systems, we would not swap in processes unless all domains had some free pages. This is too conservative in general. Instead, permit swapins so long as at least one domain has free pages, and add a kernel stack NUMA policy which ensures that we will try to allocate kernel stack pages from any domain. Reported and tested by: pho, Jan Bramkamp <crest@bultmann.eu> Reviewed by: alc, kib Discussed with: jeff MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17304	2018-10-22 17:04:04 +00:00
Gleb Smirnoff	81c0d72c60	If we lost race or were migrated during bucket allocation for the per-CPU cache, then we put new bucket on generic bucket cache. However, code didn't honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache on a zone that clearly forbids that. Fix this. Reviewed by: markj	2018-10-22 15:48:07 +00:00
Konstantin Belousov	17afd2beec	Unindent vm_map_simplify_entry() after r339506. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17632	2018-10-21 00:11:56 +00:00
Konstantin Belousov	074244628b	Reduce code duplication in merging vm_entry neighbors. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17610	2018-10-20 23:08:04 +00:00
Mark Johnston	662e7fa8d9	Create some global domainsets and refactor NUMA registration. Pre-defined policies are useful when integrating the domainset(9) policy machinery into various kernel memory allocators. The refactoring will make it easier to add NUMA support for other architectures. No functional change intended. Reviewed by: alc, gallatin, jeff, kib Tested by: pho (part of a larger patch) MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17416	2018-10-20 17:36:00 +00:00
Matt Macy	e8bb589d56	eliminate locking surrounding ui_vmsize and swap reserve by using atomics Change swap_reserve and swap_total to be in units of pages so that swap reservations can be done using only atomics instead of using a single global mutex for swap_reserve and a single mutex for all processes running under the same uid for uid accounting. Results in mmap speed up and a 70% increase in brk calls / second. Reviewed by: alc@, markj@, kib@ Approved by: re (delphij@) Differential Revision: https://reviews.freebsd.org/D16273	2018-10-05 05:50:56 +00:00
Mark Johnston	93db904d19	Use an unsigned iterator for domain sets. Otherwise (iter % ds->ds_cnt) is not guaranteed to lie in the range [0, MAXMEMDOM). Reported by: pho Reviewed by: kib Approved by: re (rgrimes) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17374	2018-10-01 18:51:39 +00:00
Andrew Gallatin	30c5525b3c	Allow empty NUMA memory domains to support Threadripper2 The AMD Threadripper 2990WX is basically a slightly crippled Epyc. Rather than having 4 memory controllers, one per NUMA domain, it has only 2 memory controllers enabled. This means that only 2 of the 4 NUMA domains can be populated with physical memory, and the others are empty. Add support to FreeBSD for empty NUMA domains by: - creating empty memory domains when parsing the SRAT table, rather than failing to parse the table - not running the pageout deamon threads in empty domains - adding defensive code to UMA to avoid allocating from empty domains - adding defensive code to cpuset to avoid binding to an empty domain Thanks to Jeff for suggesting this strategy. Reviewed by: alc, markj Approved by: re (gjb@) Differential Revision: https://reviews.freebsd.org/D1683	2018-10-01 14:14:21 +00:00
Konstantin Belousov	c62637d679	Correct vm_fault_copy_entry() handling of backing file truncation after the file mapping was wired. if a wired map entry is backed by vnode and the file is truncated, corresponding pages are invalidated. vm_fault_copy_entry() should be aware of it and allow for invalid pages past end of file. Also, such pages should be not mapped into userspace. If userspace accesses the truncated part of the mapping later, it gets a signal, there is no way kernel can prevent the page fault. Reported by: andrew using syzkaller Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17323	2018-09-28 14:11:38 +00:00
Konstantin Belousov	9f25ab83f9	In vm_fault_copy_entry(), we should not assert that entry is charged if the dst_object is not of swap type. It can only happen when entry does not require copy, otherwise vm_map_protect() already adds the charge. So the assert was right for the case where swap object was allocated in the vm_fault_copy_entry(), but not when it was just copied from src_entry and its type is not swap. Reported by: andrew using syzkaller Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17323	2018-09-28 14:11:01 +00:00
Konstantin Belousov	a60d3db15e	In vm_fault_copy_entry(), collect the code to initialize a newly allocated dst_object in a single place. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17323	2018-09-28 14:10:12 +00:00
Mark Johnston	463406ac4a	Add more NUMA-specific low memory predicates. Use these predicates instead of inline references to vm_min_domains. Also add a global all_domains set, akin to all_cpus. Reviewed by: alc, jeff, kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17278	2018-09-24 19:24:17 +00:00
Alan Cox	f5fbe90de4	Passing UMA_ZONE_NOFREE to uma_zcreate() for swpctrie_zone and swblk_zone is redundant, because uma_zone_reserve_kva() is performed on both zones and it sets this same flag on the zone. (Moreover, the implementation of the swap pager does not itself require these zones to be UMA_ZONE_NOFREE.) Reviewed by: kib, markj Approved by: re (gjb) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17296	2018-09-24 16:49:02 +00:00
Mark Johnston	3d14a7bb43	Ensure that "domain" is initialized when vm_ndomains == 1. Reported by: alc Approved by: re (gjb)	2018-09-24 15:32:46 +00:00
Mark Johnston	969e147aff	Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned. The old code appears to assume that vmem_alloc() would import size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't provide this guarantee. Also remove the unused global RWX arena and add comments explaining why we have per-domain arenas. Reported by: alc Reviewed by: alc, kib (previous version) Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17249	2018-09-20 18:29:55 +00:00
Mark Johnston	25ed23cfbb	Change the domain selection policy in kmem_back(). Ensure that pages backing the same virtual large page come from the same physical domain, as kmem_malloc_domain() does. PR: 231038 Reviewed by: alc, kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17248	2018-09-20 15:45:12 +00:00
Mark Johnston	1aed6d48a8	Move kernel vmem arena initialization to vm_kern.c. This keeps the initialization coupled together with the kmem_* KPI implementation, which is the main user of these arenas. No functional change intended. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17247	2018-09-19 19:13:43 +00:00
Mateusz Guzik	c035292545	vm: check for empty kstack cache before locking The current cache logic checks the total number of stacks in the kernel, which even on small boxes significantly exceeds the 128 limit (e.g. an 8-way box with zfs has almost 800 stacks allocated). Stacks are cached earlier for each main thread. As a result the code is rarely executed, but when it is then (on boxes like the above) it always fails. Since there are no provisions made for NUMA and release time is approaching, just do a quick check to avoid acquiring the lock. Approved by: re (kib)	2018-09-19 16:02:33 +00:00
Mark Johnston	26fe2217bf	Only update the domain cursor once in keg_fetch_slab(). We drop the keg lock when we go to actually allocate the slab, allowing other threads to advance the cursor. This can cause us to exit the round-robin loop before having attempted allocations from all domains, resulting in a hang during a subsequent blocking allocation attempt from a depleted domain. Reported and tested by: Jan Bramkamp <crest@bultmann.eu> Reviewed by: alc, cem Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17209	2018-09-18 17:51:45 +00:00
Mateusz Guzik	2554f86a8d	vm: stop taking proc lock in mmap to satisfy racct if it is disabled Limits can be safely obtained with lim_cur from the thread. racct is compiled in but disabled by default. Note that racct enablement is a boot-only tunable. This eliminates second most common place of taking the lock while pkg building. While here don't take the lock in mlockall either. Reviewed by: kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17210	2018-09-18 01:24:30 +00:00
Mark Johnston	7a364d458a	Split some checks in vm_page_activate() to make it easier to read. No functional change intended. Reviewed by: alc, kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17028	2018-09-10 18:59:23 +00:00
Mark Johnston	5a7f993702	Relax an assertion in vm_pqbatch_process_page(). While executing vm_pqbatch_process_page(m), m->queue may change to PQ_NONE if the page daemon is concurrently freeing the page. In this case m's queue state flags must be clear, so vm_pqbatch_process_page() will be a no-op, but the race could cause spurious assertion failures. Correct the assertion which assumed that m->queue's value does not change while the page queue lock is held. Reviewed by: alc, kib Reported and tested by: pho Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17027	2018-09-08 21:49:43 +00:00
Mark Johnston	c56c7299c2	Use the correct terminology. Reported by: kib Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D16191	2018-09-06 20:02:19 +00:00
Mark Johnston	23984ce5cd	Avoid resource deadlocks when one domain has exhausted its memory. Attempt other allowed domains if the requested domain is below the minimum paging threshold. Block in fork only if all domains available to the forking thread are below the severe threshold rather than any. Submitted by: jeff Reported by: mjg Reviewed by: alc, kib, markj Approved by: re (rgrimes) Differential Revision: https://reviews.freebsd.org/D16191	2018-09-06 19:28:52 +00:00
Mark Johnston	21f01f4584	Remove vm_page_remque(). Testing m->queue != PQ_NONE is not sufficient; see the commit log message for r338276. As of r332974 vm_page_dequeue() handles already-dequeued pages, so just replace vm_page_remque() calls with vm_page_dequeue() calls. Reviewed by: kib Tested by: pho Approved by: re (marius) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17025	2018-09-06 16:17:45 +00:00
Alan Cox	72aebdd742	Recent changes have created, for the first time, physical memory segments that can be coalesced. To be clear, fragmentation of phys_avail[] is not the cause. This fragmentation of vm_phys_segs[] arises from the "special" calls to vm_phys_add_seg(), in other words, not those that derive directly from phys_avail[], but those that we create for the initial kernel page table pages and now for the kernel and modules loaded at boot time. Since we sometimes iterate over the physical memory segments, coalescing these segments at initialization time is a worthwhile change. Reviewed by: kib, markj Approved by: re (rgrimes) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D16976	2018-09-02 18:29:38 +00:00
Konstantin Belousov	f0165b1ca6	Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines. Exposing max_offset and min_offset defines in public headers is causing clashes with variable names, for example when building QEMU. Based on the submission by: royger Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week Approved by: re (marius) Differential revision: https://reviews.freebsd.org/D16881	2018-08-29 12:24:19 +00:00
Mark Murray	19fa89e938	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898	2018-08-26 12:51:46 +00:00
Alan Cox	49bfa624ac	Eliminate the arena parameter to kmem_free(). Implicitly this corrects an error in the function hypercall_memfree(), where the wrong arena was being passed to kmem_free(). Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are mapped in kmem with execute permissions. Use this flag to determine which arena the kmem virtual addresses are returned to. Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it redundant. Update the nearby comment for UMA_SLAB_KERNEL. Reviewed by: kib, markj Discussed with: jeff Approved by: re (marius) Differential Revision: https://reviews.freebsd.org/D16845	2018-08-25 19:38:08 +00:00
Gleb Smirnoff	306abf0f35	Either "free" or "allocated" is misleading here, since an item in a bucket is free from perspective of UMA consumer, and it is allocated from perspective of keg. Discussed with: markj Approved by: re (kib)	2018-08-24 18:47:50 +00:00
Gleb Smirnoff	a307fb5b0c	Fix comment. The actual meaning of ub_cnt is the opposite.	2018-08-23 23:24:28 +00:00
Mark Johnston	899fe184c7	Add a per-pagequeue pdpages counter. Expose these counters under the vm.domain sysctl node. The existing vm.stats.vm.v_pdpages sysctl is preserved. Reviewed by: alc (previous version) Differential Revision: https://reviews.freebsd.org/D14666	2018-08-23 21:03:45 +00:00
Mark Johnston	99d92d732f	Ensure that queue state is cleared when vm_page_dequeue() returns. Per-page queue state is updated non-atomically, with either the page lock or the page queue lock held. When vm_page_dequeue() is called without the page lock, in rare cases a different thread may be concurrently dequeuing the page with the pagequeue lock held. Because of the non-atomic update, vm_page_dequeue() might return before queue state is completely updated, which can lead to race conditions. Restrict the vm_page_dequeue() interface so that it must be called either with the page lock held or on a free page, and busy wait when a different thread is concurrently updating queue state, which must happen in a critical section. While here, do some related cleanup: inline vm_page_dequeue_locked() into its only caller and delete a prototype for the unimplemented vm_page_requeue_locked(). Replace the volatile qualifier for "queue" added in r333703 with explicit uses of atomic_load_8() where required. Reported and tested by: pho Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D15980	2018-08-23 20:34:22 +00:00
Alan Cox	83a90bffd8	Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter became unused in FreeBSD 12.x as a side-effect of the NUMA-related changes.) Reviewed by: kib, markj Discussed with: jeff, re@ Differential Revision: https://reviews.freebsd.org/D16825	2018-08-21 16:43:46 +00:00
Alan Cox	44d0efb215	Eliminate kmem_alloc_contig()'s unused arena parameter. Reviewed by: hselasky, kib, markj Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D16799	2018-08-20 15:57:27 +00:00
Alan Cox	db7c2a4822	Eliminate the unused arena parameter from kmem_alloc_attr(). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D16793	2018-08-18 22:07:48 +00:00

1 2 3 4 5 ...

3944 Commits