The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.
During the scan, we drop the per-domain lock. At this point, the rvn
pointer may be invalidated. Take care to load rvn after re-acquiring
the per-domain lock.
While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.
Reviewed by: alc, kib
Reported by: gallatin
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29203
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures. We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.
Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain(). Make the latter an inline function.
Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers. Include it from vm_page.h so that
vm_page_domain() can be defined there.
Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.
Reviewed by: alc
Reviewed by: dougm, kib (earlier versions)
Differential Revision: https://reviews.freebsd.org/D27207
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].
This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.
The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.
Reviewed by: markj, alc, kib
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26130
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
We were not properly handling the case where the trylock of the
reservaton fails, in which case we could leak reservation lock.
Introduce a marker reservation to implement precise scanning in
vm_reserv_reclaim_contig(). Before, a race could result in early
termination of the scan in rare situations. Use the marker's lock to
serialize scans of the partpop queue so that a global marker structure
can be used. Modify vm_reserv_reclaim_inactive() to handle the presence
of a marker while minimizing the hold time of domain-global locks.
Reviewed by: alc, jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22392
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.
Reviewed by: alc, dougm, kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22119
We currently have the per-domain partially populated reservation queues
and the per-domain queue locks. Define a new per-domain padded
structure to contain both of them. This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.
Also fix field packing in the reservation structure. In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well. This reduces the size of the structure by 8 bytes.
Update some comments while here. No functional change intended.
Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22391
NUMA domain that the pages describe. Patch original from gallatin.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21252
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation. For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.
Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274
as part of a false start toward fine-grained reservation locking. In the
end, they were not needed, so eliminate them.
Order the parameters to vm_reserv_alloc_{contig,page}() consistently with
the vm_page functions that call them.
Update the comments about the locking requirements for
vm_reserv_alloc_{contig,page}(). They no longer require a free page
queues lock.
Wrap several lines that became too long after the "req" and "domain"
parameters were added to vm_reserv_alloc_{contig,page}().
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20492
power-of-two page block it frees, launching an unsuccessful search for
a buddy to pair up with each time. The only possible buddy-up mergers
are across the boundaries of the freed region, so change
vm_phys_free_contig simply to enqueue the freed interior blocks, via a
new function vm_phys_enqueue_contig, and then call vm_phys_free_pages
on the bounding blocks to create as big a cross-boundary block as
possible after buddy-merging.
The only callers of vm_phys_free_contig at the moment call it in
situations where merging blocks across the boundary is clearly
impossible, so just call vm_phys_enqueue_contig in those places and
avoid trying to buddy-up at all.
One beneficiary of this change is in breaking reservations. For the
case where memory is freed in breaking a reservation with only the
first and last pages allocated, the number of cycles consumed by the
operation drops about 11% with this change.
Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D16901
In order to allow single kernel to use PAE pagetables on i386 if
hardware supports it, and fall back to classic two-level paging
structures if not, superpage code should be able to adopt to either 2M
or 4M superpages size. There I make MI VM structures large enough to
track the biggest possible superpage, by allowing architecture to
define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants.
Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be
defined as runtime values and must be less than the _MAX constants.
If architecture does not define _MAXs, it is assumed that _MAX ==
normal constant.
Reviewed by: markj
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18853
--- vm_reserv.o ---
In file included from /opt/src/svn-current/sys/vm/vm_reserv.c:48:
In file included from /opt/src/svn-current/sys/sys/counter.h:37:
./machine/counter.h:174:3: error: implicit declaration of function
'critical_enter' is invalid in C99 [-Werror,-Wimplicit-function-declarat
ion]
critical_enter();
Reviewed by: jeff@
vmd_free_count with atomics.
This allows us to allocate and free from reservations without the free lock
except where a superpage is allocated from the physical layer, which is
roughly 1/512 of the operations on amd64.
Use the counter api to eliminate cache conention on counters.
Reviewed by: markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14707
vmd_free_count manipulation. Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup. Only trigger
the pageout daemon on transitions between states. Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.
Reviewed by: markj, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14612
there is a valid reservation. This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.
Reported by: pho, mjg
Sponsored by: Netflix, Dell/EMC Isilon
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver). Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting. Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld. In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.
Clean psind on vm_reserv_break() to avoid the situation.
Reported and tested by: Slava Shwartsman
Reviewed by: markj
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14335
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.
Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000
reservations by giving each memory domain its own KVA space in vmem that
is naturally aligned on superpage boundaries.
Reviewed by: alc, markj, kib (some objections)
Sponsored by: Netflix, Dell/EMC Isilon
Tested by; pho
Differential Revision: https://reviews.freebsd.org/D13289
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
Differential Revision discusses the benefits of this change.)
Add a function, vm_reserv_to_superpage(), that returns the superpage
containing the specified base page.
Reviewed by: kib, markj
Tested by: pho
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D11556
add support for object types that were previously prohibited because they
could contain PG_CACHED pages.
Roughly halve the number of radix trie operations performed by
vm_page_alloc_contig() using the same approach that is employed by
vm_page_alloc(). Also, eliminate the radix trie lookup performed with the
free page queues lock held.
Tidy up the handling of radix trie insert failures in vm_page_alloc() and
vm_page_alloc_contig().
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8878
independent layer of the virtual memory system. Update some of the nearby
comments to eliminate redundancy and improve clarity.
In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per
The Chicago Manual of Style.
Update the comment in vm/vm_page.h defining the four types of page queues to
reflect the elimination of PG_CACHED pages and the introduction of the
laundry queue.
Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8752
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments. Those changes will come later.)
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8497
address and use this mechanism when:
1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical
memory allocator's free page lists. This replaces the long-standing
approach of scanning the inactive and inactive queues, converting clean
pages into PG_CACHED pages and laundering dirty pages. In contrast, the
new mechanism does not use PG_CACHED pages nor does it trigger a large
number of I/O operations.
2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find
free pages in the physical memory allocator's free page lists that are
covered by the direct map. Tested by: adrian
3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable
free pages in the physical memory allocator's free page lists.
In the coming months, I expect that this new mechanism will be applied in
other places. For example, balloon drivers should use relocation to
minimize fragmentation of the guest physical address space.
Make vm_phys_alloc_contig() a little smarter (and more efficient in some
cases). Specifically, use vm_phys_segs[] earlier to avoid scanning free
page lists that can't possibly contain suitable pages.
Reviewed by: kib, markj
Glanced at: jhb
Discussed with: jeff
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4444
case that the reservation contained "low", the starting position in the
popmap for the free page search was incorrectly calculated. The most
likely (and visible) symptom of this error was the assertion failure,
"vm_reserv_reclaim_contig: pa is too low".
isn't being allocated for the last of the requested pages, because a
reservation won't fit in the gap between allocated pages, then the
reservation structure shouldn't be initialized.
While I'm here, improve the nearby comments.
Reported by: jeff, pho
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
machines. Specifically, there was a mismatch between how the routine
allocation and deallocation operations accessed the population map
and how the aggressively optimized reservation-breaking operation
accessed it. So, problems only occurred when reservations were broken.
This change makes the routine operations access the population map in
the same way as the reservation breaking operation.
This bug was introduced in r259999.
PR: 187080
Tested by: jmg (on an "armeb" machine)
Sponsored by: EMC / Isilon Storage Division
a partially populated reservation becomes fully populated, and decrease this
field when a fully populated reservation becomes partially populated.
Use this field to simplify the implementation of pmap_enter_object() on
amd64, arm, and i386.
On all architectures where we support superpages, the cost of creating a
superpage mapping is roughly the same as creating a base page mapping. For
example, both kinds of mappings entail the creation of a single PTE and PV
entry. With this in mind, use the page size field to make the
implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little
smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to
vm_map_pmap_enter(), that function would only map base pages. Now, it will
create up to 96 base page or superpage mappings.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Change the way that reservations keep track of which pages are in use.
Instead of using the page's PG_CACHED and PG_FREE flags, maintain a bit
vector within the reservation. This approach has a couple benefits.
First, it makes breaking reservations much cheaper because there are
fewer cache misses to identify the unused pages. Second, it is a pre-
requisite for supporting two or more reservation sizes.