Commit Graph

67 Commits

Author SHA1 Message Date
Mark Johnston
968079f253 vm_reserv: Fix list locking in vm_reserv_reclaim_contig()
The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.

During the scan, we drop the per-domain lock.  At this point, the rvn
pointer may be invalidated.  Take care to load rvn after re-acquiring
the per-domain lock.

While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.

Reviewed by:	alc, kib
Reported by:	gallatin
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29203
2021-03-11 10:35:35 -05:00
Mark Johnston
431fb8abd7 vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures.  We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain().  Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers.  Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by:	alc
Reviewed by:	dougm, kib (earlier versions)
Differential Revision:	https://reviews.freebsd.org/D27207
2020-11-19 03:59:21 +00:00
Mark Johnston
114484b7ec Flag vm_reserv and vm_phys sysctls as MPSAFE.
Nothing in these subsystems relies on Giant.

MFC after:	1 week
2020-09-23 19:36:07 +00:00
D Scott Phillips
7988971a99 vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].

This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.

The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.

Reviewed by:	markj, alc, kib
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26130
2020-09-21 22:22:53 +00:00
Mark Johnston
d869a17e62 Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things.
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23978
2020-03-06 19:10:00 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
a314aba874 vm: add missing CLTFLAG_MPSAFE annotations
This covers all vm/* files.
2020-01-12 05:08:57 +00:00
Mark Johnston
b378d29687 Fix locking in vm_reserv_reclaim_contig().
We were not properly handling the case where the trylock of the
reservaton fails, in which case we could leak reservation lock.

Introduce a marker reservation to implement precise scanning in
vm_reserv_reclaim_contig().  Before, a race could result in early
termination of the scan in rare situations.  Use the marker's lock to
serialize scans of the partpop queue so that a global marker structure
can be used.  Modify vm_reserv_reclaim_inactive() to handle the presence
of a marker while minimizing the hold time of domain-global locks.

Reviewed by:	alc, jeff, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22392
2019-11-22 16:28:52 +00:00
Jeff Roberson
639676877b Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory.  Introduce vm_object_allocate_anon() to create these
objects.  DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by:	alc, dougm, kib, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22119
2019-11-19 23:19:43 +00:00
Mark Johnston
fe6d5344c2 Group per-domain reservation data in the same structure.
We currently have the per-domain partially populated reservation queues
and the per-domain queue locks.  Define a new per-domain padded
structure to contain both of them.  This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.

Also fix field packing in the reservation structure.  In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well.  This reduces the size of the structure by 8 bytes.

Update some comments while here.  No functional change intended.

Reviewed by:	dougm, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22391
2019-11-18 18:25:51 +00:00
Jeff Roberson
3e5e1b5135 Allocate amd64's page array using pages and page directory pages from the
NUMA domain that the pages describe.  Patch original from gallatin.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21252
2019-08-18 23:07:56 +00:00
Aleksandr Rybalko
6b821a7455 Check paddr for overflow.
Fix panic on initialize of "vm reserv" per-superpage lock in case when RAM ends at upper boundary of address space.
Observed on ARM32 board BPI-R2 (2GB RAM 0x80000000-0xffffffff).

PR:		235362
Reviewed by:	kib, markj, alc
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D21272
2019-08-16 19:27:05 +00:00
Doug Moore
f96e8a0bab The means of finding ranges of free pages was changed for
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation.  For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.

Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274
2019-06-06 16:28:34 +00:00
Alan Cox
2d5039db18 Retire vm_reserv_extend_{contig,page}(). These functions were introduced
as part of a false start toward fine-grained reservation locking.  In the
end, they were not needed, so eliminate them.

Order the parameters to vm_reserv_alloc_{contig,page}() consistently with
the vm_page functions that call them.

Update the comments about the locking requirements for
vm_reserv_alloc_{contig,page}().  They no longer require a free page
queues lock.

Wrap several lines that became too long after the "req" and "domain"
parameters were added to vm_reserv_alloc_{contig,page}().

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20492
2019-06-03 05:15:36 +00:00
Doug Moore
b8590dae50 The function vm_phys_free_contig invokes vm_phys_free_pages for every
power-of-two page block it frees, launching an unsuccessful search for
a buddy to pair up with each time.  The only possible buddy-up mergers
are across the boundaries of the freed region, so change
vm_phys_free_contig simply to enqueue the freed interior blocks, via a
new function vm_phys_enqueue_contig, and then call vm_phys_free_pages
on the bounding blocks to create as big a cross-boundary block as
possible after buddy-merging.

The only callers of vm_phys_free_contig at the moment call it in
situations where merging blocks across the boundary is clearly
impossible, so just call vm_phys_enqueue_contig in those places and
avoid trying to buddy-up at all.

One beneficiary of this change is in breaking reservations.  For the
case where memory is freed in breaking a reservation with only the
first and last pages allocated, the number of cycles consumed by the
operation drops about 11% with this change.

Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D16901
2019-05-31 21:02:42 +00:00
Doug Moore
e67a5068ec Reduce the code size and number of ffsl calls in vm_reserv_break. Use
xor to find where free ranges begin and end.

Tested by: pho
Reviewed by:alc
Approved by:markj, kib (mentors)
Differential Revision:	https://reviews.freebsd.org/D20256
2019-05-28 00:51:23 +00:00
Konstantin Belousov
f2a496d667 MI VM: Make it possible to set size of superpage at boot instead of compile time.
In order to allow single kernel to use PAE pagetables on i386 if
hardware supports it, and fall back to classic two-level paging
structures if not, superpage code should be able to adopt to either 2M
or 4M superpages size.  There I make MI VM structures large enough to
track the biggest possible superpage, by allowing architecture to
define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants.
Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be
defined as runtime values and must be less than the _MAX constants.
If architecture does not define _MAXs, it is assumed that _MAX ==
normal constant.

Reviewed by:	markj
Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18853
2019-01-18 13:35:06 +00:00
Jeff Roberson
2ef6727edd Use the ticks since the last update to reduce hysteresis in the partpopq and
contention on the vm_reserv_domain lock.

This gives a roughly 8x speedup on will-it-scale fault1 on a 16 core machine.

Reviewed by:	alc, kib, markj
2018-07-07 01:54:45 +00:00
Jeff Roberson
2d3f4181de Fix two compliation problems on non-amd64 architectures. 2018-03-23 18:24:02 +00:00
Cy Schubert
72346b2232 Fix build on i386 without INVARIANTS following r331369.
--- vm_reserv.o ---
In file included from /opt/src/svn-current/sys/vm/vm_reserv.c:48:
In file included from /opt/src/svn-current/sys/sys/counter.h:37:
./machine/counter.h:174:3: error: implicit declaration of function
'critical_enter' is invalid in C99 [-Werror,-Wimplicit-function-declarat
ion]
                critical_enter();

Reviewed by:	jeff@
2018-03-23 03:22:30 +00:00
Jeff Roberson
5c930c894d Lock reservations with a dedicated lock in each reservation. Protect the
vmd_free_count with atomics.

This allows us to allocate and free from reservations without the free lock
except where a superpage is allocated from the physical layer, which is
roughly 1/512 of the operations on amd64.

Use the counter api to eliminate cache conention on counters.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14707
2018-03-22 19:21:11 +00:00
Jeff Roberson
30fbfdda6c Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation.  Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup.  Only trigger
the pageout daemon on transitions between states.  Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.

Reviewed by:	markj, kib
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14612
2018-03-15 19:23:07 +00:00
Jeff Roberson
f4af595964 Don't assert that the domain free lock is held until we're certain that
there is a valid reservation.  This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.

Reported by:	pho, mjg
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-07 22:04:27 +00:00
Mark Johnston
5f70fb1425 Correct some comments after r328954.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14486
2018-02-23 23:27:53 +00:00
Konstantin Belousov
ada27a3bb8 Cleanup unused page argument for vm_reserv_break().
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14364
2018-02-14 00:34:02 +00:00
Konstantin Belousov
c4be9169c0 Do not leak rv->psind in some specific situations.
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver).  Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting.  Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld.  In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.

Clean psind on vm_reserv_break() to avoid the situation.

Reported and tested by:	Slava Shwartsman
Reviewed by:	markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14335
2018-02-13 15:36:28 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Jeff Roberson
7a469c8ef3 Implement NUMA policy for kmem_*(9). This maintains compatibility with
reservations by giving each memory domain its own KVA space in vmem that
is naturally aligned on superpage boundaries.

Reviewed by:	alc, markj, kib  (some objections)
Sponsored by:	Netflix, Dell/EMC Isilon
Tested by;	pho
Differential Revision:	https://reviews.freebsd.org/D13289
2018-01-12 23:13:55 +00:00
Jeff Roberson
ef435ae7de Move domain iterators into the page layer where domain selection should take
place.  This makes the majority of the phys layer explicitly domain specific.

Reviewed by:	markj, kib (some objections)
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix & Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13014
2017-11-28 23:18:35 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Alan Cox
8b5e1472d2 Utilize pmap_enter(..., psind=1) in vm_fault_soft_fast() on amd64. (The
Differential Revision discusses the benefits of this change.)

Add a function, vm_reserv_to_superpage(), that returns the superpage
containing the specified base page.

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D11556
2017-07-23 16:28:13 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Alan Cox
920da7e4d2 Relax the object type restrictions on vm_page_alloc_contig(). Specifically,
add support for object types that were previously prohibited because they
could contain PG_CACHED pages.

Roughly halve the number of radix trie operations performed by
vm_page_alloc_contig() using the same approach that is employed by
vm_page_alloc().  Also, eliminate the radix trie lookup performed with the
free page queues lock held.

Tidy up the handling of radix trie insert failures in vm_page_alloc() and
vm_page_alloc_contig().

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8878
2016-12-28 18:32:13 +00:00
Alan Cox
3453bca864 Eliminate every mention of PG_CACHED pages from the comments in the machine-
independent layer of the virtual memory system.  Update some of the nearby
comments to eliminate redundancy and improve clarity.

In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per
The Chicago Manual of Style.

Update the comment in vm/vm_page.h defining the four types of page queues to
reflect the elimination of PG_CACHED pages and the introduction of the
laundry queue.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8752
2016-12-12 17:47:09 +00:00
Alan Cox
7667839a7e Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments.  Those changes will come later.)

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8497
2016-11-15 18:22:50 +00:00
Alan Cox
c869e67208 Introduce a new mechanism for relocating virtual pages to a new physical
address and use this mechanism when:

1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical
   memory allocator's free page lists.  This replaces the long-standing
   approach of scanning the inactive and inactive queues, converting clean
   pages into PG_CACHED pages and laundering dirty pages.  In contrast, the
   new mechanism does not use PG_CACHED pages nor does it trigger a large
   number of I/O operations.

2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find
   free pages in the physical memory allocator's free page lists that are
   covered by the direct map.  Tested by: adrian

3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable
   free pages in the physical memory allocator's free page lists.

In the coming months, I expect that this new mechanism will be applied in
other places.  For example, balloon drivers should use relocation to
minimize fragmentation of the guest physical address space.

Make vm_phys_alloc_contig() a little smarter (and more efficient in some
cases).  Specifically, use vm_phys_segs[] earlier to avoid scanning free
page lists that can't possibly contain suitable pages.

Reviewed by:	kib, markj
Glanced at:	jhb
Discussed with:	jeff
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4444
2015-12-19 18:42:50 +00:00
Alan Cox
67b7e4345e Correct an error in vm_reserv_reclaim_contig(). In the highly unusual
case that the reservation contained "low", the starting position in the
popmap for the free page search was incorrectly calculated.  The most
likely (and visible) symptom of this error was the assertion failure,
"vm_reserv_reclaim_contig: pa is too low".
2015-11-26 19:12:18 +00:00
Alan Cox
e0a63baae4 Introduce a sysctl for reporting the number of fully populated reservations. 2015-08-06 21:27:50 +00:00
Alan Cox
966272ca33 Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.
Differential Revision:	https://reviews.freebsd.org/D2712
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-06-08 04:59:32 +00:00
Alan Cox
6a93e36b5f Correct an off-by-one error in vm_reserv_reclaim_contig() that results in
an infinite loop.

Submitted by:	Svatopluk Kraus
MFC after:	1 week
2015-04-11 22:57:13 +00:00
Ian Lepore
ed9dd64b8c Revert r279932; this is going to be fixed in the sbuf code instead.
PR:		195668
2015-03-14 13:00:37 +00:00
Ian Lepore
f3b9fcf251 Nullterminate strings returned via sysctl.
PR:		195668
2015-03-12 18:06:30 +00:00
Konstantin Belousov
79d7993d98 Fix function name in the panic message.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-08 02:13:46 +00:00
Alan Cox
09e5f3c4b8 By the time that vm_reserv_init() runs, vm_phys_segs[] is initialized. Use
it instead of phys_avail[].

Discussed with:	Svatopluk Kraus
2014-11-22 17:46:30 +00:00
Alan Cox
64f096eeb2 Fix a boundary case error in vm_reserv_alloc_contig(): If a reservation
isn't being allocated for the last of the requested pages, because a
reservation won't fit in the gap between allocated pages, then the
reservation structure shouldn't be initialized.

While I'm here, improve the nearby comments.

Reported by:	jeff, pho
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2014-09-10 05:52:30 +00:00
Alan Cox
3180f7573a Correct a bug in the management of the population map on big-endian
machines.  Specifically, there was a mismatch between how the routine
allocation and deallocation operations accessed the population map
and how the aggressively optimized reservation-breaking operation
accessed it.  So, problems only occurred when reservations were broken.
This change makes the routine operations access the population map in
the same way as the reservation breaking operation.

This bug was introduced in r259999.

PR:		187080
Tested by:	jmg (on an "armeb" machine)
Sponsored by:	EMC / Isilon Storage Division
2014-06-11 16:11:12 +00:00
Alan Cox
dd05fa1945 Add a page size field to struct vm_page. Increase the page size field when
a partially populated reservation becomes fully populated, and decrease this
field when a fully populated reservation becomes partially populated.

Use this field to simplify the implementation of pmap_enter_object() on
amd64, arm, and i386.

On all architectures where we support superpages, the cost of creating a
superpage mapping is roughly the same as creating a base page mapping.  For
example, both kinds of mappings entail the creation of a single PTE and PV
entry.  With this in mind, use the page size field to make the
implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little
smarter.  Previously, if MAP_PREFAULT_PARTIAL was specified to
vm_map_pmap_enter(), that function would only map base pages.  Now, it will
create up to 96 base page or superpage mappings.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2014-06-07 17:12:26 +00:00
Alan Cox
a08c151546 Add "popmap" assertions: The page being freed isn't already free, and the
page being allocated isn't already allocated.

Sponsored by:	EMC / Isilon Storage Division
2013-12-29 04:54:52 +00:00
Alan Cox
ec17932242 MFp4 alc_popmap
Change the way that reservations keep track of which pages are in use.
  Instead of using the page's PG_CACHED and PG_FREE flags, maintain a bit
  vector within the reservation.  This approach has a couple benefits.
  First, it makes breaking reservations much cheaper because there are
  fewer cache misses to identify the unused pages.  Second, it is a pre-
  requisite for supporting two or more reservation sizes.
2013-12-28 04:28:35 +00:00
Konstantin Belousov
9eab548476 PG_SLAB no longer serves a useful purpose, since m->object is no
longer abused to store pointer to slab. Remove it.

Reviewed by:    alc
Sponsored by:   The FreeBSD Foundation
Approved by:	re (hrs)
2013-09-17 07:35:26 +00:00