Commit Graph

4783 Commits

Author SHA1 Message Date
Konstantin Belousov
8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Konstantin Belousov
5f452214f2 vm_map.c: fix syntax
Fixes:	c718009884
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-08-18 16:37:16 +03:00
Konstantin Belousov
c718009884 vm_map.c: plug several more places which might modify entry->offset
for the GUARD entries protecting stacks gaps.

syzkaller: https://syzkaller.appspot.com/bug?extid=c325d6a75e4fd0a68714
Reviewed by:	dougm, markj (previous version)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41475
2023-08-18 15:43:35 +03:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Dmitry Chagin
f3e11927dc vm: Allow MAP_32BIT for all architectures
Reviewed by:		alc, kib, markj
Differential revision:	https://reviews.freebsd.org/D41435
2023-08-14 20:20:20 +03:00
Dmitry Chagin
0ddd32b617 vm: MAP_32BIT_MAX_ADDR defined in sys/mman.h
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D41434
2023-08-14 20:18:30 +03:00
Alan Cox
37e5d49e1e vm: Fix address hints of 0 with MAP_32BIT
Also, rename min_addr to default_addr, which better reflects what it
represents.  The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated.  For
example, a non-zero hint can be less than min_addr and be allocated.

Reported by:	dchagin
Reviewed by:	dchagin, kib, markj
Fixes:	d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision:	https://reviews.freebsd.org/D41397
2023-08-12 02:35:21 -05:00
Konstantin Belousov
9b65fa6940 linuxolator: implement Linux' PROT_GROWSDOWN
From the Linux man page for mprotect(2):
   PROT_GROWSDOWN
       Apply  the  protection  mode  down to the beginning of a mapping
       that grows downward (which should be a stack segment or a
       segment mapped with the MAP_GROWSDOWN flag set).

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov
90049eabcf vm_map_protect(): add VM_MAP_PROTECT_GROWSDOWN flag
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:14 +03:00
Konstantin Belousov
b6037edbd1 vm_map_growstack(): restore stack gap data if gap entry was removed
and then restored.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
9d7ea6cff7 vm_map: do not allow to merge stack gap entries
At least, offset handling is wrong for them.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
55be6be12c vm_map_protect(): handle stack protection stored in the stack guard
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.

Reported by:	dchagin
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
79169929f0 vm_map_protect(): move guard handling at the last phase into an empty dedicated helper
Restructure the first phase slightly, to facilitate further changes.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
aa928a5216 vm_map_growstack(): handle max protection for stacks
Do not assume that protection is same as max_protection.  Store both in
offset, packed in the same way as the prot syscall parameter.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
0fb6aae7f0 vm_map.c: add CONTAINS_BITS macro
Suggested by:	dougm
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
ba41b0de3e Add vm_map_insert1(9)
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.

The comment update for vm_map_try_merge_entries() was suggested by dougm.

Suggested by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
3b44ee50be vm_map_insert(): update herald comment
Only a part of the object may be mapped.

Noted by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41099
2023-08-12 09:28:13 +03:00
Konstantin Belousov
9da33e8d10 Update comment describing struct vm_map
There is no list connecting all entries any more, and correspondingly no
order on the list entries.

Reviewed by:	dougm
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D41405
2023-08-10 09:01:26 +03:00
Doug Moore
e77f4e7f59 vm_phys: tune vm_phys_enqueue_contig loop
Rewrite the final loop in vm_phys_enqueue_contig as a new function,
vm_phys_enq_beg, to reduce amd64 code size.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41289
2023-08-04 21:09:39 -05:00
Doug Moore
ccdb28275d vm_phys_enq_range: no alignment assert for npages==0
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.

Reported by:	karels
Differential Revision:	https://reviews.freebsd.org/D41317
2023-08-04 13:41:59 -05:00
Doug Moore
c9b06fa527 vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.

The amd64 object code shrinks by 128 bytes.

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D41154
2023-08-03 09:19:48 -05:00
Doug Moore
b7370efade Revert "vm_phys_enqueue_contig: handle npages==0"
This reverts commit 1a7fcf6d51.

Peter Holm reported a problem, so I'm reverting now and looking for
the problem later.
2023-08-02 04:33:40 -05:00
Doug Moore
1a7fcf6d51 vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.

The amd64 object code shrinks by 80 bytes.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D41154
2023-08-01 22:12:00 -05:00
Mark Johnston
d0e4e53ebd vm_map: Add a macro to fetch a map entry's split boundary index
The resulting code is a bit more concise.  No functional change
intended.

Reviewed by:	alc, dougm, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D41249
2023-08-01 10:10:02 -04:00
Doug Moore
ac0572e660 radix_tree: compute slot from keybarr
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.

This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.

Reviewed by:	alc
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41235
2023-07-30 15:12:06 -05:00
Doug Moore
38f5cb1bfb radix_tree: redefine the clev field
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.

For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.

This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.

Reviewed by:	kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41226
2023-07-30 01:20:07 -05:00
Doug Moore
2d2bcba7ba Every path in a radix trie ends with a leaf or a NULL. By replacing
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.

This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.

Reviewed by:	alc, kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41171
2023-07-28 11:39:52 -05:00
Alan Cox
5ec2d94ade vm_mmap_object: Update the spelling of true/false
Since fitit is already a bool, use true/false instead of TRUE/FALSE.

MFC after:	2 weeks
2023-07-27 00:25:53 -05:00
Alan Cox
50d663b14b vm: Fix vm_map_find_min()
Fix the handling of address hints that are less than min_addr by
vm_map_find_min().

Reported by:	dchagin
Reviewed by:	kib
Fixes:	d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision:	https://reviews.freebsd.org/D41159
2023-07-26 00:24:50 -05:00
Konstantin Belousov
db6c7c7f8d vmspace_fork(): do not override offset for the guard entries
The offset field contains protection for the stack guards.

Reported by:	cy
Fixes:	21e45c30c3
MFC after:	1 week
2023-07-20 22:04:03 +03:00
Konstantin Belousov
21e45c30c3 mmap(MAP_STACK): on stack grow, use original protection
If mprotect(2) changed protection in the bottom of the currently grown
stack region, currently the changed protection would be used for the
stack grow on next fault.  This is arguably unexpected.

Store the original protection for the entry at mmap(2) time in the
offset member of the gap vm_map_entry, and use it for protection of the
grown stack region.

PR:	272585
Reported by:	John F. Carr <jfc@mit.edu>
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41089
2023-07-20 17:11:42 +03:00
Doug Moore
6f251ef228 radix_trie: simplify ge, le lookups
Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.

The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.

Reviewed by:	alc, markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40936
2023-07-19 09:43:31 -05:00
Doug Moore
3e04ae433f vm_radix_init: use initializer
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D40971
2023-07-14 01:49:55 -05:00
Doug Moore
16e01c05c0 radix_trie: avoid code duplication in insert
Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40807
2023-07-09 15:06:02 -05:00
Doug Moore
8df38859d0 radix_trie: replace node count with popmap
Replace the 'count' field in a trie node with a bitmap that
identifies non-NULL children. Drop the 'last' field, and use the
last bit set in the bitmap instead.  In lookup_le, lookup_ge,
remove, and reclaim_all, use the bitmap to find the
previous/next/only/every non-null child in constant time by
examining the bitmask instead of looping across array elements
and null-checking them one-by-one.

A buildworld test suggests that this reduces the cycle count on
those functions that eliminate some null-checks by 4.9%, 1.5%,
0.0% and 13.3%.
Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40775
2023-07-07 11:09:36 -05:00
Konstantin Belousov
ef747607ea vm_fault: move FAULT_* return codes out of range for Mach errors
This way a possible clash between FAULT_* and KERN_* numbering is
avoided, and panics checks for fault_status confusion become more
efficient.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D40771
2023-06-28 00:03:14 +03:00
Doug Moore
da72505f9c radix_trie: pass fewer params to node_get
Let node_get calculate it's own owner value. Don't pass the count
parameter, since it's always 2. Save 16 bytes in insert(). Move,
without modifying, slot and trimkey to handle use-before-declaration
problem.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D40723
2023-06-27 12:21:11 -05:00
Doug Moore
9cfed089ac radix_trie: clean up overlong lines
This is purely a cosmetic change. vm_radix.c has lines that reach past
column 80 and this change cleans that up. The associated changes to
subr_pctrie.c are just to keep mirroring vm_radix.c.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D40764
2023-06-27 12:01:33 -05:00
Doug Moore
72c3a43b16 radix_trie: skip compare in lookup_le, lookup_ge
In _lookup_ge, where a loop "looks for an available edge or val within
the current bisection node" (to quote the code comment), the value of
index has already been modified to guarantee that it is the least
value than can be found in the non-NULL child node being
examined. Therefore, if the non-NULL child is a leaf, there's no need
to compare 'index' to anything, and the value can just be returned.

The same is true for _lookup_le with 'most' replacing 'least'.
Reviewed by:	alc
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D40746
2023-06-27 00:42:41 -05:00
Alan Cox
d8e6f4946c vm: Fix anonymous memory clustering under ASLR
By default, our ASLR implementation is supposed to cluster anonymous
memory allocations, unless the application's mmap(..., MAP_ANON, ...)
call included a non-zero address hint.  Unfortunately, clustering
never occurred because kern_mmap() always replaced the given address
hint when it was zero.  So, the ASLR implementation always believed
that a non-zero hint had been provided and randomized the mapping's
location in the address space.  To fix this problem, I'm pushing down
the point at which we convert a hint of zero to the minimum allocatable
address from kern_mmap() to vm_map_find_min().

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40743
2023-06-26 23:42:48 -05:00
Doug Moore
a42d8fe001 radix_trie: simplify trimkey functions
Replacing a branch and two shifts with a single masking operation saves 64 bytes the pair of functions lookup_le and lookup_ge on amd64.  Refresh the associated comments.
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D40722
2023-06-25 12:49:15 -05:00
Doug Moore
e8efee297c radix_trie: avoid reloading radix node
In the vm_radix:remove loop that searches for the last child, load
that child once, without loading it again after the search is over.
Change KASSERTS from index check to NULL node check.
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D40721
2023-06-23 18:47:23 -05:00
Doug Moore
1efa7dbc07 vm_radix: drop unused function; use bool.
Replace boolean_t with bool in vm_radix.c. Drop the unused function
vm_radix_is_singleton, which is unused and has no corresponding
function in subr_pctrie.c.
Reviewed by:	alc
Differential Revision:	<https://reviews.freebsd.org/D40586>
2023-06-20 23:52:27 -05:00
Doug Moore
05963ea4d1 radix_trie: eliminate iteration in keydiff
Use flsll(), instead of a loop, to find where two keys differ, and
then arithmetic to transform that to a trie level.
Approved by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D40585
2023-06-20 11:30:29 -05:00
Alan Cox
58d4271721 vm_phys: Fix typo in 9e81742892 2023-06-16 03:12:42 -05:00
Doug Moore
9e81742892 vm_phys: add binary segment search
Replace several sequential searches for a segment that contains a
phyiscal address with a call to a function that does it by binary
search.  In vm_page_reclaim_contig_domain_ext, find the first segment
to reclaim from, and reclaim from each subsequent appropriate segment.
Eliminate vm_phys_scan_contig.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D40058
2023-06-16 01:43:45 -05:00
Mark Johnston
6062d9faf2 vm_phys: Change the return type of vm_phys_unfree_page() to bool
This is in keeping with the trend of removing uses of boolean_t, and the
sole caller was implicitly converting it to a "bool".

No functional change intended.

Reviewed by:	dougm, alc, imp, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40401
2023-06-05 12:22:11 -04:00
Colin Percival
45cc8519f5 tslog: Annotate parts of SYSINIT cpu
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
  * 535 us in kmem_malloc
    * 450 us in pmap_zero_page
  * 1720 us in pmap_growkernel
    * 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
  * 430 us in cpu_setregs calling load_cr0

Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40327
2023-06-04 10:16:35 -07:00