For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.
Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360
Also, rename min_addr to default_addr, which better reflects what it
represents. The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated. For
example, a non-zero hint can be less than min_addr and be allocated.
Reported by: dchagin
Reviewed by: dchagin, kib, markj
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41397
From the Linux man page for mprotect(2):
PROT_GROWSDOWN
Apply the protection mode down to the beginning of a mapping
that grows downward (which should be a stack segment or a
segment mapped with the MAP_GROWSDOWN flag set).
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Restructure the first phase slightly, to facilitate further changes.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Do not assume that protection is same as max_protection. Store both in
offset, packed in the same way as the prot syscall parameter.
Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.
The comment update for vm_map_try_merge_entries() was suggested by dougm.
Suggested by: alc
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Only a part of the object may be mapped.
Noted by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
There is no list connecting all entries any more, and correspondingly no
order on the list entries.
Reviewed by: dougm
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D41405
Rewrite the final loop in vm_phys_enqueue_contig as a new function,
vm_phys_enq_beg, to reduce amd64 code size.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41289
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.
Reported by: karels
Differential Revision: https://reviews.freebsd.org/D41317
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
The amd64 object code shrinks by 128 bytes.
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D41154
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
The amd64 object code shrinks by 80 bytes.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D41154
The resulting code is a bit more concise. No functional change
intended.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41249
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.
This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.
Reviewed by: alc
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41235
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.
For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.
This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.
Reviewed by: kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41226
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.
This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.
Reviewed by: alc, kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41171
Fix the handling of address hints that are less than min_addr by
vm_map_find_min().
Reported by: dchagin
Reviewed by: kib
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41159
If mprotect(2) changed protection in the bottom of the currently grown
stack region, currently the changed protection would be used for the
stack grow on next fault. This is arguably unexpected.
Store the original protection for the entry at mmap(2) time in the
offset member of the gap vm_map_entry, and use it for protection of the
grown stack region.
PR: 272585
Reported by: John F. Carr <jfc@mit.edu>
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41089
Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.
The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.
Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40936
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971
Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40807
Replace the 'count' field in a trie node with a bitmap that
identifies non-NULL children. Drop the 'last' field, and use the
last bit set in the bitmap instead. In lookup_le, lookup_ge,
remove, and reclaim_all, use the bitmap to find the
previous/next/only/every non-null child in constant time by
examining the bitmask instead of looping across array elements
and null-checking them one-by-one.
A buildworld test suggests that this reduces the cycle count on
those functions that eliminate some null-checks by 4.9%, 1.5%,
0.0% and 13.3%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40775
This way a possible clash between FAULT_* and KERN_* numbering is
avoided, and panics checks for fault_status confusion become more
efficient.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40771
Let node_get calculate it's own owner value. Don't pass the count
parameter, since it's always 2. Save 16 bytes in insert(). Move,
without modifying, slot and trimkey to handle use-before-declaration
problem.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D40723
This is purely a cosmetic change. vm_radix.c has lines that reach past
column 80 and this change cleans that up. The associated changes to
subr_pctrie.c are just to keep mirroring vm_radix.c.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D40764
In _lookup_ge, where a loop "looks for an available edge or val within
the current bisection node" (to quote the code comment), the value of
index has already been modified to guarantee that it is the least
value than can be found in the non-NULL child node being
examined. Therefore, if the non-NULL child is a leaf, there's no need
to compare 'index' to anything, and the value can just be returned.
The same is true for _lookup_le with 'most' replacing 'least'.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40746
By default, our ASLR implementation is supposed to cluster anonymous
memory allocations, unless the application's mmap(..., MAP_ANON, ...)
call included a non-zero address hint. Unfortunately, clustering
never occurred because kern_mmap() always replaced the given address
hint when it was zero. So, the ASLR implementation always believed
that a non-zero hint had been provided and randomized the mapping's
location in the address space. To fix this problem, I'm pushing down
the point at which we convert a hint of zero to the minimum allocatable
address from kern_mmap() to vm_map_find_min().
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40743
Replacing a branch and two shifts with a single masking operation saves 64 bytes the pair of functions lookup_le and lookup_ge on amd64. Refresh the associated comments.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D40722
In the vm_radix:remove loop that searches for the last child, load
that child once, without loading it again after the search is over.
Change KASSERTS from index check to NULL node check.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D40721
Replace boolean_t with bool in vm_radix.c. Drop the unused function
vm_radix_is_singleton, which is unused and has no corresponding
function in subr_pctrie.c.
Reviewed by: alc
Differential Revision: <https://reviews.freebsd.org/D40586>
Use flsll(), instead of a loop, to find where two keys differ, and
then arithmetic to transform that to a trie level.
Approved by: alc, markj
Differential Revision: https://reviews.freebsd.org/D40585
Replace several sequential searches for a segment that contains a
phyiscal address with a call to a function that does it by binary
search. In vm_page_reclaim_contig_domain_ext, find the first segment
to reclaim from, and reclaim from each subsequent appropriate segment.
Eliminate vm_phys_scan_contig.
Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D40058
This is in keeping with the trend of removing uses of boolean_t, and the
sole caller was implicitly converting it to a "bool".
No functional change intended.
Reviewed by: dougm, alc, imp, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40401
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
* 535 us in kmem_malloc
* 450 us in pmap_zero_page
* 1720 us in pmap_growkernel
* 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
* 430 us in cpu_setregs calling load_cr0
Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40327