This matches the return type of pmap_mapdev/bios.
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.
Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
When attempting to promote 4KB user-space mappings to a 2MB user-space
mapping, the address of the struct vm_page representing the page table
page that contains the 4KB mappings is already known to the caller.
Pass that address to the promotion function rather than making the
promotion function recompute it, which on arm64 entails iteration over
the vm_phys_segs array by PHYS_TO_VM_PAGE(). And, while I'm here,
eliminate unnecessary arithmetic from the calculation of the first PTE's
address on arm64.
MFC after: 1 week
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG
Failure to map RSDP, XSLT and checksum failures are events that can't
happen unless something has gone wrong. As such, they should be reported
always, and not in bootverbose. This has been this way since it was
originally brought in to parse APIC tables.
Sponsored by: Netflix
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D36406
Add missing pmap_unmapbios() calls for when we return 0. Otherwise we
can leave the table mapped when it is of no use.
Sponsored by: Netflix
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D36405
for the "trap with interrupts disabled" warning.
Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36302
This applies one of the changes from
5567d6b4419b02a2099527228b1a51cc55a5b47d to other architectures
besides arm64.
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36263
Reject attempts to map host physical address ranges that are not
subsets of a passthrough device's BAR into a guest.
Reviewed by: markj, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36238
It does not work with ULE, which is the default scheduler for over a
decade.
Reviewed by: emaste, kib
Differential Revision: https://reviews.freebsd.org/D36094
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Otherwise we are using whatever the value was left from the previous
thread run on kernel entry from usermode. Typically it would be the
desired value as is, but it is not guaranteed.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
On physical systems the ram isn't initialized on boot. So, coreboot uses
the cache as ram in this boot phase. When exiting cache as ram, coreboot
calls INVD for making the cache consistent.
In a virtual environment ram is always initialized and the cache is
always consistent. So, we can safely ignore this call.
Reviewed by: jhb, imp
Differential Revision: https://reviews.freebsd.org/D35620
Sponsored by: Beckhoff Automation GmbH & Co. KG
A replacement QAT driver will be imported, but this replacement does not
support Atom C2xxx hardware. So, the existing driver will be kept
around to provide opencrypto offload support for those chipsets.
Reviewed by: pauamma, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35817
With clang 15, the following -Werror warning is produced:
sys/amd64/amd64/pmap.c:8274:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
int allfree, field, freed, i, idx;
^
The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.
MFC after: 3 days
This is not completely exhaustive, but covers a large majority of
commands in the tree.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35393
Use a getter macro instead of fetching the sigcode address directly
from a sysent of a given process. It assumes that the sigcode is stored
in the shared page, which is true in all cases, except for a.out
binaries. This will be later useful when the shared page address
randomization is introduced.
No functional change intended.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35392
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap. (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)
Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses. As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35794
This is in preparation for removal of OBJT_DEFAULT. In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.
Reviewed by: alc, hselasky, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35779
While uart could be detected completely through plug and play means, add
it here for two reasons. First, we don't do that from the loader, so
it's not available as a console. Second, even if we did do it from the
loader, there's a limitation in the system today that console drivers
must be compiled into the kernel because the console is selected before
external modules are linked into the kernel. Adding it only increases
the kernel size by ~14k as well.
Sponsored by: Netflix
Idea liked by: des, rpokala, brooks, jhb
This patch fixes the AMD implementation for snapshotting. It removes
unnecessary vmcb fields that should not be saved and duplicates.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33431
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.
Output from readelf -Wl for and amd64 kernel after the change:
Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R 0x1
[Requesting program interpreter: /red/herring]
LOAD 0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R 0x200000
LOAD 0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
LOAD 0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R 0x200000
LOAD 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x200000
LOAD 0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW 0x200000
DYNAMIC 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x8
GNU_RELRO 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R 0x4
Section to Segment mapping:
Segment Sections...
[...]
10 .note.gnu.build-id .note.Xen
Reported by: cperciva
Fixes: 1a9cdd373a6a ('xen: add PV/PVH kernel entry point')
Fixes: 93ee134a24fa ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.
This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.
Reviewed by: mjg
MFC after: 2 weeks
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map. In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().
This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory. In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader. But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.
It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time(). This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.
Reported by: mhorne
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35449
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde(). Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd110e04fc19729b4e9fb1c8bfbb8398a3.
Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.
Reported by: syzkaller
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35466
Install the i386 md_var.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a mostly kernel-only header required by procstat's ZFS support.
It is pulled in by the i386 machine/counter.h.
Reviewed by: jhb, imp
Install the i386 counter.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header required by procstat's ZFS support.
Reviewed by: jhb, imp
Install the i386 pcpu_aux.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header that is required by procstat's ZFS support.
Reviewed by: jhb, imp
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header and should not be required, but
procstat's zfs support includes this with _KERNEL defined.
Reviewed by: jhb, imp
The contents of the amd64 version are kernel-only and incompatible with
other headers when compiled for i386 userspace with _KERNEL defined.
Just ifdef the whole file out in that case rather than giving this file
the full x86 treatment since it's not needed for current use cases
(procstat zfs support).
Reviewed by: jhb, imp