It does not work with ULE, which is the default scheduler for over a
decade.
Reviewed by: emaste, kib
Differential Revision: https://reviews.freebsd.org/D36094
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Otherwise we are using whatever the value was left from the previous
thread run on kernel entry from usermode. Typically it would be the
desired value as is, but it is not guaranteed.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
On physical systems the ram isn't initialized on boot. So, coreboot uses
the cache as ram in this boot phase. When exiting cache as ram, coreboot
calls INVD for making the cache consistent.
In a virtual environment ram is always initialized and the cache is
always consistent. So, we can safely ignore this call.
Reviewed by: jhb, imp
Differential Revision: https://reviews.freebsd.org/D35620
Sponsored by: Beckhoff Automation GmbH & Co. KG
A replacement QAT driver will be imported, but this replacement does not
support Atom C2xxx hardware. So, the existing driver will be kept
around to provide opencrypto offload support for those chipsets.
Reviewed by: pauamma, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35817
With clang 15, the following -Werror warning is produced:
sys/amd64/amd64/pmap.c:8274:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
int allfree, field, freed, i, idx;
^
The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.
MFC after: 3 days
This is not completely exhaustive, but covers a large majority of
commands in the tree.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35393
Use a getter macro instead of fetching the sigcode address directly
from a sysent of a given process. It assumes that the sigcode is stored
in the shared page, which is true in all cases, except for a.out
binaries. This will be later useful when the shared page address
randomization is introduced.
No functional change intended.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35392
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap. (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)
Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses. As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35794
This is in preparation for removal of OBJT_DEFAULT. In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.
Reviewed by: alc, hselasky, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35779
While uart could be detected completely through plug and play means, add
it here for two reasons. First, we don't do that from the loader, so
it's not available as a console. Second, even if we did do it from the
loader, there's a limitation in the system today that console drivers
must be compiled into the kernel because the console is selected before
external modules are linked into the kernel. Adding it only increases
the kernel size by ~14k as well.
Sponsored by: Netflix
Idea liked by: des, rpokala, brooks, jhb
This patch fixes the AMD implementation for snapshotting. It removes
unnecessary vmcb fields that should not be saved and duplicates.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33431
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.
Output from readelf -Wl for and amd64 kernel after the change:
Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R 0x1
[Requesting program interpreter: /red/herring]
LOAD 0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R 0x200000
LOAD 0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
LOAD 0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R 0x200000
LOAD 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x200000
LOAD 0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW 0x200000
DYNAMIC 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x8
GNU_RELRO 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R 0x4
Section to Segment mapping:
Segment Sections...
[...]
10 .note.gnu.build-id .note.Xen
Reported by: cperciva
Fixes: 1a9cdd373a ('xen: add PV/PVH kernel entry point')
Fixes: 93ee134a24 ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.
This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.
Reviewed by: mjg
MFC after: 2 weeks
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map. In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().
This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory. In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader. But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.
It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time(). This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.
Reported by: mhorne
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35449
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde(). Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd11.
Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.
Reported by: syzkaller
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35466
Install the i386 md_var.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a mostly kernel-only header required by procstat's ZFS support.
It is pulled in by the i386 machine/counter.h.
Reviewed by: jhb, imp
Install the i386 counter.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header required by procstat's ZFS support.
Reviewed by: jhb, imp
Install the i386 pcpu_aux.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header that is required by procstat's ZFS support.
Reviewed by: jhb, imp
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.
This is a kernel-only header and should not be required, but
procstat's zfs support includes this with _KERNEL defined.
Reviewed by: jhb, imp
The contents of the amd64 version are kernel-only and incompatible with
other headers when compiled for i386 userspace with _KERNEL defined.
Just ifdef the whole file out in that case rather than giving this file
the full x86 treatment since it's not needed for current use cases
(procstat zfs support).
Reviewed by: jhb, imp
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.
Also svm_run's implementation even wrongly misses that stat.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35350
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.
Pointy hat to: dchagin
MFC after: 2 weeks
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.
MFC after: 2 weeks
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.
Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.
In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.
Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.
Reviewed by: jhb
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35253
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.
To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.
MFC after: 2 weeks
This is an initial commit for RDMA FreeBSD driver for Intel(R) Ethernet
Controller E810, called irdma. Supporting both RoCEv2 and iWARP
protocols in per-PF manner, RoCEv2 being the default.
Testing has been done using krping tool, perftest, ucmatose, rping,
ud_pingpong, rc_pingpong and others.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: #manpages (pauamma_gundo.com) [documentation]
MFC after: 1 week
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D34690