Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.
Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
This register set contains the values of the fsbase and gsbase
registers. Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps. In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.
The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34650
This permits I/O devices on the host to directly access wired memory
dedicated to guests using passthru devices. Note that wired memory
belonging to guests that do not use passthru devices has always been
accessible by I/O devices on the host.
bhyve maps guest physical addresses into the user address space of
the bhyve process by mmap'ing /dev/vmm/<vmname>. Device models pass
pointers derived from this mapping directly to system calls such as
preadv() to minimize copies when emulating DMA. If the backing store
for a device model is a raw host device (e.g. when exporting a raw disk
device such as /dev/ada<n> as a drive in the guest), the host device
driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host
directly to the guest's memory. However, if the guest's memory is
not present in the host IOMMU domain, these DMA requests by the host
device will fail without raising an error visible to the host device
driver or to the guest resulting in non-working I/O in the guest.
It is unclear why guest addresses were removed from the IOMMU host domain
initially, especially only for VM's with a passthru device as the
host IOMMU domain does not affect the permissions of passthru devices,
only devices on the host.
A considered alternative was using bounce buffers instead (D34535
is a proof of concept), but that adds additional overhead for unclear
benefit.
This solves a long-standing problem when using passthru devices and
physical disks in the same VM.
Thanks to: grehan (patience and help)
Thanks to: jhb (for improving the commit message)
PR: 260178
Reviewed by: grehan, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34607
Introduce a helper to fetch the TSC frequency from CPUID when running
under Xen.
Since the TSC can also be initialized early when running as a Xen
guest pull out the call to tsc_init() from the
early_clock_source_init() handlers and place it in clock_init(), as
otherwise all handlers would call tsc_init() anyway.
Reviewed by: markj
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34581
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).
It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.
Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
We assume EFI_PAGE_SIZE is the same as PAGE_SIZE, however this may not
be the case. Use the former when working with a list of pages from the
UEFI firmware so the correct size is used.
This will be needed on arm64 where PAGE_SIZE could be 16k or 64k in the
future. The other architectures have been updated to be consistent.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34510
As in commit c3d830cf7c, we should finalize CPU identification before
probing the TSC frequency.
Fixes: 84369dd523 ("x86: Probe the TSC frequency earlier")
Reported by: khng
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.
PR: 262155
Reviewed by: emaste
Tested by: emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34367
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34212
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27463
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.
Switch to use the headers in sys/contrib/xen.
Reviewed by: royger
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: jhb, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Atomics have significant other use besides providing in-system
primitives for safe memory updates. They are used for implementing
communication with out of system software or hardware following some
protocols.
For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor. Or
real hardware might need atomic accesses as part of the proper
management protocol.
Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.
Requested by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: Elliott Mitchell, imp, jhb, markj, royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Eliminate shlq $3,address shift after masking of the va is done, which
is needed to convert pt_entry_t[] array index into byte offset.
Do it by preshifting the mask, and compensating the right shift of va.
Suggested by: alc
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33786
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
ASLR stack randomization will reappear in a forthcoming commit. Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.
No functional change intended, as the stack gap implementation is
currently disabled by default.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro. With stack address randomization the
ps_strings address is no longer fixed.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Some guests or driver might depend on MTRR to work properly. E.g. the
nvidia gpu driver won't work without MTRR.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33333
- Use ia32_freebsd4_* instead of ia32_*4.
- Use ia32_o* instead of ia32_*3.
Reviewed by: brooks, imp, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33882
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts.
Win10 does not use LAPIC timer to often and in most cases, and I see it
is disabled by writing 0 to Initial Count Register (for Timer).
During resume, restart timer only for enabled LAPIC and enabled timer
for that LAPIC.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33448
This removes one or two atomic update of the pte on access. Also it
makes the pte constant during its lifecycle.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33778
Pre-calculate masks and page table/page directory bases, for LA48 and
LA57 cases, trading a conditional for the memory accesses.
This shaves 672 bytes of .text, by the cost of 32 .data bytes. Note
that eliminated code contained one conditional, and several costly
movabs instructions, which are traded by two memory fetches per
vtop{t,d}e() inlines.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33776
Simplify control flow around handling of the execpath length and signal
trampoline. Cache the sysentvec pointer in a local variable.
No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33703
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE. And,
in some cases, when they currently return FALSE, they should have
returned TRUE. Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless. That is why it has gone unnoticed.
Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33659
Notably, the current compat_options only makes sense for native and
freebsd32 ABIs. For the others, it just adds cruft. Switch to having
sets of compat options, and default to the native set. Setup the other
ABIs where it doesn't make sense to opt-out of the native set.
This removes some redundant COMPAT_FREEBSD* stuff from Linuxolator bits.
line_expr in makesyscalls.lua is fixed to allow empty strings to be
specified, since they're harmless.
Reviewed by: brooks, kib (both earlier version)
Differential Revision: https://reviews.freebsd.org/D33356
Suggested by: Michael Pratt <mpratt@google.com>
Reviewed by: Michael Pratt <mpratt@google.com>, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D33423