Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.
Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
This register set contains the values of the fsbase and gsbase
registers. Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps. In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.
The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34650
This permits I/O devices on the host to directly access wired memory
dedicated to guests using passthru devices. Note that wired memory
belonging to guests that do not use passthru devices has always been
accessible by I/O devices on the host.
bhyve maps guest physical addresses into the user address space of
the bhyve process by mmap'ing /dev/vmm/<vmname>. Device models pass
pointers derived from this mapping directly to system calls such as
preadv() to minimize copies when emulating DMA. If the backing store
for a device model is a raw host device (e.g. when exporting a raw disk
device such as /dev/ada<n> as a drive in the guest), the host device
driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host
directly to the guest's memory. However, if the guest's memory is
not present in the host IOMMU domain, these DMA requests by the host
device will fail without raising an error visible to the host device
driver or to the guest resulting in non-working I/O in the guest.
It is unclear why guest addresses were removed from the IOMMU host domain
initially, especially only for VM's with a passthru device as the
host IOMMU domain does not affect the permissions of passthru devices,
only devices on the host.
A considered alternative was using bounce buffers instead (D34535
is a proof of concept), but that adds additional overhead for unclear
benefit.
This solves a long-standing problem when using passthru devices and
physical disks in the same VM.
Thanks to: grehan (patience and help)
Thanks to: jhb (for improving the commit message)
PR: 260178
Reviewed by: grehan, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34607
Introduce a helper to fetch the TSC frequency from CPUID when running
under Xen.
Since the TSC can also be initialized early when running as a Xen
guest pull out the call to tsc_init() from the
early_clock_source_init() handlers and place it in clock_init(), as
otherwise all handlers would call tsc_init() anyway.
Reviewed by: markj
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34581
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).
It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.
Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
We assume EFI_PAGE_SIZE is the same as PAGE_SIZE, however this may not
be the case. Use the former when working with a list of pages from the
UEFI firmware so the correct size is used.
This will be needed on arm64 where PAGE_SIZE could be 16k or 64k in the
future. The other architectures have been updated to be consistent.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34510
As in commit c3d830cf7c71, we should finalize CPU identification before
probing the TSC frequency.
Fixes: 84369dd52369 ("x86: Probe the TSC frequency earlier")
Reported by: khng
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.
PR: 262155
Reviewed by: emaste
Tested by: emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34367
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34212
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27463
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.
Switch to use the headers in sys/contrib/xen.
Reviewed by: royger
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: jhb, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Atomics have significant other use besides providing in-system
primitives for safe memory updates. They are used for implementing
communication with out of system software or hardware following some
protocols.
For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor. Or
real hardware might need atomic accesses as part of the proper
management protocol.
Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.
Requested by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: Elliott Mitchell, imp, jhb, markj, royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Eliminate shlq $3,address shift after masking of the va is done, which
is needed to convert pt_entry_t[] array index into byte offset.
Do it by preshifting the mask, and compensating the right shift of va.
Suggested by: alc
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33786
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
ASLR stack randomization will reappear in a forthcoming commit. Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.
No functional change intended, as the stack gap implementation is
currently disabled by default.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro. With stack address randomization the
ps_strings address is no longer fixed.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Some guests or driver might depend on MTRR to work properly. E.g. the
nvidia gpu driver won't work without MTRR.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33333
- Use ia32_freebsd4_* instead of ia32_*4.
- Use ia32_o* instead of ia32_*3.
Reviewed by: brooks, imp, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33882
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104.
This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts.
Win10 does not use LAPIC timer to often and in most cases, and I see it
is disabled by writing 0 to Initial Count Register (for Timer).
During resume, restart timer only for enabled LAPIC and enabled timer
for that LAPIC.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33448
This removes one or two atomic update of the pte on access. Also it
makes the pte constant during its lifecycle.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33778
Pre-calculate masks and page table/page directory bases, for LA48 and
LA57 cases, trading a conditional for the memory accesses.
This shaves 672 bytes of .text, by the cost of 32 .data bytes. Note
that eliminated code contained one conditional, and several costly
movabs instructions, which are traded by two memory fetches per
vtop{t,d}e() inlines.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33776