Commit Graph

9021 Commits

Author SHA1 Message Date
Dmitry Chagin
80d8a4a003 linux(4): Make struct stat64 to match Linux actual one 2023-04-28 11:55:04 +03:00
Dmitry Chagin
cd0fca82bb linux(4): Regen for mknod syscall changes 2023-04-28 11:55:04 +03:00
Dmitry Chagin
ca3333dd4a linux(4): Use Linux dev_t type for mknod syscalls dev argument
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Prior the 2.6 kernel dev_t type was an unsigned short.
However, since the firs commit of the Linuxulator, mknod syscall get int dev
argument.
Also, there is some confusion here, while the kernel declares a dev_t type
as a 32-bit sized, the user-space dev_t type can be size of 64 bits, e.g.,
in the Glibc library.
To avoid confusion and to help porting of the Linuxulator to other platforms
use explicit l_dev_t for dev argument of mknod syscalls.
2023-04-28 11:55:02 +03:00
Dmitry Chagin
19973638be linux(4): Move dev_t type declaration under /compat/linux
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Move it into the MI linux.h under /compat/linux.
2023-04-28 11:55:02 +03:00
Dmitry Chagin
e0bfe0d62c linux(4): Make struct newstat to match actual Linux one
In the struct stat the st_dev, st_rdev are unsigned long.
2023-04-28 11:55:01 +03:00
Dmitry Chagin
023e688496 linux(4): Regen for struct l_old_stat changes 2023-04-28 11:55:01 +03:00
Dmitry Chagin
2370c7321f linux(4): Update syscalls.master to reflect struct l_old_stat 2023-04-28 11:54:59 +03:00
Dmitry Chagin
391fd1e1a1 linux(4): Mark old fstat syscal as unimplemented
It looks like the old fstat system call never been implemented.
2023-04-28 11:54:59 +03:00
Dmitry Chagin
a408fc097f linux(4): Rename obsolete old struct l_stat to struct l_old_stat 2023-04-28 11:54:59 +03:00
Mark Johnston
74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
This avoids bloating the BSS when MAXCPU is large.

No functional change intended.

PR:		269572
Reviewed by:	corvink, rew
Tested by:	rew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39805
2023-04-26 10:08:42 -04:00
Vitaliy Gusev
0912408a28
vmm: fix HLT loop while vcpu has requested virtual interrupts
This fixes the detection of pending interrupts when pirval is 0 and the
pending bit is set

More information how this situation occurs, can be found here:
c5b5f2d808/sys/amd64/vmm/intel/vmx.c (L4016-L4031)

Reviewed by:		corvink, markj
Fixes:			02cc877968 ("Recognize a pending virtual interrupt while emulating the halt instruction.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39620
2023-04-26 10:38:46 +02:00
Mateusz Guzik
95e4f5ef7c x86: whack pmspcv from GENERIC
The driver is enormous and rarely used.

      text      data       bss        dec         hex   filename
  23076646   1870505   4415872   29363023   0x1c00b4f   kernel.before
  20017433   1870305   4416000   26303738   0x1915cfa   kernel.after

People using the driver will need to add pmspcv_load="YES" to
their loader.conf.

Reviewed by:	jhb
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39816
2023-04-25 18:09:44 +00:00
Mark Johnston
47cf1b37f4 vmm: Expose some more AVX512 CPUID bits to guests
This is required to announce support for some accelerated AES
operations.  AVX512BW indicates support for the AVX512-FP16 extension
and AVX512VL indicates support for the use of AVX512 instructions with
vector lengths smaller than 512 bits.

VAES and VPCLMULQDQ extensions indicate that VEX-prefixed AES-NI and
pclmulqdq instructions are supported.

All of these bits are needed for OpenSSL to use VAES to accelerate
AES-GCM transforms.

Reviewed by:	corvink, kib, jhb
MFC after:	2 weeks
Sponsored by:	Stormshield
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D39781
2023-04-25 13:35:14 -04:00
Dmitry Chagin
56c5230afd linux(4): Fix LINUX_AT_COUNT comments
Differential Revision:	https://reviews.freebsd.org/D39645
MFC after:		1 month
2023-04-22 22:16:43 +03:00
Dmitry Chagin
7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Warner Losh
559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Dmitry Chagin
de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin
d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Konstantin Belousov
617a11eab6 x86: initialize use_xsave once
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".

The fix is to protect "use_xsave" from being initialized more than once.

Reported and reviewed by:	stevek
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39660
2023-04-19 02:22:28 +03:00
Konstantin Belousov
1e0e335b0f amd64: fix PKRU and swapout interaction
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
2023-04-15 02:53:59 +03:00
Julien Grall
ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell
af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell
ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell
b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Henri Hennebert
71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Dmitry Chagin
50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin
1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Konstantin Belousov
cd137909c3 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-03-29 21:45:20 +03:00
Elliott Mitchell
9f3be3a6ec xen: switch to using core atomics for synchronization
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen.  A single core VM isn't too
unusual, but actual single core hardware is uncommon.

Replace an open-coding of evtchn_clear_port() with the inline.

Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:42 +02:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin
7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
Konstantin Belousov
2b4b3789f8 acpi_wakeup.c: apply the reviewer' editorial corrections to the comment text.
Fixes:	02904a06c7
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:47:19 +02:00
Konstantin Belousov
02904a06c7 amd64: properly recalculate mitigations knobs after resume
Revision r333125 AKA 986c4ca387 forced clear cpu_stdext_feature3
on suspend, since at that time microcode update was not reloaded
early on resume. Then, revision 050f5a8405 started re-reading
cpu_stdext_feature3 again. Since modern CPUs do not require mitigations
from the Skylake era, this went unnoticed for some time.

Keep zeroing cpu_stdext_feature3 on suspend, but re-read it in more
controlled way on resume after microcode is reloaded, and recalculate
active workarounds based on actual microcode capabilities.

Reported and tested by:	romain
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:40:05 +02:00
Konstantin Belousov
ff6d60946a amd64 acpi_wakeup.c: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-17 15:10:34 +02:00
Vitaliy Gusev
94a3876d7e
vmm: fix missing ipi statistic
ipi counters are missing in bhyvectl's output because vm_maxcpu is 0
when initializing them. That's because vmm_stat_register is executed
before vmm_init.

Instead of directly fixing it, there's a better solution in illumos
which is cherry picked:
65a3bc8373

It replaces the matrix statistic by two counters per vcpu. One for
counting the ipis to the vcpu and one counting the ipis received by the
vcpu. This has several advantages:

- A matrix statistic becomes huge when using many vcpus.
- A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit.
- Two counters are enough in most cases. DTrace can be used for more
  advanced debugging purposes.
- A matrix statistic wastes memory. The matrix size is determined by
  vm_maxcpu regardless of the number of vcpus assigned to the vm.

Reviewed by:		corvink, markj
Fixes:			ee98f99d7a ("vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39038
2023-03-17 13:50:08 +01:00
Dmitry Chagin
9e7f03e9c6 linux(4): Drop unncessary struct l_ifconf declaration from amd64/linux
Its needed only for amd64/linux32 Linuxulator.

Differential Revision:	https://reviews.freebsd.org/D38793
2023-03-04 12:11:38 +03:00
Dmitry Chagin
cabbfb60d0 linux(4): Reduce code duplication between MD files
Move struct ifnet definitions under compat/linux.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38791
2023-03-04 12:11:38 +03:00
John-Mark Gurney
2fee875629
abstract out the vm detection via smbios..
This makes the detection of VMs common between platforms that
have SMBios.

Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D38800
2023-03-02 16:54:21 -08:00
Vitaliy Gusev
8104fc31a2
bhyve: fix restore of kernel structs
vmx_snapshot() and svm_snapshot() do not save any data and error occurs at
resume:

Restoring kernel structs...
vm_restore_kern_struct: Kernel struct size was 0 for: vmx
Failed to restore kernel structs.

Reviewed by:		corvink, markj
Fixes:			39ec056e6d ("vmm: Rework snapshotting of CPU-specific per-vCPU data.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38476
2023-02-28 13:37:53 +01:00
Vitaliy Gusev
281b496f22
vmm: fix restore of TSC offset
After suspend/resume Ubuntu 20.04 and 22.04 installer can hang if
tsc-early clocksource has a big skew.

Reviewed by:		corvink, jhb
Fixes:			a7db532e3a ("vmm: Simplify saving of absolute TSC values in snapshots.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38474
2023-02-28 13:37:44 +01:00
Mike Karels
dd6f6030cc amd64 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces.  Make them consistent.
2023-02-24 08:36:28 -06:00
Mateusz Guzik
6b9acd1bfb Exclude MMCCAM kernels from make universe
They don't provide any value and are quite arbitrary.

Note arm64 GENERIC-MMCCAM was already excluded, just not the NODEBUG
variant.

The option is already build-tested with arm64 LINT kernel.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D38458
2023-02-16 07:29:53 +00:00
Dmitry Chagin
c8a79231a5 linux(4): Rename linux_timer.h to linux_time.h
To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.

MFC after:		2 weeks
2023-02-14 17:46:33 +03:00
Dmitry Chagin
2456a45929 linux(4): Cleanup includes under amd64/linux
Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after:		2 weeks
2023-02-14 17:46:32 +03:00
Dmitry Chagin
31e938c531 linux(4): Cleanup vm includes from linux_util.h
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.

MFC after:		2 weeks
2023-02-14 17:46:30 +03:00
Dmitry Chagin
06c07e1203 Complete removal of opt_compat.h
Since Linux emulation layer build options was removed there is no reason
to keep opt_compat.h.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38548
MFC after:		2 weeks
2023-02-13 19:07:38 +03:00
Dmitry Chagin
10d16789a3 linux(4): Get rid of the opt_compat.h include.
Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after:		2 weeks
2023-02-12 20:24:32 +03:00
Mark Johnston
b265a2e0d7 vmm: Fix AP startup compatibility for old bhyve executables
These changes unbreak AP startup when using a 13.1-RELEASE bhyve
executable with a newer kernel:
- Correct the destination mask for the VM_EXITCODE_IPI message generated
  by an INIT or STARTUP IPI in vlapic_icrlo_write_handler().
- Only initialize vlapics on active vCPUs.  13.1-RELEASE bhyve activates
  AP vCPUs only after the BSP starts them with an IPI, and vmm now
  allocates vcpu structures lazily, so the STARTUP handling in
  vm_handle_ipi() could trigger a page fault.
- Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP
  message.

Fixes:	7c326ab5bb ("vmm: don't lock a mtx in the icr_low write handler")
Reviewed by:	jhb, corvink
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Mark Johnston
ba34de1b3b vmm: Remove an unneeded initialization of "retu"
vm_handle_ipi() unconditionally initializes "retu".  No functional
change intended.

Reviewed by:	jhb, corvink
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00