Commit Graph

9043 Commits

Author SHA1 Message Date
Konstantin Belousov
231d75568f Move INVLPG to pmap_quick_enter_page() from pmap_quick_remove_page().
If processor prefetches neighboring TLB entries to the one being accessed
(as some have been reported to do), then the spin lock does not prevent
the situation described in the "AMD64 Architecture Programmer's Manual
Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency
Considerations".

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:46 +02:00
Konstantin Belousov
cde70e312c amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems.  The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.

Reviewed by:	alc (previous version)
Discussed with:	emaste, markj
Tested by:	karels
PR:	261169, 266145
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Konstantin Belousov
45ac7755a7 amd64: identify small cores
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Andrew Gallatin
1cac76c93f vm: reduce lock contention when processing vm batchqueues
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.

So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.

This has been run for several months on a busy Netflix server, as well
as on my personal desktop.

Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
2022-12-14 14:34:07 -05:00
Alan Cox
f0878da03b pmap: standardize promotion conditions between amd64 and arm64
On amd64, don't abort promotion due to a missing accessed bit in a
mapping before possibly write protecting that mapping.  Previously,
in some cases, we might not repromote after madvise(MADV_FREE) because
there was no write fault to trigger the repromotion.  Conversely, on
arm64, don't pointlessly, yet harmlessly, write protect physical pages
that aren't part of the physical superpage.

Don't count aborted promotions due to explicit promotion prohibition
(arm64) or hardware errata (amd64) as ordinary promotion failures.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36916
2022-12-12 11:32:50 -06:00
John Baldwin
af3b48e101 vmm: Free vCPUs when destroying them.
Reported by:	andrew
Reviewed by:	corvink, andrew, markj
Differential Revision:	https://reviews.freebsd.org/D37649
2022-12-09 10:27:05 -08:00
John Baldwin
d212d6ebb4 vmm: Avoid infinite loop in vcpu_lock_all error case.
Reported by:	Coverity (CIDs 1501060,1501071)
Reviewed by:	corvink, markj, emaste
Differential Revision:	https://reviews.freebsd.org/D37648
2022-12-09 10:26:49 -08:00
John Baldwin
91980db1be vmm: Don't lock a vCPU for VM_PPTDEV_MSI[X].
These are manipulating state in a ppt(4) device none of which is
vCPU-specific.  Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37639
2022-12-09 10:26:23 -08:00
John Baldwin
62be9ffd82 vmm: VM_GET/SET_KERNEMU_DEV should run with the vCPU locked.
Reviewed by:	corvink, kib, markj
Differential Revision:	https://reviews.freebsd.org/D37638
2022-12-09 10:25:30 -08:00
John Baldwin
1f6db5d6b5 vmm: Remove stale comment for vm_rendezvous.
Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523d.

Reported by:	andrew
2022-11-30 13:06:46 -08:00
Bjoern A. Zeeb
4a8e4d1546 net80211: fix IEEE80211_DEBUG_REFCNT builds
Remove the KPI/KBI changes from ieee80211_node.h and always use the
macros to pass in __func__ and __LINE__ to the functions.
The actual implementations are prefixed by "_" rather than suffixed
by "_debug" as they no longer are "debug"-specific.

Some of the select functions were not actually using the passed in
func, line options; however they are calling other functions which
use them.  Directly call the internal implementation in those cases
passing the arguments on.

Use a file-local __debrefcnt_used define to mark the arguments __unused
in cases when we compile without IEEE80211_DEBUG_REFCNT and hope the
toolchain is intelligent enough to not pass them at all in those cases.

Also _ieee80211_free_node() now has a conflict so make the previous
_ieee80211_free_node() the new __ieee80211_free_node().

Add IEEE80211_DEBUG_REFCNT to the NOTES file on amd64 to keep exercising
the option.

Sponsored by:	The FreeBSD Foundation
X-MFC:		never
Discussed on:	freebsd-wireless
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D37529
2022-11-29 21:20:37 +00:00
Corvin Köhne
7c326ab5bb
vmm: don't lock a mtx in the icr_low write handler
x2apic accesses are handled by a wrmsr exit. This handler is called in a
critical section. So, we can't lock a mtx in the icr_low handler.

Reported by:		kp, pho
Tested by:		kp, pho
Approved by:		manu (mentor)
Fixes:			c0f35dbf19 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
MFC after:		1 week
MFC with:		c0f35dbf19
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37452
2022-11-23 09:00:04 +01:00
Corvin Köhne
fde8ce8892
vmm: remove unneccessary rendezvous assertion
When a vcpu sees that a rendezvous is in progress, it exits and tries to
handle the rendezvous. The vcpu doesn't check if it's part of the
rendezvous or not. If the vcpu isn't part of the rendezvous, the
rendezvous could be done before it reaches the assertion. This will
cause a panic.

The assertion isn't needed at all because vm_handle_rendezvous properly
handles a spurious rendezvous. So, we can just remove it.

PR:			267779
Reviewed by:		jhb, markj
Tested by:		bz
Approved by:		manu (mentor)
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37417
2022-11-21 08:19:36 +01:00
Dmitry Chagin
2ee1a18d51 vmm: Fix build w/o KDTRACE_HOOKS.
Reviewed by:		imp
Differential revision:	https://reviews.freebsd.org/D37446
2022-11-20 18:00:55 +03:00
Cy Schubert
d487cba33d vmm: Fix non-INVARIANTS build
Reported by:	O. Hartmann <freebsd@walstatt-de.de>
Reviewed by:	jhb
Fixes:		58eefc67a1
Differential Revision:	https://reviews.freebsd.org/D37444
2022-11-18 13:20:13 -08:00
Mark Johnston
ca6b48f080 vmm: Restore the correct vm_inject_*() prototypes
Fixes:	80cb5d845b ("vmm: Pass vcpu instead of vm and vcpuid...")
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D37443
2022-11-18 14:11:48 -05:00
John Baldwin
49fd5115a9 vmm: Trim some pointless #ifdef KTR.
Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37272
2022-11-18 10:25:39 -08:00
John Baldwin
ee98f99d7a vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.
The default is now the number of physical CPUs in the system rather
than 16.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37175
2022-11-18 10:25:39 -08:00
John Baldwin
98568a005a vmm: Allocate vCPUs on first use of a vCPU.
Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use.  This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use.  A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use.  However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs.  To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37174
2022-11-18 10:25:38 -08:00
John Baldwin
c0f35dbf19 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs.  INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.

In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37173
2022-11-18 10:25:38 -08:00
John Baldwin
223de44c93 vmm devmem_mmap_single: Bump object reference under memsegs lock.
Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37273
2022-11-18 10:25:38 -08:00
John Baldwin
67b69e76e8 vmm: Use an sx lock to protect the memory map.
Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU.  This is now
replaced by a new per-VM sx lock.  Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37172
2022-11-18 10:25:38 -08:00
John Baldwin
08ebb36076 vmm: Destroy mutexes.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37171
2022-11-18 10:25:38 -08:00
John Baldwin
d5118d0fc4 vmm stat: Add a special nelems constant for arrays sized by vCPU count.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37170
2022-11-18 10:25:38 -08:00
John Baldwin
58eefc67a1 vmm vmx: Allocate vpids on demand as each vCPU is initialized.
Compared to the previous version this does mean that if the system as
a whole runs out of dedicated vPIDs you might end up with some vCPUs
within a single VM using dedicated vPIDs and others using shared
vPIDs, but this should not break anything.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37169
2022-11-18 10:25:38 -08:00
John Baldwin
3f0f4b1598 vmm: Lookup vcpu pointers in vmmdev_ioctl.
Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c.  For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37168
2022-11-18 10:25:38 -08:00
John Baldwin
0cbc39d53d vmm ppt: Remove unused vcpu arg from MSI setup handlers.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37167
2022-11-18 10:25:37 -08:00
John Baldwin
e42c24d56b vmm: Remove unused vcpuid argument from vioapic_process_eoi.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37166
2022-11-18 10:25:37 -08:00
John Baldwin
d8be3d523d vmm: Use struct vcpu in the rendezvous code.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37165
2022-11-18 10:25:37 -08:00
John Baldwin
949f0f47a4 vmm: Remove support for vm_rendezvous with a cpuid of -1.
This is not currently used.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37164
2022-11-18 10:25:37 -08:00
John Baldwin
9388bc1e3a vmm: Remove vcpuid from I/O port handlers.
No I/O ports are vCPU-specific (unlike memory which does have
vCPU-specific ranges such as the local APIC).

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37163
2022-11-18 10:25:37 -08:00
John Baldwin
80cb5d845b vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37162
2022-11-18 10:25:37 -08:00
John Baldwin
d3956e4673 vmm: Use struct vcpu in the instruction emulation code.
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code.  To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37161
2022-11-18 10:25:37 -08:00
John Baldwin
28b561ad9d vmm: Add vm_gpa_hold_global wrapper function.
This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally.  Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37160
2022-11-18 10:25:36 -08:00
John Baldwin
0f435e6476 vmm: Add _KERNEL guards for io headers shared with userspace.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37159
2022-11-18 10:25:36 -08:00
John Baldwin
2b4fe856f4 bhyve: Remove unused vm and vcpu arguments from vm_copy routines.
The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37158
2022-11-18 10:25:36 -08:00
John Baldwin
3dc3d32ad6 vmm: Use struct vcpu with the vmm_stat API.
The function callbacks still use struct vm and and vCPU index.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37157
2022-11-18 10:25:36 -08:00
John Baldwin
950af9ffc6 vmm: Expose struct vcpu as an opaque type.
Pass a pointer to the current struct vcpu to the vcpu_init callback
and save this pointer in the CPU-specific vcpu structures.

Add routines to fetch a struct vcpu by index from a VM and to query
the VM and vcpuid from a struct vcpu.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37156
2022-11-18 10:25:36 -08:00
John Baldwin
d030f941e6 vmm: Use VLAPIC_CTR* in more places.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37155
2022-11-18 10:25:36 -08:00
John Baldwin
57e0119ef3 vmm vmx: Add VMX_CTR* wrapper macros.
These macros are similar to VCPU_CTR* but accept a single vmx_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37154
2022-11-18 10:25:36 -08:00
John Baldwin
fca494dad0 vmm svm: Add SVM_CTR* wrapper macros.
These macros are similar to VCPU_CTR* but accept a single svm_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37153
2022-11-18 10:25:36 -08:00
John Baldwin
869c8d1946 vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.
This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure.  Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends.  In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37152
2022-11-18 10:25:35 -08:00
John Baldwin
1aa5150479 vmm: Refactor storage of CPU-dependent per-vCPU data.
Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.

That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer.  This cookie pointer is now passed to other vmmops
callbacks in place of the integer index.  The index is now only used
in KTR traces and when calling back into the CPU-independent layer.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37151
2022-11-18 10:25:35 -08:00
John Baldwin
73abae4493 vmm vmx: Add a global bool to indicate if the host has the TSC_AUX MSR.
A future commit will remove direct access to vCPU structures from
struct vmx, so add a dedicated boolean for this rather than checking
the capabilities for vCPU 0.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37269
2022-11-18 10:25:35 -08:00
John Baldwin
39ec056e6d vmm: Rework snapshotting of CPU-specific per-vCPU data.
Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot.  Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'.  Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37150
2022-11-18 10:25:35 -08:00
John Baldwin
19b9dd2e08 vmm svm: Mark all VMCB state caches dirty on vCPU restore.
Mark Johnston noticed that this was missing VMCB_CACHE_LBR.  Just set
all the bits as is done in svm_run() rather than trying to clear
individual bits.

Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37259
2022-11-18 10:25:35 -08:00
John Baldwin
0f00260c67 vmm vmx: Refactor per-vCPU data.
Add a struct vmx_vcpu to hold per-vCPU data specific to VT-x and
move parallel arrays out of struct vmx into a single array of
this structure.

While here, dynamically allocate the VMCS, APIC page and PIR
descriptors for each vCPU rather than embedding them.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37149
2022-11-18 10:25:35 -08:00
John Baldwin
215d2fd53f vmm svm: Refactor per-vCPU data.
- Allocate VMCBs separately to avoid excessive padding in struct
  svm_vcpu.

- Allocate APIC pages dynamically directly in struct vlapic.

- Move vm_mtrr into struct svm_vcpu rather than using a separate
  parallel array.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37148
2022-11-18 10:25:35 -08:00
John Baldwin
35abc6c238 vmm: Use vm_get_maxcpus() instead of VM_MAXCPU in various places.
Mostly these are loops that iterate over all possible vCPU IDs for a
specific virtual machine.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37147
2022-11-18 10:25:34 -08:00
John Baldwin
a7db532e3a vmm: Simplify saving of absolute TSC values in snapshots.
Read the current "now" TSC value and use it to compute absolute time
saved value in vm_snapshot_vcpus rather than iterating over vCPUs
multiple times in vm_snapshot_vm.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37146
2022-11-18 10:25:34 -08:00
Mateusz Guzik
c3f1a13902 Retire broken GPROF support from the kernel
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.

Interested parties can use pmcstat or dtrace instead.
2022-11-15 14:17:10 +00:00
Mark Johnston
8b1adff8bc bhyve: Drop volatile qualifiers from snapshot code
They accomplish nothing since the qualifier is casted away in calls to
memcpy() and copyin()/copyout().  No functional change intended.

MFC after:	2 weeks
Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D37292
2022-11-11 10:02:26 -05:00
Elliott Mitchell
ccd9b49f20 sys: use .S for assembly language files that use the preprocessor
Reviewed by:	imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/609
Differential Revision: https://reviews.freebsd.org/D35908
2022-11-02 10:29:00 -04:00
Konstantin Belousov
4d447b30f7 vmm: do not leak halted_cpus bit after suspension
Reported by:	bz
PR:	267468
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37227
2022-11-01 20:44:42 +02:00
Mitchell Horne
aba921bd9e ddb: print the actual syscall name
Some architectures will pretty-print a system call trap in the
backtrace. Rather than printing the symbol, use the syscallname()
function to pull the string from the sv_syscallnames array corresponding
to the process. This simplifies the function somewhat.

Mostly, this will result in dropping the "sys" prefix, e.g. "sys_exit"
will now be printed simply as "exit".

Make two minor tweaks to the function signature: use a u_int for the
syscall number since this is a more correct type (see the 'code' member
of struct syscall_args), and make the thread pointer the first argument.
The latter is more natural and conventional.

Suggested by:   jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37200
2022-10-28 18:21:08 -03:00
Mitchell Horne
1da65dcb1c linux: populate sv_syscallnames in each sysentvec
This allows the syscallname() function to give a usable result for Linux
ABIs.

Reported by:	jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37199
2022-10-28 18:21:08 -03:00
Jung-uk Kim
19ee8335c5 acpica: Merge ACPICA 20221020 2022-10-27 22:04:32 -04:00
John Baldwin
769b884e2e vmm: Fix AP startup with old userspace binaries.
Older binaries that do not request IPI exits to userspace do not
start user threads for other vCPUs until a STARTUP IPI triggers a
VM_EXITCODE_SPINUP_AP exit to userland.  This means that those vcpus
are not yet active (in terms of vm_active_cpus) when the INIT and
STARTUP IPIs are delivered to the vCPUs.

The changes in commit 0bda8d3e9f changed the INIT and STARTUP IPIs
to reuse the existing vlapic_calcdest() function.  This function
silently ignores IPIs sent to inactive vCPUs.  As a result, when using
an old bhyve binary, the INIT and STARTUP IPIs sent to wakeup APs were
ignored.

To fix, restructure the compat code for the INIT and STARTUP IPIs to
ignore the results of vlapic_calcdest() and manually parse the APIC ID
and resulting vcpuid.  As part of this, make the compat code always
conditonal on the ipi_exit capability being disabled.

Reviewed by:	c.koehne_beckhoff.com, markj
Differential Revision:	https://reviews.freebsd.org/D37093
2022-10-26 14:22:56 -07:00
Mark Johnston
ed72168431 bhyve: Address some signed/unsigned comparison warnings
MFC after:	1 week
2022-10-25 11:16:57 -04:00
Konstantin Belousov
934bfc128e Add vm_page_any_valid()
Use it and several other vm_page_*_valid() functions in more places.

Suggested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Colin Percival
469ad86031 amd64: Add FIRECRACKER kernel configuration
This kernel configuration supports the Firecracker VMM environment.

Relnotes:	FreeBSD can now run inside the Firecracker VMM
		via the amd64 FIRECRACKER kernel configuration.
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36672
2022-10-17 23:02:22 -07:00
Corvin Köhne
2a2a64c4b9 vmm: validate icr value
Not all combinations of icr values are allowed. Neither Intel nor AMD
document what happens when an invalid value is written to the icr.
Ignore the IPI. So, the guest will note that the IPI wasn't delivered.

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D36946
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Corvin Köhne
f56801d6d9 vmm: increase vlapic version
Mac os panics on apic versions lower than 0x14.

See https://opensource.apple.com/source/xnu/xnu-7195.81.3/osfmk/i386/lapic_native.c.auto.html

Additionally, an upcoming commit will validate the icr values written by
the guest. Older intel processors allow some different combinations than
the newer ones. AMD documents that only the newer combinations are
allowed. So, bumping the version allows us to avoid a differentiation
between AMD and Intel.

Intel documents that newer processors than the P6 are using the new
combinations. Sadly, Intel does not document which apic version belongs
to those processors. Linux identifies newer apics by a version larger or
equal to 0x14. Intel and AMD allow apic version between 0x10 and 0x15.
So, using 0x14 seems to be fine.

See 3eba620e7b/arch/x86/kernel/apic/apic.c (L238)

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D36945
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Corvin Köhne
0bda8d3e9f vmm: permit some IPIs to be handled by userspace
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and STARTUP IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D35623
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Konstantin Belousov
e0612ed490 amd64 pmap: add comment explaining why INVLPG is functional for PCID config
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:17 +03:00
Konstantin Belousov
273d0715f6 amd64: remove useless addr2 variables in page range invalidation handlers
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:12 +03:00
Mark Johnston
98d920d9cf bhyve: Annotate unused function parameters
MFC after:	1 week
2022-10-08 11:33:21 -04:00
John Baldwin
4d90a5afc5 sys: Consolidate common implementation details of PV entries.
Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.

Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).

Suggested by:	kib
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36685
2022-10-07 10:14:03 -07:00
Mitchell Horne
b05b1ecbef amd64, arm64 pmap: fix a comment typo
There is no such error code.

Fixes:	1d5ebad06c ("pmap: optimize MADV_WILLNEED on existing superpages")
2022-10-06 19:04:54 -03:00
Konstantin Belousov
85b715baae amd64/db_trace.c: remove stray prototype
Sponsored by:	NVIDIA networking
MFC after:	1 week
2022-10-04 01:50:30 +03:00
Mitchell Horne
754cb545b6 ddb: de-duplicate decode_syscall()
Only i386 and amd64 print the decoded syscall name in the backtrace.
This de-duplication facilitates further changes and adoption by other
platforms.

Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36565
2022-10-03 13:49:54 -03:00
Alan Cox
1d5ebad06c pmap: optimize MADV_WILLNEED on existing superpages
Specifically, avoid pointless calls to pmap_enter_quick_locked() when
madvise(MADV_WILLNEED) is applied to an existing superpage mapping.

Reported by:	mhorne
Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36801
2022-09-30 12:14:05 -05:00
John Baldwin
a35572b16e linux32: binutils as requires %eflags instead of %flags for CFI.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36781
2022-09-29 15:06:01 -07:00
Konstantin Belousov
648fa3558c amd64: Initialize IPI scoreboard earlier
Scoreboard is needed a moment when smp_started == true.  If some kernel
daemon thread is started before scoreboard is inited, and does some pmap
operation that requires TLB maintanence, which races with SMP startup,
we might dereference NULL invl_scoreboard.  This is particularly easy
to trigger when EARLY_AP_STARTUP is not defined.

Reported by:	glebius
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36766
2022-09-28 16:23:52 +03:00
Mark Johnston
4551cbbe99 amd64: Ignore 1GB mappings in pmap_advise()
This assertion can be triggered by usermode since vm_map_madvise()
doesn't force advice to be applied to an entire largepage mapping.  I
can't see any reason not to permit it, however, since MADV_DONTNEED and
_FREE are advisory and we can simply do nothing when a 1GB mapping is
encountered.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36675
2022-09-24 09:28:41 -04:00
Mark Johnston
6c2e9f4c32 amd64: Handle 1GB mappings in pmap_enter_quick_locked()
This code path can be triggered by applying MADV_WILLNEED to a 1GB
mapping.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36674
2022-09-24 09:28:41 -04:00
Mark Johnston
0b29f5efcc amd64: Make it possible to grow the KERNBASE region of KVA
pmap_growkernel() may be called when mapping a region above KERNBASE,
typically for a kernel module.  If we have enough PTPs left over from
bootstrap, pmap_growkernel() does nothing.  However, it's possible to
run out, and in this case pmap_growkernel() will try to grow the kernel
map all the way from kernel_vm_end to somewhere past KERNBASE, which can
easily run the system out of memory.  This happens with large kernel
modules such as the nvidia GPU driver.  There is also a WIP dtrace
provider which needs to map KVA in the region above KERNBASE (to provide
trampolines which allow a copy of traced kernel instruction to be
executed), and its allocations could potentially trigger this scenario.

This change modifies pmap_growkernel() to manage the two regions
separately, allowing them to grow independently.  The end of the
KERNBASE region is tracked by modifying "nkpt".

PR:		265019
Reviewed by:	alc, imp, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36673
2022-09-24 09:27:50 -04:00
John Baldwin
f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin
7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Richard Scheffenegger
bb1d472d79 tcp: make CUBIC the default congestion control mechanism.
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.

Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
2022-09-13 12:09:21 +02:00
Alan Cox
8d7ee2047c pmap: don't recompute mpte during promotion
When attempting to promote 4KB user-space mappings to a 2MB user-space
mapping, the address of the struct vm_page representing the page table
page that contains the 4KB mappings is already known to the caller.
Pass that address to the promotion function rather than making the
promotion function recompute it, which on arm64 entails iteration over
the vm_phys_segs array by PHYS_TO_VM_PAGE().  And, while I'm here,
eliminate unnecessary arithmetic from the calculation of the first PTE's
address on arm64.

MFC after:	1 week
2022-09-11 01:19:22 -05:00
Emmanuel Vadot
3fc174845c Revert "vmm: permit some IPIs to be handled by userspace"
This reverts commit a5a918b7a9.

This cause some problem with vm using bhyveload.

Reported by:	pho, kp
2022-09-09 15:55:01 +02:00
Emmanuel Vadot
83b65d0ae1 Revert "vmm: Remove unneeded variable maxcpus"
This reverts commit 653c36179d.
2022-09-09 15:54:56 +02:00
Emmanuel Vadot
653c36179d vmm: Remove unneeded variable maxcpus
Reported by:	FreeBSD User <freebsd@walstatt-de.de>
Fixes:	a5a918b7a9 ("vmm: permit some IPIs to be handled by userspace")
2022-09-07 11:41:16 +02:00
Corvin Köhne
a5a918b7a9 vmm: permit some IPIs to be handled by userspace
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

MFC after:              2 weeks
Differential Revision:  https://reviews.freebsd.org/D35623
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-09-07 09:07:03 +02:00
Warner Losh
991aef9795 acpi: Move some errors with RSDP and XSLT out from under bootverbose
Failure to map RSDP, XSLT and checksum failures are events that can't
happen unless something has gone wrong. As such, they should be reported
always, and not in bootverbose. This has been this way since it was
originally brought in to parse APIC tables.

Sponsored by:		Netflix
Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D36406
2022-09-01 10:40:15 -06:00
Warner Losh
a14b26a6bd acpi: Unmap RSDP in more error cases
Add missing pmap_unmapbios() calls for when we return 0. Otherwise we
can leave the table mapped when it is of no use.

Sponsored by:		Netflix
Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D36405
2022-09-01 10:39:20 -06:00
Konstantin Belousov
c1a0ab5ec5 amd64: update comment for casueword/casueword32, mentioning return value 1
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-08-26 02:41:48 +03:00
Mateusz Guzik
e621cb0be2 amd64: dump standard registers when crashing
Sample output:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x2
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80556853
stack pointer           = 0x28:0xffffffff8141bf50
frame pointer           = 0x28:0xffffffff8141bfa0
code segment            = base 0x0, limit 0xfffff, type 0x1b
		        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
rdi: fffff80002c9c400 rsi: ffffffff80b89183 rdx:                0
rcx:                2  r8:               fe  r9:                1
rax: fffff80002c9c400 rbx:                1 rbp: ffffffff8141bfa0
r10:                0 r11: ffffffff80b97f8c r12:                0
r13:                0 r14:                0 r15:                0
trap number             = 12
panic: page fault
cpuid = 1
time = 1

Reviewed by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36348
2022-08-25 17:33:07 +00:00
Konstantin Belousov
ff32a05554 x86: improve machdep.uprintf_signal
Print %eax/%rax.
Use better format strings, like %#x.

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36302
2022-08-24 22:12:45 +03:00
Konstantin Belousov
01a33b2af5 x86: print trap name in addition of trap number
for the "trap with interrupts disabled" warning.

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36302
2022-08-24 22:12:37 +03:00
John Baldwin
e663907366 Define _NPCM and the last PC_FREEn constant in terms of _NPCPV.
This applies one of the changes from
5567d6b441 to other architectures
besides arm64.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36263
2022-08-23 13:31:02 -07:00
John Baldwin
c94f30ea85 bhyve: Validate host PAs used to map passthrough BARs.
Reject attempts to map host physical address ranges that are not
subsets of a passthrough device's BAR into a guest.

Reviewed by:	markj, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36238
2022-08-19 15:03:01 -07:00
Maxim Sobolev
6a70a0c8bf Document implicit dependencies of the mlx5(4) & friends.
MFC after:      2 weeks
2022-08-11 16:33:09 -07:00
Mateusz Guzik
648edd6378 x86: remove MP_WATCHDOG
It does not work with ULE, which is the default scheduler for over a
decade.

Reviewed by:	emaste, kib
Differential Revision:	https://reviews.freebsd.org/D36094
2022-08-11 21:35:32 +00:00
Konstantin Belousov
c6d31b8306 AST: rework
Make most AST handlers dynamically registered.  This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it.  For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit.  For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed.  There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it.  In fact, this
is already present behavior for hwpmc.ko and ufs.ko.  I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by:	markj
Tested by:	emaste (arm64), pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Konstantin Belousov
4a5ec55af6 amd64: expicitly re-init td_frame in copy_thread()
Otherwise we are using whatever the value was left from the previous
thread run on kernel entry from usermode. Typically it would be the
desired value as is, but it is not guaranteed.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Corvin Köhne
4eadbef924 vmm: emulate INVD by ignoring it
On physical systems the ram isn't initialized on boot. So, coreboot uses
the cache as ram in this boot phase. When exiting cache as ram, coreboot
calls INVD for making the cache consistent.

In a virtual environment ram is always initialized and the cache is
always consistent. So, we can safely ignore this call.

Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D35620
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2022-07-27 18:20:47 +02:00
Mark Johnston
f4f56ff43d qat: Rename to qat_c2xxx and remove support for modern chipsets
A replacement QAT driver will be imported, but this replacement does not
support Atom C2xxx hardware.  So, the existing driver will be kept
around to provide opencrypto offload support for those chipsets.

Reviewed by:	pauamma, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35817
2022-07-27 11:10:52 -04:00
Dimitry Andric
7a1f289bd2 Fix unused variable warning in amd64's pmap.c
With clang 15, the following -Werror warning is produced:

    sys/amd64/amd64/pmap.c:8274:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
            int allfree, field, freed, i, idx;
                                ^

The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.

MFC after:	3 days
2022-07-26 22:08:10 +02:00
Mitchell Horne
c84c5e00ac ddb: annotate some commands with DB_CMD_MEMSAFE
This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35583
2022-07-18 22:06:09 +00:00
Kornel Dulęba
361971fbca Rework how shared page related data is stored
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35393
2022-07-18 16:27:32 +02:00
Kornel Dulęba
f6ac79fb12 Introduce the PROC_SIGCODE() macro
Use a getter macro instead of fetching the sigcode address directly
from a sysent of a given process. It assumes that the sigcode is stored
in the shared page, which is true in all cases, except for a.out
binaries. This will be later useful when the shared page address
randomization is introduced.
No functional change intended.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35392
2022-07-18 16:27:26 +02:00
Colin Percival
07007f3147 uart: Don't check SPCR tables if !late_console
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap.  (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)

Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses.  As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35794
2022-07-13 23:17:44 -07:00
Mark Johnston
6b38974085 vm_object: Modify various drivers to allocate OBJT_SWAP objects
This is in preparation for removal of OBJT_DEFAULT.  In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.

Reviewed by:	alc, hselasky, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35779
2022-07-12 09:10:15 -04:00
Warner Losh
ea5b2d6242 MIMIMAL: add uart
While uart could be detected completely through plug and play means, add
it here for two reasons. First, we don't do that from the loader, so
it's not available as a console. Second, even if we did do it from the
loader, there's a limitation in the system today that console drivers
must be compiled into the kernel because the console is selected before
external modules are linked into the kernel. Adding it only increases
the kernel size by ~14k as well.

Sponsored by:		Netflix
Idea liked by:		des, rpokala, brooks, jhb
2022-07-01 11:24:51 -06:00
Mihai Burcea
9aa02d5120 vmm: Fix snapshots for AMD CPUs
This patch fixes the AMD implementation for snapshotting.  It removes
unnecessary vmcb fields that should not be saved and duplicates.

Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D33431
2022-06-30 16:13:55 -07:00
Vitaliy Gusev
5afcca138f vmm: Cherry pick illumos commit '13361 bhyve should mask RDT cpuid info'
Summary:
    commit  1a5f1879be09d3de900b2510692dd12003784d84
    Author: Patrick Mooney <pmooney@pfmooney.com>
    Date:   2020-12-16T20:02:23.000Z

        13361 bhyve should mask RDT cpuid info
        Reviewed by: Andy Fiddaman <andy@omnios.org>
        Reviewed by: Toomas Soome <tsoome@me.com>
        Approved by: Robert Mustacchi <rm@fingolfin.org>

    https://github.com/illumos/illumos-gate/commit/1a5f1879be09d3de900b2510692dd12003784d8

----

We saw similar warning of GP (on Intel Xeon CPU E5-2630 v4 and VM with Ubuntu 20.04 5.4.0-113-generic)  until this commit is applied:

```
[    1.658880] kernel: unchecked MSR access error: WRMSR to 0xc8f (tried to write 0x0000000000000000) at rIP: 0xffffffffacc735b4 (native_write_msr+0x4/0x30)
[    1.662734] kernel: Call Trace:
[    1.663885] kernel:  ? clear_closid_rmid.isra.0+0x36/0x40
[    1.665501] kernel:  resctrl_online_cpu+0xdc/0x3f0
[    1.666952] kernel:  ? __switch_to_asm+0x40/0x70
[    1.668358] kernel:  ? __switch_to+0x7f/0x480
[    1.669693] kernel:  ? cat_wrmsr+0x70/0x70
[    1.670970] kernel:  cpuhp_invoke_callback+0x9b/0x580
[    1.672541] kernel:  ? __schedule+0x2eb/0x740
[    1.673893] kernel:  cpuhp_thread_fun+0xb8/0x120
[    1.675304] kernel:  smpboot_thread_fn+0xd0/0x170
[    1.676685] kernel:  kthread+0x104/0x140
[    1.677948] kernel:  ? sort_range+0x30/0x30
[    1.679299] kernel:  ? kthread_park+0x90/0x90
[    1.680570] kernel:  ret_from_fork+0x35/0x40
[    1.682000] kernel: *** VALIDATE rdt ***
[    1.683454] kernel: resctrl: L3 monitoring detected
```

Reviewed by:	markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35442
2022-06-30 10:27:27 -07:00
Dmitry Chagin
050f5a8405 amd64: Reload CPU ext features after resume or cr4 changes
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D35555
MFC after:		2 weeks
2022-06-29 10:34:43 +03:00
Roger Pau Monné
881c145431 elfnote: place note in a PT_NOTE program header
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.

Output from readelf -Wl for and amd64 kernel after the change:

Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R   0x1
      [Requesting program interpreter: /red/herring]
  LOAD           0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R   0x200000
  LOAD           0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
  LOAD           0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R   0x200000
  LOAD           0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x200000
  LOAD           0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW  0x200000
  DYNAMIC        0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x8
  GNU_RELRO      0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R   0x4

 Section to Segment mapping:
  Segment Sections...
[...]
   10     .note.gnu.build-id .note.Xen

Reported by: cperciva
Fixes: 1a9cdd373a ('xen: add PV/PVH kernel entry point')
Fixes: 93ee134a24 ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
2022-06-28 09:51:57 +02:00
Dmitry Chagin
d416ee86c7 linux(4): To reuse MD linux.h hide kernel dependencies unde _KERNEL constraint
MFC after:		2 weeks
2022-06-22 14:28:24 +03:00
Alexander Motin
6210ac95a1 amd64: Stop using REP MOVSB for backward memmove()s.
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.

This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.

Reviewed by:	mjg
MFC after:	2 weeks
2022-06-16 13:46:34 -04:00
Mark Johnston
756bc3adc5 kasan: Create a shadow for the bootstack prior to hammer_time()
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map.  In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().

This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory.  In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader.  But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.

It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time().  This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.

Reported by:	mhorne
Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35449
2022-06-15 11:39:10 -04:00
Mark Johnston
c6d092b510 pmap: Keep PTI page table pages busy
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde().  Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd11.

Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35466
2022-06-15 11:38:04 -04:00
Brooks Davis
5ea3094e6a amd64: -m32 support for machine/md_var.h
Install the i386 md_var.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a mostly kernel-only header required by procstat's ZFS support.
It is pulled in by the i386 machine/counter.h.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
8dc3fdfe69 amd64: -m32 support for machine/counter.h
Install the i386 counter.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header required by procstat's ZFS support.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
9f7588dd93 amd64: -m32 support for machine/pcpu_aux.h
Install the i386 pcpu_aux.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header that is required by procstat's ZFS support.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
f6fada5eed amd64: -m32 support for machine/pcpu.h
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header and should not be required, but
procstat's zfs support includes this with _KERNEL defined.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
a29263b656 amd64: -m32 support for machine/sb_buf.h
The contents of the amd64 version are kernel-only and incompatible with
other headers when compiled for i386 userspace with _KERNEL defined.
Just ifdef the whole file out in that case rather than giving this file
the full x86 treatment since it's not needed for current use cases
(procstat zfs support).

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
a69511db18 amd64: -m32 support for machine/vmparam.h
Install the i386 vmparam.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
b3120c0aeb amd64: -m32 support for machine/proc.h
Install the i386 proc.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
3cd1b382c6 amd64: -m32 support for machine/pmap.h
Install the i386 pmap.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
c2c8157ebe amd64: -m32 support for machine/segments.h
Install the i386 segments.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
08f16287a5 amd64: -m32 support for machine/atomic.h
Install the i386 atomic.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
92a98611ca amd64: -m32 support for machine/asm(macros).h
Install the i386 versions under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
68049f6da8 amd64: -m32 support for machine/profile.h
Install the i386 profile.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
cca1927261 amd64: -m32 support for machine/cpufunc.h
Install the i386 cpufunc.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:38 +01:00
Brooks Davis
24983043bf x86: cleanup in machine/cpufunc.h
Reduce diffs between amd64 and i386 and improve whitespace

Reviewed by:	jhb, imp
2022-06-13 18:35:38 +01:00
Hans Petter Selasky
9fd0d9b16e ktls: Remove the KERN_TLS option from the i386 and amd64 LINT-NOIP kernel configurations.
Kernel TLS depends on INET or INET6 being enabled.

Reported by:	bz@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-11 21:31:28 +02:00
Vitaliy Gusev
e7d34aeda4 vmm: move bumping VMEXIT_USERSPACE stat to the right place
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.

Also svm_run's implementation even wrongly misses that stat.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35350
2022-06-09 08:57:25 -04:00
Gordon Bergling
faff37be46 fpu: Fix a typo in a source code comment
- s/choise/choice/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 13:15:53 +02:00
Dmitry Chagin
4a6c2d075d linux(4): Properly restore the thread signal mask after signal delivery on i386
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.

Pointy hat to:		dchagin
MFC after:		2 weeks
2022-05-30 20:03:49 +03:00
Dmitry Chagin
109fd18ad9 linux(4): Properly build argument list for the signal handler
Provide arguments 2 and 3 if signal handler installed with SA_SIGINFO.

MFC after:		2 weeks
2022-05-30 19:53:12 +03:00
Dmitry Chagin
c30a767c6f linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.

MFC after:		2 weeks
2022-05-30 19:49:45 +03:00
Dmitry Chagin
2ab9b59faa linux(4): Avoid direct manipulation of td_sigmask
Use kern_sigprocmask() instead of direct manipulation of td_sigmask
to reschedule newly blocked signals.

MFC after:		2 weeks
2022-05-30 19:48:20 +03:00
Dmitry Chagin
2ca34847e7 linux(4): Reduce duplication between MD parts of the Linuxulator
Move sigprocmask actions defines under compat/linux,
they are identical across all Linux architectures.

MFC after:		2 weeks
2022-05-30 19:47:26 +03:00
Corvin Köhne
3ba952e1a2 vmm: add tunable to trap WBINVD
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.

Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.

In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.

Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.

Reviewed by:	jhb
Sponsored by:	Beckhoff Automation GmbH & Co. KG
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35253
2022-05-30 10:04:22 +02:00
Dmitry Chagin
0e26e54bdf linux(4): Handle 64-bit SO_TIMESTAMP for 32-bit binaries
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.

To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.

MFC after:		2 weeks
2022-05-28 23:45:39 +03:00
Bartosz Sobczak
cdcd52d41e
irdma: Add RDMA driver for Intel(R) Ethernet Controller E810
This is an initial commit for RDMA FreeBSD driver for Intel(R) Ethernet
Controller E810, called irdma.  Supporting both RoCEv2 and iWARP
protocols in per-PF manner, RoCEv2 being the default.

Testing has been done using krping tool, perftest, ucmatose, rping,
ud_pingpong, rc_pingpong and others.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	#manpages (pauamma_gundo.com) [documentation]
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D34690
2022-05-23 16:52:49 -07:00
Dmitry Chagin
26700ac0c4 linux(4): Deduplicate execve
As linux_execve is common across archs, except amd64 32-bit Linuxulator,
move it under compat/linux.

Noted by:		andrew@
MFC after:		2 weeks
2022-05-23 13:18:41 +03:00
Dmitry Chagin
53726a1f1e linux(4): Fix execve() on amd64/linux32 after a125ed50
MFC after:		2 weeks
2022-05-23 13:17:39 +03:00
Dmitry Chagin
9016ec056a linux(4): Deduplicate bsd_to_linux_trapcode()
As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:58 +03:00
Dmitry Chagin
2434137f69 linux(4): Deduplicate translate_traps()
As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:26 +03:00
John Baldwin
7a0c23da4e vmm: Destroy character devices synchronously.
This fixes a userland race where bhyveload or bhyve can fail to reuse
a VM name after bhyvectl --destroy has returned.

Reported by:	Michael Dexter
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D35186
2022-05-20 09:53:43 -07:00
Dmitry Chagin
eca368ecb6 Retire sv_transtrap
Call translate_traps directly from sendsig().

MFC after:		2 weeks
2022-05-20 14:54:03 +03:00
Dmitry Chagin
8f9635dc99 linux(4): Retire handmade DWARF annotations from signal trampolines
The Linux exports __kernel_sigreturn and __kernel_rt_sigreturn from the
vdso. Modern glibc's sigaction sets the sa_restorer field of sigaction
to the corresponding vdso __sigreturn, and sets the SA_RESTORER.
Our signal trampolines uses the FreeBSD-way to call a signal handler,
so does not use the sigaction's sa_restorer.

However, as glibc's runtime linker depends on the existment of the vdso
__sigreturn symbols, for all Linuxulators was added separate trampolines
named __sigcode with DWARF anotations and left separate __sigreturn
methods, which are exported.

MFC after:		2 weeks
2022-05-15 21:08:12 +03:00
Dmitry Chagin
6e826d27c3 linux(4): Better naming for ucontext field of struct rt_sigframe
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after:		2 weeks
2022-05-15 21:06:47 +03:00
Dmitry Chagin
af557e649c linux(4): Rework the definition of struct siginfo to match Linux actual one
Rework the defintion of struct siginfo so that the array padding
struct siginfo to SI_MAX_SIZE can be placed in a union along side of the
rest of the struct siginfo members.  The result is that we no longer need
the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions.

Move struct siginfo definition under /compat/linux to reduce MD part.
To avoid headers polution include linux_siginfo.h in the MD linux.h

MFC after:		2 weeks
2022-05-15 21:05:01 +03:00
Dmitry Chagin
21f2461741 linux(4): Move sigframe definitions to separate headers
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after:		2 weeks
2022-05-15 21:03:01 +03:00
Dmitry Chagin
ba279bcd6d linux(4): Cleanup signal trampolines
This is the first stage of a signal trampolines refactoring.

From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.

The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction.  Wherefore handmade DWARD annotations
was used.

While here rephrased and removed excessive comments.

MFC after:		2 weeks
2022-05-15 21:00:05 +03:00