Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
MFC after: 2 weeks
Since Linux emulation layer build options was removed there is no reason
to keep opt_compat.h.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D38548
MFC after: 2 weeks
These changes unbreak AP startup when using a 13.1-RELEASE bhyve
executable with a newer kernel:
- Correct the destination mask for the VM_EXITCODE_IPI message generated
by an INIT or STARTUP IPI in vlapic_icrlo_write_handler().
- Only initialize vlapics on active vCPUs. 13.1-RELEASE bhyve activates
AP vCPUs only after the BSP starts them with an IPI, and vmm now
allocates vcpu structures lazily, so the STARTUP handling in
vm_handle_ipi() could trigger a page fault.
- Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP
message.
Fixes: 7c326ab5bb ("vmm: don't lock a mtx in the icr_low write handler")
Reviewed by: jhb, corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38446
... regardless of the kernel config options.
It is reported that llvm16 ld.lld warns about undefined symbols
referenced by the VERSION script.
Reviewed by: emaste, val_packett.cool
Discussed with: jrtc27
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38392
Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().
MFC after: 1 week
A vcpu only checks if a rendezvous is in progress or not to decide if it
should handle a rendezvous. This could lead to spurios rendezvous where
a vcpu tries a handle a rendezvous it isn't part of. This situation is
properly handled by vm_handle_rendezvous but it could potentially
degrade the performance. Avoid that by an early check if the vcpu is
part of the rendezvous or not.
At the moment, rendezvous are only used to spin up application
processors and to send ioapic interrupts. Spinning up application
processors is done in the guest boot phase by sending INIT SIPI
sequences to single vcpus. This is known to cause spurious rendezvous
and only occurs in the boot phase. Sending ioapic interrupts is rare
because modern guest will use msi and the rendezvous is always send to
all vcpus.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37390
Observed on a couple Ice Lake-SP platforms (Intel Coyote Pass, Dell
R750), there are more than 8 DRHD sections enumerated in the DMAR ACPI
section. Since the previous limit was 8, this resulted in some of these
not being parsed by vtd when the iommu is initialized; in this case when
PCI devices are being passthru'd to a bhyve VM.
This omission later causes a kernel panic later in initialization when
devices could not be found in a valid DRHD scope because the DHRD
containing the device's scope was not added to vtd.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
PR: 268486
Sponsored by: Intel Corporation
Reviewed by: rew@, corvink@
MFC after: 1 day
Differential Revision: https://reviews.freebsd.org/D38285
This is done by reverting CR4_PKE bit, because we perform %CR4
initialization in initializecpu(), and the function is called before
xsave_mask is read. To not redo the whole early initialization
sequence for the corner case, this should be good enough.
Reported by: jhb
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38219
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically. This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.
Reviewed by: jhb, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37707
The vmm module destroys the host_domain before unloading the ppt module
causing a use after free. This can happen when kldunload'ing vmm.
Reviewed by: markj, jhb
Differential Revision: https://reviews.freebsd.org/D38072
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().
The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
vmmdev_ioctl()
vm_reinit()
vm_cleanup(destroy=false)
The call path for vm_destroy() is (mem_segs_lock not taken):
sysctl_vmm_destroy()
vmmdev_destroy()
vm_destroy()
vm_cleanup(destroy=true)
Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.
Reviewed by: corvink, markj, jhb
Fixes: 67b69e76e8 ("vmm: Use an sx lock to protect the memory map.")
Differential Revision: https://reviews.freebsd.org/D38071
Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.
Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink
Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.
The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.
This change targets amd64, the other architectures will follow soon.
Differential Revision: https://reviews.freebsd.org/D37783
Guard pmap_invlpg() definition with checks that only provide it when
both sys/pcpu.h and machine/cpufunc.h were already included.
Requested by: Elliott Mitchell
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In particular, do not enable the workaround if INVPCID is not supported
by the core.
Reported by: "Chen, Alvin W" <Weike.Chen@Dell.com>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37940
If processor prefetches neighboring TLB entries to the one being accessed
(as some have been reported to do), then the spin lock does not prevent
the situation described in the "AMD64 Architecture Programmer's Manual
Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency
Considerations".
Reported and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems. The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.
Reviewed by: alc (previous version)
Discussed with: emaste, markj
Tested by: karels
PR: 261169, 266145
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.
So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.
This has been run for several months on a busy Netflix server, as well
as on my personal desktop.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
On amd64, don't abort promotion due to a missing accessed bit in a
mapping before possibly write protecting that mapping. Previously,
in some cases, we might not repromote after madvise(MADV_FREE) because
there was no write fault to trigger the repromotion. Conversely, on
arm64, don't pointlessly, yet harmlessly, write protect physical pages
that aren't part of the physical superpage.
Don't count aborted promotions due to explicit promotion prohibition
(arm64) or hardware errata (amd64) as ordinary promotion failures.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36916
These are manipulating state in a ppt(4) device none of which is
vCPU-specific. Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37639
Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523d.
Reported by: andrew
Remove the KPI/KBI changes from ieee80211_node.h and always use the
macros to pass in __func__ and __LINE__ to the functions.
The actual implementations are prefixed by "_" rather than suffixed
by "_debug" as they no longer are "debug"-specific.
Some of the select functions were not actually using the passed in
func, line options; however they are calling other functions which
use them. Directly call the internal implementation in those cases
passing the arguments on.
Use a file-local __debrefcnt_used define to mark the arguments __unused
in cases when we compile without IEEE80211_DEBUG_REFCNT and hope the
toolchain is intelligent enough to not pass them at all in those cases.
Also _ieee80211_free_node() now has a conflict so make the previous
_ieee80211_free_node() the new __ieee80211_free_node().
Add IEEE80211_DEBUG_REFCNT to the NOTES file on amd64 to keep exercising
the option.
Sponsored by: The FreeBSD Foundation
X-MFC: never
Discussed on: freebsd-wireless
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D37529
x2apic accesses are handled by a wrmsr exit. This handler is called in a
critical section. So, we can't lock a mtx in the icr_low handler.
Reported by: kp, pho
Tested by: kp, pho
Approved by: manu (mentor)
Fixes: c0f35dbf19 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
MFC after: 1 week
MFC with: c0f35dbf19
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37452
When a vcpu sees that a rendezvous is in progress, it exits and tries to
handle the rendezvous. The vcpu doesn't check if it's part of the
rendezvous or not. If the vcpu isn't part of the rendezvous, the
rendezvous could be done before it reaches the assertion. This will
cause a panic.
The assertion isn't needed at all because vm_handle_rendezvous properly
handles a spurious rendezvous. So, we can just remove it.
PR: 267779
Reviewed by: jhb, markj
Tested by: bz
Approved by: manu (mentor)
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D37417