Commit Graph

8989 Commits

Author SHA1 Message Date
Konstantin Belousov
2b4b3789f8 acpi_wakeup.c: apply the reviewer' editorial corrections to the comment text.
Fixes:	02904a06c7
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:47:19 +02:00
Konstantin Belousov
02904a06c7 amd64: properly recalculate mitigations knobs after resume
Revision r333125 AKA 986c4ca387 forced clear cpu_stdext_feature3
on suspend, since at that time microcode update was not reloaded
early on resume. Then, revision 050f5a8405 started re-reading
cpu_stdext_feature3 again. Since modern CPUs do not require mitigations
from the Skylake era, this went unnoticed for some time.

Keep zeroing cpu_stdext_feature3 on suspend, but re-read it in more
controlled way on resume after microcode is reloaded, and recalculate
active workarounds based on actual microcode capabilities.

Reported and tested by:	romain
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:40:05 +02:00
Konstantin Belousov
ff6d60946a amd64 acpi_wakeup.c: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-17 15:10:34 +02:00
Vitaliy Gusev
94a3876d7e
vmm: fix missing ipi statistic
ipi counters are missing in bhyvectl's output because vm_maxcpu is 0
when initializing them. That's because vmm_stat_register is executed
before vmm_init.

Instead of directly fixing it, there's a better solution in illumos
which is cherry picked:
65a3bc8373

It replaces the matrix statistic by two counters per vcpu. One for
counting the ipis to the vcpu and one counting the ipis received by the
vcpu. This has several advantages:

- A matrix statistic becomes huge when using many vcpus.
- A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit.
- Two counters are enough in most cases. DTrace can be used for more
  advanced debugging purposes.
- A matrix statistic wastes memory. The matrix size is determined by
  vm_maxcpu regardless of the number of vcpus assigned to the vm.

Reviewed by:		corvink, markj
Fixes:			ee98f99d7a ("vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39038
2023-03-17 13:50:08 +01:00
Dmitry Chagin
9e7f03e9c6 linux(4): Drop unncessary struct l_ifconf declaration from amd64/linux
Its needed only for amd64/linux32 Linuxulator.

Differential Revision:	https://reviews.freebsd.org/D38793
2023-03-04 12:11:38 +03:00
Dmitry Chagin
cabbfb60d0 linux(4): Reduce code duplication between MD files
Move struct ifnet definitions under compat/linux.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38791
2023-03-04 12:11:38 +03:00
John-Mark Gurney
2fee875629
abstract out the vm detection via smbios..
This makes the detection of VMs common between platforms that
have SMBios.

Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D38800
2023-03-02 16:54:21 -08:00
Vitaliy Gusev
8104fc31a2
bhyve: fix restore of kernel structs
vmx_snapshot() and svm_snapshot() do not save any data and error occurs at
resume:

Restoring kernel structs...
vm_restore_kern_struct: Kernel struct size was 0 for: vmx
Failed to restore kernel structs.

Reviewed by:		corvink, markj
Fixes:			39ec056e6d ("vmm: Rework snapshotting of CPU-specific per-vCPU data.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38476
2023-02-28 13:37:53 +01:00
Vitaliy Gusev
281b496f22
vmm: fix restore of TSC offset
After suspend/resume Ubuntu 20.04 and 22.04 installer can hang if
tsc-early clocksource has a big skew.

Reviewed by:		corvink, jhb
Fixes:			a7db532e3a ("vmm: Simplify saving of absolute TSC values in snapshots.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38474
2023-02-28 13:37:44 +01:00
Mike Karels
dd6f6030cc amd64 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces.  Make them consistent.
2023-02-24 08:36:28 -06:00
Mateusz Guzik
6b9acd1bfb Exclude MMCCAM kernels from make universe
They don't provide any value and are quite arbitrary.

Note arm64 GENERIC-MMCCAM was already excluded, just not the NODEBUG
variant.

The option is already build-tested with arm64 LINT kernel.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D38458
2023-02-16 07:29:53 +00:00
Dmitry Chagin
c8a79231a5 linux(4): Rename linux_timer.h to linux_time.h
To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.

MFC after:		2 weeks
2023-02-14 17:46:33 +03:00
Dmitry Chagin
2456a45929 linux(4): Cleanup includes under amd64/linux
Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after:		2 weeks
2023-02-14 17:46:32 +03:00
Dmitry Chagin
31e938c531 linux(4): Cleanup vm includes from linux_util.h
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.

MFC after:		2 weeks
2023-02-14 17:46:30 +03:00
Dmitry Chagin
06c07e1203 Complete removal of opt_compat.h
Since Linux emulation layer build options was removed there is no reason
to keep opt_compat.h.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38548
MFC after:		2 weeks
2023-02-13 19:07:38 +03:00
Dmitry Chagin
10d16789a3 linux(4): Get rid of the opt_compat.h include.
Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after:		2 weeks
2023-02-12 20:24:32 +03:00
Mark Johnston
b265a2e0d7 vmm: Fix AP startup compatibility for old bhyve executables
These changes unbreak AP startup when using a 13.1-RELEASE bhyve
executable with a newer kernel:
- Correct the destination mask for the VM_EXITCODE_IPI message generated
  by an INIT or STARTUP IPI in vlapic_icrlo_write_handler().
- Only initialize vlapics on active vCPUs.  13.1-RELEASE bhyve activates
  AP vCPUs only after the BSP starts them with an IPI, and vmm now
  allocates vcpu structures lazily, so the STARTUP handling in
  vm_handle_ipi() could trigger a page fault.
- Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP
  message.

Fixes:	7c326ab5bb ("vmm: don't lock a mtx in the icr_low write handler")
Reviewed by:	jhb, corvink
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Mark Johnston
ba34de1b3b vmm: Remove an unneeded initialization of "retu"
vm_handle_ipi() unconditionally initializes "retu".  No functional
change intended.

Reviewed by:	jhb, corvink
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Mark Johnston
f3bbd0e818 vmm: Collapse identical case statements in vlapic_icrlo_write_handler()
No functional change intended.

Reviewed by:	jhb, corvink
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Dag-Erling Smørgrav
43d4680b39 MINIMAL: Update and clean up.
* Add GEOM_LABEL, required to boot a default UEFI install.

* Add enough of virtio to boot in bhyve.

* Reduce diff between amd64 and i386.

* Reduce diff to GENERIC.

MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D38468
2023-02-09 18:24:45 +01:00
Konstantin Belousov
ee84487120 amd64 ia32 vdso: always define some __vdso_ symbols
... regardless of the kernel config options.
It is reported that llvm16 ld.lld warns about undefined symbols
referenced by the VERSION script.

Reviewed by:	emaste, val_packett.cool
Discussed with:	jrtc27
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38392
2023-02-09 04:36:40 +02:00
Mateusz Guzik
b2c68dc6d9 amd64: ansify
Reported by:    clang 15
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-02-07 22:52:06 +00:00
Mateusz Guzik
819ed47204 amd64 pmap: patch up a comment in pmap_init_pv_table
Requested by:	jhb
2023-02-06 22:33:28 +00:00
Yuri
e4d3f1e40a hv_hid: Hyper-V HID driver
Hyper-V HID driver using hidbus/hms.

Reviewed by:	wulf
MFC after:	1 week
PR:		221074
Differential revision:	https://reviews.freebsd.org/D38140
2023-02-05 18:32:08 +03:00
Elliott Mitchell
d27d543c78 vmm: purge EOL release compatibility
Remove FreeBSD 11 support

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/603
Differential Revision: https://reviews.freebsd.org/D35560
2023-02-04 09:13:10 -07:00
Dmitry Chagin
ce20c00e85 linux(4): Remove stale comment that no longer applies.
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
6ad07a4b2b linux(4): Microoptimize rt_sendsig() on amd64.
Drop proc lock earlier, before copying user stuff.

Pointed out by:		kib
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38326
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
a95cb95e12 linux(4): Preserve fpu fxsave state across signal delivery on amd64.
PR:			240768
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38302
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
95b8603427 linux(4): Deduplicate linux_trans_osrel().
MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
6039e966ff linux(4): Deduplicate linux_copyout_strings().
It is still present in the 32-bit Linuxulator on amd64.

MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
9e550625f8 linux(4): Deduplicate linux_fixup_elf().
Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().

MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
7446514533 linux(4): Microoptimize linux_elf.h for future use.
In order to reduce code duplication move coredump support definitions
into the appropriate header and hide private definitions.

MFC after:		1 week
2023-02-02 17:58:06 +03:00
Konstantin Belousov
2555f175b3 Move kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38320
2023-02-02 00:59:26 +02:00
Dmitry Chagin
575e48f1c4 linux(4): Deduplicate MI futex structures.
MFC after:	1 week
2023-02-01 21:57:04 +03:00
Dmitry Chagin
5c32146723 amd64: Eliminate write only cpu_fxsr.
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38289
MFC after:		1 week
2023-02-01 18:17:06 +03:00
Corvin Köhne
892feec221
vmm: avoid spurious rendezvous
A vcpu only checks if a rendezvous is in progress or not to decide if it
should handle a rendezvous. This could lead to spurios rendezvous where
a vcpu tries a handle a rendezvous it isn't part of. This situation is
properly handled by vm_handle_rendezvous but it could potentially
degrade the performance. Avoid that by an early check if the vcpu is
part of the rendezvous or not.

At the moment, rendezvous are only used to spin up application
processors and to send ioapic interrupts. Spinning up application
processors is done in the guest boot phase by sending INIT SIPI
sequences to single vcpus. This is known to cause spurious rendezvous
and only occurs in the boot phase. Sending ioapic interrupts is rare
because modern guest will use msi and the rendezvous is always send to
all vcpus.

Reviewed by:		jhb
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37390
2023-02-01 12:36:36 +01:00
Eric Joyner
5354596764
vtd: Increase DRHD_MAX_UNITS
Observed on a couple Ice Lake-SP platforms (Intel Coyote Pass, Dell
R750), there are more than 8 DRHD sections enumerated in the DMAR ACPI
section.  Since the previous limit was 8, this resulted in some of these
not being parsed by vtd when the iommu is initialized; in this case when
PCI devices are being passthru'd to a bhyve VM.

This omission later causes a kernel panic later in initialization when
devices could not be found in a valid DRHD scope because the DHRD
containing the device's scope was not added to vtd.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

PR:		268486
Sponsored by:	Intel Corporation
Reviewed by:	rew@, corvink@
MFC after:	1 day
Differential Revision:	https://reviews.freebsd.org/D38285
2023-01-31 13:57:42 -08:00
Konstantin Belousov
153643a5bc amd64: do not enable PKRU if user disabled saving PKRU register in xsave mask
This is done by reverting CR4_PKE bit, because we perform %CR4
initialization in initializecpu(), and the function is called before
xsave_mask is read.  To not redo the whole early initialization
sequence for the corner case, this should be good enough.

Reported by:	jhb
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38219
2023-01-27 19:44:49 +02:00
Andrew Gallatin
9cb6ba29cb vm: centralize VM_BATCHQUEUE_SIZE definition
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically.  This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.

Reviewed by: jhb, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37707
2023-01-21 14:30:00 -05:00
Robert Wing
27029bc08f vmm: fix use after free in ppt_detach()
The vmm module destroys the host_domain before unloading the ppt module
causing a use after free. This can happen when kldunload'ing vmm.

Reviewed by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D38072
2023-01-20 11:25:27 +00:00
Robert Wing
c668e8173a vmm: take exclusive mem_segs_lock in vm_cleanup()
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().

The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
    vmmdev_ioctl()
    vm_reinit()
    vm_cleanup(destroy=false)

The call path for vm_destroy() is (mem_segs_lock not taken):
    sysctl_vmm_destroy()
    vmmdev_destroy()
    vm_destroy()
    vm_cleanup(destroy=true)

Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.

Reviewed by:	corvink, markj, jhb
Fixes:  67b69e76e8 ("vmm: Use an sx lock to protect the memory map.")
Differential Revision:	https://reviews.freebsd.org/D38071
2023-01-20 11:10:53 +00:00
Robert Wing
ccf32a68f8 vmm: take exclusive mem_segs_lock when (un)assigning ppt dev
PR:             268744
Reported by:    mmatalka@gmail.com
Reviewed by:	corvink, markj, jhb
Fixes:  67b69e76e8 ("vmm: Use an sx lock to protect the memory map.")
Differential Revision:	https://reviews.freebsd.org/D37962
2023-01-20 10:03:59 +00:00
Gordon Bergling
05187f2ffc amd64: Fix a common typo in source code comments
- s/comparision/comparison/

MFC after:	3 days
2023-01-19 14:27:18 +01:00
Alexander V. Chernikov
692e19cf51 netlink: add netlink to GENERIC@amd64
Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.

Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink

Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.

The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.

This change targets amd64, the other architectures will follow soon.

Differential Revision: https://reviews.freebsd.org/D37783
2023-01-13 10:22:40 +00:00
Konstantin Belousov
ad97b9bbfc amd64 pmap.h: make it easier to use the header for other consumers
Guard pmap_invlpg() definition with checks that only provide it when
both sys/pcpu.h and machine/cpufunc.h were already included.

Requested by:	Elliott Mitchell
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-01-06 01:30:29 +02:00
Konstantin Belousov
a2c08eba43 amd64: be more precise when enabling the AlderLake small core PCID workaround
In particular, do not enable the workaround if INVPCID is not supported
by the core.

Reported by:	"Chen, Alvin W" <Weike.Chen@Dell.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37940
2023-01-06 01:30:29 +02:00
Konstantin Belousov
231d75568f Move INVLPG to pmap_quick_enter_page() from pmap_quick_remove_page().
If processor prefetches neighboring TLB entries to the one being accessed
(as some have been reported to do), then the spin lock does not prevent
the situation described in the "AMD64 Architecture Programmer's Manual
Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency
Considerations".

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:46 +02:00
Konstantin Belousov
cde70e312c amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems.  The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.

Reviewed by:	alc (previous version)
Discussed with:	emaste, markj
Tested by:	karels
PR:	261169, 266145
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Konstantin Belousov
45ac7755a7 amd64: identify small cores
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Andrew Gallatin
1cac76c93f vm: reduce lock contention when processing vm batchqueues
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.

So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.

This has been run for several months on a busy Netflix server, as well
as on my personal desktop.

Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
2022-12-14 14:34:07 -05:00