freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	246c398145	bhyve: Do not remove guest physical addresses from IOMMU host domain This permits I/O devices on the host to directly access wired memory dedicated to guests using passthru devices. Note that wired memory belonging to guests that do not use passthru devices has always been accessible by I/O devices on the host. bhyve maps guest physical addresses into the user address space of the bhyve process by mmap'ing /dev/vmm/<vmname>. Device models pass pointers derived from this mapping directly to system calls such as preadv() to minimize copies when emulating DMA. If the backing store for a device model is a raw host device (e.g. when exporting a raw disk device such as /dev/ada<n> as a drive in the guest), the host device driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host directly to the guest's memory. However, if the guest's memory is not present in the host IOMMU domain, these DMA requests by the host device will fail without raising an error visible to the host device driver or to the guest resulting in non-working I/O in the guest. It is unclear why guest addresses were removed from the IOMMU host domain initially, especially only for VM's with a passthru device as the host IOMMU domain does not affect the permissions of passthru devices, only devices on the host. A considered alternative was using bounce buffers instead (D34535 is a proof of concept), but that adds additional overhead for unclear benefit. This solves a long-standing problem when using passthru devices and physical disks in the same VM. Thanks to: grehan (patience and help) Thanks to: jhb (for improving the commit message) PR: 260178 Reviewed by: grehan, jhb MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34607	2022-03-24 15:21:24 +00:00
Corvin Köhne	e47fe3183e	bhyve: add ROM emulation Some PCI devices especially GPUs require a ROM to work properly. The ROM is executed by boot firmware to initialize the device. To add a ROM to a device use the new ROM option for passthru device (e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>). It's necessary that the ROM is executed by the boot firmware. It won't be executed by any OS. Additionally, the boot firmware should be configured to execute the ROM file. For that reason, it's only possible to use a ROM when using OVMF with enabled bus enumeration. Differential Revision: https://reviews.freebsd.org/D33129 Sponsored by: Beckhoff Automation GmbH & Co. KG MFC after: 1 month	2022-03-10 12:30:37 +01:00
Robert Wing	2062ce996d	vmm: fix "set but not used" warnings	2022-02-28 15:09:32 -09:00
Robert Wing	39d87a0235	vmm: fix "set but not used" warnings	2022-02-28 14:55:37 -09:00
Robert Wing	73505a1076	vmm: fix "set but not used" warnings	2022-02-28 14:46:08 -09:00
John Baldwin	6426978617	Extend the VMM stats interface to support a dynamic count of statistics. - Add a starting index to 'struct vmstats' and change the VM_STATS ioctl to fetch the 64 stats starting at that index. A compat shim for <= 13 continues to fetch only the first 64 stats. - Extend vm_get_stats() in libvmmapi to use a loop and a static thread local buffer which grows to hold the stats needed. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27463	2022-02-07 14:11:10 -08:00
Corvin Köhne	6171e026be	bhyve: add support for MTRR Some guests or driver might depend on MTRR to work properly. E.g. the nvidia gpu driver won't work without MTRR. Reviewed by: markj MFC after: 2 weeks Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33333	2022-01-14 12:41:44 +01:00
Vitaliy Gusev	c72e914cf1	vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts. Win10 does not use LAPIC timer to often and in most cases, and I see it is disabled by writing 0 to Initial Count Register (for Timer). During resume, restart timer only for enabled LAPIC and enabled timer for that LAPIC. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33448	2022-01-11 09:27:45 -05:00
Stefan Eßer	e2650af157	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451	2021-12-30 12:20:32 +01:00
Mark Johnston	4c812fe61b	vlapic: Schedule callouts on the local CPU The virtual LAPIC driver uses callouts to implement the LAPIC timer. Callouts are armed using callout_reset_sbt(), which currently puts everything on CPU 0. On systems running many bhyve VMs this results in a large amount of contention for CPU 0's callout lock. Modify vlapic to schedule callouts on the local CPU instead. This allows timer interrupts to be scheduled more evenly among CPUs where bhyve is running. Reviewed by: grehan, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32559	2021-10-19 21:22:57 -04:00
Mark Johnston	de8554295b	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. This is a re-application of commit `9068f6ea69`. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2021-10-18 09:56:58 -04:00
Mark Johnston	bcdc599dc2	Revert "cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it" This reverts commit `9068f6ea69`. The underlying macro needs to be reworked to avoid problems with control flow statements. Reported by: rlibby	2021-09-21 13:51:42 -04:00
Mark Johnston	9068f6ea69	cpuset(9): Add CPU_FOREACH_IS(SET\|CLR) and modify consumers to use it This implementation is faster and doesn't modify the cpuset, so it lets us avoid some unnecessary copying as well. No functional change intended. Reviewed by: cem, kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32029	2021-09-21 12:07:47 -04:00
Andrew Turner	b792434150	Create sys/reg.h for the common code previously in machine/reg.h Move the common kernel function signatures from machine/reg.h to a new sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2). Reviewed by: imp, markj Sponsored by: DARPA, AFRL (original work) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19830	2021-08-30 12:50:53 +01:00
Cyril Zhang	a85404906b	vmm: Add credential to cdev object Add a credential to the cdev object in sysctl_vmm_create(), then check that we have the correct credentials in sysctl_vmm_destroy(). This prevents a process in one jail from opening or destroying the /dev/vmm file corresponding to a VM in a sibling jail. Add regression tests. Reviewed by: jhb, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31156	2021-08-18 13:41:33 -04:00
Ka Ho Ng	179bc5729d	vmm: Fix wrong assert in ivhd_dev_add_entry The correct condition is to check the number of ivhd entries fit into the array. Reported by: bz Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31514	2021-08-12 15:56:16 +08:00
Mark Johnston	41335c6b7f	vmm: Make iommu ops tables const While here, use designated initializers and rename some AMD iommu method implementations to match the corresponding op names. No functional change intended. Reviewed by: grehan MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31462	2021-08-09 13:28:27 -04:00
Mark Johnston	e54ae8258d	amd64: Fix output operand specs for the stmxcsr and vmread intrinsics This does not appear to affect code generation, at least with the default toolchain. Noticed because incorrect output specifications lead to false positives from KMSAN, as the instrumentation uses them to update shadow state for output operands. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31466	2021-08-09 13:28:08 -04:00
Ka Ho Ng	df95cc76af	vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1 In hw.vmm.create sysctl handler the maximum length of vm name is VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to allow the length of VM_MAX_NAMELEN for vm name. MFC after: 3 days Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31372	2021-08-02 17:55:08 +08:00
Ka Ho Ng	b5c74dfd64	vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827	2021-07-14 01:53:10 +08:00
Ka Ho Ng	210e6aec4f	vmm: Fix ivrs_drv device_printf usage The original %b description string is wrong. Sponsored by: The FreeBSD Foundation Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D30805	2021-06-19 14:07:26 +08:00
Mark Johnston	4c599db71a	vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462	2021-05-26 09:34:52 -04:00
Ka Ho Ng	6fe60f1d5c	AMD-vi: Fortify IVHD device_identify process - Use malloc(9) to allocate ivhd_hdrs list. The previous assumption that there are at most 10 IVHDs in a system is not true. A counter example would be a system with 4 IOMMUs, and each IOMMU is related to IVHDs type 10h, 11h and 40h in the ACPI IVRS table. - Always scan through the whole ivhd_hdrs list to find IVHDs that has the same DeviceId but less prioritized IVHD type. Sponsored by: The FreeBSD Foundation MFC with: `74ada297e8` Reviewed by: grehan Approved by: lwhsu (mentor) Differential Revision: https://reviews.freebsd.org/D29525	2021-04-19 16:08:13 +08:00
Ka Ho Ng	03efa462b2	AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: `74ada297e8` Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521	2021-04-01 15:31:24 +08:00
Ka Ho Ng	cf76495e0a	AMD-vi: Fix mismatched NULL checking in amdiommu teardown path Sponsored by: The FreeBSD Foundation Approved by: lwhsu (mentor) MFC with: `74ada297e8`	2021-04-01 03:34:34 +08:00
Ka Ho Ng	be97fc8dce	bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862	2021-03-23 16:12:41 +08:00
Ed Maste	54399caa2f	Correct "Fondation" typo (missing "u")	2021-03-22 13:06:31 -04:00
Ka Ho Ng	74ada297e8	AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984	2021-03-22 17:33:43 +08:00
Ka Ho Ng	ede14736fd	ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945	2021-03-22 17:33:43 +08:00
D Scott Phillips	f8a6ec2d57	bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066	2021-03-19 11:04:36 +08:00
Roger Pau Monné	5ea878684f	bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238	2021-02-02 09:47:00 +01:00
Roger Pau Monné	d7d067698a	bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237	2021-02-02 09:45:45 +01:00
Roger Pau Monné	49429cf9be	bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236	2021-02-02 09:44:20 +01:00
Konstantin Belousov	3c48106aaa	bhyve: limit max GPA to VM_MAXUSER_ADDRESS_LA48. We use 4-level EPT pages, correct the upper bound. Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27402	2020-11-29 10:32:38 +00:00
Peter Grehan	15add60d37	Convert vmm_ops calls to IFUNC There is no need for these to be function pointers since they are never modified post-module load. Rename AMD/Intel ops to be more consistent. Submitted by: adam_fenn.io Reviewed by: markj, grehan Approved by: grehan (bhyve) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27375	2020-11-28 01:16:59 +00:00
Peter Grehan	ddfc488c36	Remove manual instruction encodings for VMLOAD, VMRUN, and VMSAVE. This is a relic from when these instructions weren't supported by the toolchain. No functional change. Submitted by: adam_fenn.io Reviewed by: grehan Approved by: grehan (bhyve) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27130	2020-11-26 05:58:55 +00:00
John Baldwin	908dca3ef4	Pull the check for VM ownership into ppt_find(). This reduces some code duplication. One behavior change is that ppt_assign_device() will now only succeed if the device is unowned. Previously, a device could be assigned to the same VM multiple times, but each time it was assigned, the device's state was reset. Reviewed by: markj, grehan MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27301	2020-11-24 23:56:33 +00:00
John Baldwin	1925586e03	Honor the disabled setting for MSI-X interrupts for passthrough devices. Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough device and invoke it if a write to the MSI-X capability registers disables MSI-X. This avoids leaving MSI-X interrupts enabled on the host if a guest device driver has disabled them (e.g. as part of detaching a guest device driver). This was found by Chelsio QA when testing that a Linux guest could switch from MSI-X to MSI interrupts when using the cxgb4vf driver. While here, explicitly fail requests to enable MSI on a passthrough device if MSI-X is enabled and vice versa. Reported by: Sony Arpita Das @ Chelsio Reviewed by: grehan, markj MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27212	2020-11-24 23:18:52 +00:00
Mark Johnston	6f5a960678	vmm: Make pmap_invalidate_ept() wait synchronously for guest exits Currently EPT TLB invalidation is done by incrementing a generation counter and issuing an IPI to all CPUs currently running vCPU threads. The VMM inner loop caches the most recently observed generation on each host CPU and invalidates TLB entries before executing the VM if the cached generation number is not the most recent value. pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing guest instructions and reload the generation number. However, it does not actually wait for vCPUs to exit, potentially creating a window where guests may continue to reference stale TLB entries. Fix the problem by bracketing guest execution with an SMR read section which is entered before loading the invalidation generation. Then, pmap_invalidate_ept() increments the current write sequence before loading pm_active and sending IPIs, and polls readers to ensure that all vCPUs potentially operating with stale TLB entries have exited before pmap_invalidate_ept() returns. Also ensure that unsynchronized loads of the generation counter are wrapped with atomic(9), and stop (inconsistently) updating the invalidation counter and pm_active bitmask with acquire semantics. Reviewed by: grehan, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26910	2020-11-11 15:01:17 +00:00
Mark Johnston	8e2cbc5660	vmx: Implement pmap (de)activation in C Rewrite the code that maintains pm_active and invalidates EPTP-tagged TLB entries in C. Previously this work was done in vmx_enter_guest(), in assembly, but there is no good reason for that and it makes the TLB invalidation algorithm for nested page tables harder to review. No functional change intended. Now, an error from the invept instruction results in a kernel panic rather than a vmexit. Such errors should occur only as a result of VMM bugs. Reviewed by: grehan, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26830	2020-10-19 15:24:35 +00:00
Mark Johnston	494955366a	Remove svn:executable from a couple of vmm(4) source files. MFC after: 3 days	2020-10-01 22:20:29 +00:00
John Baldwin	a3f2a9c57e	Clear the upper 32-bits of registers in x86_emulate_cpuid(). Per the Intel manuals, CPUID is supposed to unconditionally zero the upper 32 bits of the involved (rax/rbx/rcx/rdx) registers. Previously, the emulation would cast pointers to the 64-bit register values down to `uint32_t`, which while properly manipulating the lower bits, would leave any garbage in the upper bits uncleared. While no existing guest OSes seem to stumble over this in practice, the bhyve emulation should match x86 expectations. This was discovered through alignment warnings emitted by gcc9, while testing it against SmartOS/bhyve. SmartOS bug: https://smartos.org/bugview/OS-8168 Submitted by: Patrick Mooney Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D24727	2020-10-01 16:45:11 +00:00
Ed Maste	09860d44e4	bhyve: do not permit write access to VMCB / VMCS Reported by: Patrick Mooney Submitted by: jhb Security: CVE-2020-24718	2020-09-15 21:04:27 +00:00
Konstantin Belousov	101d5b527a	bhyve: intercept AMD SVM instructions. Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an SVM instruction. Otherwise, SVM allows execution of them, and instructions operate on host physical addresses despite being executed in guest mode. Reported by: Maxime Villard <max@m00nbsd.net> admbug: 972 CVE: CVE-2020-7467 Reviewed by: grehan, markj Differential revision: https://reviews.freebsd.org/D26313	2020-09-15 20:22:50 +00:00
John Baldwin	385f4a5ac8	Use vmcb_read/write for the vmcb snapshot functions. This avoids some unnecessary layers of indirection.	2020-09-10 22:22:23 +00:00
Mateusz Guzik	543769bf83	amd64: clean up empty lines in .c and .h files	2020-09-01 21:16:54 +00:00
Konstantin Belousov	f3eb12e4a6	Add bhyve support for LA57 guest mode. Noted and reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:37:21 +00:00
Konstantin Belousov	9ce875d9b5	amd64 pmap: LA57 AKA 5-level paging Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:19:04 +00:00
Peter Grehan	3a3f1e9dfa	Export a routine to provide the TSC_AUX MSR value and use this in vmm. Also, drop an unnecessary set of braces. Requested by: kib Reviewed by: kib MFC after: 3 weeks	2020-08-18 11:36:38 +00:00
Peter Grehan	f5f5f1e7d6	Support guest rdtscp and rdpid instructions on Intel VT-x Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts that support the "enable RDTSCP" VM-execution control. Submitted by: adam_fenn.io Reported by: chuck Reviewed by: chuck, grehan, jhb Approved by: jhb (bhyve), grehan MFC after: 3 weeks Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D26003	2020-08-18 07:23:47 +00:00

1 2 3 4 5 ...

559 Commits