freebsd-skq

Author	SHA1	Message	Date
Mark Johnston	4c599db71a	vmm: Let guests enable SMEP/SMAP if the host supports it Reviewed by: kib, grehan, jhb Tested by: grehan (AMD) MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30462	2021-05-26 09:34:52 -04:00
Ka Ho Ng	6fe60f1d5c	AMD-vi: Fortify IVHD device_identify process - Use malloc(9) to allocate ivhd_hdrs list. The previous assumption that there are at most 10 IVHDs in a system is not true. A counter example would be a system with 4 IOMMUs, and each IOMMU is related to IVHDs type 10h, 11h and 40h in the ACPI IVRS table. - Always scan through the whole ivhd_hdrs list to find IVHDs that has the same DeviceId but less prioritized IVHD type. Sponsored by: The FreeBSD Foundation MFC with: `74ada297e8` Reviewed by: grehan Approved by: lwhsu (mentor) Differential Revision: https://reviews.freebsd.org/D29525	2021-04-19 16:08:13 +08:00
Ka Ho Ng	03efa462b2	AMD-vi: Mixed format IVHD block should replace fixed format IVHD block This fixes double IVHD_SETUP_INTR calls on the same IOMMU device. Sponsored by: The FreeBSD Foundation MFC with: `74ada297e8` Reported by: Oleg Ginzburg <olevole@olevole.ru> Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D29521	2021-04-01 15:31:24 +08:00
Ka Ho Ng	cf76495e0a	AMD-vi: Fix mismatched NULL checking in amdiommu teardown path Sponsored by: The FreeBSD Foundation Approved by: lwhsu (mentor) MFC with: `74ada297e8`	2021-04-01 03:34:34 +08:00
Ka Ho Ng	be97fc8dce	bhyve amd: Small cleanups in amdvi_dump_cmds Bump offset with MOD_INC instead in amdvi_dump_cmds. Reviewed by: jhb Approved by: philip (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28862	2021-03-23 16:12:41 +08:00
Ed Maste	54399caa2f	Correct "Fondation" typo (missing "u")	2021-03-22 13:06:31 -04:00
Ka Ho Ng	74ada297e8	AMD-vi: Fix IOMMU device interrupts being overridden Currently, AMD-vi PCI-e passthrough will lead to the following lines in dmesg: "kernel: CPU0: local APIC error 0x40 ivhd0: Error: completion failed tail:0x720, head:0x0." After some tracing, the problem is due to the interaction with amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the identification of AMD-vi IVHD is done by walking over the ACPI IVRS table and ivhdX device_ts are added under the acpi bus, while there are no driver handling the corresponding IOMMU PCI function. In amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is called on ivhdX. the IOMMU pci function device_t is only used for pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci function, the IOMMU PCI function device_t's dinfo->cfg.msi is never updated to reflect the supposed msi_data and msi_addr. So the msi_data and msi_addr stay in the value 0. When pci_driver_added() tried to loop over the children of a pci bus, and do pci_cfg_restore() on each of them, msi_addr and msi_data with value 0 will be written to the MSI capability of the IOMMU pci function, thus explaining the errors in dmesg. This change includes an amdiommu driver which currently does attaching, detaching and providing DEVMETHODs for setting up and tearing down interrupt. The purpose of the driver is to prevent pci_driver_added() from calling pci_cfg_restore() on the IOMMU PCI function device_t. The introduction of the amdiommu driver handles allocation of an IRQ resource within the IOMMU PCI function, so that the dinfo->cfg.msi is populated. This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU. Sponsored by: The FreeBSD Foundation Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28984	2021-03-22 17:33:43 +08:00
Ka Ho Ng	ede14736fd	ivrs_drv: Fix IVHDs with duplicated BaseAddress Reviewed by: jhb Approved by: philip (mentor) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28945	2021-03-22 17:33:43 +08:00
D Scott Phillips	f8a6ec2d57	bhyve: support relocating fbuf and passthru data BARs We want to allow the UEFI firmware to enumerate and assign addresses to PCI devices so we can boot from NVMe[1]. Address assignment of PCI BARs is properly handled by the PCI emulation code in general, but a few specific cases need additional support. fbuf and passthru map additional objects into the guest physical address space and so need to handle address updates. Here we add a callback to emulated PCI devices to inform them of a BAR configuration change. fbuf and passthru then watch for these BAR changes and relocate the frame buffer memory segment and passthru device mmio area respectively. We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls to vmm(4) to facilitate the unmapping needed for addres updates. [1]: https://github.com/freebsd/uefi-edk2/pull/9/ Originally by: scottph MFC After: 1 week Sponsored by: Intel Corporation Reviewed by: grehan Approved by: philip (mentor) Differential Revision: https://reviews.freebsd.org/D24066	2021-03-19 11:04:36 +08:00
Roger Pau Monné	5ea878684f	bhyve/ioapic: improve the tracking of IRR bit One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238	2021-02-02 09:47:00 +01:00
Roger Pau Monné	d7d067698a	bhyve/ioapic: only account for asserted line in level mode After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237	2021-02-02 09:45:45 +01:00
Roger Pau Monné	49429cf9be	bhyve/vioapic: remove an extra pin masked check vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236	2021-02-02 09:44:20 +01:00
Konstantin Belousov	3c48106aaa	bhyve: limit max GPA to VM_MAXUSER_ADDRESS_LA48. We use 4-level EPT pages, correct the upper bound. Reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27402	2020-11-29 10:32:38 +00:00
Peter Grehan	15add60d37	Convert vmm_ops calls to IFUNC There is no need for these to be function pointers since they are never modified post-module load. Rename AMD/Intel ops to be more consistent. Submitted by: adam_fenn.io Reviewed by: markj, grehan Approved by: grehan (bhyve) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27375	2020-11-28 01:16:59 +00:00
Peter Grehan	ddfc488c36	Remove manual instruction encodings for VMLOAD, VMRUN, and VMSAVE. This is a relic from when these instructions weren't supported by the toolchain. No functional change. Submitted by: adam_fenn.io Reviewed by: grehan Approved by: grehan (bhyve) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27130	2020-11-26 05:58:55 +00:00
John Baldwin	908dca3ef4	Pull the check for VM ownership into ppt_find(). This reduces some code duplication. One behavior change is that ppt_assign_device() will now only succeed if the device is unowned. Previously, a device could be assigned to the same VM multiple times, but each time it was assigned, the device's state was reset. Reviewed by: markj, grehan MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27301	2020-11-24 23:56:33 +00:00
John Baldwin	1925586e03	Honor the disabled setting for MSI-X interrupts for passthrough devices. Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough device and invoke it if a write to the MSI-X capability registers disables MSI-X. This avoids leaving MSI-X interrupts enabled on the host if a guest device driver has disabled them (e.g. as part of detaching a guest device driver). This was found by Chelsio QA when testing that a Linux guest could switch from MSI-X to MSI interrupts when using the cxgb4vf driver. While here, explicitly fail requests to enable MSI on a passthrough device if MSI-X is enabled and vice versa. Reported by: Sony Arpita Das @ Chelsio Reviewed by: grehan, markj MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27212	2020-11-24 23:18:52 +00:00
Mark Johnston	6f5a960678	vmm: Make pmap_invalidate_ept() wait synchronously for guest exits Currently EPT TLB invalidation is done by incrementing a generation counter and issuing an IPI to all CPUs currently running vCPU threads. The VMM inner loop caches the most recently observed generation on each host CPU and invalidates TLB entries before executing the VM if the cached generation number is not the most recent value. pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing guest instructions and reload the generation number. However, it does not actually wait for vCPUs to exit, potentially creating a window where guests may continue to reference stale TLB entries. Fix the problem by bracketing guest execution with an SMR read section which is entered before loading the invalidation generation. Then, pmap_invalidate_ept() increments the current write sequence before loading pm_active and sending IPIs, and polls readers to ensure that all vCPUs potentially operating with stale TLB entries have exited before pmap_invalidate_ept() returns. Also ensure that unsynchronized loads of the generation counter are wrapped with atomic(9), and stop (inconsistently) updating the invalidation counter and pm_active bitmask with acquire semantics. Reviewed by: grehan, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26910	2020-11-11 15:01:17 +00:00
Mark Johnston	8e2cbc5660	vmx: Implement pmap (de)activation in C Rewrite the code that maintains pm_active and invalidates EPTP-tagged TLB entries in C. Previously this work was done in vmx_enter_guest(), in assembly, but there is no good reason for that and it makes the TLB invalidation algorithm for nested page tables harder to review. No functional change intended. Now, an error from the invept instruction results in a kernel panic rather than a vmexit. Such errors should occur only as a result of VMM bugs. Reviewed by: grehan, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26830	2020-10-19 15:24:35 +00:00
Mark Johnston	494955366a	Remove svn:executable from a couple of vmm(4) source files. MFC after: 3 days	2020-10-01 22:20:29 +00:00
John Baldwin	a3f2a9c57e	Clear the upper 32-bits of registers in x86_emulate_cpuid(). Per the Intel manuals, CPUID is supposed to unconditionally zero the upper 32 bits of the involved (rax/rbx/rcx/rdx) registers. Previously, the emulation would cast pointers to the 64-bit register values down to `uint32_t`, which while properly manipulating the lower bits, would leave any garbage in the upper bits uncleared. While no existing guest OSes seem to stumble over this in practice, the bhyve emulation should match x86 expectations. This was discovered through alignment warnings emitted by gcc9, while testing it against SmartOS/bhyve. SmartOS bug: https://smartos.org/bugview/OS-8168 Submitted by: Patrick Mooney Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D24727	2020-10-01 16:45:11 +00:00
Ed Maste	09860d44e4	bhyve: do not permit write access to VMCB / VMCS Reported by: Patrick Mooney Submitted by: jhb Security: CVE-2020-24718	2020-09-15 21:04:27 +00:00
Konstantin Belousov	101d5b527a	bhyve: intercept AMD SVM instructions. Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an SVM instruction. Otherwise, SVM allows execution of them, and instructions operate on host physical addresses despite being executed in guest mode. Reported by: Maxime Villard <max@m00nbsd.net> admbug: 972 CVE: CVE-2020-7467 Reviewed by: grehan, markj Differential revision: https://reviews.freebsd.org/D26313	2020-09-15 20:22:50 +00:00
John Baldwin	385f4a5ac8	Use vmcb_read/write for the vmcb snapshot functions. This avoids some unnecessary layers of indirection.	2020-09-10 22:22:23 +00:00
Mateusz Guzik	543769bf83	amd64: clean up empty lines in .c and .h files	2020-09-01 21:16:54 +00:00
Konstantin Belousov	f3eb12e4a6	Add bhyve support for LA57 guest mode. Noted and reviewed by: grehan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:37:21 +00:00
Konstantin Belousov	9ce875d9b5	amd64 pmap: LA57 AKA 5-level paging Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273	2020-08-23 20:19:04 +00:00
Peter Grehan	3a3f1e9dfa	Export a routine to provide the TSC_AUX MSR value and use this in vmm. Also, drop an unnecessary set of braces. Requested by: kib Reviewed by: kib MFC after: 3 weeks	2020-08-18 11:36:38 +00:00
Peter Grehan	f5f5f1e7d6	Support guest rdtscp and rdpid instructions on Intel VT-x Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts that support the "enable RDTSCP" VM-execution control. Submitted by: adam_fenn.io Reported by: chuck Reviewed by: chuck, grehan, jhb Approved by: jhb (bhyve), grehan MFC after: 3 weeks Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D26003	2020-08-18 07:23:47 +00:00
Peter Grehan	46567b4f5e	Allow guest device MMIO access from bootmem memory segments. Recent versions of UEFI have moved local APIC timer initialization into the early SEC phase which runs out of ROM, prior to self-relocating into RAM. This results in a hypervisor exit. Currently bhyve prevents instruction emulation from segments that aren't marked as "sysmem" aka guest RAM, with the vm_gpa_hold() routine failing. However, there is no reason for this restriction: the hypervisor already controls whether EPT mappings are marked as executable. Fix by dropping the redundant check of sysmem. MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D25955	2020-08-18 07:08:17 +00:00
Conrad Meyer	4daa95f85d	bhyve(8): For prototyping, reattempt decode in userspace If userspace has a newer bhyve than the kernel, it may be able to decode and emulate some instructions vmm.ko is unaware of. In this scenario, reset decoder state and try again. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24464	2020-06-25 00:18:42 +00:00
Conrad Meyer	f4ce062964	vmm(4): Add 12 user ABI compat after r349948 Reported by: kp Reviewed by: jhb, kp Tested by: kp Differential Revision: https://reviews.freebsd.org/D24929	2020-05-20 17:27:54 +00:00
Conrad Meyer	8a68ae80f6	vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace for use in, e.g., fallback instruction emulation (when userspace has a newer instruction decode/emulation layer than the kernel vmm(4)). Plumb the ioctl through libvmmapi and register the memory ranges in bhyve(8). Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24525	2020-05-15 15:54:22 +00:00
Peter Grehan	ec048c7550	Hide host CPUID 0x15 TSC/Crystal ratio/freq info from guest In recent Linux (5.3+) and OpenBSD (6.6+) kernels, and with hosts that support CPUID 0x15, the local APIC frequency is determined directly from the reported crystal clock to avoid calibration against the 8254 timer. However, the local APIC frequency implemented by bhyve is 128MHz, where most h/w systems report frequencies around 25MHz. This shows up on OpenBSD guests as repeated keystrokes on the emulated PS2 keyboard when using VNC, since the kernel's timers are now much shorter. Fix by reporting all-zeroes for CPUID 0x15. This allows guests to fall back to using the 8254 to calibrate the local APIC frequency. Future work could be to compute values returned for 0x15 that would match the host TSC and bhyve local APIC frequency, though all dependencies on this would need to be examined (for example, Linux will start using 0x16 for some hosts). PR: 246321 Reported by: Jason Tubnor (and tested) Reviewed by: jhb Approved by: jhb, bz (mentor) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D24837	2020-05-14 22:18:12 +00:00
John Baldwin	483d953a86	Initial support for bhyve save and restore. Save and restore (also known as suspend and resume) permits a snapshot to be taken of a guest's state that can later be resumed. In the current implementation, bhyve(8) creates a UNIX domain socket that is used by bhyvectl(8) to send a request to save a snapshot (and optionally exit after the snapshot has been taken). A snapshot currently consists of two files: the first holds a copy of guest RAM, and the second file holds other guest state such as vCPU register values and device model state. To resume a guest, bhyve(8) must be started with a matching pair of command line arguments to instantiate the same set of device models as well as a pointer to the saved snapshot. While the current implementation is useful for several uses cases, it has a few limitations. The file format for saving the guest state is tied to the ABI of internal bhyve structures and is not self-describing (in that it does not communicate the set of device models present in the system). In addition, the state saved for some device models closely matches the internal data structures which might prove a challenge for compatibility of snapshot files across a range of bhyve versions. The file format also does not currently support versioning of individual chunks of state. As a result, the current file format is not a fixed binary format and future revisions to save and restore will break binary compatiblity of snapshot files. The goal is to move to a more flexible format that adds versioning, etc. and at that point to commit to providing a reasonable level of compatibility. As a result, the current implementation is not enabled by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option for userland builds, and the kernel option BHYVE_SHAPSHOT. Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz Relnotes: yes Sponsored by: University Politehnica of Bucharest Sponsored by: Matthew Grooms (student scholarships) Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D19495	2020-05-05 00:02:04 +00:00
Conrad Meyer	47332982bc	vmm(4): Decode and emulate BEXTR Clang 10 -march=native kernels on znver1 emit BEXTR for APIC reads, apparently. Decode and emulate the instruction. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24463	2020-04-21 21:34:24 +00:00
Conrad Meyer	cfdea69d24	vmm(4): Decode 3-byte VEX-prefixed instructions Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24462	2020-04-21 21:33:06 +00:00
Conrad Meyer	00d3723fb4	vmm(4): Bump VM_MAX_MEMMAPS for vmgenid As a short term solution for the problem reported by Shawn Webb re: r359950, bump the maximum number of memmaps per VM. This structure is 40 bytes, and the additional four (fixed array embedded in the struct vm) members increase the size of struct vm by 3%. (The vast majority of struct vm is the embedded struct vcpu array, which accounts for 84% of the size -- over 4 kB.) Reported by: Shawn Webb <shawn.webb AT hardenedbsd.org> Reviewed by: grehan X-MFC-With: r359950 Differential Revision: https://reviews.freebsd.org/D24507	2020-04-19 23:53:47 +00:00
Conrad Meyer	b645fd4531	vmm(4): Expose instruction decode to userspace build Permit instruction decoding logic to be compiled outside of the kernel for rapid iteration and validation. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D24439	2020-04-16 16:50:33 +00:00
Jung-uk Kim	3ee58df503	Merge ACPICA 20200326.	2020-03-27 00:29:33 +00:00
Michael Reifenberger	1bc51bad2b	Untangle TPR shadowing and APIC virtualization. This speeds up Windows guests tremendously. The patch does: Add a new tuneable 'hw.vmm.vmx.use_tpr_shadowing' to disable TLP shadowing. Also add 'hw.vmm.vmx.cap.tpr_shadowing' to be able to query if TPR shadowing is used. Detach the initialization of TPR shadowing from the initialization of APIC virtualization. APIC virtualization still needs TPR shadowing, but not vice versa. Any CPU that supports APIC virtualization should also support TPR shadowing. When TPR shadowing is used, the APIC page of each vCPU is written to the VMCS_VIRTUAL_APIC field of the VMCS so that the CPU can write directly to the page without intercept. On vm exit, vlapic_update_ppr() is called to update the PPR. Submitted by: Yamagi Burmeister MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22942	2020-03-10 16:53:49 +00:00
Pawel Biernacki	b40598c539	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Reviewed by: kib Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23625 X-Generally looks fine: jhb	2020-02-15 18:57:49 +00:00
Konstantin Belousov	caab504277	vmm: Add Hygon Dhyana support. Submitted by: Pu Wen <puwen@hygon.cn> Discussed with: grehan Reviewed by: jhb (previous version) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23553	2020-02-13 19:03:12 +00:00
Konstantin Belousov	b837dadd87	bhyve: terminate waiting loops if thread suspension is requested. PR: 242724 Reviewed by: markj Reported and tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22881	2020-01-02 22:37:04 +00:00
John Baldwin	cbd03a9df2	Support software breakpoints in the debug server on Intel CPUs. - Allow the userland hypervisor to intercept breakpoint exceptions (BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to enable this feature. These exceptions are reported to userland via a new VM_EXITCODE_BPT that includes the length of the original breakpoint instruction. If userland wishes to pass the exception through to the guest, it must be explicitly re-injected via vm_inject_exception(). - Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH pseudo-register. Injecting a BP# on Intel requires setting this to the length of the breakpoint instruction. AMD SVM currently ignores writes to this register (but reports success) and fails to read it. - Rework the per-vCPU state tracked by the debug server. Rather than a single 'stepping_vcpu' global, add a structure for each vCPU that tracks state about that vCPU ('stepping', 'stepped', and 'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is currently reporting an event. Event handlers for MTRAP and breakpoint exits loop until the associated event is reported to the debugger. Breakpoint events are discarded if the breakpoint is not present when a vCPU resumes in the breakpoint handler to retry submitting the breakpoint event. - Maintain a linked-list of active breakpoints in response to the GDB 'Z0' and 'z0' packets. Reviewed by: markj (earlier version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D20309	2019-12-13 19:21:58 +00:00
Anish Gupta	84474332d3	bhyve amd: amdvi_dump_cmds() log the command for which the command completion failed. Completion is checked in poll mode although it can be done using interrupts. No need to log all the commands in command ring but only the last one for which completion failed. Reported by: np@freebsd.org Reviewed by: np, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22566	2019-12-01 04:00:08 +00:00
Konstantin Belousov	a7af4a3e7d	amd64: move GDT into PCPU area. Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302	2019-11-12 15:51:47 +00:00
Andriy Gapon	869dbab7ba	vmm: remove a wmb() call After removing wmb(), vm_set_rendezvous_func() became super trivial, so there was no point in keeping it. The wmb (sfence on amd64, lock nop on i386) was not needed. This can be explained from several points of view. First, wmb() is used for store-store ordering (although, the primitive is undocumented). There was no obvious subsequent store that needed the barrier. Second, x86 has a memory model with strong ordering including total store order. An explicit store barrier may be needed only when working with special memory (device, special caching mode) or using special instructions (non-temporal stores). That was not the case for this code. Third, I believe that there is a misconception that sfence "flushes" the store buffer in a sense that it speeds up the propagation of stores from the store buffer to the global visibility. I think that such propagation always happens as fast as possible. sfence only makes subsequent stores wait for that propagation to complete. So, sfence is only useful for ordering of stores and only in the situations described above. Reviewed by: jhb MFC after: 23 days Differential Revision: https://reviews.freebsd.org/D21978	2019-10-19 07:10:15 +00:00
Mark Johnston	d3588766e1	Correct the scope of several global variables. They are accessed from multiple compilation units. No functional change intended. MFC after: 1 week Sponsored by: Netflix	2019-09-27 21:04:33 +00:00
Konstantin Belousov	df08823d07	Improve MD page fault handlers. Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way. This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations. Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures. For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation. vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware. PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566	2019-09-27 18:43:36 +00:00

1 2 3 4 5 ...

538 Commits