freebsd-skq

Author	SHA1	Message	Date
jhb	49cf19e56b	Fix missed posted interrupts in VT-x in bhyve. When a vCPU is HLTed, interrupts with a priority below the processor priority (PPR) should not resume the vCPU while interrupts at or above the PPR should. With posted interrupts, bhyve maintains a bitmap of pending interrupts in PIR descriptor along with a single 'pending' bit. This bit is checked by a CPU running in guest mode at various places to determine if it should be checked. In addition, another CPU can force a CPU in guest mode to check for pending interrupts by sending an IPI to a special IDT vector reserved for this purpose. bhyve had a bug in that it would only notify a guest vCPU of an interrupt (e.g. by sending the special IPI or by resuming it if it was idle due to HLT) if an interrupt arrived that was higher priority than PPR and no interrupts were currently pending. This assumed that if the 'pending' bit was set, any needed notification was already in progress. However, if the first interrupt sent to a HLTed vCPU was lower priority than PPR and the second was higher than PPR, the first interrupt would set 'pending' but not notify the vCPU, and the second interrupt would not notify the vCPU because 'pending' was already set. To fix this, track the priority of pending interrupts in a separate per-vCPU bitmask and notify a vCPU anytime an interrupt arrives that is above PPR and higher than any previously-received interrupt. This was found and debugged in the bhyve port to SmartOS maintained by Joyent. Relevant SmartOS bugs with more background: https://smartos.org/bugview/OS-6829 https://smartos.org/bugview/OS-6930 https://smartos.org/bugview/OS-7354 Submitted by: Patrick Mooney <pmooney@pfmooney.com> Reviewed by: tychon, rgrimes Obtained from: SmartOS / Joyent MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19299	2019-03-01 20:43:48 +00:00
cem	e87da0fa26	vmm(4): Mask Spectre feature bits on AMD hosts For parity with Intel hosts, which already mask out the CPUID feature bits that indicate the presence of the SPEC_CTRL MSR, do the same on AMD. Eventually we may want to have a better support story for guests, but for now, limit the damage of incorrectly indicating an MSR we do not yet support. Eventually, we may want a generic CPUID override system for administrators, or for minimum supported feature set in heterogenous environments with failover. That is a much larger scope effort than this bug fix. PR: 235010 Reported by: Rys Sommefeldt <rys AT sommefeldt.com> Sponsored by: Dell EMC Isilon	2019-01-18 23:54:51 +00:00
kib	0dd0b09e39	Trim whitespace at EoL, use tabs instead of spaces for indent. PR: 235004 Submitted by: Jose Luis Duran <jlduran@gmail.com> MFC after: 3 days	2019-01-17 05:15:25 +00:00
cem	8a1e4c2437	vmm(4): Take steps towards multicore bhyve AMD support vmm's CPUID emulation presented Intel topology information to the guest, but disabled AMD topology information and in some cases passed through garbage. I.e., CPUID leaves 0x8000_001[de] were passed through to the guest, but guest CPUs can migrate between host threads, so the information presented was not consistent. This could easily be observed with 'cpucontrol -i 0xfoo /dev/cpuctl0'. Slightly improve this situation by enabling the AMD topology feature flag and presenting at least the CPUID fields used by FreeBSD itself to probe topology on more modern AMD64 hardware (Family 15h+). Older stuff is probably less interesting. I have not been able to empirically confirm it is sufficient, but it should not regress anything either. Reviewed by: araujo (previous version) Relnotes: sure	2019-01-16 02:19:04 +00:00
kib	3cd0168095	Align IA32_ARCH_CAP MSR definitions and use with SDM rev. 068. SDM rev. 068 was released yesterday and it contains the description of the MSR 0x10a IA32_ARCH_CAP. This change adds symbolic definitions for all bits present in the document, and decode them in the CPU identification lines printed on boot. But also, the document defines SSB_NO as bit 4, while FreeBSD used but 2 to detect the need to work-around Speculative Store Bypass issue. Change code to use the bit from SDM. Similarly, the document describes bit 3 as an indicator that L1TF issue is not present, in particular, no L1D flush is needed on VMENTRY. We used RDCL_NO to avoid flushing, and again I changed the code to follow new spec from SDM. In fact my Apollo Lake machine with latest ucode shows this: IA32_ARCH_CAPS=0x19<RDCL_NO,SKIP_L1DFL_VME,SSB_NO> Reviewed by: bwidawsk Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D18006	2018-11-16 21:27:11 +00:00
araujo	4ba03c8057	Merge cases with upper block. This is a cosmetic change only to simplify code. Reported by: anish Sponsored by: iXsystems Inc.	2018-10-31 01:27:44 +00:00
araujo	12d6a6c7c6	Emulate machine check related MSR_EXTFEATURES to allow guest OSes to boot on AMD FX Series. PR: 224476 Submitted by: Keita Uchida <m@jgz.jp> Reviewed by: rgrimes Sponsored by: iXsystems Inc. Differential Revision: https://reviews.freebsd.org/D17713	2018-10-30 10:02:23 +00:00
yuripv	c7e9d7202a	Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3"). Add the document to SEE ALSO in bhyve.8 (and pet manlint here a bit). Reviewed by: jhb, rgrimes, 0mp Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D17531	2018-10-27 21:24:28 +00:00
jhb	b13c842313	Reload the LDT selector after an AMD-v #VMEXIT. cpu_switch() always reloads the LDT, so this can only affect the hypervisor process itself. Fix this by explicitly reloading the host LDT selector after each #VMEXIT. The stock bhyve process on FreeBSD never uses a custom LDT, so this change is cosmetic. Reviewed by: kib Tested by: Mike Tancsa <mike@sentex.net> Approved by: re (gjb) MFC after: 2 weeks	2018-10-15 18:12:25 +00:00
kib	304ffe7ed0	bhyve: emulate CLFLUSH and CLFLUSHOPT. Apparently CLFLUSH on mmio can cause VM exit, as reported in the PR. I do not see that anything useful can be done except emulating page faults on invalid addresses. Due to the instruction encoding pecularity, also emulate SFENCE. PR: 232081 Reported by: phk Reviewed by: araujo, avg, jhb (all: previous version) Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17482	2018-10-12 15:30:15 +00:00
jhb	f410335413	Fully restore the GDTR, IDTR, and LDTR after VT-x VM exits. The VT-x VMCS only stores the base address of the GDTR and IDTR. As a result, VM exits use a fixed limit of 0xffff for the host GDTR and IDTR losing the smaller limits set in when the initial GDT is loaded on each CPU during boot. Explicitly save and restore the full GDTR and IDTR contents around VM entries and exits to restore the correct limit. Similarly, explicitly save and restore the LDT selector. VM exits always clear the host LDTR as if the LDT was loaded with a NULL selector and a userspace hypervisor is probably using a NULL selector anyway, but save and restore the LDT explicitly just to be safe. PR: 230773 Reported by: John Levon <levon@movementarian.org> Reviewed by: kib Tested by: araujo Approved by: re (rgrimes) MFC after: 1 week	2018-10-11 18:27:19 +00:00
andrew	f81459bd83	Handle a guest executing a vm instruction by trapping and raising an undefined instruction exception. Previously we would exit the guest, however an unprivileged user could execute these. Found with: syzkaller Reviewed by: araujo, tychon (previous version) Approved by: re (kib) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17192	2018-09-27 11:16:19 +00:00
kib	04480b8572	Update L1TF workaround to sustain L1D pollution from NMI. Current mitigation for L1TF in bhyve flushes L1D either by an explicit WRMSR command, or by software reading enough uninteresting data to fully populate all lines of L1D. If NMI occurs after either of methods is completed, but before VM entry, L1D becomes polluted with the cache lines touched by NMI handlers. There is no interesting data which NMI accesses, but something sensitive might be co-located on the same cache line, and then L1TF exposes that to a rogue guest. Use VM entry MSR load list to ensure atomicity of L1D cache and VM entry if updated microcode was loaded. If only software flush method is available, try to help the bhyve sw flusher by also flushing L1D on NMI exit to kernel mode. Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16790	2018-08-19 18:47:16 +00:00
kib	77ff24342c	Provide part of the mitigation for L1TF-VMM. On the guest entry in bhyve, flush L1 data cache, using either L1D flush command MSR if available, or by reading enough uninteresting data to fill whole cache. Flush is automatically enabled on CPUs which do not report RDCL_NO, and can be disabled with the hw.vmm.l1d_flush tunable/kenv. Security: CVE-2018-3646 Reviewed by: emaste. jhb, Tony Luck <tony.luck@intel.com> Sponsored by: The FreeBSD Foundation	2018-08-14 17:29:41 +00:00
araujo	15d1271a22	- Add the ability to run bhyve(8) within a jail(8). This patch adds a new sysctl(8) knob "security.jail.vmm_allowed", by default this option is disable. Submitted by: Shawn Webb <shawn.webb____hardenedbsd.org> Reviewed by: jamie@ and myself. Relnotes: Yes. Sponsored by: HardenedBSD and G2, Inc. Differential Revision: https://reviews.freebsd.org/D16057	2018-08-01 00:39:21 +00:00
araujo	7e47182377	Add SPDX tags to vmm(4). MFC after: 4 weeks. Sponsored by: iXsystems Inc.	2018-06-13 07:02:58 +00:00
jhb	31acfe0f07	Cleanups related to debug exceptions on x86. - Add constants for fields in DR6 and the reserved fields in DR7. Use these constants instead of magic numbers in most places that use DR6 and DR7. - Refer to T_TRCTRAP as "debug exception" rather than a "trace trap" as it is not just for trace exceptions. - Always read DR6 for debug exceptions and only clear TF in the flags register for user exceptions where DR6.BS is set. - Clear DR6 before returning from a debug exception handler as recommended by the SDM dating all the way back to the 386. This allows debuggers to determine the cause of each exception. For kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value to other parts of the handler (namely, user_dbreg_trap()). For user traps, wait until after trapsignal to clear DR6 so that userland debuggers can read DR6 via PT_GETDBREGS while the thread is stopped in trapsignal(). Reviewed by: kib, rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15189	2018-05-22 00:45:00 +00:00
antoine	8156c46154	vmmdev: return EFAULT when trying to read beyond VM system memory max address Currently, when using dd(1) to take a VM memory image, the capture never ends, reading zeroes when it's beyond VM system memory max address. Return EFAULT when trying to read beyond VM system memory max address. Reviewed by: imp, grehan, anish Approved by: grehan Differential Revision: https://reviews.freebsd.org/D15156	2018-05-15 17:20:58 +00:00
grehan	d0b707de44	Use PCI power-mgmt to reset a device if FLR fails. A large number of devices don't support PCIe FLR, in particular graphics adapters. Use PCI power management to perform the reset if FLR fails or isn't available, by cycling the device through the D3 state. This has been tested by a number of users with Nvidia and AMD GPUs. Submitted and tested by: Matt Macy Reviewed by: jhb, imp, rgrimes MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15268	2018-05-02 17:41:00 +00:00
kib	dd08c073d0	Correct undesirable interaction between caching of %cr4 in bhyve and invltlb_glob(). Reviewed by: grehan, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15138	2018-04-24 13:44:19 +00:00
tychon	d53228c4ee	Add SDT probes to vmexit on Intel. Submitted by: domagoj.stolfa_gmail.com Reviewed by: grehan, tychon Sponsored by: DARPA/AFRL Differential Revision: https://reviews.freebsd.org/D14656	2018-04-13 17:23:05 +00:00
rgrimes	f4d1671ab3	Add the ability to control the CPU topology of created VMs from userland without the need to use sysctls, it allows the old sysctls to continue to function, but deprecates them at FreeBSD_version 1200060 (Relnotes for deprecate). The command line of bhyve is maintained in a backwards compatible way. The API of libvmmapi is maintained in a backwards compatible way. The sysctl's are maintained in a backwards compatible way. Added command option looks like: bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n] The optional parts can be specified in any order, but only a single integer invokes the backwards compatible parse. [,maxcpus=n] is hidden by #ifdef until kernel support is added, though the api is put in place. bhyvectl --get-cpu-topology option added. Reviewed by: grehan (maintainer, earlier version), Reviewed by: bcr (manpages) Approved by: bde (mentor), phk (mentor) Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd) MFC after: 1 week Relnotes: Y Differential Revision: https://reviews.freebsd.org/D9930	2018-04-08 19:24:49 +00:00
jhb	a8d4adc969	Add a way to temporarily suspend and resume virtual CPUs. This is used as part of implementing run control in bhyve's debug server. The hypervisor now maintains a set of "debugged" CPUs. Attempting to run a debugged CPU will fail to execute any guest instructions and will instead report a VM_EXITCODE_DEBUG exit to the userland hypervisor. Virtual CPUs are placed into the debugged state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl). Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl). The debug server suspends virtual CPUs when it wishes them to stop executing in the guest (for example, when a debugger attaches to the server). The debug server can choose to resume only a subset of CPUs (for example, when single stepping) or it can choose to resume all CPUs. The debug server must explicitly mark a CPU as resumed via vm_resume_cpu() before the virtual CPU will successfully execute any guest instructions. Reviewed by: avg, grehan Tested on: Intel (jhb), AMD (avg) Differential Revision: https://reviews.freebsd.org/D14466	2018-04-06 22:03:43 +00:00
tychon	6cbf64d8bf	Fix a lock recursion introduced in r327065. Reported by: kmacy Reviewed by: grehan, jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14548	2018-03-07 18:03:22 +00:00
anish	919edd6e24	Move the new AMD-Vi IVHD [ACPI_IVRS_HARDWARE_NEW]definitions added in r329360 in contrib ACPI to local files till ACPI code adds new definitions reported by jkim. Rename ACPI_IVRS_HARDWARE_NEW to ACPI_IVRS_HARDWARE_EFRSUP, since new definitions add Extended Feature Register support. Use IvrsType to distinguish three types of IVHD - 0x10(legacy), 0x11 and 0x40(with EFR). IVHD 0x40 is also called mixed type since it supports HID device entries. Fix 2 coverity bugs reported by cem. Reported by:jkim, cem Approved by:grehan Differential Revision://reviews.freebsd.org/D14501	2018-03-05 02:28:25 +00:00
jhb	3e51017144	Add a new variant of the GLA2GPA ioctl for use by the debug server. Unlike the existing GLA2GPA ioctl, GLA2GPA_NOFAULT does not modify the guest. In particular, it does not inject any faults or modify PTEs in the guest when performing an address space translation. This is used by bhyve's debug server to read and write memory for the remote debugger. Reviewed by: grehan MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D14075	2018-02-26 19:19:05 +00:00
jhb	4c5555dd19	Add two new ioctls to bhyve for batch register fetch/store operations. These are a convenience for bhyve's debug server to use a single ioctl for 'g' and 'G' rather than a loop of individual get/set ioctl requests. Reviewed by: grehan MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D14074	2018-02-22 00:39:25 +00:00
avg	f05cb3b6cf	move vintr_intercept_enabled under INVARIANTS The function is not used outside of INVARIANTS since r328622. MFC after: 1 week	2018-02-16 07:02:14 +00:00
anish	046b55b4d5	This change fixes duplicate detection of same IOMMU/AMD-Vi device for Ryzen with EFR support. IVRS can have entry of type legacy and non-legacy present at same time for same AMD-Vi device. ivhd driver will ignore legacy if new IVHD type is present as specified in AMD-Vi specification. Earlier both of IVHD entries used and two ivhd devices were created. Add support for new IVHD type 0x11 and 0x40 in ACPI. Create new struct of type acpi_ivrs_hardware_new for these new type of IVHDs. Legacy type 0x10 will continue to use acpi_ivrs_hardware. Reviewed by: avg Approved by: grehan Differential Revision:https://reviews.freebsd.org/D13160	2018-02-16 05:17:00 +00:00
tychon	e942535b5d	Provide further mitigation against CVE-2017-5715 by flushing the return stack buffer (RSB) upon returning from the guest. This was inspired by this linux commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=117cc7a908c83697b0b737d15ae1eb5943afe35b Reviewed by: grehan Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14272	2018-02-12 14:45:27 +00:00
avg	bedef9326b	vmm/svm: post LAPIC interrupts using event injection, not virtual interrupts The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR fields of VMCB to inject a virtual interrupt into a guest VM. This method has many advantages over the direct event injection as it offloads all decisions of whether and when the interrupt can be delivered to the guest. But with a purely software emulated vAPIC the advantage is also a problem. The problem is that the hypervisor does not have any precise control over when the interrupt is actually delivered to the guest (or a notification about that). Because of that the hypervisor cannot update the interrupt vector in IRR and ISR in the same way as real hardware would. The hypervisor becomes aware that the interrupt is being serviced only upon the first VMEXIT after the interrupt is delivered. This creates a window between the actual interrupt delivery and the update of IRR and ISR. That means that IRR and ISR might not be correctly set up to the point of the end-of-interrupt signal. The described deviation has been observed to cause an interrupt loss in the following scenario. vCPU0 posts an inter-processor interrupt to vCPU1. The interrupt is injected as a virtual interrupt by the hypervisor. The interrupt is delivered to a guest and an interrupt handler is invoked. The handler performs a requested action and acknowledges the request by modifying a global variable. So far, there is no VMEXIT and the hypervisor is unaware of the events. Then, vCPU0 notices the acknowledgment and sends another IPI with the same vector. The IPI gets collapsed into the previous IPI in the IRR of vCPU1. Only after that a VMEXIT of vCPU1 occurs. At that time the vector is cleared in the IRR and is set in the ISR. vCPU1 has vAPIC state as if the second IPI has never been sent. The scenario is impossible on the real hardware because IRR and ISR are updated just before the interrupt handler gets started. I saw several possibilities of fixing the problem. One is to intercept the virtual interrupt delivery to update IRR and ISR at the right moment. The other is to deliver the LAPIC interrupts using the event injection, same as legacy interrupts. I opted to use the latter approach for several reasons. It's equivalent to what VMM/Intel does (in !VMX case). It appears to be what VirtualBox and KVM do. The code is already there (to support legacy interrupts). Another possibility was to use a special intermediate state for a vector after it is injected using a virtual interrupt and before it is known whether it was accepted or is still pending. That approach was implemented in https://reviews.freebsd.org/D13828 That method is more complex and does not have any clear advantage. Please see sections 15.20 and 15.21.4 of "AMD64 Architecture Programmer's Manual Volume 2: System Programming" (publication 24593, revision 3.29) for comparison between event injection and virtual interrupt injection. PR: 215972 Reported by: ajschot@hotmail.com, grehan Tested by: anish, grehan, Nils Beyer <nbe@renzel.net> Reviewed by: anish, grehan MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13780	2018-01-31 11:14:26 +00:00
jhb	391a83c86b	Save and restore guest debug registers. Currently most of the debug registers are not saved and restored during VM transitions allowing guest and host debug register values to leak into the opposite context. One result is that hardware watchpoints do not work reliably within a guest under VT-x. Due to differences in SVM and VT-x, slightly different approaches are used. For VT-x: - Enable debug register save/restore for VM entry/exit in the VMCS for DR7 and MSR_DEBUGCTL. - Explicitly save DR0-3,6 of the guest. - Explicitly save DR0-3,6-7, MSR_DEBUGCTL, and the trap flag from %rflags for the host. Note that because DR6 is "software" managed and not stored in the VMCS a kernel debugger which single steps through VM entry could corrupt the guest DR6 (since a single step trap taken after loading the guest DR6 could alter the DR6 register). To avoid this, explicitly disable single-stepping via the trace flag before loading the guest DR6. A determined debugger could still defeat this by setting a breakpoint after the guest DR6 was loaded and then single-stepping. For SVM: - Enable debug register caching in the VMCB for DR6/DR7. - Explicitly save DR0-3 of the guest. - Explicitly save DR0-3,6-7, and MSR_DEBUGCTL for the host. Since SVM saves the guest DR6 in the VMCB, the race with single-stepping described for VT-x does not exist. For both platforms, expose all of the guest DRx values via --get-drX and --set-drX flags to bhyvectl. Discussed with: avg, grehan Tested by: avg (SVM), myself (VT-x) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D13229	2018-01-17 23:11:25 +00:00
kib	c35d24e497	PTI for amd64. The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2018-01-17 11:44:21 +00:00
tychon	03cbc447ee	Provide some mitigation against CVE-2017-5715 by clearing registers upon returning from the guest which aren't immediately clobbered by the host. This eradicates any remaining guest contents limiting their usefulness in an exploit gadget. This was inspired by this linux commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b6c02f38315b720c593c6079364855d276886aa Reviewed by: grehan, rgrimes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13573	2018-01-15 18:37:03 +00:00
avg	67ddd57974	vmm/svm: contigmalloc of the whole svm_softc is excessive This is a followup to r307903. struct svm_softc takes more than 200 kilobytes while what we really need is 3 contiguous pages for I/O permission map and 2 contiguous pages for MSR permission map. Other physically mapped structures have a size of a single page, so a proper alignment is sufficient for their correct mapping. Thus, only the permission maps are allocated with contigmalloc now, the softc is allocated with a regular malloc. Additionally, this commit adds a check that malloc returns memory with the expected page alignment and that contigmalloc does not fail. Unfortunately, at present svm_vminit() is expected to always succeed and there is no way to report an error. So, a contigmalloc failure leads to a panic. We should probably fix this. MFC after: 2 weeks	2018-01-09 14:22:18 +00:00
avg	85a273200a	Fix a couple of comments in AMD Virtual Machine Control Block structure MFC after: 1 week	2018-01-05 19:15:24 +00:00
tychon	02cc877968	Recognize a pending virtual interrupt while emulating the halt instruction. Reviewed by: grehan, rgrimes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13573	2017-12-21 18:30:11 +00:00
avg	209d01f38b	amd-vi: set iommu msi configuration using pci_enable_msi method This is better than directly changing PCI configuration space of the device because it makes the PCI bus aware of the configuration. Also, the change allows to drop a bunch of code that duplicated pci_enable_msi() functionality. I wonder if it's possible to further simplify the code by using pci_alloc_msi().	2017-12-04 17:10:52 +00:00
avg	057e143faa	vmm/amd: add ivhd device with a higher order ivhd should attach after the root PCI bus and, thus, after the ACPI Host-PCI bridge off which the bus hangs. This is because ivhd changes PCI configuration of a PCI IOMMU device that is located on the root bus. If the bus attaches after ivhd it clears the MSI portion of the configuration. As a result IOMMU event interrupts would never be delivered. For regular ACPI devices the order is calculated as ACPI_DEV_BASE_ORDER + level * 10 where level is a depth of the device in the ACPI namespace. I expect the depth of the Host-PCI bridge to be two or three, so ACPI_DEV_BASE_ORDER + 10 * 10 should be a sufficiently safe order for ivhd. This should fix the setup of the AMD-Vi event interrupt when vmm is preloaded (as opposed to kldload-ed).	2017-12-04 17:08:03 +00:00
avg	a4b48b00e7	amd-vi: clear event interrupt and overflow bits upon handling the interrupt This ensures that we can receive further event interrupts. See the description of the bits in the specification for MMIO Offset 2020h IOMMU Status Register. The bits are defined as set-by-hardware write-1-to-clear, same as all the bits in the status register. Discussed with: anish	2017-12-04 17:02:53 +00:00
pfg	d754734f5c	sys/amd64: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:03:07 +00:00
avg	993cc07930	amd-vi: a small whitespace cleanup Reviewed by: anish	2017-11-24 11:37:41 +00:00
avg	d56e6db67b	amd-vi: use correct type for pci_rid, start_dev_rid, end_dev_rid sysctls Previously, the values could look confusing because of unrelated bits from adjacent memory. Reviewed by: anish	2017-11-24 11:36:35 +00:00
avg	bc48907278	amd-vi: small improvements to event printing Ensure that an opening bracket always has a matching closing one. Ensure that there is always a new-line at the end of a report line. Also, add a space before the printed event flag. Reviewed by: anish	2017-11-24 11:35:43 +00:00
avg	17a7ee6279	amd-vi: print some additional details for INVALID_DEVICE_REQUEST event Namely, the type of the hardware event and whether the transaction was a translation request. Reviewed by: anish	2017-11-24 11:34:46 +00:00
avg	88db7a2c1c	amd-vi: fix up r326152, the new width requires a wider type This is my brain-o from extending the width at the last moment.	2017-11-24 11:25:06 +00:00
avg	3cf07fb3d3	amd-vi: fix and extend definition of Command and Event Status Register (0x2020) The defined bits are the lower bits, not the higher ones. Also, the specification has been extended to define bits 0:18 and they all could potentially be interesting to us, so extend the width of the field accordingly. Reviewed by: anish	2017-11-24 11:20:10 +00:00
avg	58c07c5a95	vmm/amd: improve iteration over IVHD (type 10h) entries in IVRS table Many 8-byte entries have zero at byte 4, so the second 4-byte part is skipped as a 4-byte padding entry. But not all 8-byte entries have that property and they get misinterpreted. A real example: 48 00 00 00 ff 01 00 01 This an 8-byte ACPI_IVRS_TYPE_SPECIAL entry for IOAPIC with ID 255 (bogus). It is reported as: ivhd0: Unknown dev entry:0xff Fortunately, it was completely harmless. Also, bail out early if we encounter an entry of a variable length type. We do not have proper handling for those yet. Reviewed by: anish	2017-11-24 11:10:36 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
grehan	02db9e4f6d	Emulate the "OR reg, r/m" instruction (opcode 0BH). This is needed for the HDA emulation with FreeBSD guests. Reviewed by: marcelo MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D12832	2017-11-01 03:26:53 +00:00

1 2 3 4 5 ...

466 Commits