freebsd-dev

Author	SHA1	Message	Date
Ed Maste	c75da776c2	Disable amd64 TLB Context ID (pcid) by default for now There are a number of reports of userspace application crashes that are "solved" by setting vm.pmap.pcid_enabled=0, including Java and the x11/mate-terminal port (PR ports/184362). I originally planned to disable this only in stable/10 (in r262753), but it has been pointed out that additional crash reports on HEAD are not likely to provide new insight into the problem. The feature can easily be enabled for testing.	2014-03-05 01:34:10 +00:00
Jung-uk Kim	1d22d877b8	Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. Inspired by: jhb	2014-03-04 21:35:57 +00:00
Jung-uk Kim	be2d4fcf68	Revert accidentally committed changes in 262748.	2014-03-04 20:16:00 +00:00
Jung-uk Kim	603bc162fb	Properly save and restore CR0. MFC after: 3 days	2014-03-04 20:07:36 +00:00
Jung-uk Kim	05acaa9f85	Remove dead code since r230426, fix a comment, and tidy up. Reported by: jhb MFC after: 3 days	2014-03-04 19:41:16 +00:00
Neel Natu	ef39d7e910	Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu' being updated outside of the vcpu_lock(). The race is benign and could potentially result in a missed notification about a pending interrupt to a vcpu. The interrupt would not be lost but rather delayed until the next VM exit. The vcpu's hostcpu is now updated concurrently with the vcpu state change. When the vcpu transitions to the RUNNING state the hostcpu is set to 'curcpu'. It is set to 'NOCPU' in all other cases. Reviewed by: grehan	2014-03-01 03:17:58 +00:00
John Baldwin	ad3e368726	Correct VMware capitalization. Submitted by: joeld	2014-02-28 21:33:40 +00:00
John Baldwin	722b6744a7	Workaround an apparent bug in VMWare Fusion's nested VT support where it triggers a VM exit with the exit reason of an external interrupt but without a valid interrupt set in the exit interrupt information. Tested by: Michael Dexter Reviewed by: neel MFC after: 1 week	2014-02-28 19:07:55 +00:00
Neel Natu	dc50650607	Queue pending exceptions in the 'struct vcpu' instead of directly updating the processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb	2014-02-26 00:52:05 +00:00
Alan Cox	f438547338	When the kernel is running in a virtual machine, it cannot rely upon the processor family to determine if the workaround for AMD Family 10h Erratum 383 should be enabled. To enable virtual machine migration among a heterogeneous collection of physical machines, the hypervisor may have been configured to report an older processor family with a reduced feature set. Effectively, the reported processor family and its features are like a "least common denominator" for the collection of machines. Therefore, when the kernel is running in a virtual machine, instead of relying upon the processor family, we now test for features that prove that the underlying processor is not affected by the erratum. (The features that we test for are unlikely to ever be emulated in software on an affected physical processor.) PR: 186061 Tested by: Simon Matter Discussed with: jhb, neel MFC after: 2 weeks	2014-02-22 18:53:42 +00:00
Neel Natu	159dd56f94	Add support for x2APIC virtualization assist in Intel VT-x. The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode is switched to x2APIC. The VT-x implementation of this handler turns off the APIC-access virtualization and enables the x2APIC virtualization in the VMCS. The x2APIC virtualization is done by allowing guest read access to a subset of MSRs in the x2APIC range. In non-root operation the processor will satisfy an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead. The guest is also given write access to TPR, EOI and SELF_IPI MSRs which get special treatment in non-root operation. This is documented in the Intel SDM section titled "Virtualizing MSR-Based APIC Accesses". Enforce that APIC-write and APIC-access VM-exits are handled only if APIC-access virtualization is enabled. The one exception to this is SELF_IPI virtualization which may result in an APIC-write VM-exit.	2014-02-21 06:03:54 +00:00
Neel Natu	52e5c8a2ec	Simplify APIC mode switching from MMIO to x2APIC. In part this is done to simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@	2014-02-20 01:48:25 +00:00
John Baldwin	a0efd3fb34	A first pass at adding support for injecting hardware exceptions for emulated instructions. - Add helper routines to inject interrupt information for a hardware exception from the VM exit callback routines. - Use the new routines to inject GP and UD exceptions for invalid operations when emulating the xsetbv instruction. - Don't directly manipulate the entry interrupt info when a user event is injected. Instead, store the event info in the vmx state and only apply it during a VM entry if a hardware exception or NMI is not already pending. - While here, use HANDLED/UNHANDLED instead of 1/0 in a couple of routines. Reviewed by: neel	2014-02-18 03:07:36 +00:00
Neel Natu	294d0d88fc	Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured in x2apic mode. Reads to this MSR are currently ignored but should cause a general proctection exception to be injected into the vcpu. All accesses to the corresponding offset in xAPIC mode are ignored. Also, do not panic the host if there is mismatch between the trigger mode programmed in the TMR and the actual interrupt being delivered. Instead the anomaly is logged to aid debugging and to prevent a misbehaving guest from panicking the host.	2014-02-17 23:07:16 +00:00
Neel Natu	9c43cd07ec	Use spinlocks to lock accesses to the vioapic. This is necessary because if the vlapic is configured in x2apic mode the vioapic_process_eoi() function is called inside the critical section established by vm_run().	2014-02-17 22:57:51 +00:00
Dimitry Andric	f785676f2a	Upgrade our copy of llvm/clang to 3.4 release. This version supports all of the features in the current working draft of the upcoming C++ standard, provisionally named C++1y. The code generator's performance is greatly increased, and the loop auto-vectorizer is now enabled at -Os and -O2 in addition to -O3. The PowerPC backend has made several major improvements to code generation quality and compile time, and the X86, SPARC, ARM32, Aarch64 and SystemZ backends have all seen major feature work. Release notes for llvm and clang can be found here: <http://llvm.org/releases/3.4/docs/ReleaseNotes.html> <http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html> MFC after: 1 month	2014-02-16 19:44:07 +00:00
Christian Brueffer	7f47cbd3ce	Retire the nve(4) driver; nfe(4) has been the default driver for NVIDIA nForce MCP adapters for a long time. Yays: jhb, remko, yongari Nays: none on the current and stable lists	2014-02-16 12:22:43 +00:00
Andriy Gapon	f25e50cf0b	provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64 Reviewed by: jhb MFC after: 10 days X-MFC note: consider thirdparty modules depending on these symbols Sponsored by: HybridCluster	2014-02-14 15:18:37 +00:00
John Baldwin	4edef187b8	Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386. Reviewed by: imp MFC after: 1 month	2014-02-12 04:30:37 +00:00
John Baldwin	4f67a8c5e9	Don't waste a page of KVA for the boot-time memory test on x86. For amd64, reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386, remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test. Reviewed by: alc, peter MFC after: 2 weeks	2014-02-11 22:02:40 +00:00
John Baldwin	abb023fb95	Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and XSAVE-enabled features like AVX. - Store a per-cpu guest xcr0 register. When switching to the guest FPU state, switch to the guest xcr0 value. Note that the guest FPU state is saved and restored using the host's xcr0 value and xcr0 is saved/restored "inside" of saving/restoring the guest FPU state. - Handle VM exits for the xsetbv instruction by updating the guest xcr0. - Expose the XSAVE feature to the guest only if the host has enabled XSAVE, and only advertise XSAVE features enabled by the host to the guest. This ensures that the guest will only adjust FPU state that is a subset of the guest FPU state saved and restored by the host. Reviewed by: grehan	2014-02-08 16:37:54 +00:00
Neel Natu	bf73979dd9	Add a counter to differentiate between VM-exits due to nested paging faults and instruction emulation faults.	2014-02-08 06:22:09 +00:00
Neel Natu	62fbd7c27a	Fix a bug in the handling of VM-exits caused by non-maskable interrupts (NMI). If a VM-exit is caused by an NMI then "blocking by NMI" is in effect on the CPU when the VM-exit is completed. No more NMIs will be recognized until the execution of an "iret". Prior to this change the NMI handler was dispatched via a software interrupt with interrupts enabled. This meant that an interrupt could be recognized by the processor before the NMI handler completed its execution. The "iret" issued by the interrupt handler would then cause the "blocking by NMI" to be cleared prematurely. This is now fixed by handling the NMI with interrupts disabled in addition to "blocking by NMI" already established by the VM-exit.	2014-02-08 05:04:34 +00:00
John Baldwin	00f3efe1bd	Add support for FreeBSD/i386 guests under bhyve. - Similar to the hack for bootinfo32.c in userboot, define _MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot. This allows userboot to load 32-bit kernels and modules. - Copy the SMAP generation code out of bootinfo64.c and into its own file so it can be shared with bootinfo32.c to pass an SMAP to the i386 kernel. - Use uint32_t instead of u_long when aligning module metadata in bootinfo32.c in userboot, as otherwise the metadata used 64-bit alignment which corrupted the layout. - Populate the basemem and extmem members of the bootinfo struct passed to 32-bit kernels. - Fix the 32-bit stack in userboot to start at the top of the stack instead of the bottom so that there is room to grow before the kernel switches to its own stack. - Push a fake return address onto the 32-bit stack in addition to the arguments normally passed to exec() in the loader. This return address is needed to convince recover_bootinfo() in the 32-bit locore code that it is being invoked from a "new" boot block. - Add a routine to libvmmapi to setup a 32-bit flat mode register state including a GDT and TSS that is able to start the i386 kernel and update bhyveload to use it when booting an i386 kernel. - Use the guest register state to determine the CPU's current instruction mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long mode) in the instruction emulation code. Update the gla2gpa() routine used when fetching instructions to handle flat mode, 32-bit paging, and PAE paging in addition to long mode paging. Don't look for a REX prefix when the CPU is in 32-bit mode, and use the detected mode to enable the existing 32-bit mode code when decoding the mod r/m byte. Reviewed by: grehan, neel MFC after: 1 month	2014-02-05 04:39:03 +00:00
Tycho Nightingale	54e03e07b3	Add support for emulating the byte move and zero extend instructions: "mov r/m8, r32" and "mov r/m8, r64". Approved by: neel (co-mentor)	2014-02-05 02:01:08 +00:00
Neel Natu	953c2c47eb	Avoid doing unnecessary nested TLB invalidations. Prior to this change the cached value of 'pm_eptgen' was tracked per-vcpu and per-hostcpu. In the degenerate case where 'N' vcpus were sharing a single hostcpu this could result in 'N - 1' unnecessary TLB invalidations. Since an 'invept' invalidates mappings for all VPIDs the first 'invept' is sufficient. Fix this by moving the 'eptgen[MAXCPU]' array from 'vmxctx' to 'struct vmx'. If it is known that an 'invept' is going to be done before entering the guest then it is safe to skip the 'invvpid'. The stat VPU_INVVPID_SAVED counts the number of 'invvpid' invalidations that were avoided because they were subsumed by an 'invept'. Discussed with: grehan	2014-02-04 02:45:08 +00:00
John Baldwin	3cbf3585cb	Enhance the support for PCI legacy INTx interrupts and enable them in the virtio backends. - Add a new ioctl to export the count of pins on the I/O APIC from vmm to the hypervisor. - Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for ISA interrupts. - Populate the MP Table with I/O interrupt entries for any PCI INTx interrupts. - Create a _PRT table under the PCI root bridge in ACPI to route any PCI INTx interrupts appropriately. - Track which INTx interrupts are in use per-slot so that functions that share a slot attempt to distribute their INTx interrupts across the four available pins. - Implicitly mask INTx interrupts if either MSI or MSI-X is enabled and when the INTx DIS bit is set in a function's PCI command register. Either assert or deassert the associated I/O APIC pin when the state of one of those conditions changes. - Add INTx support to the virtio backends. - Always advertise the MSI capability in the virtio backends. Submitted by: neel (7) Reviewed by: neel MFC after: 2 weeks	2014-01-29 14:56:48 +00:00
John Baldwin	3d2ec11759	Add support for 'clac' and 'stac' to DDB's disassembler on amd64.	2014-01-27 18:53:18 +00:00
Neel Natu	30b94db8c0	Support level triggered interrupts with VT-x virtual interrupt delivery. The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector. If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger an EOI-induced VM-exit when it is doing EOI virtualization. The EOI-induced VM-exit results in the EOI being forwarded to the vioapic so that level triggered interrupts can be properly handled. Tested by: Anish Gupta (akgupt3@gmail.com)	2014-01-25 20:58:05 +00:00
Peter Grehan	062eef4911	Change RWX to XWR in comments to match intent and bit patterns in discussion of valid EPT pte protections. Discussed with: neel MFC after: 3 days	2014-01-25 06:58:41 +00:00
John Baldwin	e07ef9b0f6	Move <machine/apicvar.h> to <x86/apicvar.h>.	2014-01-23 20:10:22 +00:00
Neel Natu	36736912b6	Set "Interrupt Window Exiting" in the case where there is a vector to be injected into the vcpu but the VM-entry interruption information field already has the valid bit set. Pointed out by: David Reed (david.reed@tidalscale.com)	2014-01-23 06:06:50 +00:00
Neel Natu	c308b23b7a	Handle a VM-exit due to a NMI properly by vectoring to the host's NMI handler via a software interrupt. This is safe to do because the logical processor is already cognizant of the NMI and further NMIs are blocked until the host's NMI handler executes "iret".	2014-01-22 04:03:11 +00:00
Neel Natu	51f45d0146	There is no need to initialize the IOMMU if no passthru devices have been configured for bhyve to use. Suggested by: grehan@	2014-01-21 03:01:34 +00:00
Ed Maste	80f9f1580e	Add VT kernel configuration to ease testing of vt(9), aka Newcons	2014-01-19 18:46:38 +00:00
Neel Natu	48b2d828a2	Some processor's don't allow NMI injection if the STI_BLOCKING bit is set in the Guest Interruptibility-state field. However, there isn't any way to figure out which processors have this requirement. So, inject a pending NMI only if NMI_BLOCKING, MOVSS_BLOCKING, STI_BLOCKING are all clear. If any of these bits are set then enable "NMI window exiting" and inject the NMI in the VM-exit handler.	2014-01-18 21:47:12 +00:00
Bryan Venteicher	10c4018057	Add very simple virtio_random(4) driver to harvest entropy from host Reviewed by: markm (random bits only)	2014-01-18 06:14:38 +00:00
Neel Natu	e5a1d95089	If the guest exits due to a fault while it is executing IRET then restore the state of "Virtual NMI blocking" in the guest's interruptibility-state field before resuming the guest.	2014-01-18 02:20:10 +00:00
Neel Natu	160471d264	If a VM-exit happens during an NMI injection then clear the "NMI Blocking" bit in the Guest Interruptibility-state VMCS field. If we fail to do this then a subsequent VM-entry will fail because it is an error to inject an NMI into the guest while "NMI Blocking" is turned on. This is described in "Checks on Guest Non-Register State" in the Intel SDM. Submitted by: David Reed (david.reed@tidalscale.com)	2014-01-17 04:21:39 +00:00
Neel Natu	5b8a8cd1fe	Add an API to rendezvous all active vcpus in a virtual machine. The rendezvous can be initiated in the context of a vcpu thread or from the bhyve(8) control process. The first use of this functionality is to update the vlapic trigger-mode register when the IOAPIC pin configuration is changed. Prior to this change we would update the TMR in the virtual-APIC page at the time of interrupt delivery. But this doesn't work with Posted Interrupts because there is no way to program the EOI_exit_bitmap[] in the VMCS of the target at the time of interrupt delivery. Discussed with: grehan@	2014-01-14 01:55:58 +00:00
Gavin Atkinson	56c63f28ed	Remove spaces from boot messages when we print the CPU ID/Family/Stepping to match the rest of the CPU identification lines, and once again fit into 80 columns in the usual case.	2014-01-11 22:41:10 +00:00
Neel Natu	176666c2c9	Enable "Posted Interrupt Processing" if supported by the CPU. This lets us inject interrupts into the guest without causing a VM-exit. This feature can be disabled by setting the tunable "hw.vmm.vmx.use_apic_pir" to "0". The following sysctls provide information about this feature: - hw.vmm.vmx.posted_interrupts (0 if disabled, 1 if enabled) - hw.vmm.vmx.posted_interrupt_vector (vector number used for vcpu notification) Tested on a Intel Xeon E5-2620v2 courtesy of Allan Jude at ScaleEngine.	2014-01-11 04:22:00 +00:00
Neel Natu	f7d4742540	Enable the "Acknowledge Interrupt on VM exit" VM-exit control. This control is needed to enable "Posted Interrupts" and is present in all the Intel VT-x implementations supported by bhyve so enable it as the default. With this VM-exit control enabled the processor will acknowledge the APIC and store the vector number in the "VM-Exit Interruption Information" field. We now call the interrupt handler "by hand" through the IDT entry associated with the vector.	2014-01-11 03:14:05 +00:00
Neel Natu	add611fd4c	Don't expose 'vmm_ipinum' as a global.	2014-01-09 03:25:54 +00:00
Neel Natu	88c4b8d145	Use the 'Virtual Interrupt Delivery' feature of Intel VT-x if supported by hardware. It is possible to turn this feature off and fall back to software emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0. We now start handling two new types of VM-exits: APIC-access: This is a fault-like VM-exit and is triggered when the APIC register access is not accelerated (e.g. apic timer CCR). In response to this we do emulate the instruction that triggered the APIC-access exit. APIC-write: This is a trap-like VM-exit which does not require any instruction emulation but it does require the hypervisor to emulate the access to the specified register (e.g. icrlo register). Introduce 'vlapic_ops' which are function pointers to vector the various vlapic operations into processor-dependent code. The 'Virtual Interrupt Delivery' feature installs 'ops' for setting the IRR bits in the virtual APIC page and to return whether any interrupts are pending for this vcpu. Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.	2014-01-07 21:04:49 +00:00
Neel Natu	79c596309c	Fix a bug introduced in r260167 related to VM-exit tracing. Keep a copy of the 'rip' and the 'exit_reason' and use that when calling vmx_exit_trace(). This is because both the 'rip' and 'exit_reason' can be changed by 'vmx_exit_process()' and can lead to very misleading traces.	2014-01-07 18:53:14 +00:00
Neel Natu	4d1e82a88e	Allow vlapic_set_intr_ready() to return a value that indicates whether or not the vcpu should be kicked to process a pending interrupt. This will be useful in the implementation of the Posted Interrupt APICv feature. Change the return value of 'vlapic_pending_intr()' to indicate whether or not an interrupt is available to be delivered to the vcpu depending on the value of the PPR. Add KTR tracepoints to debug guest IPI delivery.	2014-01-07 00:38:22 +00:00
Neel Natu	c847a5062c	Split the VMCS setup between 'vmcs_init()' that does initialization and 'vmx_vminit()' that does customization. This makes it easier to turn on optional features (e.g. APICv) without having to keep adding new parameters to 'vmcs_set_defaults()'. Reviewed by: grehan@	2014-01-06 23:16:39 +00:00
Jens Schweikhardt	aa27ed4569	Correct a grammo in a comment; remove white space at EOL.	2014-01-06 17:23:22 +00:00
Neel Natu	5f8e2dfcb5	Use the same label name for ENTRY() and END() macros for 'vmx_enter_guest'. Pointed out by: rmh@	2014-01-03 19:29:33 +00:00

1 2 3 4 5 ...

6609 Commits