Commit Graph

50 Commits

Author SHA1 Message Date
Neel Natu
970388bf8d IFC @r272185 2014-09-27 22:15:50 +00:00
Neel Natu
9d8d8e3ee7 Add some more KTR events to help debugging. 2014-09-20 05:13:03 +00:00
Neel Natu
79ad53fba3 Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to the
processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector
and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes
care of injecting this vector when the guest is able to receive it.

Legacy PIC interrupts are still delivered via the event injection mechanism.
This is because the vector injected by the PIC must reflect the state of its
pins at the time the CPU is ready to accept the interrupt.

Accesses to the TPR via %CR8 are handled entirely in hardware. This requires
that the emulated TPR must be synced to V_TPR after a #VMEXIT.

The guest can also modify the TPR via the memory mapped APIC. This requires
that the V_TPR must be synced with the emulated TPR before a VMRUN.

Reviewed by:	Anish Gupta (akgupt3@gmail.com)
2014-09-16 03:31:40 +00:00
Neel Natu
051f2bd19d Add reserved bit checking when doing %CR8 emulation and inject #GP if required.
Pointed out by:	grehan
Reviewed by:	tychon
2014-06-09 20:51:08 +00:00
Tycho Nightingale
594db0024e Support guest accesses to %cr8.
Reviewed by:	neel
2014-06-06 18:23:49 +00:00
Neel Natu
95ebc360ef Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
Neel Natu
c5d216b786 Change the vlapic timer frequency to be in the ballpark of contemporary
hardware. This also decouples the vlapic emulation from the host's TSC
frequency.

Requested by:	grehan@
2014-04-23 16:50:40 +00:00
Tycho Nightingale
0775fbb475 Fix a race wherein the source of an interrupt vector is wrongly
attributed if an ExtINT arrives during interrupt injection.

Also, fix a spurious interrupt if the PIC tries to raise an interrupt
before the outstanding one is accepted.

Finally, improve the PIC interrupt latency when another interrupt is
raised immediately after the outstanding one is accepted by creating a
vmexit rather than waiting for one to occur by happenstance.

Approved by:	neel (co-mentor)
2014-03-15 23:09:34 +00:00
Tycho Nightingale
1ed19b835a Don't try to return a vector to a caller that only cares if a vector
is pending or not.

Approved by:	neel (co-mentor)
2014-03-11 22:12:12 +00:00
Tycho Nightingale
762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Neel Natu
159dd56f94 Add support for x2APIC virtualization assist in Intel VT-x.
The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode
is switched to x2APIC. The VT-x implementation of this handler turns off the
APIC-access virtualization and enables the x2APIC virtualization in the VMCS.

The x2APIC virtualization is done by allowing guest read access to a subset
of MSRs in the x2APIC range. In non-root operation the processor will satisfy
an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead.

The guest is also given write access to TPR, EOI and SELF_IPI MSRs which
get special treatment in non-root operation. This is documented in the
Intel SDM section titled "Virtualizing MSR-Based APIC Accesses".

Enforce that APIC-write and APIC-access VM-exits are handled only if
APIC-access virtualization is enabled. The one exception to this is
SELF_IPI virtualization which may result in an APIC-write VM-exit.
2014-02-21 06:03:54 +00:00
Neel Natu
52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
Neel Natu
294d0d88fc Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured
in x2apic mode. Reads to this MSR are currently ignored but should cause a
general proctection exception to be injected into the vcpu.

All accesses to the corresponding offset in xAPIC mode are ignored.

Also, do not panic the host if there is mismatch between the trigger mode
programmed in the TMR and the actual interrupt being delivered. Instead the
anomaly is logged to aid debugging and to prevent a misbehaving guest from
panicking the host.
2014-02-17 23:07:16 +00:00
Neel Natu
30b94db8c0 Support level triggered interrupts with VT-x virtual interrupt delivery.
The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector.
If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger
an EOI-induced VM-exit when it is doing EOI virtualization.

The EOI-induced VM-exit results in the EOI being forwarded to the vioapic
so that level triggered interrupts can be properly handled.

Tested by:	Anish Gupta (akgupt3@gmail.com)
2014-01-25 20:58:05 +00:00
Neel Natu
5b8a8cd1fe Add an API to rendezvous all active vcpus in a virtual machine. The rendezvous
can be initiated in the context of a vcpu thread or from the bhyve(8) control
process.

The first use of this functionality is to update the vlapic trigger-mode
register when the IOAPIC pin configuration is changed.

Prior to this change we would update the TMR in the virtual-APIC page at
the time of interrupt delivery. But this doesn't work with Posted Interrupts
because there is no way to program the EOI_exit_bitmap[] in the VMCS of
the target at the time of interrupt delivery.

Discussed with:	grehan@
2014-01-14 01:55:58 +00:00
Neel Natu
add611fd4c Don't expose 'vmm_ipinum' as a global. 2014-01-09 03:25:54 +00:00
Neel Natu
88c4b8d145 Use the 'Virtual Interrupt Delivery' feature of Intel VT-x if supported by
hardware. It is possible to turn this feature off and fall back to software
emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0.

We now start handling two new types of VM-exits:

APIC-access: This is a fault-like VM-exit and is triggered when the APIC
register access is not accelerated (e.g. apic timer CCR). In response to
this we do emulate the instruction that triggered the APIC-access exit.

APIC-write: This is a trap-like VM-exit which does not require any instruction
emulation but it does require the hypervisor to emulate the access to the
specified register (e.g. icrlo register).

Introduce 'vlapic_ops' which are function pointers to vector the various
vlapic operations into processor-dependent code. The 'Virtual Interrupt
Delivery' feature installs 'ops' for setting the IRR bits in the virtual
APIC page and to return whether any interrupts are pending for this vcpu.

Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.
2014-01-07 21:04:49 +00:00
Neel Natu
4d1e82a88e Allow vlapic_set_intr_ready() to return a value that indicates whether or not
the vcpu should be kicked to process a pending interrupt. This will be useful
in the implementation of the Posted Interrupt APICv feature.

Change the return value of 'vlapic_pending_intr()' to indicate whether or not
an interrupt is available to be delivered to the vcpu depending on the value
of the PPR.

Add KTR tracepoints to debug guest IPI delivery.
2014-01-07 00:38:22 +00:00
Neel Natu
7c05bc3124 Modify handling of writes to the vlapic LVT registers.
The handler is now called after the register value is updated in the virtual
APIC page. This will make it easier to handle APIC-write VM-exits with APIC
register virtualization turned on.

This also implies that we need to keep a snapshot of the last value written
to a LVT register. We can no longer rely on the LVT registers in the APIC
page to be "clean" because the guest can write anything to it before the
hypervisor has had a chance to sanitize it.
2013-12-28 00:20:55 +00:00
Neel Natu
fafe884473 Modify handling of writes to the vlapic ICR_TIMER, DCR_TIMER, ICRLO and ESR
registers.

The handler is now called after the register value is updated in the virtual
APIC page. This will make it easier to handle APIC-write VM-exits with APIC
register virtualization turned on.

We can no longer rely on the value of 'icr_timer' on the APIC page
in the callout handler. With APIC register virtualization the value of
'icr_timer' will be updated by the processor in guest-context before an
APIC-write VM-exit.

Clear the 'delivery status' bit in the ICRLO register in the write handler.
With APIC register virtualization the write happens in guest-context and
we cannot prevent a (buggy) guest from setting this bit.
2013-12-27 20:18:19 +00:00
Neel Natu
2c52dcd9a8 Modify handling of write to the vlapic SVR register.
The handler is now called after the register value is updated in the virtual
APIC page. This will make it easier to handle APIC-write VM-exits with APIC
register virtualization turned on.

Additionally, mask all the LVT entries when the vlapic is software-disabled.
2013-12-27 07:01:42 +00:00
Neel Natu
3f0ddc7c5c Modify handling of writes to the vlapic ID, LDR and DFR registers.
The handlers are now called after the register value is updated in the virtual
APIC page. This will make it easier to handle APIC-write VM-exits with APIC
register virtualization turned on.

Additionally, we need to ensure that the value of these registers is always
correctly reflected in the virtual APIC page, because there is no VM exit
when the guest reads these registers with APIC register virtualization.
2013-12-26 19:58:30 +00:00
Neel Natu
de5ea6b65e vlapic code restructuring to make it easy to support hardware-assist for APIC
emulation.

The vlapic initialization and cleanup is done via processor specific vmm_ops.
This will allow the VT-x/SVM modules to layer any hardware-assist for APIC
emulation or virtual interrupt delivery on top of the vlapic device model.

Add a parameter to 'vcpu_notify_event()' to distinguish between vlapic
interrupts versus other events (e.g. NMI). This provides an opportunity to
use hardware-assists like Posted Interrupts (VT-x) or doorbell MSR (SVM)
to deliver an interrupt to a guest without causing a VM-exit.

Get rid of lapic_pending_intr() and lapic_intr_accepted() and use the
vlapic_xxx() counterparts directly.

Associate an 'Apic Page' with each vcpu and reference it from the 'vlapic'.
The 'Apic Page' is intended to be referenced from the Intel VMCS as the
'virtual APIC page' or from the AMD VMCB as the 'vAPIC backing page'.
2013-12-25 06:46:31 +00:00
John Baldwin
330baf58c6 Extend the support for local interrupts on the local APIC:
- Add a generic routine to trigger an LVT interrupt that supports both
  fixed and NMI delivery modes.
- Add an ioctl and bhyvectl command to trigger local interrupts inside a
  guest.  In particular, a global NMI similar to that raised by SERR# or
  PERR# can be simulated by asserting LINT1 on all vCPUs.
- Extend the LVT table in the vCPU local APIC to support CMCI.
- Flesh out the local APIC error reporting a bit to cache errors and
  report them via ESR when ESR is written to.  Add support for asserting
  the error LVT when an error occurs.  Raise illegal vector errors when
  attempting to signal an invalid vector for an interrupt or when sending
  an IPI.
- Ignore writes to reserved bits in LVT entries.
- Export table entries the MADT and MP Table advertising the stock x86
  config of LINT0 set to ExtInt and LINT1 wired to NMI.

Reviewed by:	neel (earlier version)
2013-12-23 19:29:07 +00:00
Neel Natu
a783578566 Consolidate the virtual apic initialization in a single function: vlapic_reset() 2013-12-22 00:08:00 +00:00
Neel Natu
4f8be175d5 Add an API to deliver message signalled interrupts to vcpus. This allows
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.

Submitted by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by:	grehan@
2013-12-16 19:59:31 +00:00
Neel Natu
a83011d2e7 Fix typo when initializing the vlapic version register ('<<' instead of '<'). 2013-12-11 06:28:44 +00:00
Neel Natu
becd984900 Fix x2apic support in bhyve.
When the guest is bringing up the APs in the x2APIC mode a write to the
ICR register will now trigger a return to userspace with an exitcode of
VM_EXITCODE_SPINUP_AP. This gets SMP guests working again with x2APIC.

Change the vlapic timer lock to be a spinlock because the vlapic can be
accessed from within a critical section (vm run loop) when guest is using
x2apic mode.

Reviewed by:	grehan@
2013-12-10 22:56:51 +00:00
Neel Natu
fb03ca4e42 Use callout(9) to drive the vlapic timer instead of clocking it on each VM exit.
This decouples the guest's 'hz' from the host's 'hz' setting. For e.g. it is
now possible to have a guest run at 'hz=1000' while the host is at 'hz=100'.

Discussed with:	grehan@
Tested by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
2013-12-07 23:11:12 +00:00
Neel Natu
1c05219285 If a vcpu disables its local apic and then executes a 'HLT' then spin down the
vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore
pending interrupts in the IRR if interrupts have been disabled by the guest.
The interrupt cannot be injected into the guest in any case so resuming it
is futile.

With this change "halt" from a Linux guest works correctly.

Reviewed by:	grehan@
Tested by:	Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
2013-12-07 22:18:36 +00:00
Neel Natu
b5b28fc9dc Add support for level triggered interrupt pins on the vioapic. Prior to this
commit level triggered interrupts would work as long as the pin was not shared
among multiple interrupt sources.

The vlapic now keeps track of level triggered interrupts in the trigger mode
register and will forward the EOI for a level triggered interrupt to the
vioapic. The vioapic in turn uses the EOI to sample the level on the pin and
re-inject the vector if the pin is still asserted.

The vhpet is the first consumer of level triggered interrupts and advertises
that it can generate interrupts on pins 20 through 23 of the vioapic.

Discussed with:	grehan@
2013-11-27 22:18:08 +00:00
Neel Natu
03cd05011f Remove the 'vdev' abstraction that was meant to sit on top of device models
in the kernel. This abstraction was redundant because the only device emulated
inside vmm.ko is the local apic and it is always at a fixed guest physical
address.

Discussed with:	grehan
2013-11-04 23:25:07 +00:00
Neel Natu
513c8d338d Rename the VMM_CTRx() family of macros to VCPU_CTRx() to highlight that these
tracepoints are vcpu-specific.

Add support for tracepoints that are global to the virtual machine - these
tracepoints are called VM_CTRx().
2013-10-31 05:20:11 +00:00
Sergey Kandaurov
1e2751ddeb Fix a gcc warning uncovered after r251745.
Reported by:	Sergey V. Dyatko
Reviewed by:	neel
2013-06-18 23:31:09 +00:00
Sergey Kandaurov
82f2974a69 Replace cpusetffs_obj with CPU_FFS, missed in r251703.
Reported by:	bdrewery, O. Hartmann
2013-06-14 10:26:38 +00:00
Neel Natu
0acb0d84c5 Support array-type of stats in bhyve.
An array-type stat in vmm.ko is defined as follows:
VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");

It is incremented as follows:
vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1);

And output of 'bhyvectl --get-stats' looks like:
ipis sent to vcpu[0]     3114
ipis sent to vcpu[1]     0

Reviewed by:	grehan
Obtained from:	NetApp
2013-05-10 02:59:49 +00:00
Peter Grehan
117e8f378e Don't panic when a valid divisor of 1 has been requested.
Obtained from:	NetApp
2013-04-05 22:16:31 +00:00
Neel Natu
77d8fd9bb3 Add counter to keep track of the number of timer interrupts generated by
the local apic for each virtual cpu.
2013-03-31 03:56:48 +00:00
Neel Natu
485f986ac9 Modify the default behavior of bhyve such that it no longer forces the use of
x2apic mode on the guest.

The guest can decide whether or not it wants to use legacy mmio or x2apic
access to the APIC by writing to the MSR_APICBASE register.

Obtained from:	NetApp
2012-12-16 01:20:08 +00:00
Neel Natu
2e25737a49 Calculate the number of host ticks until the next guest timer interrupt.
This information will be used in conjunction with guest "HLT exiting" to
yield the thread hosting the virtual cpu.

Obtained from:	NetApp
2012-10-20 08:23:05 +00:00
Neel Natu
73820fb0a4 Add an option "-a" to present the local apic in the XAPIC mode instead of the
default X2APIC mode to the guest.
2012-09-26 00:06:17 +00:00
Neel Natu
a2da7af6bc Add support for trapping MMIO writes to local apic registers and emulating them.
The default behavior is still to present the local apic to the guest in the
x2apic mode.
2012-09-25 22:31:35 +00:00
Neel Natu
edf89256dd Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an
AP needs to be activated by spinning up an execution context for it.

The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.

Reviewed by: grehan
2012-09-25 02:33:25 +00:00
Neel Natu
2d3a73ed6d Restructure the x2apic access code in preparation for supporting memory mapped
access to the local apic.

The vlapic code is now aware of the mode that the guest is using to access the
local apic.

Reviewed by: grehan@
2012-09-21 03:09:23 +00:00
Peter Grehan
cd942e0f25 MSI-x interrupt support for PCI pass-thru devices.
Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.

Submitted by:	ryan dot berryhill at sandvine dot com
Reviewed by:	grehan
Obtained from:	Sandvine
2012-04-28 16:28:00 +00:00
Ed Maste
2a5bbbe380 Remove duplicated license text. 2012-03-06 21:13:12 +00:00
Neel Natu
14ddf164ba Get rid of redundant initialization of 'dmask'. It was being re-initialized
shortly afterwards.
2011-07-06 21:40:48 +00:00
Peter Grehan
a5615c9044 IFC @ r222830 2011-06-28 06:26:03 +00:00
John Baldwin
34a6b2d627 First cut at porting the kernel portions of 221828 and 221905 from the
BHyVe reference branch to HEAD.
2011-05-14 20:35:01 +00:00
Peter Grehan
366f60834f Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
  bhyve  - user-space sequencer and i/o emulation
  vmmctl - dump of hypervisor register state
  libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
	Joe CaraDonna
	Peter Snyder
	Jeff Heller
	Sandeep Mann
	Steve Miller
	Brian Pawlowski
2011-05-13 04:54:01 +00:00