Allow the ppt driver to attach to devices that were hinted to be
passthrough devices by the PCI code creating them with a driver
name of "ppt".
Add a tunable that allows the IOMMU to be forced to be used. With
SR-IOV passthrough devices the VFs may be created after vmm.ko is
loaded. The current code will not initialize the IOMMU in that
case, meaning that the passthrough devices can't actually be used.
Differential Revision: https://reviews.freebsd.org/D73
Reviewed by: neel
MFC after: 1 month
Sponsored by: Sandvine Inc.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.
Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.
The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks
OpenBSD guests always enable "special mask mode" during boot. As a result of
r275952 this is flagged as an error and the guest cannot boot.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D1384
MFC after: 1 week
- implement 8259 "polled" mode.
- set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization.
- report error if guest tries to enable the "special mask" mode.
Differential Revision: https://reviews.freebsd.org/D1328
Reviewed by: tychon
Reported by: grehan
Tested by: grehan
MFC after: 1 week
Initialize the 8259 such that IRQ7 is the lowest priority.
Reviewed by: tychon
Differential Revision: https://reviews.freebsd.org/D1322
MFC after: 1 week
is deasserted. Prior to this change each assertion on a level triggered irq
pin resulted in two interrupts being delivered to the CPU.
Differential Revision: https://reviews.freebsd.org/D1310
Reviewed by: tychon
MFC after: 1 week
'struct vm *'. Previously it used to be a 'void *' but there is no reason
to hide the actual type from the handler.
Discussed with: tychon
MFC after: 1 week
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).
Discussed with: grehan
As of git submit e179f6914152eca9, the Linux kernel does a simple
probe of the PIC by writing a pattern to the IMR and then reading it
back, prior to the init sequence of ICW words.
The bhyve PIC emulation wasn't allowing the IMR to be read until
the ICW sequence was complete. This limitation isn't required so
relax the test.
With this change, Linux kernels 3.15-rc2 and later won't hang
on boot when calibrating the local APIC.
Reviewed by: tychon
MFC after: 3 days
processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector
and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes
care of injecting this vector when the guest is able to receive it.
Legacy PIC interrupts are still delivered via the event injection mechanism.
This is because the vector injected by the PIC must reflect the state of its
pins at the time the CPU is ready to accept the interrupt.
Accesses to the TPR via %CR8 are handled entirely in hardware. This requires
that the emulated TPR must be synced to V_TPR after a #VMEXIT.
The guest can also modify the TPR via the memory mapped APIC. This requires
that the V_TPR must be synced with the emulated TPR before a VMRUN.
Reviewed by: Anish Gupta (akgupt3@gmail.com)
it implicitly in vmm.ko.
Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.
This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
the proper ICWx initialization sequence. It assumes, probably correctly, that
the boot firmware has done the 8259 initialization.
Since grub-bhyve does not initialize the 8259 this write to the mask register
takes a code path in which 'error' remains uninitialized (ready=0,icw_num=0).
Fix this by initializing 'error' at the start of the function.
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.
Reviewed by: tycho
Status and Control register at port 0x61.
Be more conservative about "catching up" callouts that were supposed
to fire in the past by skipping an interrupt if it was
scheduled too far in the past.
Restore the PIT ACPI DSDT entries and add an entry for NMISC too.
Approved by: neel (co-mentor)
- remove redundant code
- remove erroneous setting of the error return
in vmmdev_ioctl()
- use style(9) initialization
- in vmx_inject_pir(), document the race condition
that the final conditional statement was detecting,
Tested with both gcc and clang builds.
Reviewed by: neel
correct for the pirbase test (since I'd have thought we'd need to do
something even when the offset is 0 and that test looks like a
misguided attempt to not use an uninitialized variable), but it is at
least the same as today.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it. I may have accidentally applied a patch to a non-clean
working tree. Revert everything while I figure out what went wrong.
Pointy hat to: rstone
attributed if an ExtINT arrives during interrupt injection.
Also, fix a spurious interrupt if the PIC tries to raise an interrupt
before the outstanding one is accepted.
Finally, improve the PIC interrupt latency when another interrupt is
raised immediately after the outstanding one is accepted by creating a
vmexit rather than waiting for one to occur by happenstance.
Approved by: neel (co-mentor)
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.
Reviewed by: jhb, neel
Approved by: neel (co-mentor)
The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode
is switched to x2APIC. The VT-x implementation of this handler turns off the
APIC-access virtualization and enables the x2APIC virtualization in the VMCS.
The x2APIC virtualization is done by allowing guest read access to a subset
of MSRs in the x2APIC range. In non-root operation the processor will satisfy
an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead.
The guest is also given write access to TPR, EOI and SELF_IPI MSRs which
get special treatment in non-root operation. This is documented in the
Intel SDM section titled "Virtualizing MSR-Based APIC Accesses".
Enforce that APIC-write and APIC-access VM-exits are handled only if
APIC-access virtualization is enabled. The one exception to this is
SELF_IPI virtualization which may result in an APIC-write VM-exit.
simplify the implementation of the x2APIC virtualization assist in VT-x.
Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.
Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.
Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.
The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.
Discussed with: grehan@
in x2apic mode. Reads to this MSR are currently ignored but should cause a
general proctection exception to be injected into the vcpu.
All accesses to the corresponding offset in xAPIC mode are ignored.
Also, do not panic the host if there is mismatch between the trigger mode
programmed in the TMR and the actual interrupt being delivered. Instead the
anomaly is logged to aid debugging and to prevent a misbehaving guest from
panicking the host.
This is necessary because if the vlapic is configured in x2apic mode the
vioapic_process_eoi() function is called inside the critical section
established by vm_run().
The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector.
If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger
an EOI-induced VM-exit when it is doing EOI virtualization.
The EOI-induced VM-exit results in the EOI being forwarded to the vioapic
so that level triggered interrupts can be properly handled.
Tested by: Anish Gupta (akgupt3@gmail.com)
can be initiated in the context of a vcpu thread or from the bhyve(8) control
process.
The first use of this functionality is to update the vlapic trigger-mode
register when the IOAPIC pin configuration is changed.
Prior to this change we would update the TMR in the virtual-APIC page at
the time of interrupt delivery. But this doesn't work with Posted Interrupts
because there is no way to program the EOI_exit_bitmap[] in the VMCS of
the target at the time of interrupt delivery.
Discussed with: grehan@
hardware. It is possible to turn this feature off and fall back to software
emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0.
We now start handling two new types of VM-exits:
APIC-access: This is a fault-like VM-exit and is triggered when the APIC
register access is not accelerated (e.g. apic timer CCR). In response to
this we do emulate the instruction that triggered the APIC-access exit.
APIC-write: This is a trap-like VM-exit which does not require any instruction
emulation but it does require the hypervisor to emulate the access to the
specified register (e.g. icrlo register).
Introduce 'vlapic_ops' which are function pointers to vector the various
vlapic operations into processor-dependent code. The 'Virtual Interrupt
Delivery' feature installs 'ops' for setting the IRR bits in the virtual
APIC page and to return whether any interrupts are pending for this vcpu.
Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.
the vcpu should be kicked to process a pending interrupt. This will be useful
in the implementation of the Posted Interrupt APICv feature.
Change the return value of 'vlapic_pending_intr()' to indicate whether or not
an interrupt is available to be delivered to the vcpu depending on the value
of the PPR.
Add KTR tracepoints to debug guest IPI delivery.
guest disables the HPET.
The HPET timer interrupt is triggered from the callout handler associated with
the timer. It is possible for the callout handler to be delayed before it gets
a chance to execute. If the guest disables the HPET during this window then the
handler never gets a chance to execute and the timer interrupt is lost.
This is now fixed by injecting a timer interrupt into the guest if the callout
time is detected to be in the past when the HPET is disabled.
The handler is now called after the register value is updated in the virtual
APIC page. This will make it easier to handle APIC-write VM-exits with APIC
register virtualization turned on.
This also implies that we need to keep a snapshot of the last value written
to a LVT register. We can no longer rely on the LVT registers in the APIC
page to be "clean" because the guest can write anything to it before the
hypervisor has had a chance to sanitize it.