New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.
Reviewed by: jhb, neel
Approved by: neel (co-mentor)
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.
Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks
- Implement the PM1_EVT and PM1_CTL registers required by ACPI.
The PM1_EVT register is mostly a dummy as bhyve doesn't support any
of the hardware-initiated events. The only bit of PM1_CNT that is
implemented are the sleep request bits (SPL_EN and SLP_TYP) which
request a graceful power off for S5. In particular, for S5, bhyve
exits with a non-zero value which terminates the loop in vmrun.sh.
- Emulate the Reset Control register at I/O port 0xcf9 and advertise
it as the reset register via ACPI.
- Advertise an _S5 package.
- Extend the in/out interface to allow an in/out handler to request
that the hypervisor trigger a reset or power-off.
- While here, note that all vCPUs in a guest support C1 ("hlt").
Reviewed by: neel (earlier version)
upcoming in-kernel device emulations like the HPET.
The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.
Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
to a virtual machine then we implicitly create COM1 and COM2 ISA devices.
Prior to this change the only way of attaching a COM port to the virtual
machine was by presenting it as a PCI device that is mapped at the legacy
I/O address 0x3F8 or 0x2F8.
There were some issues with the original approach:
- It did not work at all with UEFI because UEFI will reprogram the PCI device
BARs and remap the COM1/COM2 ports at non-legacy addresses.
- OpenBSD GENERIC kernel does not create a /dev/console because it expects
the uart device at the legacy 0x3F8/0x2F8 address to be an ISA device.
- It was functional with a FreeBSD guest but caused the console to appear
on /dev/ttyu2 which was not intuitive.
The uart emulation is now independent of the bus on which it resides. Thus it
is possible to have uart devices on the PCI bus in addition to the legacy
COM1/COM2 devices behind the LPC bus.
The command line option to attach ISA COM1/COM2 ports to a virtual machine is
"-s <bus>,lpc -l com1,stdio".
The command line option to create a PCI-attached uart device is:
"-s <bus>,uart[,stdio]"
The command line option to create PCI-attached COM1/COM2 device is:
"-S <bus>,uart[,stdio]". This style of creating COM ports is deprecated.
Discussed with: grehan
Reviewed by: grehan
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
M share/examples/bhyve/vmrun.sh
AM usr.sbin/bhyve/legacy_irq.c
AM usr.sbin/bhyve/legacy_irq.h
M usr.sbin/bhyve/Makefile
AM usr.sbin/bhyve/uart_emul.c
M usr.sbin/bhyve/bhyverun.c
AM usr.sbin/bhyve/uart_emul.h
M usr.sbin/bhyve/pci_uart.c
M usr.sbin/bhyve/pci_emul.c
M usr.sbin/bhyve/inout.c
M usr.sbin/bhyve/pci_emul.h
M usr.sbin/bhyve/inout.h
AM usr.sbin/bhyve/pci_lpc.c
AM usr.sbin/bhyve/pci_lpc.h
bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.
(renaming internal routines will come later)
Reviewed by: neel
Obtained from: NetApp
On a nested page table fault the hypervisor will:
- fetch the instruction using the guest %rip and %cr3
- decode the instruction in 'struct vie'
- emulate the instruction in host kernel context for local apic accesses
- any other type of mmio access is punted up to user-space (e.g. ioapic)
The decoded instruction is passed as collateral to the user-space process
that is handling the PAGING exit.
The emulation code is fleshed out to include more addressing modes (e.g. SIB)
and more types of operands (e.g. imm8). The source code is unified into a
single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well
as /usr/sbin/bhyve.
Reviewed by: grehan
Obtained from: NetApp
The -A option will create the minimal set of required ACPI tables in
guest memory. Since ACPI mandates an IOAPIC, the -I option must also
be used.
Template ASL files are created, and then passed to the iasl compiler
to generate AML files. These are then loaded into guest physical mem.
In support of this, the ACPI PM timer is implemented, in 32-bit mode.
Tested on 7.4/8.*/9.*/10-CURRENT.
Reviewed by: neel
Obtained from: NetApp
Discussed with: jhb (a long while back)
Firmware tables require too much knowledge of system configuration,
and it's difficult to pass that information in general terms to a library.
The upcoming ACPI work exposed this - it will also livein bhyve.
Also, remove code specific to NetApp from the mptable name, and remove
the -n option from bhyve.
Reviewed by: neel
Obtained from: NetApp
- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.
- Support memory-mapped BARs in PCI emulation.
mem.c/h - memory region interface
instruction_emul.c/h - remove old region interface.
Use gpa from EPT exit to avoid a tablewalk to
determine operand address. Determine operand size
and use when calling through to region handler.
fbsdrun.c - call into region interface on paging
exit. Distinguish between instruction emul error
and region not found
pci_emul.c/h - implement new BAR callback api.
Split BAR alloc routine into routines that
require/don't require the BAR phys address.
ioapic.c
pci_passthru.c
pci_virtio_block.c
pci_virtio_net.c
pci_uart.c - update to new BAR callback i/f
Reviewed by: neel
Obtained from: NetApp
AP needs to be activated by spinning up an execution context for it.
The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.
Reviewed by: grehan
be activated as part of the slot config options.
The syntax is:
-s <slotnum>,uart[,stdio]
The stdio parameter instructs the code to perform i/o using
stdin/stdout. It can only be used for one instance.
To allow legacy i/o ports/irqs to be used, a new variant of
the slot command, -S, is introduced. When used to specify a
slot, the device will use legacy resources if it supports
them; otherwise it will be treated the same as the '-s' option.
Specifying the -S option with the uart will first use the 0x3f8/irq 4
config, and the second -S will use 0x2F8/irq 3.
Interrupt delivery is awaiting the arrival of the i/o apic code,
but this works fine in uart(4)'s polled mode.
This code was written by Cynthia Lu @ MIT while an intern at NetApp,
with further work from neel@ and grehan@.
Obtained from: NetApp
Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.
Submitted by: ryan dot berryhill at sandvine dot com
Reviewed by: grehan
Obtained from: Sandvine
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface
bhyve was designed and implemented by Neel Natu.
Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski