Don't emulate anything yet. Just check if the user would like to pass an
Intel GPU to the guest.
Reviewed by: jhb, markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40038
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Intel GPUs have two special memory regions. They are called Graphics
Stolen Memory and OpRegion. bhyve has to emulate both of them. In order
to keep track of those special regions, add generic mmio ranges to the
passthru emulation.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40036
The GVT-d emulation requires access to this selector to read from the
device.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D40035
Most register of the PCI header are either constant values or require
emulation anyway. The command and status register are the only exception which
require hardware access. So, we're adding an emulation handler for all
other register.
As this emulation handler will be reused by some future features like
GPU passthrough, we directly export it.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33010
GPU passthrough requires a special handling of some PCI config register.
Therefore, we need a flexible approach for implementing it. Adding an
array of handler meets this condition.
Start by using the default handler for all accesses to the PCI config
space. In upcoming commits, we can start to split the default handler
into several handler for each register that requires emulation.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39291
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.
struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.
As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.
Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.
While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().
Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124
Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096
These ioctls are not vCPU-specific and the ioctl now ignores the vCPU
ID. 0 is used instead of -1 to provide limited forwards
compatibility.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37651
When initializing the device model for a PCI pass through device that
uses MSI-X, bhyve reads the MSI-X capability from the real device to
save a copy in the emulated PCI config space. It also saves a copy in
a local struct msixcap on the stack. Since struct msixcap is packed,
GCC complains that casting a pointer to the struct to a uint32_t
pointer may result in an unaligned pointer.
This path is not performance critical, so to appease the compiler,
simply change the pointer to a char * and use memcpy to copy the 4
bytes read in each iteration of the loop.
Reviewed by: corvink, bz, markj
Differential Revision: https://reviews.freebsd.org/D37490
Permit naming pass through devices using the syntax accepted by
pciconf (pci[<domain>:]<bus>:<slot>:<func>) as well as by device name
(e.g. "ppt0").
While here, fix an error in the manpage that had the bus and slot
arguments for the original /-delimited scheme swapped.
Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36147
pci_parse_legacy_config splits the options string by comma characters.
strchr returns a pointer to the first occurence of a character. In that
case, it's a comma. So, pci_parse_legacy_config will stop at the first
character and creates a new config node with a name of NULL.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34600
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).
It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.
Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
Export functions for reading and writing the pci config space from passthru
device to be used by other devices.
This is required for lpc devices to set their vendor/device ids to their
physical values.
Otherwise, GPU passthrough for integrated Intel GPUs won't work properly.
Differential Revision: https://reviews.freebsd.org/D33769
Reviewed by: markj
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
The starting address passed to mprotect was wrong, so in the case where
the last page containing the table is not the last page of the BAR, the
wrong region would be unmapped.
Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa2335347 ("bhyve: Map the MSI-X table unconditionally for passthrough")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33739
The PBA and MSI-X table can reside in different BARs.
Reported by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: jhb
Fixes: 7fa2335347 ("bhyve: Map the MSI-X table unconditionally for passthrough")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33739
Some passthru devices only support MSI instead of MSI-X. For those
devices the initialization of MSI-X table will fail. Re-add the
check erroneously removed in f1442847c9.
MFC after: 3 days
X-MFC with: f1442847c9
PR: 260148
Reviewed by: manu, bz
Differential Revision: https://reviews.freebsd.org/D33728
The first time we start bhyve with a passthru device everything is fine
as on boot we do enable BARs. If a driver (unload) inside bhyve disables
the BAR(s) as some Linux drivers do, we need to make sure we re-enable
them on next bhyve start.
If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP)
the kernel will give us an EBUSY.
While we were re-enabling the BAR(s) in the current code loop
cfginit() was writing the changes out too late to the real hardware.
Move the call to init_msix_table() after the register on the real
hardware was updated. That way the kernel will be happy and the
mmap will succeed and bhyve will start.
Also simplify the code given the last argument to init_msix_table()
is unused we do not need to do checks for each bar. [1]
MFC after: 3 days
PR: 260148
Pointed out by: markj [1]
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33628
Reads of the MSI-X capabilites aren't emulated by passthru devices
yet. The guest will read the host MSI-X capabilites which could
cause issues.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D32686
Sponsored by: Beckhoff Automation GmbH & Co. KG
On startup all virtual BARs are registered.
Additionally, the encoding bit in the virtual cmd register is set.
After that, the passthru emulation overwrites the virtual cmd register with
the physical one.
This could lead to a mismatch between registered BARs and the encoding
bits in the cmd register.
Instead of writing the physical to the virtual cmd register,
write the virtual to the physical cmd register to solve this issue.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D32687
Sponsored by: Beckhoff Automation GmbH & Co. KG
Tell the guest whether a BAR uses prefetched memory or not for
passthru devices by using the same lobits as the physical device.
Reviewed by: grehan
Sponsored by: Beckhoff Autmation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D32685
It is possible for the PBA to reside in the same page as the MSI-X
table. And, while devices are not supposed to do this, at least some
Intel wifi devices place registers in a page shared with the MSI-X
table. To handle the first case we currently map the PBA page using
/dev/mem, and the second case is not handled.
Kill two birds with one stone: map the MSI-X table BAR using the
PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that
accesses beyond the bounds of the table can be emulated. Regions of the
BAR not containing the table are left unmapped.
Reviewed by: bz, grehan, jhb
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32359
This removes the dependency on /dev/io.
PR: 251046
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31308
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.
The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.
All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.
A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.
A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.
In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.
A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.
Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:
- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).
- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.
After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.
The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.
- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.
Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).
This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.
While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212
- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.
- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.
Reported and extensively tested by Anatoli (me at anatoli dot ws)
PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951
VFs return zero for the memory enable bit even if it has been set by a
prior write. After r348779 this caused the annoying behavior that a
guest OS would unintentionally disable memory decoding on a future
read-modify-write operation on the command register. Instead, return
the shadow value of the command register for reads. This ensures that
the guest will only toggle the state of the memory enable bit when it
specifically intends to do so.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531
bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to. For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.
bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table. However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.
However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer. Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.
Reported by: gallatin
Tested by: gallatin
Reviewed by: markj, Patrick Mooney
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20523
will return success when the kernel is built without support of
the capability mode.
It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.
Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
Previously, many errors (such as the PCI device not being attached
to the ppt(4) driver) resulted in bhyve silently exiting without
starting the virtual machine. Now any errors encountered when
configuring a virtual slot for a PCI passthru device should be noted
on stderr.
Reviewed by: neel
Differential Revision: https://reviews.freebsd.org/D5990
If the PBA shares a page with the MSI-X table, map the shared page via
/dev/mem and emulate accesses to the portion of the PBA in the shared
page by accessing the mapped page.
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5919
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks