Add the System Management BIOS Baseboard (or Module) Information
a.k.a. Type 2 structure to the SMBIOS emulation.
Reviewed by: rgrimes, bcran, grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29657
Commit 621b509048 introduced a regression
in legacy virtio-9p config parsing by not initializing *sharename to
NULL. As a result, "sharename != NULL" check in the first iteration fails
and bhyve exits with "virtio-9p: more than one share name given".
Fix by adding NULL back.
Approved by: grehan
This allows the xhci tablet device to be recognized and a PCI device
instantiated.
Reviewed by: jhb
Fixes: 621b509048 Refactor configuration management in bhyve.
MFC after: 3 months.
The old prototype requires callers to inspect flags of each descriptors
to get the starting position of host-writable iovecs.
vq_getchain() is changed to return a virtio request with the number of
host-readable iovecs and host-writable iovecs instead. Callers can avoid
boilerplate code of getting the start offset of host-writable iovecs.
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Reviewed by: afedorov
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29433
The previous commit added the handler to parse the command line
options for virtio-scsi devices but forgot to set the correct function
pointer to point to the handler.
Reported by: vangyzen
Reviewed by: vangyzen
Fixes: 621b509048
Differential Revision: https://reviews.freebsd.org/D29438
"device" is already used as the generic PCI-level name of the device
model to use (e.g. "hostbridge"). The result was that parsing
"hostbridge" as an integer failed and the host bridge used a device ID
of 0. The EFI ROM asserts that the device ID of the hostbridge is not
0, so booting with the current EFI ROM was failing during the ROM
boot.
Fixes: 621b509048
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.
The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.
All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.
A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.
A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.
In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.
A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.
Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:
- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).
- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.
After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.
The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.
- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.
Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.
This reduces code duplication.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084
The save/restore feature uses a unix domain socket to send messages
from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice
for this.
An added benefit of using a datagram socket is simplified code. For
bhyve, the listen/accept calls are dropped; and for bhyvectl, the
connect() call is dropped.
EPRINTLN handles raw mode for bhyve(8), use it to print error messages.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28983
MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character
buffer that stores a filename or the path to a file - this file is used
by the save/restore feature.
Since the file doesn't have anything to do with a vm name, rename
MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX
while here.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28879
Generalize the naming here since the domain socket that uses these codes
might be used for purposes other than the save/restore feature.
- rename checkpoint_opcodes to ipc_opcode
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28877
Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when
creating the unix domain socket for a given bhyve vm.
The path to the unix domain socket for a bhyve vm will now be
/var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname
Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared
to bhyvectl(8).
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28783
struct checkpoint_op, enum checkpoint_opcodes, and
MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi
header.
They are used for the save/restore functionality that bhyve(8)
provides and are better suited in usr.sbin/bhyve/snapshot.h
Since bhyvectl(8) requires these, the Makefile for bhyvectl has been
modified to include usr.sbin/bhyve/snapshot.h
Reviewed by: kevans, grehan
Differential Revision: https://reviews.freebsd.org/D28410
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.
This also removes the '-b' and '-g' flag from bhyve(8). These two flags were
marked deprecated in r368519.
Reviewed by: grehan, kevans
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D27490
- support VNC version 3.3 (macos "Screen Sharing" builtin client)
- wait until client has requested an update prior to sending framebuffer data
- don't send an update if no framebuffer updates detected
- increase framebuffer poll frequency to 30Hz, and double that when
kbd/mouse input detected
- zero uninitialized array elements in rfb_send_server_init_msg()
- fix overly large allocation in rfb_init()
- use atomics for flags shared between input and output threads
- use #defines for constants
This work was contributed by Marko Kiiskila, with reuse of some earlier
work by Henrik Gulbrandsen.
Clients tested :
FreeBSD-current
- tightvnc
- tigervnc
- krdc
- vinagre
Linux (Ubuntu)
- krdc
- vinagre
- tigervnc
- xtightvncviewer
- remmina
MacOS
- VNC Viewer
- TigerVNC
- Screen Sharing (builtin client)
Windows 10
- Tiger VNC
- VNC Viewer (cursor lag)
- UltraVNC (cursor lag)
o/s independent
- noVNC (browser) using websockify relay
PR: 250795
Submitted by: Marko Kiiskila <marko@apache.org>
Reviewed by: jhb (bhyve)
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27605
Enforce the requirement that the RX callback cannot be called after a reset until the features have been negotiated.
This fixes a race condition where the receive callback is called during a device reset.
Reviewed by: vmaffione, grehan
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D27381
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.
Mark the '-b' and '-g' flag as deprecated for bhyve(8).
These will be removed in 13.
Reviewed by: jhb, grehan
Approved by: kevans (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27519
This uses the same snapshot routine as other VirtIO devices.
Submitted by: Vitaliy Gusev <gusev.vitaliy@gmail.com>
Differential Revision: https://reviews.freebsd.org/D26265
Permit suspend/resume of a XHCI device model that has not been
attached to by a driver in a guest OS.
Submitted by: Vitaliy Gusev <gusev.vitaliy@gmail.com>
Differential Revision: https://reviews.freebsd.org/D26264
This fixes the amount of memory displayed in the EDK2 UiApp to be the same
as passed on the bhyve command line. Otherwise, 8GB is displayed as 4GB,
32GB as 28GB etc.
Reviewed by: jhb, kib, rgrimes
Differential Revision: https://reviews.freebsd.org/D27348
Fix a couple of style issues introduced in my previous commit.
Add a comment explaining that the SMBIOS specification defines the date
format to be mm/dd/yyyy, which is why we don't use ISO 8601.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).
This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.
While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212
Add update to RIP after a userspace instruction decode (as is done for
the in-kernel counterpart of this case).
Submitted by: adam_fenn.io
Reviewed by: cem, markj
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27243
When the AHCI code was reworked to use FreeBSD struct
definitions, the valid element was mis-transcribed resulting
in the UMDA capability being hidden. This prevented Illumos
from using AHCI disk/cdrom drives.
Fix by using definitions that match the code pre-rework.
PR: 250924
Submitted by: Rolf Stalder
Reported by: Rolf Stalder
MFC after: 3 days
Since lots of work has been done on bhyve since 2014, increase the version
to 13.0 to match 13-CURRENT, and update the release date.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D27147
Read CPUID leaf 0x8000008 to determine max supported phys address and
create BAR region right below it, reserving 1/4 of the supported guest
physical address space to the 64bit BARs mappings.
PR: 250802 (although the issue from PR is not fixed by the change)
Noted and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27095
If bhyve is used to emulate 512e access in guest OS, then discard addresses should be properly aligned.
Otherwise ioctl DIOCGDELETE fails for 512b requires on devices with 4K sector size.
see g_dev_ioctl() in sys/geom/geom_dev.c
Submitted by: Vitaliy Gusev <gusev.vitaliy@gmail.com>
MFC after: 1 week
Sponsored by: vStack.com
Differential Revision: https://reviews.freebsd.org/D27075
"smbios.system.family" as " ".
This presents challenges for both humans and tools when trying to parse output
that uses those results.
The new values reported are now:
smbios.system.family="Virtual Machine"
smbios.system.maker="FreeBSD"
PR: 250728
Approved by: grehan@FreeBSD.org
Sponsored by: Netflix
bhyve sometimes segfaults when using an e1000 NIC with a Windows guest.
We are only updating our tdba and cached host mapping when the low address
register is written and when tx is set enabled, but not when the high address
or length registers are written. It is observed that Windows 10 is occasionally
enabling tx first then writing the registers in the order low, high, len. This
leaves us with a bogus base address and mapping, which causes a segfault later
when we try to copy from a descriptor that has unpredictable garbage in a
pointer.
Updating the address and mapping when any of those registers change seems to fix
that particular issue.
Reviewed by: mav, grehan (bhyve)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26798
VirtFS allows sharing an arbitrary directory tree between bhyve virtual
machine and the host. Current implementation has a fairly complete support
for 9P2000.L protocol, except for the extended attribute support. It has
been verified to work with the qemu-kvm hypervisor.
Reviewed by: rgrimes, emaste, jhb, trasz
Approved by: trasz (mentor)
MFC after: 1 month
Relnotes: yes
Sponsored by: Conclusive Engineering (development), vStack.com (funding)
Differential Revision: https://reviews.freebsd.org/D10335
'ident' was replaced with 'ata_ident' in revision r363596.
Submitted by: Vitaliy Gusev <gusev.vitaliy_gmail.com>
Reviewed by: Darius Mihai
Differential Revision: https://reviews.freebsd.org/D26263
The language id of String Descriptors in usb mouse is
0x0904, while the spec require 0x0409 (English - United States)
Submitted by: Wanpeng Qian
Reviewed by: grehan
Approved by: grehan (#bhyve)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26472
The NVMe emulation code did not explicitly initialize queue head and
tail pointers on queue creation. As these pointers are part of
calloc()'ed memory, this only becomes a problem if the queues are
deleted and then recreated.
This error can manifest with messages about completions not matching a
command.
Some operating systems believe bhyve's emulated NVMe drive is failing
based on certain values in the SMART / Health Information log page being
zero. Fix is to set the reported temperature and available spare values
to reasonable defaults.
Submitted by: wanpengqian@gmail.com
Reviewed by: grehan
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24202
Allow the serial number, firmware revision, model number and nominal media
rotation rate (nmrr) parameters to be set from the command line.
Note that setting the nmrr value can be used to indicate the AHCI
device is an SSD.
Submitted by: Wanpeng Qian
Reviewed by: jhb, grehan (#bhyve)
Approved by: jhb, grehan
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D24174
This fixes a coredump with NetBSD guests when XHCI is configured.
On seeing the AC64 flag clear, the NetBSD XHCI driver was only writing
to the lower 32-bits of 64-bit physical address registers. The emulation
relies on a write to the hi 32-bits to calculate a host virtual address
for internal use, and has always supported 64-bit addressing.
All other guests were seen to write to both the lo- and hi- address
registers, regardless of the AC64 setting.
Discussed with: Leon Dang (author)
Tested with: Ubuntu 16/18/20, Windows10, OpenBSD UEFI guests.
MFC after: 2 weeks.
Allow guests to set the RTC bit in the ACPI PM control register.
This eliminates an annoying (and harmless) Linux kernel boot message.
PR: 244721
Submitted by: Jose Luis Duran
MFC after: 1 week
The NVMe specification requires unused entries in the Identify, Active
Namespace ID data to be zero. Fix is bzero the provided page, similar to
what is done for the Namespace Descriptors list.
Fixes UNH Tests 2.6 and 2.9
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24901
Dataset Management range specifications may have a zero length (a.k.a.
an empty range definition). Handle the case of all ranges being empty by
completing with Success (DSM commands are advisory only). For
Deallocate, skip empty range definitions when sending TRIM's to the
backing storage.
Fixes UNH Test 2.2.4
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24900
If the Predictable Latency Mode is not supported, NVMe Controllers must
return Invalid Field in Command status for the Get Features command
with IDs:
- Predictable Latency Mode Config
- Predictable Latency Mode Window
Fixes UNH Tests 3.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24899
This adds support for NVMe Get Features, Interrupt Vector Config
parameter error checking done by the UNH compliance tests.
Fixes UNH Tests 1.6.8 and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24898
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).
Fixes UNH Test 5.5.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24897
This adds support to bhyve's NVMe device emulation for processing Async
Event Requests but not returning them (i.e. Async Event Notifications).
Fixes UNH Test 5.5.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24896
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.
Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.
Fixes UNH Tests 2.2.3, 2.3.2, and 2.4.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24895
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).
This patch implements counters for:
- Data Units Read
- Data Units Written
- Host Read Commands
- Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.
Fixes UNH Test 1.3.8
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24894
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.
Fixes UNH Test 2.2.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24893
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.
Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24891
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24890
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.
Fixes UNH Test 1.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24889
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.
Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.
Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.
Fixes UNH Tests 1.2, 5.4, and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24887
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.
Fixes UNH Test 1.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24885
Consolidate the code which writes Completion Queue entries and updates
the CQ doorbell value. While in the neighborhood, convert the "toggle CQ
phase bit" code to use an XOR operation instead of an "if/else" branch.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24882
The NVMe code attempted to ensure thread safety through a combination of
using atomics and a "busy" flag. But this approach leads to unavoidable
race conditions.
Fix is to use per-queue mutex locks to ensure thread safety within the
queue processing code. While in the neighborhood, move all the queue
initialization code to a common function.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19841
This adds support for the NVMe I/O command Flush. For block-based
devices, submit a DIOCGFLUSH to the backing storage. Otherwise, command
is treated like a NOP and completes with a Successful status.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24880
This refactors the NVMe I/O command processing function to make adding
new commands easier. The main change is to move command specific
processing (i.e. Read/Write) to separate functions for each NVMe I/O
command and leave the common per-command processing in the existing
pci_nvme_handle_io_cmd() function.
While here, add checks for some common errors (invalid Namespace ID,
invalid opcode, LBA out of range).
Add myself to the Copyright holders
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24879
Convert the debug and warning logging macros to be parameterized and
correctly use bhyve's PRINTLN macro.
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24878
The TRB processing loop could potentially call a back-end twice
with the same status transaction. While this was generally benign,
some code paths in the tablet backend weren't set up to handle
this case, resulting in a NULL dereference.
Fix by
- returning a STALL error when an invalid request was seen in the backend
- skipping a call to the backend if the number of packets in a status
transaction was zero (this code fragment was taken from the Intel ACRN
xhci backend)
PR: 246964
Reported by: Ali Abdallah
Discussed with: Leon Dang (author)
Reviewed by: jhb (#bhyve), Leon Dang
Approved by: jhb
Obtained from: Intel ACRN (partially)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25228
Introduce -D flag that allows for the VM to be destroyed on guest initiated
power-off by the bhyve(8) process itself.
This is quality of life change that allows for simpler deployments without
the need for bhyvectl --destroy.
Requested by: swills
Reviewed by: 0mp (manpages), grehan, kib, swills
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25414
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464
- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.
- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.
Reported and extensively tested by Anatoli (me at anatoli dot ws)
PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).
Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525
After r360820, additional parameters are passed through the argument 'opts', and the name of the backend through the argument 'devname'. So, there is no need to skip the backend name from the 'opts' argument.
The backend uses the socket API with the PF_NETGRAPH protocol family, which is provided by the ng_socket(4).
To use the new backend, provide the following bhyve option:
-s X:Y:Z,[virtio-net|e1000],netgraph,socket=[ng_socket name],path=[destination node],hook=[our socket src hook],peerhook=[dst node hook]
Reviewed by: vmaffione, lutz_donnerhacke.de
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D24620
r359704 introduced an 'mtu' option for the virtio-net device emulation.
Update the man page to describe the new option.
Reviewed by: bcr
Differential Revision: https://reviews.freebsd.org/D24723
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
bhyve uses cached copies of the MSI capability registers to generate
MSI interrupts for device models. Previously, these cached fields
were only set when the MSI capability control register was updated.
The Linux kernel recently adopted a change to deal with races in MSI
interrupt delivery that writes to the MSI capability address and data
registers to alter the destination of MSI interrupts without writing
to the MSI capability control register. bhyve was not updating its
cached registers for these writes and continued to send interrupts
with the old data value to the old address. Fix this by recomputing
the cached values for every write to any MSI capability register.
Reported by: Jason Tubnor, Ryan Moeller
Reported by: Marc Dionne (bisected the Linux kernel commit)
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24593
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.
Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.
Submitted by: Jason King <jason.king@joyent.com> (improvements)
Reviewed by: jhb
Obtained from: illumos-joyent (improvements)
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21707
This patch is about sorting the arguments and using proper mdoc(7) macros
to stylize arguments and command modifiers for much better readability.
Further style fixes in other sections within the bhyve manual page are
going to be worked on in upcoming patches.
Reviewed by: rgrimes
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24526
Print the failed instruction stream as a contiguous stream of hex. This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.
Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
line
effect.
Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve. The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.
A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back). At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.
Suggested by: rpokala
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D23165
To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.
The bootrom segment is the high 16MB of the low 4GB region.
Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range. However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.
The existing ROM-file loading code is refactored to use the new interface.
Reviewed by: grehan (earlier version)
Differential Revision: https://reviews.freebsd.org/D24422
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000
Reported by: vmaffione, jhb
Approved by: vmaffione (mentor)
Differential Revision: https://reviews.freebsd.org/D23971
According to the SMBIOS specification (revision 2.7 or newer), the
extended module size field should only be used for sizes that can't
fit in the older size field.
Reviewed by: rgrimes, grehan, jhb
Differential Revision: https://reviews.freebsd.org/D24107
The SQHD field of a Completion Queue entry indicates the current
Submission Queue head pointer value. The head pointer represents the
next entry to be consumed and is updated after consuming the current
entry.
In the Admin queue processing, the current code updates the head pointer
after reporting the value to the host via the SQHD. This gives the
impression that the Controller is perpetually one command behind in its
processing of the Admin SQ. And while this doesn't appear to bother some
initiators, it is wrong.
Fix is to update the SQ head pointer prior to writing the SQHD value in
the completion.
While here, fix missed update of dword 0 (cdw0) in the completion
message.
Reported by: khng300
Reviewed by: jhb, imp
Approved by: jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24083
The bhyve NVMe emulation has a race in the logic which generates command
completion interrupts. On FreeBSD guests, this manifests as kernel log
messages similar to:
nvme0: Missing interrupt
The NVMe emulation code sets a per-submission queue "busy" flag while
processing the submission queue, and only generates an interrupt when
the submission queue is not busy.
Aside from being counter to the NVMe design (i.e. interrupt properties
are tied to the completion queue) and adding complexity (e.g. exceptions
to not generating an interrupt when "busy"), it causes a race condition
under the following conditions:
- guest OS has no outstanding interrupts
- guest OS submits a single NVMe IO command
- bhyve emulation processes the SQ and sets the "busy" flag
- bhyve emulation submits the asynchronous IO to the backing storage
- IO request to the backing storage completes before the SQ processing
loop exits and doesn't generate an interrupt because the SQ is "busy"
- bhyve emulation finishes processing the SQ and clears the "busy" flag
Fix is to remove the "busy" flag and generate an interrupt when the CQ
head and tail pointers do not match.
Reported by: khng300
Reviewed by: jhb, imp
Approved by: jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24082
This adds support for the Dataset Management (DSM) command to the NVMe
emulation in general, and more specifically, for the deallocate
attribute (a.k.a. trim in the ATA protocol). If the backing storage for
the namespace supports delete (i.e. deallocate), setting the deallocate
attribute in a DSM will trim/delete the requested LBA ranges in the
underlying storage.
Reviewed by: jhb, araujo, imp
Approved by: jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21839
Pass the struct pci_nvme_blockstore pointer for this namespace to the
namespace initialization function instead of only the desired eui64
value.
Minor functional change in that the code updates the eui64 value in the
blockstore.
Reviewed by: jhb, araujo
Approved by: jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21838
Add a "copy direction" parameter to nvme_prp_memcpy such that data can
be copied to the memory specified by the PRP entries (current behavior)
or copied from the PRP entries (new behavior). The upcoming deallocate
functionality will use the copy from capability.
Reviewed by: jhb, araujo
Approved by: jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21837