The NVMe emulation code did not explicitly initialize queue head and
tail pointers on queue creation. As these pointers are part of
calloc()'ed memory, this only becomes a problem if the queues are
deleted and then recreated.
This error can manifest with messages about completions not matching a
command.
Some operating systems believe bhyve's emulated NVMe drive is failing
based on certain values in the SMART / Health Information log page being
zero. Fix is to set the reported temperature and available spare values
to reasonable defaults.
Submitted by: wanpengqian@gmail.com
Reviewed by: grehan
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24202
Allow the serial number, firmware revision, model number and nominal media
rotation rate (nmrr) parameters to be set from the command line.
Note that setting the nmrr value can be used to indicate the AHCI
device is an SSD.
Submitted by: Wanpeng Qian
Reviewed by: jhb, grehan (#bhyve)
Approved by: jhb, grehan
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D24174
This fixes a coredump with NetBSD guests when XHCI is configured.
On seeing the AC64 flag clear, the NetBSD XHCI driver was only writing
to the lower 32-bits of 64-bit physical address registers. The emulation
relies on a write to the hi 32-bits to calculate a host virtual address
for internal use, and has always supported 64-bit addressing.
All other guests were seen to write to both the lo- and hi- address
registers, regardless of the AC64 setting.
Discussed with: Leon Dang (author)
Tested with: Ubuntu 16/18/20, Windows10, OpenBSD UEFI guests.
MFC after: 2 weeks.
Allow guests to set the RTC bit in the ACPI PM control register.
This eliminates an annoying (and harmless) Linux kernel boot message.
PR: 244721
Submitted by: Jose Luis Duran
MFC after: 1 week
The NVMe specification requires unused entries in the Identify, Active
Namespace ID data to be zero. Fix is bzero the provided page, similar to
what is done for the Namespace Descriptors list.
Fixes UNH Tests 2.6 and 2.9
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24901
Dataset Management range specifications may have a zero length (a.k.a.
an empty range definition). Handle the case of all ranges being empty by
completing with Success (DSM commands are advisory only). For
Deallocate, skip empty range definitions when sending TRIM's to the
backing storage.
Fixes UNH Test 2.2.4
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24900
If the Predictable Latency Mode is not supported, NVMe Controllers must
return Invalid Field in Command status for the Get Features command
with IDs:
- Predictable Latency Mode Config
- Predictable Latency Mode Window
Fixes UNH Tests 3.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24899
This adds support for NVMe Get Features, Interrupt Vector Config
parameter error checking done by the UNH compliance tests.
Fixes UNH Tests 1.6.8 and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24898
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).
Fixes UNH Test 5.5.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24897
This adds support to bhyve's NVMe device emulation for processing Async
Event Requests but not returning them (i.e. Async Event Notifications).
Fixes UNH Test 5.5.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24896
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.
Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.
Fixes UNH Tests 2.2.3, 2.3.2, and 2.4.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24895
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).
This patch implements counters for:
- Data Units Read
- Data Units Written
- Host Read Commands
- Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.
Fixes UNH Test 1.3.8
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24894
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.
Fixes UNH Test 2.2.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24893
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.
Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24891
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24890
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.
Fixes UNH Test 1.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24889
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.
Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.
Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.
Fixes UNH Tests 1.2, 5.4, and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24887
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.
Fixes UNH Test 1.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24885
Consolidate the code which writes Completion Queue entries and updates
the CQ doorbell value. While in the neighborhood, convert the "toggle CQ
phase bit" code to use an XOR operation instead of an "if/else" branch.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24882
The NVMe code attempted to ensure thread safety through a combination of
using atomics and a "busy" flag. But this approach leads to unavoidable
race conditions.
Fix is to use per-queue mutex locks to ensure thread safety within the
queue processing code. While in the neighborhood, move all the queue
initialization code to a common function.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19841
This adds support for the NVMe I/O command Flush. For block-based
devices, submit a DIOCGFLUSH to the backing storage. Otherwise, command
is treated like a NOP and completes with a Successful status.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24880
This refactors the NVMe I/O command processing function to make adding
new commands easier. The main change is to move command specific
processing (i.e. Read/Write) to separate functions for each NVMe I/O
command and leave the common per-command processing in the existing
pci_nvme_handle_io_cmd() function.
While here, add checks for some common errors (invalid Namespace ID,
invalid opcode, LBA out of range).
Add myself to the Copyright holders
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24879
Convert the debug and warning logging macros to be parameterized and
correctly use bhyve's PRINTLN macro.
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24878
The TRB processing loop could potentially call a back-end twice
with the same status transaction. While this was generally benign,
some code paths in the tablet backend weren't set up to handle
this case, resulting in a NULL dereference.
Fix by
- returning a STALL error when an invalid request was seen in the backend
- skipping a call to the backend if the number of packets in a status
transaction was zero (this code fragment was taken from the Intel ACRN
xhci backend)
PR: 246964
Reported by: Ali Abdallah
Discussed with: Leon Dang (author)
Reviewed by: jhb (#bhyve), Leon Dang
Approved by: jhb
Obtained from: Intel ACRN (partially)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25228
Introduce -D flag that allows for the VM to be destroyed on guest initiated
power-off by the bhyve(8) process itself.
This is quality of life change that allows for simpler deployments without
the need for bhyvectl --destroy.
Requested by: swills
Reviewed by: 0mp (manpages), grehan, kib, swills
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25414
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464
- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.
- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.
Reported and extensively tested by Anatoli (me at anatoli dot ws)
PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).
Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525
After r360820, additional parameters are passed through the argument 'opts', and the name of the backend through the argument 'devname'. So, there is no need to skip the backend name from the 'opts' argument.
The backend uses the socket API with the PF_NETGRAPH protocol family, which is provided by the ng_socket(4).
To use the new backend, provide the following bhyve option:
-s X:Y:Z,[virtio-net|e1000],netgraph,socket=[ng_socket name],path=[destination node],hook=[our socket src hook],peerhook=[dst node hook]
Reviewed by: vmaffione, lutz_donnerhacke.de
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D24620
r359704 introduced an 'mtu' option for the virtio-net device emulation.
Update the man page to describe the new option.
Reviewed by: bcr
Differential Revision: https://reviews.freebsd.org/D24723
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
bhyve uses cached copies of the MSI capability registers to generate
MSI interrupts for device models. Previously, these cached fields
were only set when the MSI capability control register was updated.
The Linux kernel recently adopted a change to deal with races in MSI
interrupt delivery that writes to the MSI capability address and data
registers to alter the destination of MSI interrupts without writing
to the MSI capability control register. bhyve was not updating its
cached registers for these writes and continued to send interrupts
with the old data value to the old address. Fix this by recomputing
the cached values for every write to any MSI capability register.
Reported by: Jason Tubnor, Ryan Moeller
Reported by: Marc Dionne (bisected the Linux kernel commit)
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24593
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.
Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.
Submitted by: Jason King <jason.king@joyent.com> (improvements)
Reviewed by: jhb
Obtained from: illumos-joyent (improvements)
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21707
This patch is about sorting the arguments and using proper mdoc(7) macros
to stylize arguments and command modifiers for much better readability.
Further style fixes in other sections within the bhyve manual page are
going to be worked on in upcoming patches.
Reviewed by: rgrimes
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24526
Print the failed instruction stream as a contiguous stream of hex. This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.
Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
line
effect.
Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve. The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.
A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back). At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.
Suggested by: rpokala
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D23165
To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.
The bootrom segment is the high 16MB of the low 4GB region.
Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range. However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.
The existing ROM-file loading code is refactored to use the new interface.
Reviewed by: grehan (earlier version)
Differential Revision: https://reviews.freebsd.org/D24422
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000
Reported by: vmaffione, jhb
Approved by: vmaffione (mentor)
Differential Revision: https://reviews.freebsd.org/D23971