Instead of skipping the NVMe Completion Queue update based on the
opcode, define a synthetic status value which indicates the completion
queue entry is invalid. This will also allow deferred completion queue
updates for other commands.
Also returns the correct status for unrecognized opcodes ("invalid
opcode").
Reviewed by: imp, jhb, araujo
Approved by: imp (mentor), jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20945
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
- the FreeBSD IEEE OUI
- PCI bus, device/slot, function values
- Namespace ID
Reviewed by: imp, araujo, jhb, rgrimes
Approved by: imp (mentor), jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19905
Follow-up work to improve the handling of unsupported/invalid opcodes
is being developed by chuck@.
Coverity CID: 1398928
Reviewed by: chuck
Approved by: araujo, imp
Differential Revision: https://reviews.freebsd.org/D20914
The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:
nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link
The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.
Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.
Reviewed by: imp, ken, araujo, jhb, rgrimes
Approved by: imp (mentor), ken (mentor), jhb (maintainer)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19904
bhyve's NVMe emulation was transferring Identify data back to the guest
incorrectly causing memory corruptions. These corruptions resulted in
core dumps and other system level errors in the guest.
In their simplest form, NVMe Physical Region Page (PRP) values in
commands indicate which physical pages to use for data transfer. The
first PRP value is not required to be page aligned but does not cross a
page boundary. The second PRP value must be page aligned, does not cross
a page boundary, and need not be contiguous with PRP1.
The code was copying Identify data past the end of PRP1. This happens to
work if PRP1 and PRP2 are physically contiguous but will corrupt guest
memory in unpredictable ways if they are not.
Fix is to copy the Identify data back to the guest piecewise (i.e. for
each PRP entry). Also fix a similarly wrong problem when copying back
Log page data.
Reviewed by: imp (mentor), araujo, jhb, rgrimes, bhyve
Approved by: imp (mentor), bhyve (jhb)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19695
The NVMe specification defines bits 13:4 of BAR0 as Reserved (i.e. 0x0).
Most drivers do not enforce this, but the Windows NVMe driver does and
will refuse to start the device (i.e. error 10) if any of these bits are
set.
The current BAR size calculation tries to minimize the amount of memory
the device reserves by scaling the BAR size by the maximum number of
queues supported by the device. But unless the device supports a large
number of queue pairs (over 1536), it will reserve too little memory.
The fix is to allocate a minimum of 16K bytes for BAR0.
Tested on Windows Server 2016 and 2019
Reviewed by: imp (mentor), araujo, jhb, bhyve
Approved by: imp (mentor), bhyve (jhb)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19676
The NVMe Identify Namespace data structure's Number of LBA Formats
(NLBAF) field is a 0's based value (i.e. 0x0 means 1). Since the
emulation only supports a single format, set NLBAF to 0x0, not 1.
Reviewed by: imp, araujo, rgrimes
Approved by: imp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19579
The function which processes Admin commands was not returning the
Command Specific value in Completion Queue Entry, Dword 0 (CDW0). This
effects commands such as Set Features, Number of Queues which returns
the number of queues supported by the device in CDW0. In this case, the
host will only create 1 queue pair (Number of Queues is zero based).
This also masked a bug in the queue counting logic.
Reviewed by: imp, araujo
Approved by: imp (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D18703
Many size / length parameters in NVMe are "0's based", meaning, a value
of 0x0 represents 1, 0x1 represents 2, etc.. While this leads to an
efficient encoding, it can lead to subtle bugs. With respect to queues,
these parameters include:
- Maximum number of queue entries
- Maximum number of queues
- Number of Completion Queues
- Number of Submission Queues
To be consistent, convert all 0's based values from the host to 1's
based value internally. Likewise, covert internal 1's based values to
0's based values when returned to the host. This fixes an off-by-one bug
when creating IO queues and simplifies some of the code. Note that this
bug is masked by another bug.
While in the neighborhood,
- fix an erroneous queue ID check (checking CQ count when deleting SQ)
- check for queue ID of 0x0 in a few places where this is illegal
- clean up the Set Features, Number of Queues command and check for
illegal values
Reviewed by: imp, araujo
Approved by: imp (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D18702
Also switch from int to size_t to keep portability.
Reviewed by: brooks
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D17795
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).
This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.
This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.
Bump __FreeBSD_version to 1200081 due to API change
Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
The initial work on bhyve NVMe device emulation was done by the GSoC student
Shunsuke Mie and was heavily modified in performan, functionality and
guest support by Leon Dang.
bhyve:
-s <n>,nvme,devpath,maxq=#,qsz=#,ioslots=#,sectsz=#,ser=A-Z
accepted devpath:
/dev/blockdev
/path/to/image
ram=size_in_MiB
Tested with guest OS: FreeBSD Head, Linux Fedora fc27, Ubuntu 18.04,
OpenSuse 15.0, Windows Server 2016 Datacenter.
Tested with all accepted device paths: Real nvme, zdev and also with ram.
Tested on: AMD Ryzen Threadripper 1950X 16-Core Processor and
Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz.
Tests at: https://people.freebsd.org/~araujo/bhyve_nvme/nvme.txt
Submitted by: Shunsuke Mie <sux2mfgj_gmail.com>,
Leon Dang <leon_digitalmsx.com>
Reviewed by: chuck (early version), grehan
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D14022