Certain operations such as checksum insertion and VLAN insertion
require the device model to rewrite the packet header. The first step
in rewriting the packet header is to copy the existing packet header
from the source packet. This copy is done by copying data from an
iovec array that corresponds to the S/G entries described by transmit
descriptors. However, if the total packet length is smaller than the
headers that need to be copied as the initial template, this copy can
overflow the iovec array and use garbage values as the source pointer
to memcpy. The PR used a single descriptor with a length of 0 in its
PoC.
To fix, track the total packet length and drop requests to transmit
packets whose payload is smaller than the required header length.
While here, fix another issue where the final descriptor could have an
invalid length (too short) that could underflow 'len' when stripping
the checksum. Skip those requests instead, too.
PR: 264372
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: grehan, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36182
This avoids type confusion where a malicious guest could rewrite the
MaxPStreams field in an endpoint context after the endpoint was
initialized causing the device model to interpret a guest provided
address (stored in ep_ringaddr of the "software" endpoint state) as a
bhyve host process address (ep_sctx_trbs). It also prevents a malicious
guest from triggering overflows of ep_sctx_trbs[] by increasing the
number of streams after the endpoint has been initialized.
Rather than re-reading the MaxPStreams value out of the endpoint context
in guest memory on subsequent operations, cache the value in the software
endpoint state. Possibly the device model should raise errors if the
value of MaxPStreams changes while an endpoint is running. This approach
simply ignores any such changes by the guest.
PR: 264294, 264347
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36181
Controllers must support the Identify Controller list if they support
Namespace Management. But the UNH NVMe tests use this command regardless
of whether the device under test supports Namespace Management.
This implementation returns an empty Controller list (i.e., Number of
Identifiers is zero).
Fixes UNH Test 1.1.2
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36193
The NVMe specification only allows Controllers compliant with the
revision 1.3 and earlier specification to report a value of 0x0 in the
No-Deallocate Modifies Media After Sanitize (NODMMAS) field.
For our revision 1.4 Controller, report that media is not modified after
Sanitize as the implementation does not implement Sanitize.
Fixes UNH Test 1.1.2
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36192
Summary:
Code was using the mask value without the shift.
Test Plan: Within FreeBSD/Linux guest, Identify NVMe controller to check the result.
Reviewed by: chuck, imp
MFC after: 2 weeks
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Differential Revision: https://reviews.freebsd.org/D32659
Summary:
Currently Active Firmware Info is not initialized.
Fix is to initialize the Active Firmware Info to Slot 1.
Test Plan: Within FreeBSD/Linux guests, show the Firmware Logpage to confirm.
Reviewed By: chuck
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32658
Summary:
Set Number of Power States Supported to indicate 1 power state. Keep the
Power State Descriptor data structures as zero to indicate "Not
reported".
Test Plan:
Within FreeBSD/Linux guests, list the number of power states and check
the Max Power value.
Reviewed By: markj, chuck
MFC after: 2 weeks
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Differential Revision: https://reviews.freebsd.org/D32657
The debug print in nvme_opc_get_log_page() would print an uninitialized
local variable.
In nvme_opc_write_read(), a failed LBA bounds check would cause
pci_nvme_stats_write_read_update() to be called with an uninitialized
variable as a parameter. Although the parameter is unused when the
check fails (and so status != 0), LLVM 14 emits some bogus machine code
in this path, which happens to result in a segfault when it gets
executed.
PR: 265749
Reviewed by: chuck, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36119
Currently these are not reported because bhyve is compiled with WARNS=2.
Let's start taking small steps towards enabling more warnings.
No functional change intended.
Reviewed by: chuck, imp, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36118
The Dataset Management code could free an uninitialized pointer if the
device doesn't support the Dataset Management command.
PR: 264548
Reported by: Robert Morris <rtm@lcs.mit.edu>
Fuzzing of bhyve uncovered an assertion failure in the NVMe emulation.
Investigation uncovered several corner cases the code did not handle.
This change handles several Controller initialization errors, including
- bad AQ sizes
- bad AQ vm_map_gpa
- doorbell writes prior to RDY
- doorbell writes to uninitialized queue
- CSTS.RDY if CFS set
PR: 256317,256319,256320,256322
Reported by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35453
Fuzzing of bhyve using hyfuzz discovered a way to cause a segmentation
fault in the NVMe emulation. If a guest specifies a physical address in
either the PRP1 or PRP2 field of a command that cannot be mapped from
guest to host, the function paddr_guest2host() returns a NULL pointer.
The NVMe emulation did not check for this error case, which allowed for
the segmentation fault to occur.
Fix is to check for a return value of NULL and indicate an error back to
the guest (Data Transfer error). While in the area, slightly refactor
the write/read blockif function to use a common error exit path.
PR: 256321
Reported by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D35452
Summary:
NVMe operations indicate the memory region(s) associated with a command
via physical region pages (PRPs). Since each PRP has a fixed size,
contiguous memory regions larger than the PRP size require multiple PRP
entries.
Instead of issuing a blockif call for each PRP, the NVMe emulation
concatenates multiple contiguous PRP entries into a single blockif
request. The test for contiguous regions has a bug such that it
mistakenly treats an initial PRP address of zero as a contiguous range
and concatenates it with the previous. But because there is no previous
IOV, the concatenation code corrupts the IO request structure and leads
to a segmentation fault when the blockif request completes.
Fix is to test for the existence of a previous range before trying to
concatenate the current range with the previous one.
While in the area, rename pci_nvme_append_iov_req()'s lba parameter to
offset to match its usage.
PR: 264177
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35328
Surrently virtio-net uses the prefix of the backing interface to
choose the backend. This patch adds an additional option "type" to
choose the backend type explicitly. This allows greater flexibility
for end users to manage bhyve specific resources (such as by naming
the tap interfaces to more descriptive names). The option "type" is
optional. When it is not presented, the backend is derived from the
name of the backend interface.
For example, the line `-s 3,virtio-net,bsdvm0,type=tap` will create a
virtio-net device for the guest using the tap interface "bsdvm0".
Adding a new "type" option preserves the current legacy format in which
the first value after virtio-net names an instance of a backend.
Note that tap interfaces not following the pattern "tap*" will not be
created on demand via devfs cloning but must be created explicitly.
Reviewed by: vmaffione, jhb
Differential Revision: https://reviews.freebsd.org/D35143
When pausing a block I/O device model as part of suspending a VM, wait
for all active block I/O requests to finish before saving snapshot
data. This avoids having to save information about in-flight requests
both in the block_if layer and in storage device models.
For the AHCI device model, the queues are now guaranteed to be idle
when taking a snapshot, so remove the code to save queue state and
rely on the initial state in a resumed VM having all queues already
idle.
This will also simplify adding NVMe snapshot support in the future.
Reviewed by: jhb
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D26267
Some software uses SMBIOS entries to identify the system on which it's
running. In order to make it possible to use such software inside a VM,
SMBIOS entries should be configurable. Therefore, bhyve_config can be
used. While only a few SMBIOS entries might be of interest, it makes
sense that all SMBIOS entries are configurable. This way all SMBIOS
tables are build the same way and there's no special handling for some
tables.
Reviewed by: jhb
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34465
virtio-console is currently missing .pe_legacy_config, which prevents any
portN configuration from being parsed, and therefore no sockets will be
created.
Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35142
Now that nvlist_send()/nvlist_recv() are being used, ditch the datagram
socket.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34863
pci_parse_legacy_config splits the options string by comma characters.
strchr returns a pointer to the first occurence of a character. In that
case, it's a comma. So, pci_parse_legacy_config will stop at the first
character and creates a new config node with a name of NULL.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34600
At the moment, writes to BAR registers that aren't 4 byte aligned are
ignored. So, there's no overflow yet. Nevertheless, if this behaviour
changes in the future, it could unintentionally, introduce a buffer
overflow. Additionally, some compiler or tools will detect this
potential overflow and complain about it.
Reviewed by: markj
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reported-by: Andy Fiddaman <andy@omniosce.org>
Differential Revision: https://reviews.freebsd.org/D34689
NAME_MAX is a better fit since strcat_extension() constructs the
filename of the snapshot file.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34291
Allows callers of vm_get_name() to retrieve the vm name without having
to allocate a buffer.
While in the vicinity, do minor cleanup in vm_snapshot_basic_metadata().
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34290
Explicitly initialize the mutex that a PCI virtio module passes back to
virtio.
It so happens that these mutexes were being initialized regardless, no
functional change intended.
Reviewed by: chuck, jhb
Differential Revision: https://reviews.freebsd.org/D34372
The Li macro is deprecated. Also, the Cm macro should be used here
instead for consistency with the rest of the manual and style.mdoc(5).
Fixes: e47fe3183e1f bhyve: add ROM emulation
MFC after: 1 month
For backward compatibility, the memory size will be interpreted in MB if
it's smaller than1 MB and has no suffix. Nowadays, the -m switch accepts
more than just MB. Respect it in the usage message.
Differential Revision: https://reviews.freebsd.org/D34506
Reviewed by: grehan
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).
It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.
Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
Export functions for reading and writing the pci config space from passthru
device to be used by other devices.
This is required for lpc devices to set their vendor/device ids to their
physical values.
Otherwise, GPU passthrough for integrated Intel GPUs won't work properly.
Differential Revision: https://reviews.freebsd.org/D33769
Reviewed by: markj
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
Even today it is possible to specify pinning for a vCPU higher than
the configured number of CPUs but lower than VM_MAXCPU without raising
an error.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34492
Use basl_ncpu instead of VM_MAXCPU in MADT_SIZE. Since several of the
offsets are no longer compile time constants, unroll the loop
generating ACPI tables.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34490
Use seperate nvlist entries for the romfile and the varfile.
While here, don't leak varfd in bootrom_loadrom().
Reviewed by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D33433
Advertise Namespace Attribute Notices events in the Optional
Asynchronous Events Supported (OAES) field of the Identify Controller
data structure. Additionally, rename the enums and macros to clarify
these are AEN's related to Notices and not generic information.
Reported by: andy@omniosce.org
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D34331