The netmap-specific data stored at be->opaque is freed by the caller
on error as part of freeing be.
Reviewed by: markj
Reported by: GCC -Wfree-nonheap-object
Differential Revision: https://reviews.freebsd.org/D36828
- Add const and __unused qualifiers where appropriate.
- Localize some global variables.
- Consistently spell vmexit state as "vme" in vmexit handlers, to avoid
shadowing the global vm_exit state array.
- Similarly, avoid shadowing "optarg".
MFC after: 2 weeks
This is easier to read and addresses some compiler warnings.
One might expect these tables to be read-only but it seems that the
snapshot/restore code may modify them.
MFC after: 2 weeks
- Avoid arithmetic on void pointers.
- Avoid a signed/unsigned comparison in loops which write or fill audio
data buffers.
Convert while loops to for loops while here.
MFC after: 2 weeks
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG
vcpus could be restarted by the guest by sending an INIT SIPI SIPI
sequence to a vcpu. That's not supported by bhyve yet but it will be
supported in a future commit. So, create the vcpu threads only once on
startup to make restarting a vcpu easier.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35621
Sponsored by: Beckhoff Automation GmbH & Co. KG
- Ignore I/O requests with insufficiently sized input or output
buffers (those not containing compete request headers).
- Ignore control requests with improperly sized buffers.
- While here, explicitly zero the output header of an I/O request to
avoid leaking malloc garbage from the host if the header is not
fully populated.
PR: 264521
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mav, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36271
Use a consistent prefix ("virtio-scsi: ") similar to the e1000 device
model.
Reviewed by: mav, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36270
When preparing to transmit pending packets, ensure that the head (TDH)
and tail (TDT) indices are in bounds. Note that validating values
when they are written is not sufficient along as the transmit length
(TDLEN) could be changed turning a value that was valid when written
into an out of bounds value.
While here, add further restrictions to the head register (TDH). The
manual states that writing to this value while transmit is enabled can
cause unexpected behavior and that it should only be written after a
reset. As such, ignore attempts to write while transmit is active,
and also ignore writes of non-zero values. Later e1000 chipsets have
this register as read-only.
Also ignore any attempts to transmit packets if the transmit ring's
size is zero.
PR: 264567
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36269
Permit naming pass through devices using the syntax accepted by
pciconf (pci[<domain>:]<bus>:<slot>:<func>) as well as by device name
(e.g. "ppt0").
While here, fix an error in the manpage that had the bus and slot
arguments for the original /-delimited scheme swapped.
Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36147
Certain operations such as checksum insertion and VLAN insertion
require the device model to rewrite the packet header. The first step
in rewriting the packet header is to copy the existing packet header
from the source packet. This copy is done by copying data from an
iovec array that corresponds to the S/G entries described by transmit
descriptors. However, if the total packet length is smaller than the
headers that need to be copied as the initial template, this copy can
overflow the iovec array and use garbage values as the source pointer
to memcpy. The PR used a single descriptor with a length of 0 in its
PoC.
To fix, track the total packet length and drop requests to transmit
packets whose payload is smaller than the required header length.
While here, fix another issue where the final descriptor could have an
invalid length (too short) that could underflow 'len' when stripping
the checksum. Skip those requests instead, too.
PR: 264372
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: grehan, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36182
This avoids type confusion where a malicious guest could rewrite the
MaxPStreams field in an endpoint context after the endpoint was
initialized causing the device model to interpret a guest provided
address (stored in ep_ringaddr of the "software" endpoint state) as a
bhyve host process address (ep_sctx_trbs). It also prevents a malicious
guest from triggering overflows of ep_sctx_trbs[] by increasing the
number of streams after the endpoint has been initialized.
Rather than re-reading the MaxPStreams value out of the endpoint context
in guest memory on subsequent operations, cache the value in the software
endpoint state. Possibly the device model should raise errors if the
value of MaxPStreams changes while an endpoint is running. This approach
simply ignores any such changes by the guest.
PR: 264294, 264347
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36181
Controllers must support the Identify Controller list if they support
Namespace Management. But the UNH NVMe tests use this command regardless
of whether the device under test supports Namespace Management.
This implementation returns an empty Controller list (i.e., Number of
Identifiers is zero).
Fixes UNH Test 1.1.2
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36193
The NVMe specification only allows Controllers compliant with the
revision 1.3 and earlier specification to report a value of 0x0 in the
No-Deallocate Modifies Media After Sanitize (NODMMAS) field.
For our revision 1.4 Controller, report that media is not modified after
Sanitize as the implementation does not implement Sanitize.
Fixes UNH Test 1.1.2
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36192
Summary:
Code was using the mask value without the shift.
Test Plan: Within FreeBSD/Linux guest, Identify NVMe controller to check the result.
Reviewed by: chuck, imp
MFC after: 2 weeks
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Differential Revision: https://reviews.freebsd.org/D32659
Summary:
Currently Active Firmware Info is not initialized.
Fix is to initialize the Active Firmware Info to Slot 1.
Test Plan: Within FreeBSD/Linux guests, show the Firmware Logpage to confirm.
Reviewed By: chuck
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32658
Summary:
Set Number of Power States Supported to indicate 1 power state. Keep the
Power State Descriptor data structures as zero to indicate "Not
reported".
Test Plan:
Within FreeBSD/Linux guests, list the number of power states and check
the Max Power value.
Reviewed By: markj, chuck
MFC after: 2 weeks
Signed-off-by: Wanpeng Qian <wanpengqian@gmail.com>
Differential Revision: https://reviews.freebsd.org/D32657
The debug print in nvme_opc_get_log_page() would print an uninitialized
local variable.
In nvme_opc_write_read(), a failed LBA bounds check would cause
pci_nvme_stats_write_read_update() to be called with an uninitialized
variable as a parameter. Although the parameter is unused when the
check fails (and so status != 0), LLVM 14 emits some bogus machine code
in this path, which happens to result in a segfault when it gets
executed.
PR: 265749
Reviewed by: chuck, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36119
Currently these are not reported because bhyve is compiled with WARNS=2.
Let's start taking small steps towards enabling more warnings.
No functional change intended.
Reviewed by: chuck, imp, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36118
The Dataset Management code could free an uninitialized pointer if the
device doesn't support the Dataset Management command.
PR: 264548
Reported by: Robert Morris <rtm@lcs.mit.edu>
Fuzzing of bhyve uncovered an assertion failure in the NVMe emulation.
Investigation uncovered several corner cases the code did not handle.
This change handles several Controller initialization errors, including
- bad AQ sizes
- bad AQ vm_map_gpa
- doorbell writes prior to RDY
- doorbell writes to uninitialized queue
- CSTS.RDY if CFS set
PR: 256317,256319,256320,256322
Reported by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35453
Fuzzing of bhyve using hyfuzz discovered a way to cause a segmentation
fault in the NVMe emulation. If a guest specifies a physical address in
either the PRP1 or PRP2 field of a command that cannot be mapped from
guest to host, the function paddr_guest2host() returns a NULL pointer.
The NVMe emulation did not check for this error case, which allowed for
the segmentation fault to occur.
Fix is to check for a return value of NULL and indicate an error back to
the guest (Data Transfer error). While in the area, slightly refactor
the write/read blockif function to use a common error exit path.
PR: 256321
Reported by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D35452
Summary:
NVMe operations indicate the memory region(s) associated with a command
via physical region pages (PRPs). Since each PRP has a fixed size,
contiguous memory regions larger than the PRP size require multiple PRP
entries.
Instead of issuing a blockif call for each PRP, the NVMe emulation
concatenates multiple contiguous PRP entries into a single blockif
request. The test for contiguous regions has a bug such that it
mistakenly treats an initial PRP address of zero as a contiguous range
and concatenates it with the previous. But because there is no previous
IOV, the concatenation code corrupts the IO request structure and leads
to a segmentation fault when the blockif request completes.
Fix is to test for the existence of a previous range before trying to
concatenate the current range with the previous one.
While in the area, rename pci_nvme_append_iov_req()'s lba parameter to
offset to match its usage.
PR: 264177
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35328
Surrently virtio-net uses the prefix of the backing interface to
choose the backend. This patch adds an additional option "type" to
choose the backend type explicitly. This allows greater flexibility
for end users to manage bhyve specific resources (such as by naming
the tap interfaces to more descriptive names). The option "type" is
optional. When it is not presented, the backend is derived from the
name of the backend interface.
For example, the line `-s 3,virtio-net,bsdvm0,type=tap` will create a
virtio-net device for the guest using the tap interface "bsdvm0".
Adding a new "type" option preserves the current legacy format in which
the first value after virtio-net names an instance of a backend.
Note that tap interfaces not following the pattern "tap*" will not be
created on demand via devfs cloning but must be created explicitly.
Reviewed by: vmaffione, jhb
Differential Revision: https://reviews.freebsd.org/D35143
When pausing a block I/O device model as part of suspending a VM, wait
for all active block I/O requests to finish before saving snapshot
data. This avoids having to save information about in-flight requests
both in the block_if layer and in storage device models.
For the AHCI device model, the queues are now guaranteed to be idle
when taking a snapshot, so remove the code to save queue state and
rely on the initial state in a resumed VM having all queues already
idle.
This will also simplify adding NVMe snapshot support in the future.
Reviewed by: jhb
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D26267
Some software uses SMBIOS entries to identify the system on which it's
running. In order to make it possible to use such software inside a VM,
SMBIOS entries should be configurable. Therefore, bhyve_config can be
used. While only a few SMBIOS entries might be of interest, it makes
sense that all SMBIOS entries are configurable. This way all SMBIOS
tables are build the same way and there's no special handling for some
tables.
Reviewed by: jhb
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34465
virtio-console is currently missing .pe_legacy_config, which prevents any
portN configuration from being parsed, and therefore no sockets will be
created.
Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35142
Now that nvlist_send()/nvlist_recv() are being used, ditch the datagram
socket.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34863
pci_parse_legacy_config splits the options string by comma characters.
strchr returns a pointer to the first occurence of a character. In that
case, it's a comma. So, pci_parse_legacy_config will stop at the first
character and creates a new config node with a name of NULL.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34600
At the moment, writes to BAR registers that aren't 4 byte aligned are
ignored. So, there's no overflow yet. Nevertheless, if this behaviour
changes in the future, it could unintentionally, introduce a buffer
overflow. Additionally, some compiler or tools will detect this
potential overflow and complain about it.
Reviewed by: markj
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reported-by: Andy Fiddaman <andy@omniosce.org>
Differential Revision: https://reviews.freebsd.org/D34689
NAME_MAX is a better fit since strcat_extension() constructs the
filename of the snapshot file.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34291