576 Commits

Author SHA1 Message Date
vmaffione
2ad2dfa579 bhyve: uniform printf format string newlines
Some of the printf statements only use LF to get a newline. However, a CR character is also required for the serial console to print debug logs in a nice way.
Fix those code locations that only use LF, by adding a CR character.

Reviewed by:	markj, aleksandr.fedorov@itglobal.com
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22552
2019-12-02 20:51:46 +00:00
vmaffione
9efa1d846d bhyve: virtio-net: disable receive until features are negotiated
This patch fixes a race condition where the receive callback is called
while the device is being reset. Since the rx_merge variable may change
during reset, the receive callback may operate inconsistently with what
the guest expects.
Also, get rid of the unused rx_vhdrlen variable.

PR:	242023
Reported by:	aleksandr.fedorov@itglobal.com
Reviewed by:	markj, jhb
MFC with:	r354552
Differential Revision:	https://reviews.freebsd.org/D22440
2019-11-19 21:10:44 +00:00
vmaffione
ce475b8a78 bhyve: rework mevent processing to fix a race condition
At the end of both mevent_add() and mevent_update(), mevent_notify()
is called to wakeup the I/O thread, that will call kevent(changelist)
to update the kernel.
A race condition is possible where the client calls mevent_add() and
mevent_update(EV_ENABLE) before the I/O thread has the chance to wake
up and call mevent_build()+kevent(changelist) in response to mevent_add().
The mevent_add() is therefore ignored by the I/O thread, and
kevent(fd, EV_ENABLE) is called before kevent(fd, EV_ADD), resuliting
in a failure of the kevent(fd, EV_ENABLE) call.

PR:	241808
Reviewed by:	jhb, markj
MFC with:	r354288
Differential Revision:	https://reviews.freebsd.org/D22286
2019-11-12 21:07:51 +00:00
vmaffione
bbda4a35c2 bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.

This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.

Reviewed by:    jhb
MFC after:      3 weeks
Differential Revision:	https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
vmaffione
621be0aeda bhyve: add backend rx backpressure to virtio-net
If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by:	bryanv, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20987
2019-11-03 19:02:32 +00:00
vmaffione
e1bd0b5308 bhyve: fix mistake introduced by r352841
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20973
2019-11-03 18:53:42 +00:00
jkim
e7c86565cd Catch up with ACPICA 20191018.
PR:		241467
XMFC with:	r353764
2019-10-24 22:33:46 +00:00
vmaffione
fc21e0e751 bhyve: support for enabling/disabling the net backend
Extend the net backend interface with two functions, namely netbe_rx_disable()
and netbe_rx_enable(), which can be used by the net device emulators to stop
the backend from invoking the receive callback. This is useful for device
emulators, i.e., on hardware resets or to implement receive backpressure.
The mevent module has been extendede to support the addition of a disabled
event. To prevent race conditions, the net backends will start with receive
operation disabled. A follow-up patch will use the new functionalities in
the virtio-net device.

Reviewed by:	jhb, markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D20973
2019-09-28 12:02:43 +00:00
jhb
242d0be1ff Validate guest-supplied length of headers for TSO transmit requests.
When transmitting a large TCP packet, the final transmit descriptor
includes the length of the protocol headers to be duplicated on each
segment.  The device model was trusting the guest-supplied value
without validating it.  A value of zero would result in the guest
being able to indirect a garbage pointer on the stack to overwrite
arbitrary memory in the bhyve process.  A value that was non-zero but
too small for the requested parameters resulted in the device model
reading and writing values beyond the end of the on-stack buffer used
to hold the template header.

To fix, validate the supplied length and drop requests to transmit
packets that would overflow the header buffer.  While here, initialize
the header pointer to NULL as a preventive measure so that any access
to an unallocated template header crashes they hypervisor
deterministically.

While here, only read the TCP sequence number if the packet being
split is a TCP packet.  The e1000 logic supports a segmentation of UDP
frames, and while UDP segmentation requires this part of the header to
be valid (so there is no buffer overflow), only reading the field when
needed is cleaner.

admbugs:	918
Reported by:	Reno Robert <renorobert@gmail.com>
Reviewed by:	markj
Approved by:	so
Security:	CVE-2019-5609
2019-08-05 21:39:55 +00:00
scottl
b151e337d8 Fix the register layout for the Buffer Descript List Entry. It
got jumbled around during some other cleanups and was causing
audio failures on some guests.

PR:		239341
Reported by:	shamaz.mazum@gmail.com
2019-07-23 18:40:07 +00:00
emaste
baac2f0930 bhyve: correct out-of-bounds read in XHCI device emulation
Add appropriate bounds checks on the epid and streamid fields in the
device doorbell registers.

admbugs:	919
Submitted by:	jhb
Reported by:	Reno Robert <renorobert@gmail.com>
Reviewed by:	markj
Approved by:	so
Security:	out-of-bounds read
2019-07-23 16:27:36 +00:00
chuck
b9ac5b2377 bhyve: update the NVMe CQ based on the status
Instead of skipping the NVMe Completion Queue update based on the
opcode, define a synthetic status value which indicates the completion
queue entry is invalid. This will also allow deferred completion queue
updates for other commands.

Also returns the correct status for unrecognized opcodes ("invalid
opcode").

Reviewed by:	imp, jhb, araujo
Approved by:	imp (mentor), jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D20945
2019-07-17 03:19:30 +00:00
chuck
2449e50de0 bhyve: Create EUI64 for NVMe namespaces
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
 - the FreeBSD IEEE OUI
 - PCI bus, device/slot, function values
 - Namespace ID

Reviewed by:	imp, araujo, jhb, rgrimes
Approved by:	imp (mentor), jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D19905
2019-07-13 12:48:28 +00:00
seanc
32757e2797 usr.sbin/bhyve: close backend file descriptor during tap init error
Coverity CID:	1402953
Reviewed by:	scottl, markj, aleksandr.fedorov -at- itglobal.com
Approved by:	vmaffione, jhb
Differential Revision:	https://reviews.freebsd.org/D20913
2019-07-12 18:50:46 +00:00
seanc
37baa108b8 usr.sbin/bhyve: only unassign a pt device after obtaining bus/slot/func
Coverity CID:	1194302, 1194303, 1194304
Approved by:	jhb, markj
Differential Revision:	https://reviews.freebsd.org/D20933
2019-07-12 18:33:58 +00:00
seanc
c1d34a7231 usr.sbin/bhyve: free resources when erroring out of pci_vtcon_sock_add()
Coverity CID:	1362880
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20916
2019-07-12 18:20:56 +00:00
seanc
862e8ad26b usr.sbin/bhyve: prevent use-after-free in virtio scsi request handling
Coverity CID:	1393377
Approved by:	araujo, jhb
Differential Revision:	https://reviews.freebsd.org/D20915
2019-07-12 18:17:35 +00:00
seanc
e8c1d8680e usr.sbin/bhyve: don't leak a FD if the device is not a tty
Coverity CID:	1194193
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20934
2019-07-12 18:13:58 +00:00
seanc
9a5b5c49bc usr.sbin/bhyve: unconditionally initialize the NVMe completion status
Follow-up work to improve the handling of unsupported/invalid opcodes
is being developed by chuck@.

Coverity CID:	1398928
Reviewed by:	chuck
Approved by:	araujo, imp
Differential Revision:	https://reviews.freebsd.org/D20914
2019-07-12 05:53:13 +00:00
seanc
d582c0105f usr.sbin/bhyve: free resources when erroring out of pci_vtnet_init()
Coverity CID:	1402978
Approved by:	vmaffione
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D20912
2019-07-12 05:19:37 +00:00
seanc
7d876eb100 usr.sbin/bhyve: send an initialized value to wake up blocking kqueue
This is a no-op initialization because nothing reads this value.  "This
wasn't wrong previously, but this is more correct now." -imp

Coverity CID:	1194307
Approved by:	markj, imp, scottl
Differential Revision:	https://reviews.freebsd.org/D20921
2019-07-11 23:54:50 +00:00
seanc
6186c641de usr.sbin/bhyve: commit miss from r349918
Submitted by:	markj
Approved by:	markj
Differential Revision:	https://reviews.freebsd.org/D20918
2019-07-11 19:51:33 +00:00
seanc
1c38da32d4 usr.sbin/bhyve: free leaked memory during option parsing
Also update to use strsep(3) instead of strtok(3).

Most of this commit inadvertently ended up in r349914.

Coverity CID:	1357337
Approved by:	markj
PR:		233038
Differential Revision:	https://reviews.freebsd.org/D20918
2019-07-11 19:41:14 +00:00
seanc
e2b54523d9 usr.sbin/bhyve: initialize return value in xhci device interrupt handler
Coverity CID:	1357340
Approved by:	scottl, markj
Differential Revision:	https://reviews.freebsd.org/D20917
2019-07-11 19:26:35 +00:00
seanc
ef5561a2c4 usr.sbin/bhyve: free resources if there is an initialization error in rfb
Coverity CID:	1357335
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20919
2019-07-11 19:07:45 +00:00
vmaffione
c098642c9b bhyve: net_backends.c: add missing __FBSDID
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20883
2019-07-09 22:05:58 +00:00
vmaffione
fa786d0d8c bhyve: add missing license identifiers in net_utils and net_backend
Reviewed by:	jhb, markj, imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20874
2019-07-09 22:04:33 +00:00
vmaffione
c8edafdabc bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Reviewed by:	jhb, bryanv
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
seanc
88bb3225d5 bhyve/audio: don't leak resources on failed initialization.
Coverity CID:	1402793
Approved by:	markj, jhb, bhyve
Differential Revision:	https://reviews.freebsd.org/D20841
2019-07-03 17:24:24 +00:00
wblock
e7d5018503 Correct name of vmm(4) pptdevs variable.
Reported by:	nwolff@ixsystems.com
2019-07-02 14:53:51 +00:00
jhb
1e772efaed Use __FBSDID() and sort #includes.
No functional change.
2019-06-27 21:45:40 +00:00
emaste
aa773f3e23 bhyve: avoid theoretical stack buffer overflow from integer overflow
Use the proper size_t type to match strlen's return type.  This is not
exploitable in practice as this parses command line arguments, which
are limited to well below 2^31 bytes.

This is a minimal change to address the reported issue; hda_parse_config
and the rest of this file will benefit from further review.

Reported by:	Fakhri Zulkifli
Reviewed by:	jhb, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-06-25 19:06:43 +00:00
araujo
5030363470 Add SPDX tags to bhyve(8) HD Audio device.
Reviewed by:	bcran
Differential Revision:	https://reviews.freebsd.org/D20750
2019-06-25 06:24:56 +00:00
imp
b86e15505e Replay r349336 by scottl accidentally reverted by r349352
Add a section about the HD Audio module support
2019-06-25 06:14:11 +00:00
imp
650d07809a Replay r349335 by scottl accidentally reverted by r349352
Add the PCI HDAudio device model from the 2016 GSoC.  Detailed information
can be found at

https://wiki.freebsd.org/SummerOfCode2016/HDAudioEmulationForBhyve

This commit has evolved from the original work to include Capsicum
integration.  As part of that, it only opens the host audio devices once
and leaves them open, instead of opening and closing them on each guest
access.  Thanks to Peter Grehan and Marcelo Araujo for their help in
bringing the work forward and providing some of the final techncial push.

Submitted by:	Alex Teaca <iateaca@freebsd.org>
Differential Revision:	D7840, D12419
2019-06-25 06:14:05 +00:00
imp
0ea6c510f8 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
scottl
8d6977f94a Add a section about the HD Audio module support 2019-06-24 19:42:32 +00:00
scottl
9a56ca1b83 Add the PCI HDAudio device model from the 2016 GSoC. Detailed information
can be found at

https://wiki.freebsd.org/SummerOfCode2016/HDAudioEmulationForBhyve

This commit has evolved from the original work to include Capsicum
integration.  As part of that, it only opens the host audio devices once
and leaves them open, instead of opening and closing them on each guest
access.  Thanks to Peter Grehan and Marcelo Araujo for their help in
bringing the work forward and providing some of the final techncial push.

Submitted by:	Alex Teaca <iateaca@freebsd.org>
Differential Revision:	D7840, D12419
2019-06-24 19:31:32 +00:00
vangyzen
4c3575eb30 bhyve: Fix vtscsi maximum segment config
The seg_max value reported to the guest should be two less than the
host's maximum, in order to leave room for the request and the
response.  This is analogous to r347033 for virtio_block.

We hit the "too many segments to enqueue" assertion on OneFS because
we increase MAXPHYS to 256 KB.

Reviewed by:	bryanv
Discussed with:	cem jhb rgrimes
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20529
2019-06-21 18:57:33 +00:00
markj
395e29f3c0 Make zlib encoding messages idempotent.
Otherwise duplicate messages can trigger a reinitialization of the
compression stream while the update thread is running.  Also ensure
that the stream is initialized before the update thread may attempt
to use it.

PR:		238333
Reviewed by:	cem, rgrimes
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20673
2019-06-19 16:09:20 +00:00
vmaffione
9fc6ef7193 bhyve: vtnet: fix locking on receive
The vsc_rx_ready and the RX virtqueue is protected by the rx_mtx lock.
However, pci_vtnet_ping_rxq() (currently called only once after each
device reset) accesses those without acquiring the lock.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20609
2019-06-18 17:51:30 +00:00
vmaffione
6ca4e46aa6 bhyve: move common code to net_utils.c
Both virtio_net and e82545 network frontends have code to validate and
generate MAC addresses. These functionalities are replicated in the two
files, so we move them in a separate compilation unit.

Reviewed by:	rgrimes, bryanv, imp, kevans
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20626
2019-06-13 17:39:32 +00:00
jhb
244c928774 Remove a spurious break when setting up a 64-bit memory BAR.
This was causing 'enbit' to not be initialized in this case.

CID:		1401924
Reported by:	Coverity
MFC after:	1 week
2019-06-12 16:49:01 +00:00
vmaffione
59105dcf07 bhyve: virtio: introduce vq_kick_enable() and vq_kick_disable()
The VirtIO standard supports two schemes for notification suppression:
a notification enable bit and a more sophisticated one (event_idx) that
also supports delayed notifications. Currently bhyve fully supports
only the first scheme. This patch hides the notification suppression
internals by means of two inline routines, vq_kick_enable() and
vq_kick_disable(), and makes the code more readable.
Moreover, further improve readability by replacing the call to mb()
with a call to atomic_thread_fence_seq_cst(), which is already used
in virtio.c

Reviewed by:	pmooney_pfmooney.com, bryanv
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20581
2019-06-11 15:52:41 +00:00
vmaffione
e5906c2293 bhyve: vtnet: simplify thread synchronization
On vtnet device reset it is necessary to wait for threads to stop TX and
RX processing. However, the rx_in_progress variable (used for to wait for
RX processing to stop) is actually useless, and can be removed. Acquiring
and releasing the RX lock is enough to synchronize correctly. Moreover,
it is possible to reset the device while holding both TX and RX locks, so
that the "resetting" variable becomes unnecessary for the RX thread, and
can be protected by the TX lock (instead of being volatile).

Reviewed by:	jhb, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20543
2019-06-09 12:41:21 +00:00
chuck
f04fbdcb9d bhyve: Add PCIe Integrated Endpoint capability
The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:

nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link

The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.

Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.

Reviewed by:	imp, ken, araujo, jhb, rgrimes
Approved by:	imp (mentor), ken (mentor), jhb (maintainer)
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D19904
2019-06-07 17:09:49 +00:00
jhb
8e4b03df46 Keep the shadow PCIR_COMMAND synced with the real one for pass through.
This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices.  To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.

Reviewed by:	rgrimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20531
2019-06-07 15:53:27 +00:00
jhb
44305ec379 Enable memory and I/O decoding in PCI devices on demand.
Rather than uncoditionally setting the MEMEN and PORTEN bits in
PCIR_COMMAND for PCI devices, set the respective bit when the first
BAR of a given type is added to the device.  This more closely matches
what firmware does on bare metal.

BUSMASTEREN is still set unconditionally.  Eventually this bit should
move into the device models as not all device models need this set.

Reviewed by:	rgrimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20530
2019-06-07 15:48:12 +00:00
jhb
a9755a09c9 Use parse_integer to avoid sign extension.
Coverity warned about gdb_write_mem sign extending the result of
parse_byte shifted left by 24 bits when generating a 32-bit memory
write value for MMIO.  Simplify the code by using parse_integer
instead of unrolled parse_byte calls.

CID:		1401600
Reviewed by:	cem
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D20508
2019-06-05 23:37:50 +00:00
jhb
004c60515d Don't simulate PBA access if the PBA is in a separate BAR.
bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to.  For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.

bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table.  However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.

However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer.  Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.

Reported by:	gallatin
Tested by:	gallatin
Reviewed by:	markj, Patrick Mooney
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20523
2019-06-05 19:29:02 +00:00