Commit Graph

598 Commits

Author SHA1 Message Date
Rebecca Cran
e23fe873b2 Bhyve: fix SMBIOS Type 17 table generation
According to the SMBIOS specification (revision 2.7 or newer), the
extended module size field should only be used for sizes that can't
fit in the older size field.

Reviewed by:	rgrimes, grehan, jhb
Differential Revision:	https://reviews.freebsd.org/D24107
2020-03-31 02:36:39 +00:00
Chuck Tuffli
1264a2b909 bhyve: fix NVMe emulation update of SQHD
The SQHD field of a Completion Queue entry indicates the current
Submission Queue head pointer value. The head pointer represents the
next entry to be consumed and is updated after consuming the current
entry.

In the Admin queue processing, the current code updates the head pointer
after reporting the value to the host via the SQHD. This gives the
impression that the Controller is perpetually one command behind in its
processing of the Admin SQ. And while this doesn't appear to bother some
initiators, it is wrong.

Fix is to update the SQ head pointer prior to writing the SQHD value in
the completion.

While here, fix missed update of dword 0 (cdw0) in the completion
message.

Reported by:	khng300
Reviewed by:	jhb, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24083
2020-03-27 15:28:27 +00:00
Chuck Tuffli
961be12f6a bhyve: fix NVMe emulation missed interrupts
The bhyve NVMe emulation has a race in the logic which generates command
completion interrupts. On FreeBSD guests, this manifests as kernel log
messages similar to:
    nvme0: Missing interrupt

The NVMe emulation code sets a per-submission queue "busy" flag while
processing the submission queue, and only generates an interrupt when
the submission queue is not busy.

Aside from being counter to the NVMe design (i.e. interrupt properties
are tied to the completion queue) and adding complexity (e.g. exceptions
to not generating an interrupt when "busy"), it causes a race condition
under the following conditions:
 - guest OS has no outstanding interrupts
 - guest OS submits a single NVMe IO command
 - bhyve emulation processes the SQ and sets the "busy" flag
 - bhyve emulation submits the asynchronous IO to the backing storage
 - IO request to the backing storage completes before the SQ processing
   loop exits and doesn't generate an interrupt because the SQ is "busy"
 - bhyve emulation finishes processing the SQ and clears the "busy" flag

Fix is to remove the "busy" flag and generate an interrupt when the CQ
head and tail pointers do not match.

Reported by:	khng300
Reviewed by:	jhb, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24082
2020-03-27 15:28:22 +00:00
Chuck Tuffli
f3e46ff932 bhyve: use STAILQ in NVMe emulation
Use the standard queue(3) macros instead of hand-crafted linked list
code.

Reviewed by:	imp, jhb
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24081
2020-03-27 15:28:16 +00:00
Chuck Tuffli
cd65e08916 bhyve: implement NVMe deallocate command
This adds support for the Dataset Management (DSM) command to the NVMe
emulation in general, and more specifically, for the deallocate
attribute (a.k.a. trim in the ATA protocol). If the backing storage for
the namespace supports delete (i.e. deallocate), setting the deallocate
attribute in a DSM will trim/delete the requested LBA ranges in the
underlying storage.

Reviewed by:	jhb, araujo, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21839
2020-03-27 15:28:11 +00:00
Chuck Tuffli
d31d525ef5 bhyve: refactor NVMe namespace initialization
Pass the struct pci_nvme_blockstore pointer for this namespace to the
namespace initialization function instead of only the desired eui64
value.

Minor functional change in that the code updates the eui64 value in the
blockstore.

Reviewed by:	jhb, araujo
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21838
2020-03-27 15:28:05 +00:00
Chuck Tuffli
da8de3e9a8 bhyve: refactor NVMe PRP memcpy
Add a "copy direction" parameter to nvme_prp_memcpy such that data can
be copied to the memory specified by the PRP entries (current behavior)
or copied from the PRP entries (new behavior). The upcoming deallocate
functionality will use the copy from capability.

Reviewed by:	jhb, araujo
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21837
2020-03-27 15:28:00 +00:00
Rebecca Cran
a717adb5a0 Bhyve: log message when rfb client connects
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D24098
2020-03-18 03:17:15 +00:00
Rebecca Cran
cbd7ddcf65 Bhyve: DPRINTF already includes newline, so don't add another
Reviewed by:	jhb, vmaffione, emaste
Differential Revision:	https://reviews.freebsd.org/D24099
2020-03-18 03:15:57 +00:00
John Baldwin
98ee12e64e Use stream_read() to read all 12 bytes of the RFB client version.
read() can return a short read, whereas stream_read() waits until the
full version string is read.

Submitted by:	Ka Ho Ng <khng300_gmail.com>
Reviewed by:	grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23591
2020-02-27 16:51:41 +00:00
Vincenzo Maffione
f92bb8c19a bhyve: enable virtio-net mergeable rx buffers for tap(4)
This patch adds a new netbe_peek_recvlen() function to the net
backend API. The new function allows the virtio-net receive code
to know in advance how many virtio descriptors chains will be
needed to receive the next packet. As a result, the implementation
of the virtio-net mergeable rx buffers feature becomes efficient,
so that we can enable it also with the tap(4) backend. For the
tap(4) backend, a bounce buffer is introduced to implement the
peeck_recvlen() callback, which implies an additional packet copy
on the receive datapath. In the future, it should be possible to
remove the bounce buffer (and so the additional copy), by
obtaining the length of the next packet from kevent data.

Reviewed by:    grehan, aleksandr.fedorov@itglobal.com
MFC after:      1 week
Differential Revision:	https://reviews.freebsd.org/D23472
2020-02-20 21:07:23 +00:00
Konstantin Belousov
5a6d45d015 bhyve, bhyvectl: Add Hygon Dhyana support.
Submitted by:	Pu Wen <puwen@hygon.cn>
Reviewed by:	jhb
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23554
2020-02-13 19:05:14 +00:00
Vincenzo Maffione
66c662b005 bhyve: move virtio-net header processing to pci_virtio_net
This patch cleans up the API between the net frontends (e1000,
virtio-net) and the net backends (tap and netmap).
We move the virtio-net header stripping/prepending to the
virtio-net code, where this functionality belongs.
In this way, the netbe_send() and netbe_recv() signatures
can have const struct iov * rather than struct iov *.

Reviewed by:	grehan, bcr, aleksandr.fedorov@itglobal.com
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23342
2020-02-12 22:44:18 +00:00
Vincenzo Maffione
332eff95e3 bhyve: add wrapper for debug printf statements
Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by:	grehan, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22657
2020-01-08 22:55:22 +00:00
John Baldwin
9b078661c4 Trim a spurious carriage return from the RFB signature string added in r355301.
Submitted by:	Yamagi <lists@yamagi.org>
2019-12-19 15:36:00 +00:00
John Baldwin
cbd03a9df2 Support software breakpoints in the debug server on Intel CPUs.
- Allow the userland hypervisor to intercept breakpoint exceptions
  (BP#) in the guest.  A new capability (VM_CAP_BPT_EXIT) is used to
  enable this feature.  These exceptions are reported to userland via
  a new VM_EXITCODE_BPT that includes the length of the original
  breakpoint instruction.  If userland wishes to pass the exception
  through to the guest, it must be explicitly re-injected via
  vm_inject_exception().

- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
  pseudo-register.  Injecting a BP# on Intel requires setting this to
  the length of the breakpoint instruction.  AMD SVM currently ignores
  writes to this register (but reports success) and fails to read it.

- Rework the per-vCPU state tracked by the debug server.  Rather than
  a single 'stepping_vcpu' global, add a structure for each vCPU that
  tracks state about that vCPU ('stepping', 'stepped', and
  'hit_swbreak').  A global 'stopped_vcpu' tracks which vCPU is
  currently reporting an event.  Event handlers for MTRAP and
  breakpoint exits loop until the associated event is reported to the
  debugger.

  Breakpoint events are discarded if the breakpoint is not present
  when a vCPU resumes in the breakpoint handler to retry submitting
  the breakpoint event.

- Maintain a linked-list of active breakpoints in response to the GDB
  'Z0' and 'z0' packets.

Reviewed by:	markj (earlier version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D20309
2019-12-13 19:21:58 +00:00
John Baldwin
976ba8c6b2 Document that the debug server supports writing to guest memory.
This was added in r348212.
2019-12-13 02:18:44 +00:00
John Baldwin
dd58314395 Fix a mismerge in r355683 and remove the local gdb_port from main. 2019-12-13 02:15:34 +00:00
John Baldwin
cd333f156c Don't call into the debug server if it isn't configured.
Reviewed by:	markj (as part of a larger diff)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D20309
2019-12-13 01:17:20 +00:00
John Baldwin
c7ba149dba Emulate reads of the PCI command register for passthrough devices.
VFs return zero for the memory enable bit even if it has been set by a
prior write.  After r348779 this caused the annoying behavior that a
guest OS would unintentionally disable memory decoding on a future
read-modify-write operation on the command register.  Instead, return
the shadow value of the command register for reads.  This ensures that
the guest will only toggle the state of the memory enable bit when it
specifically intends to do so.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2019-12-11 23:41:39 +00:00
Simon J. Gerraty
2c9a9dfc18 Update Makefile.depend files
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22494
2019-12-11 17:37:53 +00:00
Simon J. Gerraty
5ab1c5846f Add Makefile.depend.options
Leaf directories that have dependencies impacted
by options need a Makefile.depend.options file
to avoid churn in Makefile.depend

DIRDEPS for cases such as OPENSSL, TCP_WRAPPERS etc
can be set in local.dirdeps-options.mk
which can add to those set in Makefile.depend.options

See share/mk/dirdeps-options.mk

Reviewed by:	 bdrewery
MFC after:	1 week
Sponsored by:   Juniper Networks
Differential Revision:  https://reviews.freebsd.org/D22469
2019-12-11 17:37:37 +00:00
Vincenzo Maffione
79c1428ed6 bhyve: uniform printf format string newlines
Some of the printf statements only use LF to get a newline. However, a CR character is also required for the serial console to print debug logs in a nice way.
Fix those code locations that only use LF, by adding a CR character.

Reviewed by:	markj, aleksandr.fedorov@itglobal.com
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22552
2019-12-02 20:51:46 +00:00
Vincenzo Maffione
d70b206955 bhyve: virtio-net: disable receive until features are negotiated
This patch fixes a race condition where the receive callback is called
while the device is being reset. Since the rx_merge variable may change
during reset, the receive callback may operate inconsistently with what
the guest expects.
Also, get rid of the unused rx_vhdrlen variable.

PR:	242023
Reported by:	aleksandr.fedorov@itglobal.com
Reviewed by:	markj, jhb
MFC with:	r354552
Differential Revision:	https://reviews.freebsd.org/D22440
2019-11-19 21:10:44 +00:00
Vincenzo Maffione
07b35f77c0 bhyve: rework mevent processing to fix a race condition
At the end of both mevent_add() and mevent_update(), mevent_notify()
is called to wakeup the I/O thread, that will call kevent(changelist)
to update the kernel.
A race condition is possible where the client calls mevent_add() and
mevent_update(EV_ENABLE) before the I/O thread has the chance to wake
up and call mevent_build()+kevent(changelist) in response to mevent_add().
The mevent_add() is therefore ignored by the I/O thread, and
kevent(fd, EV_ENABLE) is called before kevent(fd, EV_ADD), resuliting
in a failure of the kevent(fd, EV_ENABLE) call.

PR:	241808
Reviewed by:	jhb, markj
MFC with:	r354288
Differential Revision:	https://reviews.freebsd.org/D22286
2019-11-12 21:07:51 +00:00
Vincenzo Maffione
d55e0373f1 bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.

This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.

Reviewed by:    jhb
MFC after:      3 weeks
Differential Revision:	https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
Vincenzo Maffione
3e11768ee1 bhyve: add backend rx backpressure to virtio-net
If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by:	bryanv, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20987
2019-11-03 19:02:32 +00:00
Vincenzo Maffione
14d726374b bhyve: fix mistake introduced by r352841
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20973
2019-11-03 18:53:42 +00:00
Jung-uk Kim
412d13d559 Catch up with ACPICA 20191018.
PR:		241467
XMFC with:	r353764
2019-10-24 22:33:46 +00:00
Vincenzo Maffione
d12c5ef640 bhyve: support for enabling/disabling the net backend
Extend the net backend interface with two functions, namely netbe_rx_disable()
and netbe_rx_enable(), which can be used by the net device emulators to stop
the backend from invoking the receive callback. This is useful for device
emulators, i.e., on hardware resets or to implement receive backpressure.
The mevent module has been extendede to support the addition of a disabled
event. To prevent race conditions, the net backends will start with receive
operation disabled. A follow-up patch will use the new functionalities in
the virtio-net device.

Reviewed by:	jhb, markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D20973
2019-09-28 12:02:43 +00:00
John Baldwin
ed9ffd2f09 Validate guest-supplied length of headers for TSO transmit requests.
When transmitting a large TCP packet, the final transmit descriptor
includes the length of the protocol headers to be duplicated on each
segment.  The device model was trusting the guest-supplied value
without validating it.  A value of zero would result in the guest
being able to indirect a garbage pointer on the stack to overwrite
arbitrary memory in the bhyve process.  A value that was non-zero but
too small for the requested parameters resulted in the device model
reading and writing values beyond the end of the on-stack buffer used
to hold the template header.

To fix, validate the supplied length and drop requests to transmit
packets that would overflow the header buffer.  While here, initialize
the header pointer to NULL as a preventive measure so that any access
to an unallocated template header crashes they hypervisor
deterministically.

While here, only read the TCP sequence number if the packet being
split is a TCP packet.  The e1000 logic supports a segmentation of UDP
frames, and while UDP segmentation requires this part of the header to
be valid (so there is no buffer overflow), only reading the field when
needed is cleaner.

admbugs:	918
Reported by:	Reno Robert <renorobert@gmail.com>
Reviewed by:	markj
Approved by:	so
Security:	CVE-2019-5609
2019-08-05 21:39:55 +00:00
Scott Long
88880fd4cf Fix the register layout for the Buffer Descript List Entry. It
got jumbled around during some other cleanups and was causing
audio failures on some guests.

PR:		239341
Reported by:	shamaz.mazum@gmail.com
2019-07-23 18:40:07 +00:00
Ed Maste
61db163fd0 bhyve: correct out-of-bounds read in XHCI device emulation
Add appropriate bounds checks on the epid and streamid fields in the
device doorbell registers.

admbugs:	919
Submitted by:	jhb
Reported by:	Reno Robert <renorobert@gmail.com>
Reviewed by:	markj
Approved by:	so
Security:	out-of-bounds read
2019-07-23 16:27:36 +00:00
Chuck Tuffli
31b67520d4 bhyve: update the NVMe CQ based on the status
Instead of skipping the NVMe Completion Queue update based on the
opcode, define a synthetic status value which indicates the completion
queue entry is invalid. This will also allow deferred completion queue
updates for other commands.

Also returns the correct status for unrecognized opcodes ("invalid
opcode").

Reviewed by:	imp, jhb, araujo
Approved by:	imp (mentor), jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D20945
2019-07-17 03:19:30 +00:00
Chuck Tuffli
409a80e5a4 bhyve: Create EUI64 for NVMe namespaces
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
 - the FreeBSD IEEE OUI
 - PCI bus, device/slot, function values
 - Namespace ID

Reviewed by:	imp, araujo, jhb, rgrimes
Approved by:	imp (mentor), jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D19905
2019-07-13 12:48:28 +00:00
Sean Chittenden
2d5fe36980 usr.sbin/bhyve: close backend file descriptor during tap init error
Coverity CID:	1402953
Reviewed by:	scottl, markj, aleksandr.fedorov -at- itglobal.com
Approved by:	vmaffione, jhb
Differential Revision:	https://reviews.freebsd.org/D20913
2019-07-12 18:50:46 +00:00
Sean Chittenden
dbb1521165 usr.sbin/bhyve: only unassign a pt device after obtaining bus/slot/func
Coverity CID:	1194302, 1194303, 1194304
Approved by:	jhb, markj
Differential Revision:	https://reviews.freebsd.org/D20933
2019-07-12 18:33:58 +00:00
Sean Chittenden
ba1ca6d2e3 usr.sbin/bhyve: free resources when erroring out of pci_vtcon_sock_add()
Coverity CID:	1362880
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20916
2019-07-12 18:20:56 +00:00
Sean Chittenden
cb84aeda17 usr.sbin/bhyve: prevent use-after-free in virtio scsi request handling
Coverity CID:	1393377
Approved by:	araujo, jhb
Differential Revision:	https://reviews.freebsd.org/D20915
2019-07-12 18:17:35 +00:00
Sean Chittenden
ae2c5fe32f usr.sbin/bhyve: don't leak a FD if the device is not a tty
Coverity CID:	1194193
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20934
2019-07-12 18:13:58 +00:00
Sean Chittenden
e47c192236 usr.sbin/bhyve: unconditionally initialize the NVMe completion status
Follow-up work to improve the handling of unsupported/invalid opcodes
is being developed by chuck@.

Coverity CID:	1398928
Reviewed by:	chuck
Approved by:	araujo, imp
Differential Revision:	https://reviews.freebsd.org/D20914
2019-07-12 05:53:13 +00:00
Sean Chittenden
c7cb7db87d usr.sbin/bhyve: free resources when erroring out of pci_vtnet_init()
Coverity CID:	1402978
Approved by:	vmaffione
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D20912
2019-07-12 05:19:37 +00:00
Sean Chittenden
fe1329e446 usr.sbin/bhyve: send an initialized value to wake up blocking kqueue
This is a no-op initialization because nothing reads this value.  "This
wasn't wrong previously, but this is more correct now." -imp

Coverity CID:	1194307
Approved by:	markj, imp, scottl
Differential Revision:	https://reviews.freebsd.org/D20921
2019-07-11 23:54:50 +00:00
Sean Chittenden
bf51e078b6 usr.sbin/bhyve: commit miss from r349918
Submitted by:	markj
Approved by:	markj
Differential Revision:	https://reviews.freebsd.org/D20918
2019-07-11 19:51:33 +00:00
Sean Chittenden
bab8915c94 usr.sbin/bhyve: free leaked memory during option parsing
Also update to use strsep(3) instead of strtok(3).

Most of this commit inadvertently ended up in r349914.

Coverity CID:	1357337
Approved by:	markj
PR:		233038
Differential Revision:	https://reviews.freebsd.org/D20918
2019-07-11 19:41:14 +00:00
Sean Chittenden
cdd80cac4a usr.sbin/bhyve: initialize return value in xhci device interrupt handler
Coverity CID:	1357340
Approved by:	scottl, markj
Differential Revision:	https://reviews.freebsd.org/D20917
2019-07-11 19:26:35 +00:00
Sean Chittenden
2a1950b9cc usr.sbin/bhyve: free resources if there is an initialization error in rfb
Coverity CID:	1357335
Approved by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D20919
2019-07-11 19:07:45 +00:00
Vincenzo Maffione
8cd0c1ac32 bhyve: net_backends.c: add missing __FBSDID
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20883
2019-07-09 22:05:58 +00:00
Vincenzo Maffione
90db4ba908 bhyve: add missing license identifiers in net_utils and net_backend
Reviewed by:	jhb, markj, imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20874
2019-07-09 22:04:33 +00:00
Vincenzo Maffione
0ff7076bdb bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Reviewed by:	jhb, bryanv
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00