Commit Graph

741 Commits

Author SHA1 Message Date
Chuck Tuffli
0c6282e842 bhyve: add SMBIOS Baseboard Information
Add the System Management BIOS Baseboard (or Module) Information
a.k.a. Type 2 structure to the SMBIOS emulation.

Reviewed by:	rgrimes, bcran, grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29657
2021-04-12 08:09:52 -07:00
Roman Bogorodskiy
f2ecc0d1b7 bhyve: fix regression in legacy virtio-9p config parsing
Commit 621b509048 introduced a regression
in legacy virtio-9p config parsing by not initializing *sharename to
NULL. As a result, "sharename != NULL" check in the first iteration fails
and bhyve exits with "virtio-9p: more than one share name given".

Fix by adding NULL back.

Approved by:	grehan
2021-04-08 18:44:58 +04:00
Peter Grehan
ab899f8937 Fix typo in xhci nvlist node name, and also increment device counter.
This allows the xhci tablet device to be recognized and a PCI device
instantiated.

Reviewed by:	jhb
Fixes:		621b509048 Refactor configuration management in bhyve.
MFC after:	3 months.
2021-04-03 14:32:54 +10:00
Ka Ho Ng
b013912772 bhyve: change vq_getchain to return iovecs in both directions
The old prototype requires callers to inspect flags of each descriptors
to get the starting position of host-writable iovecs.

vq_getchain() is changed to return a virtio request with the number of
host-readable iovecs and host-writable iovecs instead. Callers can avoid
boilerplate code of getting the start offset of host-writable iovecs.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Reviewed by:	afedorov
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D29433
2021-03-30 16:44:07 +08:00
John Baldwin
4d5460a720 bhyve: Enable virtio-scsi legacy config parsing.
The previous commit added the handler to parse the command line
options for virtio-scsi devices but forgot to set the correct function
pointer to point to the handler.

Reported by:	vangyzen
Reviewed by:	vangyzen
Fixes:		621b509048
Differential Revision:	https://reviews.freebsd.org/D29438
2021-03-29 10:25:45 -07:00
John Baldwin
9f40a3be3d bhyve hostbridge: Rename "device" property to "devid".
"device" is already used as the generic PCI-level name of the device
model to use (e.g. "hostbridge").  The result was that parsing
"hostbridge" as an integer failed and the host bridge used a device ID
of 0.  The EFI ROM asserts that the device ID of the hostbridge is not
0, so booting with the current EFI ROM was failing during the ROM
boot.

Fixes:		621b509048
2021-03-24 09:29:15 -07:00
D Scott Phillips
f8a6ec2d57 bhyve: support relocating fbuf and passthru data BARs
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by:	scottph
MFC After:	1 week
Sponsored by:	Intel Corporation
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D24066
2021-03-19 11:04:36 +08:00
John Baldwin
621b509048 Refactor configuration management in bhyve.
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs.  The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key.  Values are always stored as strings.  The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists.  Leaf
nodes hold string values.  Configuration values are permitted to
reference other configuration values using '%(name)'.  This permits
constructing template configurations.

All existing command line arguments now set configuration values.  For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable.  The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable.  Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values.  Once this is
complete, bhyve then begins initializing its state based on the
configuration values.  This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues.  When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g.  '-w' sets the 'x86.strictmsr' value to false.  A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
  list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
  The first argument to '-s' (the device type) is used as the value of
  a "device" variable.  Additional comma-separated arguments are then
  parsed into 'key=value' pairs and used to set additional variables
  under the device node.  A PCI device emulation driver can provide
  its own hook to override the parsing of the additonal '-s' arguments
  after the device type.

  After the configuration phase as completed, the init_pci hook
  then walks the "pci.<bus>.<slot>.<func>" nodes.  It uses the
  "device" value to find the device model to use.  The device
  model's init routine is passed a reference to its nvlist node
  in the configuration tree which it can query for specific
  variables.

  The result is that a lot of the string parsing is removed from
  the device models and centralized.  In addition, adding a new
  variable just requires teaching the model to look for the new
  variable.

- For '-l' options, a similar model is used where the string is
  parsed into values that are later read during initialization.
  One key note here is that the serial ports use the commonly
  used lowercase names from existing documentation and examples
  (e.g. "lpc.com1") instead of the uppercase names previously
  used internally in bhyve.

Reviewed by:	grehan
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D26035
2021-03-18 16:30:26 -07:00
Ka Ho Ng
54ac6f721e bhyve: virtio shares definitions between sys/dev/virtio
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.

This reduces code duplication.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D29084
2021-03-16 19:29:39 +08:00
Robert Wing
38dfb0626f bhyve/snapshot: use SOCK_DGRAM instead of SOCK_STREAM
The save/restore feature uses a unix domain socket to send messages
from bhyvectl(8) to a bhyve(8) process. A datagram socket will suffice
for this.

An added benefit of using a datagram socket is simplified code. For
bhyve, the listen/accept calls are dropped; and for bhyvectl, the
connect() call is dropped.

EPRINTLN handles raw mode for bhyve(8), use it to print error messages.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D28983
2021-03-07 15:23:29 -09:00
Robert Wing
d656ce199d bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME
MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character
buffer that stores a filename or the path to a file - this file is used
by the save/restore feature.

Since the file doesn't have anything to do with a vm name, rename
MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX
while here.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D28879
2021-02-27 12:07:35 -09:00
Robert Wing
b7fd9c4e5e bhyve/snapshot: rename checkpoint_opcodes to be more generic
Generalize the naming here since the domain socket that uses these codes
might be used for purposes other than the save/restore feature.

- rename checkpoint_opcodes to ipc_opcode

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D28877
2021-02-27 12:03:03 -09:00
Robert Wing
5ce2d4a1c2 bhyve/snapshot: drop mkdir when creating the unix domain socket
Add /var/run/bhyve/ to BSD.var.dist so we don't have to call mkdir when
creating the unix domain socket for a given bhyve vm.

The path to the unix domain socket for a bhyve vm will now be
/var/run/bhyve/vmname instead of /var/run/bhyve/checkpoint/vmname

Move BHYVE_RUN_DIR from snapshot.c to snapshot.h so it can be shared
to bhyvectl(8).

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D28783
2021-02-22 11:31:07 -09:00
Robert Wing
4f4065e0a2 libvmm: clean up vmmapi.h
struct checkpoint_op, enum checkpoint_opcodes, and
MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi
header.

They are used for the save/restore functionality that bhyve(8)
provides and are better suited in usr.sbin/bhyve/snapshot.h

Since bhyvectl(8) requires these, the Makefile for bhyvectl has been
modified to include usr.sbin/bhyve/snapshot.h

Reviewed by:    kevans, grehan
Differential Revision:  https://reviews.freebsd.org/D28410
2021-02-17 17:46:42 -09:00
Allan Jude
e6d795d154 Fix manpage markup in 2c8bb126de 2021-01-21 20:32:15 +00:00
Allan Jude
2c8bb126de bhyve: Add missing man page section on the nodelete block-device-option
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D28272
2021-01-21 20:30:55 +00:00
Peter Grehan
eed1cc6cdf Support COM3 and COM4 serial ports.
Submitted by: Jan Poctavek <janci@binaryparadise.com>, otis
Reviewed by: grehan (bhyve), imp, 0mp (manpages)
Differential Revision: https://reviews.freebsd.org/D28207
2021-01-20 03:30:22 +10:00
Mariusz Zaborski
845b273728 bhyve: fix build without casper/capsicum support
Fix typo introduced in 966026246e.

Pointed out by:	jbeich
2021-01-03 18:17:04 +01:00
Mariusz Zaborski
966026246e bhyve: fix build without casper/capsicum support
PR:		252353
2021-01-03 17:21:28 +01:00
Robert Wing
c4df8cbfde Remove bvmconsole and bvmdebug.
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.

This also removes the '-b' and '-g' flag from bhyve(8). These two flags were
marked deprecated in r368519.

Reviewed by:    grehan, kevans
Approved by:    kevans (mentor)
Differential Revision:  https://reviews.freebsd.org/D27490
2020-12-23 17:15:23 -09:00
Peter Grehan
2bb4be0f86 Fix issues with various VNC clients.
- support VNC version 3.3 (macos "Screen Sharing" builtin client)
- wait until client has requested an update prior to sending framebuffer data
- don't send an update if no framebuffer updates detected
- increase framebuffer poll frequency to 30Hz, and double that when
  kbd/mouse input detected
- zero uninitialized array elements in rfb_send_server_init_msg()
- fix overly large allocation in rfb_init()
- use atomics for flags shared between input and output threads
- use #defines for constants

 This work was contributed by Marko Kiiskila, with reuse of some earlier
work by Henrik Gulbrandsen.

Clients tested :
FreeBSD-current
 - tightvnc
 - tigervnc
 - krdc
 - vinagre

Linux (Ubuntu)
 - krdc
 - vinagre
 - tigervnc
 - xtightvncviewer
 - remmina

MacOS
 - VNC Viewer
 - TigerVNC
 - Screen Sharing (builtin client)

Windows 10
 - Tiger VNC
 - VNC Viewer (cursor lag)
 - UltraVNC (cursor lag)

o/s independent
 - noVNC (browser) using websockify relay

PR: 250795
Submitted by:	Marko Kiiskila <marko@apache.org>
Reviewed by:	jhb (bhyve)
MFC after:	3 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D27605
2020-12-18 00:38:48 +00:00
Aleksandr Fedorov
ea57b2d7a8 [bhyve] virtio-net: Do not allow receiving packets until features have been negotiated.
Enforce the requirement that the RX callback cannot be called after a reset until the features have been negotiated.
This fixes a race condition where the receive callback is called during a device reset.

Reviewed by:	vmaffione, grehan
Approved by:	vmaffione (mentor)
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D27381
2020-12-17 16:52:40 +00:00
Robert Wing
92f7309929 Add deprecation notice for bvmconsole and bvmdebug
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.

Mark the '-b' and '-g' flag as deprecated for bhyve(8).

These will be removed in 13.

Reviewed by:    jhb, grehan
Approved by:    kevans (mentor)
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D27519
2020-12-10 18:07:25 +00:00
John Baldwin
1b9c78611d Suspend I/O on ahci-cd devices during a snapshot.
Submitted by:	Vitaliy Gusev <gusev.vitaliy@gmail.com>
2020-11-28 04:21:22 +00:00
John Baldwin
bb481f6718 bhyve: Add snapshot support for virtio-rnd.
This uses the same snapshot routine as other VirtIO devices.

Submitted by:	Vitaliy Gusev <gusev.vitaliy@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D26265
2020-11-28 04:06:09 +00:00
John Baldwin
57b0a3aaca bhyve: 'xhci,tablet' snapshot fixes
Permit suspend/resume of a XHCI device model that has not been
attached to by a driver in a guest OS.

Submitted by:	Vitaliy Gusev <gusev.vitaliy@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D26264
2020-11-28 03:54:48 +00:00
Rebecca Cran
866db2fef0 Fix bhyve SMBIOS type 19 handling to avoid misreporting total RAM amount
This fixes the amount of memory displayed in the EDK2 UiApp to be the same
as passed on the bhyve command line. Otherwise, 8GB is displayed as 4GB,
32GB as 28GB etc.

Reviewed by:	jhb, kib, rgrimes
Differential Revision:	https://reviews.freebsd.org/D27348
2020-11-27 08:00:32 +00:00
Rebecca Cran
5285d5e8e1 bhyve: fix smbiostbl.c style issues and add comment about date format
Fix a couple of style issues introduced in my previous commit.
Add a comment explaining that the SMBIOS specification defines the date
format to be mm/dd/yyyy, which is why we don't use ISO 8601.
2020-11-27 07:53:15 +00:00
John Baldwin
1925586e03 Honor the disabled setting for MSI-X interrupts for passthrough devices.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X.  This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	grehan, markj
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27212
2020-11-24 23:18:52 +00:00
Peter Grehan
887d46ef5b Advance RIP after userspace instruction decode
Add update to RIP after a userspace instruction decode (as is done for
the in-kernel counterpart of this case).

Submitted by:	adam_fenn.io
Reviewed by:	cem, markj
Approved by:	grehan (bhyve)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D27243
2020-11-19 07:23:39 +00:00
Peter Grehan
2f40fc6ff3 Add legacy debug/test interfaces for kvm unit tests.
Implement the legacy debug/test interfaces expected by KVM-unit-tests'
realmode, emulator, and ioapic tests.

Submitted by:	adam_fenn.io
Reviewed by:	markj, grehan
Approved by:	grehan (bhyve)
MFC after:	3 weeks
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D27130
2020-11-17 13:14:04 +00:00
Peter Grehan
cd5b6d16ca Fix regression in AHCI controller settings.
When the AHCI code was reworked to use FreeBSD struct
definitions, the valid element was mis-transcribed resulting
in the UMDA capability being hidden. This prevented Illumos
from using AHCI disk/cdrom drives.

Fix by using definitions that match the code pre-rework.

PR:	250924
Submitted by:	Rolf Stalder
Reported by:	Rolf Stalder
MFC after:	3 days
2020-11-15 12:59:24 +00:00
Rebecca Cran
a2fe464c81 bhyve: update smbiostbl.c to bump the version and release date
Since lots of work has been done on bhyve since 2014, increase the version
to 13.0 to match 13-CURRENT, and update the release date.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D27147
2020-11-13 19:47:16 +00:00
Konstantin Belousov
038f5c7bfe bhyve: remove a hack to map all 8G BARs 1:1
Suggested and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27186
2020-11-12 02:52:01 +00:00
Konstantin Belousov
670b364b76 bhyve: increase allowed size for 64bit BAR allocation below 4G from 32 to 128 MB.
Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27095
2020-11-12 00:51:53 +00:00
Konstantin Belousov
9922872ba2 bhyve: avoid allocating BARs above the end of supported physical addresses.
Read CPUID leaf 0x8000008 to determine max supported phys address and
create BAR region right below it, reserving 1/4 of the supported guest
physical address space to the 64bit BARs mappings.

PR:    250802 (although the issue from PR is not fixed by the change)
Noted and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27095
2020-11-12 00:46:53 +00:00
Olivier Cochard
c4fd0cc9ee Return the same value for smbios.chassis.maker as smbios.system.maker (and prevents returning a space character).
Reviewed by:	grehan
Approved by:	grehan
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27123
2020-11-08 07:49:39 +00:00
Allan Jude
cc3568c1d0 VirtIO: Make sure the guest knows the TRIM alignment requirements
If bhyve is used to emulate 512e access in guest OS, then discard addresses should be properly aligned.
Otherwise ioctl DIOCGDELETE fails for 512b requires on devices with 4K sector size.
see g_dev_ioctl() in sys/geom/geom_dev.c

Submitted by:	Vitaliy Gusev <gusev.vitaliy@gmail.com>
MFC after:	1 week
Sponsored by:	vStack.com
Differential Revision:	https://reviews.freebsd.org/D27075
2020-11-05 17:10:14 +00:00
Olivier Cochard
ac8f506b85 bhyve currently reports each of "smbios.system.maker" and
"smbios.system.family" as " ".
This presents challenges for both humans and tools when trying to parse output
that uses those results.
The new values reported are now:
smbios.system.family="Virtual Machine"
smbios.system.maker="FreeBSD"

PR:		250728
Approved by:	grehan@FreeBSD.org
Sponsored by:	Netflix
2020-10-30 00:03:59 +00:00
Ryan Moeller
60dc6bee1f bhyve: Update TX descriptor base address and host mapping on change
bhyve sometimes segfaults when using an e1000 NIC with a Windows guest.

We are only updating our tdba and cached host mapping when the low address
register is written and when tx is set enabled, but not when the high address
or length registers are written. It is observed that Windows 10 is occasionally
enabling tx first then writing the registers in the order low, high, len. This
leaves us with a bogus base address and mapping, which causes a segfault later
when we try to copy from a descriptor that has unpredictable garbage in a
pointer.

Updating the address and mapping when any of those registers change seems to fix
that particular issue.

Reviewed by:	mav, grehan (bhyve)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26798
2020-10-16 20:27:20 +00:00
Jakub Wojciech Klama
100353cfbf Add virtio-9p (aka VirtFS) filesystem sharing to bhyve.
VirtFS allows sharing an arbitrary directory tree between bhyve virtual
machine and the host. Current implementation has a fairly complete support
for 9P2000.L protocol, except for the extended attribute support. It has
been verified to work with the qemu-kvm hypervisor.

Reviewed by:	rgrimes, emaste, jhb, trasz
Approved by:	trasz (mentor)
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Conclusive Engineering (development), vStack.com (funding)
Differential Revision:	https://reviews.freebsd.org/D10335
2020-10-03 19:05:13 +00:00
John Baldwin
6f64e4f361 bhyve: Fix build with option BHYVE_SNAPSHOT
'ident' was replaced with 'ata_ident' in revision r363596.

Submitted by:	Vitaliy Gusev <gusev.vitaliy_gmail.com>
Reviewed by:	Darius Mihai
Differential Revision:	 https://reviews.freebsd.org/D26263
2020-10-01 17:16:05 +00:00
Peter Grehan
285e35e6f1 Fix byte-reversal of language ID in string descriptor.
The language id of String Descriptors in usb mouse is
0x0904, while the spec require 0x0409 (English - United States)

Submitted by:	Wanpeng Qian
Reviewed by:	grehan
Approved by:	grehan (#bhyve)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26472
2020-09-18 05:54:59 +00:00
Chuck Tuffli
71a51f69a4 bhyve: NVMe queue create must init head/tail
The NVMe emulation code did not explicitly initialize queue head and
tail pointers on queue creation. As these pointers are part of
calloc()'ed memory, this only becomes a problem if the queues are
deleted and then recreated.

This error can manifest with messages about completions not matching a
command.
2020-08-24 01:51:21 +00:00
Chuck Tuffli
c4a86c1fc0 bhyve: NVMe set nominal health values
Some operating systems believe bhyve's emulated NVMe drive is failing
based on certain values in the SMART / Health Information log page being
zero. Fix is to set the reported temperature and available spare values
to reasonable defaults.

Submitted by:	wanpengqian@gmail.com
Reviewed by:    grehan
MFC after:      2 weeks
Differential Revision: https://reviews.freebsd.org/D24202
2020-08-24 01:51:17 +00:00
Konstantin Belousov
f3eb12e4a6 Add bhyve support for LA57 guest mode.
Noted and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:37:21 +00:00
Peter Grehan
f1c3dac414 Replace magic numbers in Identify page register 0 with ATA definitions.
No functional change. Verified with objdump output before/after.

Requested by:	rpokala
Reviewed by:	rpokala
MFC after:	3 weeks
2020-07-31 12:10:28 +00:00
Peter Grehan
9af3bcd7c9 Support the setting of additional AHCI controller parameters.
Allow the serial number, firmware revision, model number and nominal media
rotation rate (nmrr) parameters to be set from the command line.

Note that setting the nmrr value can be	used to	indicate the AHCI
device is an SSD.

Submitted by:	Wanpeng Qian
Reviewed by:	jhb, grehan (#bhyve)
Approved by:	jhb, grehan
MFC after:	3 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D24174
2020-07-27 07:56:55 +00:00
Peter Grehan
fb5f5a17ef Advertise 64-bit physical-address capability.
This fixes a coredump with NetBSD guests when XHCI is configured.
On seeing the AC64 flag clear, the NetBSD XHCI driver was only writing
to the lower 32-bits of 64-bit physical address registers. The emulation
relies on a write to the hi 32-bits to calculate a host virtual address
for internal use, and has always supported 64-bit addressing.

All other guests were seen to write to both the lo- and hi- address
registers, regardless of the AC64 setting.

Discussed with:  Leon Dang (author)
Tested with:  Ubuntu 16/18/20, Windows10, OpenBSD UEFI guests.

MFC after:	2 weeks.
2020-07-10 07:26:50 +00:00
Peter Grehan
6a7ff0600b Silence ACPI RTC error/warning in Linux guests.
Allow guests to	set the	RTC bit	in the ACPI PM control register.
This eliminates an annoying	(and harmless) Linux kernel boot message.

PR:	244721
Submitted by:	Jose Luis Duran
MFC after:	1 week
2020-07-06 08:36:14 +00:00
Chuck Tuffli
0ed1d2e484 bhyve: fix NVMe Active Namespace list
The NVMe specification requires unused entries in the Identify, Active
Namespace ID data to be zero. Fix is bzero the provided page, similar to
what is done for the Namespace Descriptors list.

Fixes UNH Tests 2.6 and 2.9

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24901
2020-06-29 00:32:24 +00:00
Chuck Tuffli
a104b18c52 bhyve: NVMe handle zero length DSM ranges
Dataset Management range specifications may have a zero length (a.k.a.
an empty range definition). Handle the case of all ranges being empty by
completing with Success (DSM commands are advisory only). For
Deallocate, skip empty range definitions when sending TRIM's to the
backing storage.

Fixes UNH Test 2.2.4

Reviewed by:	imp
Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24900
2020-06-29 00:32:21 +00:00
Chuck Tuffli
7669ea7bb0 bhyve: fix NVMe Get Features, Predictable Latency
If the Predictable Latency Mode is not supported, NVMe Controllers must
return Invalid Field in Command status for the Get Features command
with IDs:
 - Predictable Latency Mode Config
 - Predictable Latency Mode Window

Fixes UNH Tests 3.6

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24899
2020-06-29 00:32:18 +00:00
Chuck Tuffli
f97ed15123 bhyve: add NVMe Feature Interrupt Vector Config
This adds support for NVMe Get Features, Interrupt Vector Config
parameter error checking done by the UNH compliance tests.

Fixes UNH Tests 1.6.8 and 5.5.6

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24898
2020-06-29 00:32:15 +00:00
Chuck Tuffli
46ea627374 bhyve: add basic NVMe Firmware Commit support
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).

Fixes UNH Test 5.5.3

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24897
2020-06-29 00:32:11 +00:00
Chuck Tuffli
106329ef33 bhyve: Add AER support to NVMe emulation
This adds support to bhyve's NVMe device emulation for processing Async
Event Requests but not returning them (i.e. Async Event Notifications).

Fixes UNH Test 5.5.2

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24896
2020-06-29 00:32:08 +00:00
Chuck Tuffli
8bba8666e2 bhyve: validate the NVMe LBA start and count
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.

Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.

Fixes UNH Tests 2.2.3, 2.3.2, and 2.4.2

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24895
2020-06-29 00:32:04 +00:00
Chuck Tuffli
7d248cffd9 bhyve: implement NVMe SMART data I/O statistics
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).

This patch implements counters for:
 - Data Units Read
 - Data Units Written
 - Host Read Commands
 - Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.

Fixes UNH Test 1.3.8

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24894
2020-06-29 00:32:01 +00:00
Chuck Tuffli
ae638f2bd5 bhyve: validate NVMe deallocate range values
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.

Fixes UNH Test 2.2.3

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24893
2020-06-29 00:31:58 +00:00
Chuck Tuffli
73cd73c0b8 bhyve: base pci_nvme_ioreq size on advertised MDTS
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.

Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24891
2020-06-29 00:31:54 +00:00
Chuck Tuffli
206edceb73 bhyve: refactor NVMe I/O read/write
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24890
2020-06-29 00:31:51 +00:00
Chuck Tuffli
a0900f46d1 bhyve: implement NVMe Format NVM command
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.

Fixes UNH Test 1.6

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24889
2020-06-29 00:31:47 +00:00
Chuck Tuffli
45cf82682c bhyve: make unsupported NVMe commands a debug message
Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24888
2020-06-29 00:31:44 +00:00
Chuck Tuffli
e3ebd4210a bhyve: add more compliant NVMe Get/Set Features
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.

Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.

Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.

Fixes UNH Tests 1.2, 5.4, and 5.5.6

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24887
2020-06-29 00:31:41 +00:00
Chuck Tuffli
d708ced617 bhyve: fix NVMe queue creation and deletion
Add checks for various types of invalid I/O Queue Create and Delete
command parameters, including:
 - QID=0
 - QID>MAX
 - QID already in use
 - Delete an Active CQ
 - Invalid QSIZE
 - Invalid CQID (SQ creation)
 - Invalid interrupt vector (CQ creation)

Fixes UNH Tests 1.4.2-5,7-8

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24886
2020-06-29 00:31:37 +00:00
Chuck Tuffli
f6f02911b6 bhyve: fix NVMe Get Log Page command
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.

Fixes UNH Test 1.3

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24885
2020-06-29 00:31:34 +00:00
Chuck Tuffli
f8fa74679c bhyve: implement NVMe Namespace Identification Descriptor
NVMe 1.3 compliant controllers must implement the Namespace
Identification Descriptor structure (i.e. CNS=3). Previously this was
unimplemented.

Fixes UNH Test 1.1.4-0

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24884
2020-06-29 00:31:30 +00:00
Chuck Tuffli
064ca48f57 bhyve: Consolidate NVMe CQ update
Consolidate the code which writes Completion Queue entries and updates
the CQ doorbell value. While in the neighborhood, convert the "toggle CQ
phase bit" code to use an XOR operation instead of an "if/else" branch.

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24882
2020-06-29 00:31:27 +00:00
Chuck Tuffli
d7e180feb7 bhyve: add locks around NVMe queue accesses
The NVMe code attempted to ensure thread safety through a combination of
using atomics and a "busy" flag. But this approach leads to unavoidable
race conditions.

Fix is to use per-queue mutex locks to ensure thread safety within the
queue processing code. While in the neighborhood, move all the queue
initialization code to a common function.

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D19841
2020-06-29 00:31:24 +00:00
Chuck Tuffli
cf20131a15 bhyve: add a comment explaining NVME dsm option
Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24881
2020-06-29 00:31:20 +00:00
Chuck Tuffli
9963f1805c bhyve: implement NVMe Flush command
This adds support for the NVMe I/O command Flush. For block-based
devices, submit a DIOCGFLUSH to the backing storage. Otherwise, command
is treated like a NOP and completes with a Successful status.

Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24880
2020-06-29 00:31:17 +00:00
Chuck Tuffli
a43ab8d253 bhyve: refactor NVMe IO command handling
This refactors the NVMe I/O command processing function to make adding
new commands easier. The main change is to move command specific
processing (i.e. Read/Write) to separate functions for each NVMe I/O
command and leave the common per-command processing in the existing
pci_nvme_handle_io_cmd() function.

While here, add checks for some common errors (invalid Namespace ID,
invalid opcode, LBA out of range).

Add myself to the Copyright holders

Reviewed by:	imp
Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24879
2020-06-29 00:31:14 +00:00
Chuck Tuffli
0220a2aeed bhyve: convert NVMe logging statements
Convert the debug and warning logging macros to be parameterized and
correctly use bhyve's PRINTLN macro.

Reviewed by:	imp
Tested by:	Jason Tubnor
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24878
2020-06-29 00:31:11 +00:00
Peter Grehan
71ab6f9708 Prevent calling USB backends multiple times.
The TRB processing loop could potentially call a back-end twice
with the same status transaction. While this was generally benign,
some code paths in the tablet backend weren't set up to handle
this case, resulting in a NULL dereference.

Fix by
 - returning a STALL error when an invalid request was seen in the backend
 - skipping a call to the backend if the number of packets in a status
   transaction was zero (this code fragment was taken from the Intel ACRN
   xhci backend)

PR:	246964
Reported by:  Ali Abdallah
Discussed with: Leon Dang (author)
Reviewed by: jhb (#bhyve), Leon Dang
Approved by: jhb
Obtained from:  Intel ACRN (partially)
MFC after: 2 weeks
Differential Revision:	https://reviews.freebsd.org/D25228
2020-06-26 08:20:38 +00:00
Pawel Biernacki
0a1016f9e8 bhyve: allow for automatic destruction on power-off
Introduce -D flag that allows for the VM to be destroyed on guest initiated
power-off by the bhyve(8) process itself.
This is quality of life change that allows for simpler deployments without
the need for bhyvectl --destroy.

Requested by:	swills
Reviewed by:	0mp (manpages), grehan, kib, swills
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25414
2020-06-25 12:35:20 +00:00
Conrad Meyer
4daa95f85d bhyve(8): For prototyping, reattempt decode in userspace
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of.  In this scenario,
reset decoder state and try again.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24464
2020-06-25 00:18:42 +00:00
Peter Grehan
2136849868 Fix pci-passthru MSI issues with OpenBSD guests
- Return 2 x 16-bit registers in the correct byte order
 for a 4-byte read that spans the CMD/STATUS register.
  This reversal was hiding the capabilities-list, which prevented
 the MSI capability from being found for XHCI passthru.

- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
 capability offset would have the read-only portion skipped.
  This prevented MSI interrupts from being enabled.

 Reported and extensively tested by Anatoli (me at anatoli dot ws)

PR:	245392
Reported by:	Anatoli (me at anatoli dot ws)
Reviewed by:	jhb (bhyve)
Approved by:	jhb, bz (mentor)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24951
2020-05-25 06:25:31 +00:00
Aleksandr Fedorov
e90337e48f bhyve(8): Add the netgraph network backend decription to the manpage.
Reviewed by:	vmaffione, bcr
Approved by:	vmaffione (mentor)
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D24846
2020-05-18 15:03:52 +00:00
Conrad Meyer
8a68ae80f6 vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).

Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24525
2020-05-15 15:54:22 +00:00
Aleksandr Fedorov
8ffb1c8ce1 bhyve: Fix processing of netgraph backend options.
After r360820, additional parameters are passed through the argument 'opts', and the name of the backend through the argument 'devname'. So, there is no need to skip the backend name from the 'opts' argument.
2020-05-15 11:03:27 +00:00
Aleksandr Fedorov
2cd7735d92 Add a new bhyve network backend that allow to connect the VM to the netgraph(4) network.
The backend uses the socket API with the PF_NETGRAPH protocol family, which is provided by the ng_socket(4).

To use the new backend, provide the following bhyve option:
-s X:Y:Z,[virtio-net|e1000],netgraph,socket=[ng_socket name],path=[destination node],hook=[our socket src hook],peerhook=[dst node hook]

Reviewed by:	vmaffione, lutz_donnerhacke.de
Approved by:	vmaffione (mentor)
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D24620
2020-05-12 11:18:14 +00:00
Vincenzo Maffione
692dbfe930 bhyve: update man page to describe the virtio-net mtu option
r359704 introduced an 'mtu' option for the virtio-net device emulation.
Update the man page to describe the new option.

Reviewed by:	bcr
Differential Revision:	https://reviews.freebsd.org/D24723
2020-05-09 07:57:41 +00:00
Aleksandr Fedorov
5bebe92327 bhyve: Pass the full string of options to the network backends.
Reviewed by:	vmaffione
Approved by:	vmaffione (mentor)
Sponsored by:	vstack.com
Differential Revision:	https://reviews.freebsd.org/D24735
2020-05-08 17:15:54 +00:00
John Baldwin
483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
John Baldwin
7840d1c45f Update the cached MSI state when any MSI capability register is written.
bhyve uses cached copies of the MSI capability registers to generate
MSI interrupts for device models.  Previously, these cached fields
were only set when the MSI capability control register was updated.
The Linux kernel recently adopted a change to deal with races in MSI
interrupt delivery that writes to the MSI capability address and data
registers to alter the destination of MSI interrupts without writing
to the MSI capability control register.  bhyve was not updating its
cached registers for these writes and continued to send interrupts
with the old data value to the old address.  Fix this by recomputing
the cached values for every write to any MSI capability register.

Reported by:	Jason Tubnor, Ryan Moeller
Reported by:	Marc Dionne (bisected the Linux kernel commit)
Reviewed by:	grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24593
2020-04-27 22:27:35 +00:00
Allan Jude
22769bbe30 Add VIRTIO_BLK_T_DISCARD (TRIM) support to the bhyve virtio-blk backend
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.

Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.

Submitted by:	Jason King <jason.king@joyent.com> (improvements)
Reviewed by:	jhb
Obtained from:	illumos-joyent (improvements)
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D21707
2020-04-23 19:20:58 +00:00
Mateusz Piotrowski
77d208a3ae Improve formatting of synopsis section
This patch is about sorting the arguments and using proper mdoc(7) macros
to stylize arguments and command modifiers for much better readability.

Further style fixes in other sections within the bhyve manual page are
going to be worked on in upcoming patches.

Reviewed by:	rgrimes
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24526
2020-04-22 06:32:51 +00:00
Conrad Meyer
38e6153f75 bhyve(8): Correct copyright boilerplate for r359950
Use the text from the canonical sys/copyright.h 2-clause FreeBSD License.

Reported by:	grehan (thanks!)
2020-04-15 05:55:14 +00:00
Conrad Meyer
52c39ee643 bhyve(8): Minor cosmetic niceties in instemul failure
Print the failed instruction stream as a contiguous stream of hex.  This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.

Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
         line
             effect.
2020-04-15 02:34:44 +00:00
Conrad Meyer
9cb339cc7b bhyve(8): Add VM Generation Counter ACPI device
Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve.  The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.

A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back).  At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.

Suggested by:	rpokala
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D23165
2020-04-15 02:00:17 +00:00
Conrad Meyer
bb30b08e76 bhyve(8): Add bootrom allocation abstraction
To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.

The bootrom segment is the high 16MB of the low 4GB region.

Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range.  However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.

The existing ROM-file loading code is refactored to use the new interface.

Reviewed by:	grehan (earlier version)
Differential Revision:	https://reviews.freebsd.org/D24422
2020-04-15 01:58:51 +00:00
Rodney W. Grimes
9d3fd86663 In the past changes have been made to smbios->minor without updating the
smbios->bcdrev value.
Correct that by calculating bcdrev from the major/minor values.

Reported by:	bcran
Reviewed by:	bcran, jhb
Approved by:	jhb (maintainer)
2020-04-07 23:17:44 +00:00
Aleksandr Fedorov
1ff57e3a25 Add VIRTIO_NET_F_MTU flag support for the bhyve virtio-net device.
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000

Reported by:	vmaffione, jhb
Approved by:	vmaffione (mentor)
Differential Revision:	https://reviews.freebsd.org/D23971
2020-04-07 17:06:33 +00:00
Rebecca Cran
e23fe873b2 Bhyve: fix SMBIOS Type 17 table generation
According to the SMBIOS specification (revision 2.7 or newer), the
extended module size field should only be used for sizes that can't
fit in the older size field.

Reviewed by:	rgrimes, grehan, jhb
Differential Revision:	https://reviews.freebsd.org/D24107
2020-03-31 02:36:39 +00:00
Chuck Tuffli
1264a2b909 bhyve: fix NVMe emulation update of SQHD
The SQHD field of a Completion Queue entry indicates the current
Submission Queue head pointer value. The head pointer represents the
next entry to be consumed and is updated after consuming the current
entry.

In the Admin queue processing, the current code updates the head pointer
after reporting the value to the host via the SQHD. This gives the
impression that the Controller is perpetually one command behind in its
processing of the Admin SQ. And while this doesn't appear to bother some
initiators, it is wrong.

Fix is to update the SQ head pointer prior to writing the SQHD value in
the completion.

While here, fix missed update of dword 0 (cdw0) in the completion
message.

Reported by:	khng300
Reviewed by:	jhb, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24083
2020-03-27 15:28:27 +00:00
Chuck Tuffli
961be12f6a bhyve: fix NVMe emulation missed interrupts
The bhyve NVMe emulation has a race in the logic which generates command
completion interrupts. On FreeBSD guests, this manifests as kernel log
messages similar to:
    nvme0: Missing interrupt

The NVMe emulation code sets a per-submission queue "busy" flag while
processing the submission queue, and only generates an interrupt when
the submission queue is not busy.

Aside from being counter to the NVMe design (i.e. interrupt properties
are tied to the completion queue) and adding complexity (e.g. exceptions
to not generating an interrupt when "busy"), it causes a race condition
under the following conditions:
 - guest OS has no outstanding interrupts
 - guest OS submits a single NVMe IO command
 - bhyve emulation processes the SQ and sets the "busy" flag
 - bhyve emulation submits the asynchronous IO to the backing storage
 - IO request to the backing storage completes before the SQ processing
   loop exits and doesn't generate an interrupt because the SQ is "busy"
 - bhyve emulation finishes processing the SQ and clears the "busy" flag

Fix is to remove the "busy" flag and generate an interrupt when the CQ
head and tail pointers do not match.

Reported by:	khng300
Reviewed by:	jhb, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24082
2020-03-27 15:28:22 +00:00
Chuck Tuffli
f3e46ff932 bhyve: use STAILQ in NVMe emulation
Use the standard queue(3) macros instead of hand-crafted linked list
code.

Reviewed by:	imp, jhb
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D24081
2020-03-27 15:28:16 +00:00
Chuck Tuffli
cd65e08916 bhyve: implement NVMe deallocate command
This adds support for the Dataset Management (DSM) command to the NVMe
emulation in general, and more specifically, for the deallocate
attribute (a.k.a. trim in the ATA protocol). If the backing storage for
the namespace supports delete (i.e. deallocate), setting the deallocate
attribute in a DSM will trim/delete the requested LBA ranges in the
underlying storage.

Reviewed by:	jhb, araujo, imp
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21839
2020-03-27 15:28:11 +00:00
Chuck Tuffli
d31d525ef5 bhyve: refactor NVMe namespace initialization
Pass the struct pci_nvme_blockstore pointer for this namespace to the
namespace initialization function instead of only the desired eui64
value.

Minor functional change in that the code updates the eui64 value in the
blockstore.

Reviewed by:	jhb, araujo
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21838
2020-03-27 15:28:05 +00:00
Chuck Tuffli
da8de3e9a8 bhyve: refactor NVMe PRP memcpy
Add a "copy direction" parameter to nvme_prp_memcpy such that data can
be copied to the memory specified by the PRP entries (current behavior)
or copied from the PRP entries (new behavior). The upcoming deallocate
functionality will use the copy from capability.

Reviewed by:	jhb, araujo
Approved by:	jhb (maintainer)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D21837
2020-03-27 15:28:00 +00:00