Commit Graph

38 Commits

Author SHA1 Message Date
Robert Wing
08cb63a12f bhyve/block_if: allow DIOCGMEDIASIZE ioctl
This is needed to get mediasize of the device after a resize event.

I missed this earlier as I was building WITH_BHYVE_SNAPSHOT, which
disables capsicum.

Reviewed by:	khng, markj
Fixes: ae9ea22e14 ("bhyve: get mediasize for character devices when ...")
Differential Revision:	https://reviews.freebsd.org/D34013
2022-01-25 07:44:13 -09:00
Robert Wing
ae9ea22e14 bhyve: get mediasize for character devices when resizing virtio-blk
Reviewed by:	imp, allanjude, jhb
Differential Revision:	https://reviews.freebsd.org/D33403
2022-01-18 11:26:49 -09:00
Toomas Soome
c2fa905cf6 bhyve: clean up trailing whitespaces
Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681
2021-12-27 19:58:10 +02:00
Chuck Tuffli
d8c1d7b652 bhyve blockif: fix blockif_candelete with Capsicum
NVMe conformance tests for the Format command failed if the
backing-storage for the bhyve device was a file instead of a Zvol. The
tests (and the specification) expect a Format to destroy all previously
written data. The bhyve NVMe emulation implements this by trimming /
deallocating all data from the backing-storage.

The blockif_candelete() function indicated the file did not support
deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE)
even though the kernel supported file hole punching. This occurs on
builds with Capsicum enabled because blockif did not allow the
fpathconf(2) right.

Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call.

PR:		260081
Reviewed by:	allanjude, markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33203
2021-11-30 21:49:34 -08:00
Ka Ho Ng
3676512b60 bhyve: Use fspacectl(2) for BOP_DELETE on regular file images
bhyve can also make use of fspacectl(2) to implement BOP_DELETE with
hole-punching. Since it is not desirable to do zero-filling for large
DEALLOCATE/UNMAP range, candelete is not set if pathconf(2) indicates
that the underlying file system does not support native
VOP_DEALLOCATE(9).

Sponsored by:	The FreeBSD Foundation
Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D28880
2021-08-07 17:10:30 +08:00
John Baldwin
8794846a91 bhyve: Add support for handling disk resize events to block_if.
Allow clients of blockif to register a resize callback handler.  When
a callback is registered, register an EVFILT_VNODE kevent watching the
backing store for a change in the file's attributes.  If the size has
changed when the kevent fires, invoke the clients' callback.

Currently resize detection is limited to backing stores that support
EVFILT_VNODE kevents such as regular files.

Reviewed by:	grehan, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D30504
2021-06-11 18:00:24 -07:00
John Baldwin
621b509048 Refactor configuration management in bhyve.
Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs.  The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key.  Values are always stored as strings.  The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists.  Leaf
nodes hold string values.  Configuration values are permitted to
reference other configuration values using '%(name)'.  This permits
constructing template configurations.

All existing command line arguments now set configuration values.  For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable.  The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable.  Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values.  Once this is
complete, bhyve then begins initializing its state based on the
configuration values.  This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues.  When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g.  '-w' sets the 'x86.strictmsr' value to false.  A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
  list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
  The first argument to '-s' (the device type) is used as the value of
  a "device" variable.  Additional comma-separated arguments are then
  parsed into 'key=value' pairs and used to set additional variables
  under the device node.  A PCI device emulation driver can provide
  its own hook to override the parsing of the additonal '-s' arguments
  after the device type.

  After the configuration phase as completed, the init_pci hook
  then walks the "pci.<bus>.<slot>.<func>" nodes.  It uses the
  "device" value to find the device model to use.  The device
  model's init routine is passed a reference to its nvlist node
  in the configuration tree which it can query for specific
  variables.

  The result is that a lot of the string parsing is removed from
  the device models and centralized.  In addition, adding a new
  variable just requires teaching the model to look for the new
  variable.

- For '-l' options, a similar model is used where the string is
  parsed into values that are later read during initialization.
  One key note here is that the serial ports use the commonly
  used lowercase names from existing documentation and examples
  (e.g. "lpc.com1") instead of the uppercase names previously
  used internally in bhyve.

Reviewed by:	grehan
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D26035
2021-03-18 16:30:26 -07:00
John Baldwin
483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
Allan Jude
22769bbe30 Add VIRTIO_BLK_T_DISCARD (TRIM) support to the bhyve virtio-blk backend
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.

Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.

Submitted by:	Jason King <jason.king@joyent.com> (improvements)
Reviewed by:	jhb
Obtained from:	illumos-joyent (improvements)
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D21707
2020-04-23 19:20:58 +00:00
Vincenzo Maffione
332eff95e3 bhyve: add wrapper for debug printf statements
Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by:	grehan, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22657
2020-01-08 22:55:22 +00:00
John Baldwin
8c74ade848 Increase the VirtIO segment count to support modern Windows guests.
The Windows virtio driver ignores the advertized seg_max field and
assumes the host can accept up to 67 segments in indirect descriptors,
triggering an assert in the bhyve process.

This brings back r282922 but with a couple of changes:
- It raises the block interface segment limit to 128 instead of 67.
- Linux's virtio driver assumes that the segment limit is no
  larger than the ring size.  To avoid breaking Linux guests,
  raise the VirtIO ring size to 128, and cap the VirtIO segment
  limit at ring size - 2 (effectively 126).

Reviewed by:	rgrimes, Patrick Mooney <pmooney@pfmooney.com>
Obtained from:	Joyent (Linux workaround)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18831
2019-05-02 22:46:37 +00:00
Rodney W. Grimes
85cf19adc5 In r340042 an attempt to quiet coverity warning cid 1305412 was overdone.
nopt is the only allocated space,
xopt and cp are aliases into that allocated space.
Remove the 2 unneeded free's

Reported by:	Patrick Mooney (@pmooney_pfmooney.com)
Reviewed by:	jhb (maintainer), Patrick Mooney (joyent/illumos)
Approved by:	bde (mentor)
CID:		1305412
MFC after:	3 days
MFC with:	340042
Differential Revision:	https://reviews.freebsd.org/D19200
2019-02-15 16:20:21 +00:00
Marcelo Araujo
abfa3c39e7 Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by:	jhb
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D18744
2019-01-16 00:39:23 +00:00
Marcelo Araujo
ea2c655dd9 Fix resource leak, variables cp, xopts and nopt going out of scope.
Reported by:	Coverity
CID:		1305412
Sponsored by:	iXsystems Inc.
2018-11-02 07:57:28 +00:00
Marcelo Araujo
f7224b709f Fix style(9) space vs tab.
Reviewed by:	jhb
MFC after:	3 weeks.
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D15768
2018-06-14 01:34:53 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bartek Rutkowski
00ef17befe Capsicum support for bhyve(8).
Adds Capsicum sandboxing to bhyve.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by:	grehan, oshogbo
Approved by:	emaste, grehan
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D8290
2017-02-14 13:35:59 +00:00
Baptiste Daroussin
86bdfe1565 Improve error message when failing to open a backing file
When bhyve cannot open a backing file, it now says explicitly which file
could not be opened

Note that the change has only be maed in block_if.c and not in
pci_virtio_block.c as the error will always be catched by the first

PR:		202321 (different patch)
Reviewed by:	grehan
MFC after:	3 day
Sponsored by:	Gandi.net
Differential Revision:	https://reviews.freebsd.org/D6576
2016-05-27 11:46:54 +00:00
Marcelo Araujo
305b5a14e4 Cleanup unused-but-set-variable spotted by gcc-4.9.
Reviewed by:	neel
Approved by:	rodrigc (mentor)
Differential Revision:	https://reviews.freebsd.org/D5042
2016-01-26 07:17:21 +00:00
Neel Natu
4e43c1e8b5 Allow configuration of the sector size advertised to the guest.
The default behavior is to infer the logical and physical sector sizes from
the block device backend. However older versions of Windows only work with
specific logical/physical combinations:
- Vista and Windows 7:	512/512
- Windows 7 SP1:	512/512 or 512/4096

For this reason allow the sector size to be specified using the following
block device option: sectorsize=logical[/physical]

Reported by:	Leon Dang (ldang@nahannisys.com)
Reviewed by:	grehan
MFC after:	2 weeks
2015-05-12 00:30:39 +00:00
Alexander Motin
bb1524af0c Workaround bhyve virtual disks operation on top of GEOM providers.
GEOM does not support scatter/gather lists in its I/Os.  Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size.  If such case is detected,
move the data through temporary sequential buffer.

MFC after:	2 weeks
2015-04-18 20:10:19 +00:00
Alexander Motin
e365f36c32 Pre-allocate one extra request per processing thread.
Processing threads call callbacks before freeing requests.  As result,
new requests may arrive before old ones are freed.

MFC after:	2 weeks
2015-03-15 22:44:53 +00:00
Alexander Motin
f2e62de7d9 Close potential race on blockif_close().
Reported by:	vangyzen
MFC after:	2 weeks
2015-03-15 16:18:03 +00:00
Alexander Motin
066a8f1411 Rewrite virtio block device driver to work asynchronously and use the block
I/O interface.

Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.

Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O.  Guest CPU usage during test dropped from 100% to almost zero.

MFC after:	2 weeks
2015-03-15 14:57:11 +00:00
Alexander Motin
7e8e553940 Block delete capability for read-only devices.
Submitted by:	neel
MFC after:	2 weeks
2015-03-15 08:09:56 +00:00
Alexander Motin
79565afed8 Give block I/O interface multiple (8) execution threads.
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.

Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.

MFC after:	2 weeks
2015-03-14 21:15:45 +00:00
Alexander Motin
0b9d25c935 Add DSM TRIM command support for virtual AHCI disks.
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE.  Virtual disks backed by files won't report this capability.

MFC after:	2 weeks
Relnotes:	yes
2015-03-13 16:43:52 +00:00
Alexander Motin
2d678f1f4f Implement cache flush for ahci-hd and for virtio-blk over device.
MFC after:	2 weeks
2015-03-05 15:29:18 +00:00
Alexander Motin
94682383d9 Report logical/physical sector sizes for virtual SATA disk.
MFC after:	2 weeks
2015-03-05 12:21:12 +00:00
Tycho Nightingale
48a9d8f214 To allow a request to be submitted from within the callback routine of
a completing one increase the total by 1 but don't advertise it.

Reviewed by:	grehan
2014-11-09 21:08:52 +00:00
Tycho Nightingale
ae45750d6c Improve the ability to cancel an in-flight request by using an
interrupt, via SIGCONT, to force the read or write system call to
return prematurely.

Reviewed by:	grehan
2014-11-04 01:06:33 +00:00
Tycho Nightingale
3ef05c4677 Support stopping and restarting the AHCI command list via toggling
PxCMD.ST from '1' to '0' and back.  This allows the driver a chance to
recover if for instance a timeout occurred due to activity on the
host.

Reviewed by:	grehan
2014-10-17 11:37:50 +00:00
Neel Natu
1aba8e7ff8 Initialize 'bc_rdonly' to the right value.
Note that independent of this change a readonly disk file would still be
opened O_RDONLY and protected from writes by the guest.

Reviewed by:	grehan
2014-09-11 21:15:20 +00:00
Peter Grehan
c4813fadf1 Add a call to synthesize a C/H/S value for block emulations
that require it (ahci). The algorithm used is from the VHD
specification.
2014-07-15 00:25:54 +00:00
Xin LI
994f858a8b Use calloc() in favor of malloc + memset.
Reviewed by:	neel
2014-04-22 18:55:21 +00:00
Tycho Nightingale
40eb53f232 Increase the block-layer backend maximum number of requests to match
the AHCI command queue depth.  This allows a slew of commands issued
by a Linux guest to be absorbed without error.

Approved by:	grehan (co-mentor)
2014-01-22 01:56:49 +00:00
Peter Grehan
7f5487aca1 Add the VM name to the process name with setproctitle().
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.

This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.

Requested by:	Michael Dexter
Reviewed by:	neel
MFC after:	4 weeks
2013-11-06 00:25:17 +00:00
Peter Grehan
7cf5a7eeb0 Block-layer backend interface for bhyve block-io device emulations.
Approved by:	re@ (blanket)
2013-10-04 16:52:03 +00:00