Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.
Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.
Submitted by: Jason King <jason.king@joyent.com> (improvements)
Reviewed by: jhb
Obtained from: illumos-joyent (improvements)
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21707
Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.
Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657
The Windows virtio driver ignores the advertized seg_max field and
assumes the host can accept up to 67 segments in indirect descriptors,
triggering an assert in the bhyve process.
This brings back r282922 but with a couple of changes:
- It raises the block interface segment limit to 128 instead of 67.
- Linux's virtio driver assumes that the segment limit is no
larger than the ring size. To avoid breaking Linux guests,
raise the VirtIO ring size to 128, and cap the VirtIO segment
limit at ring size - 2 (effectively 126).
Reviewed by: rgrimes, Patrick Mooney <pmooney@pfmooney.com>
Obtained from: Joyent (Linux workaround)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18831
nopt is the only allocated space,
xopt and cp are aliases into that allocated space.
Remove the 2 unneeded free's
Reported by: Patrick Mooney (@pmooney_pfmooney.com)
Reviewed by: jhb (maintainer), Patrick Mooney (joyent/illumos)
Approved by: bde (mentor)
CID: 1305412
MFC after: 3 days
MFC with: 340042
Differential Revision: https://reviews.freebsd.org/D19200
will return success when the kernel is built without support of
the capability mode.
It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.
Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
When bhyve cannot open a backing file, it now says explicitly which file
could not be opened
Note that the change has only be maed in block_if.c and not in
pci_virtio_block.c as the error will always be catched by the first
PR: 202321 (different patch)
Reviewed by: grehan
MFC after: 3 day
Sponsored by: Gandi.net
Differential Revision: https://reviews.freebsd.org/D6576
The default behavior is to infer the logical and physical sector sizes from
the block device backend. However older versions of Windows only work with
specific logical/physical combinations:
- Vista and Windows 7: 512/512
- Windows 7 SP1: 512/512 or 512/4096
For this reason allow the sector size to be specified using the following
block device option: sectorsize=logical[/physical]
Reported by: Leon Dang (ldang@nahannisys.com)
Reviewed by: grehan
MFC after: 2 weeks
GEOM does not support scatter/gather lists in its I/Os. Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size. If such case is detected,
move the data through temporary sequential buffer.
MFC after: 2 weeks
I/O interface.
Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.
Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O. Guest CPU usage during test dropped from 100% to almost zero.
MFC after: 2 weeks
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.
Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.
MFC after: 2 weeks
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE. Virtual disks backed by files won't report this capability.
MFC after: 2 weeks
Relnotes: yes
PxCMD.ST from '1' to '0' and back. This allows the driver a chance to
recover if for instance a timeout occurred due to activity on the
host.
Reviewed by: grehan
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.
This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.
Requested by: Michael Dexter
Reviewed by: neel
MFC after: 4 weeks