Commit Graph

16 Commits

Author SHA1 Message Date
John Baldwin
483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
Vincenzo Maffione
07b35f77c0 bhyve: rework mevent processing to fix a race condition
At the end of both mevent_add() and mevent_update(), mevent_notify()
is called to wakeup the I/O thread, that will call kevent(changelist)
to update the kernel.
A race condition is possible where the client calls mevent_add() and
mevent_update(EV_ENABLE) before the I/O thread has the chance to wake
up and call mevent_build()+kevent(changelist) in response to mevent_add().
The mevent_add() is therefore ignored by the I/O thread, and
kevent(fd, EV_ENABLE) is called before kevent(fd, EV_ADD), resuliting
in a failure of the kevent(fd, EV_ENABLE) call.

PR:	241808
Reviewed by:	jhb, markj
MFC with:	r354288
Differential Revision:	https://reviews.freebsd.org/D22286
2019-11-12 21:07:51 +00:00
Vincenzo Maffione
3e11768ee1 bhyve: add backend rx backpressure to virtio-net
If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by:	bryanv, jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20987
2019-11-03 19:02:32 +00:00
Vincenzo Maffione
d12c5ef640 bhyve: support for enabling/disabling the net backend
Extend the net backend interface with two functions, namely netbe_rx_disable()
and netbe_rx_enable(), which can be used by the net device emulators to stop
the backend from invoking the receive callback. This is useful for device
emulators, i.e., on hardware resets or to implement receive backpressure.
The mevent module has been extendede to support the addition of a disabled
event. To prevent race conditions, the net backends will start with receive
operation disabled. A follow-up patch will use the new functionalities in
the virtio-net device.

Reviewed by:	jhb, markj
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D20973
2019-09-28 12:02:43 +00:00
Sean Chittenden
fe1329e446 usr.sbin/bhyve: send an initialized value to wake up blocking kqueue
This is a no-op initialization because nothing reads this value.  "This
wasn't wrong previously, but this is more correct now." -imp

Coverity CID:	1194307
Approved by:	markj, imp, scottl
Differential Revision:	https://reviews.freebsd.org/D20921
2019-07-11 23:54:50 +00:00
Marcelo Araujo
abfa3c39e7 Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by:	jhb
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D18744
2019-01-16 00:39:23 +00:00
Marcelo Araujo
f7224b709f Fix style(9) space vs tab.
Reviewed by:	jhb
MFC after:	3 weeks.
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D15768
2018-06-14 01:34:53 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
Bartek Rutkowski
00ef17befe Capsicum support for bhyve(8).
Adds Capsicum sandboxing to bhyve.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by:	grehan, oshogbo
Approved by:	emaste, grehan
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D8290
2017-02-14 13:35:59 +00:00
Neel Natu
09fd42cb88 Re-adding an event to a kqueue modifies the parameters of the original event.
However, if the original knote had been disabled then it is not automatically
re-enabled.

Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it.

Adding a kevent for the first time implicitly enables it so existing callers
of mevent_add() don't need to change.

Reviewed by:	grehan
2014-05-05 16:30:03 +00:00
Xin LI
994f858a8b Use calloc() in favor of malloc + memset.
Reviewed by:	neel
2014-04-22 18:55:21 +00:00
John Baldwin
058e24d34b Extend the ACPI power management support to wire a virtual power button up
to SIGTERM when ACPI is enabled.  Sending SIGTERM to the hypervisor when an
ACPI-aware OS is running will now trigger a soft-off allowing for a graceful
shutdown of the guest.
- Move constants for ACPI-related registers to acpi.h.
- Implement an SMI_CMD register with commands to enable and disable ACPI.
  Currently the only change when ACPI is enabled is to enable the virtual
  power button via SIGTERM.
- Implement a fixed-feature power button when ACPI is enabled by asserting
  PWRBTN_STS in PM1_EVT when SIGTERM is received.
- Add support for EVFILT_SIGNAL events to mevent.
- Implement support for the ACPI system command interrupt (SCI) and assert
  it when needed based on the values in PM1_EVT.  Mark the SCI as active-low
  and level triggered in the MADT and MP Table.
- Mark PCI interrupts in the MP Table as active-low in addition to level
  triggered.

Reviewed by:	neel
2013-12-28 04:01:05 +00:00
Peter Grehan
7f5487aca1 Add the VM name to the process name with setproctitle().
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.

This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.

Requested by:	Michael Dexter
Reviewed by:	neel
MFC after:	4 weeks
2013-11-06 00:25:17 +00:00
Peter Grehan
151dba4a87 Add simplistic periodic timer support to mevent using kqueue's
timer support. This should be enough for the emulation of
h/w periodic timers (and no more) e.g. some of the 8254's
more esoteric modes that happen to be used by non-FreeBSD o/s's.

Approved by:	re@ (blanket)
2013-09-19 04:48:26 +00:00
Peter Grehan
4e8c7465ad Change thread name for the main kqueue event loop to "<vmname> mevent" so
it can be easily distinguished from other non-vCPU threads in forthcoming
changes.

Obtained from:	NetApp
2012-12-20 23:01:53 +00:00
Peter Grehan
366f60834f Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
  bhyve  - user-space sequencer and i/o emulation
  vmmctl - dump of hypervisor register state
  libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
	Joe CaraDonna
	Peter Snyder
	Jeff Heller
	Sandeep Mann
	Steve Miller
	Brian Pawlowski
2011-05-13 04:54:01 +00:00