2016-05-27 06:22:24 +00:00
|
|
|
/*-
|
2023-05-10 15:40:58 +00:00
|
|
|
* SPDX-License-Identifier: BSD-2-Clause
|
2017-11-27 15:37:16 +00:00
|
|
|
*
|
2016-05-27 06:22:24 +00:00
|
|
|
* Copyright (c) 2011 NetApp, Inc.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
#include <sys/types.h>
|
2017-02-14 13:35:59 +00:00
|
|
|
#ifndef WITHOUT_CAPSICUM
|
|
|
|
#include <sys/capsicum.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <sys/mman.h>
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <sys/time.h>
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
#include <sys/un.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2018-10-27 21:24:28 +00:00
|
|
|
#include <amd64/vmm/intel/vmcs.h>
|
2022-09-07 07:07:03 +00:00
|
|
|
#include <x86/apicreg.h>
|
2018-10-27 21:24:28 +00:00
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <machine/atomic.h>
|
|
|
|
#include <machine/segments.h>
|
|
|
|
|
2017-02-14 13:35:59 +00:00
|
|
|
#ifndef WITHOUT_CAPSICUM
|
|
|
|
#include <capsicum_helpers.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <err.h>
|
2017-02-14 13:35:59 +00:00
|
|
|
#include <errno.h>
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
#include <fcntl.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <libgen.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <assert.h>
|
|
|
|
#include <pthread.h>
|
|
|
|
#include <pthread_np.h>
|
|
|
|
#include <sysexits.h>
|
|
|
|
#include <stdbool.h>
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
#include <stdint.h>
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
#include <ucl.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
|
|
|
|
#include <libxo/xo.h>
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
#include <machine/vmm.h>
|
2017-02-14 13:35:59 +00:00
|
|
|
#ifndef WITHOUT_CAPSICUM
|
|
|
|
#include <machine/vmm_dev.h>
|
|
|
|
#endif
|
2020-06-25 00:18:42 +00:00
|
|
|
#include <machine/vmm_instruction_emul.h>
|
2016-05-27 06:22:24 +00:00
|
|
|
#include <vmmapi.h>
|
|
|
|
|
|
|
|
#include "bhyverun.h"
|
|
|
|
#include "acpi.h"
|
Initial bhyve native graphics support.
This adds emulations for a raw framebuffer device, PS2 keyboard/mouse,
XHCI USB controller and a USB tablet.
A simple VNC server is provided for keyboard/mouse input, and graphics
output.
A VGA emulation is included, but is currently disconnected until an
additional bhyve change to block out VGA memory is committed.
Credits:
- raw framebuffer, VNC server, XHCI controller, USB bus/device emulation
and UEFI f/w support by Leon Dang
- VGA, console/g, initial VNC server by tychon@
- PS2 keyboard/mouse jointly done by tychon@ and Leon Dang
- hypervisor framebuffer mem support by neel@
Tested by: Michael Dexter, in a number of revisions of this code.
With the appropriate UEFI image, FreeBSD, Windows and Linux guests can
installed and run in graphics mode using the UEFI/GOP framebuffer.
2016-05-27 06:30:35 +00:00
|
|
|
#include "atkbdc.h"
|
2020-04-15 01:58:51 +00:00
|
|
|
#include "bootrom.h"
|
2019-06-26 20:30:41 +00:00
|
|
|
#include "config.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "inout.h"
|
2020-04-15 02:34:44 +00:00
|
|
|
#include "debug.h"
|
2021-09-09 09:37:04 +00:00
|
|
|
#include "e820.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "fwctl.h"
|
2018-05-01 15:17:46 +00:00
|
|
|
#include "gdb.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "ioapic.h"
|
2020-05-15 15:54:22 +00:00
|
|
|
#include "kernemu_dev.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "mem.h"
|
|
|
|
#include "mevent.h"
|
|
|
|
#include "mptbl.h"
|
|
|
|
#include "pci_emul.h"
|
|
|
|
#include "pci_irq.h"
|
|
|
|
#include "pci_lpc.h"
|
2021-08-18 07:31:59 +00:00
|
|
|
#include "qemu_fwcfg.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "smbiostbl.h"
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
#include "snapshot.h"
|
|
|
|
#endif
|
2021-10-07 14:20:37 +00:00
|
|
|
#include "tpm_device.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
#include "xmsr.h"
|
|
|
|
#include "spinup_ap.h"
|
|
|
|
#include "rtc.h"
|
2020-04-15 02:00:17 +00:00
|
|
|
#include "vmgenc.h"
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
#define MB (1024UL * 1024)
|
|
|
|
#define GB (1024UL * MB)
|
|
|
|
|
2018-10-27 21:24:28 +00:00
|
|
|
static const char * const vmx_exit_reason_desc[] = {
|
|
|
|
[EXIT_REASON_EXCEPTION] = "Exception or non-maskable interrupt (NMI)",
|
|
|
|
[EXIT_REASON_EXT_INTR] = "External interrupt",
|
|
|
|
[EXIT_REASON_TRIPLE_FAULT] = "Triple fault",
|
|
|
|
[EXIT_REASON_INIT] = "INIT signal",
|
|
|
|
[EXIT_REASON_SIPI] = "Start-up IPI (SIPI)",
|
|
|
|
[EXIT_REASON_IO_SMI] = "I/O system-management interrupt (SMI)",
|
|
|
|
[EXIT_REASON_SMI] = "Other SMI",
|
|
|
|
[EXIT_REASON_INTR_WINDOW] = "Interrupt window",
|
|
|
|
[EXIT_REASON_NMI_WINDOW] = "NMI window",
|
|
|
|
[EXIT_REASON_TASK_SWITCH] = "Task switch",
|
|
|
|
[EXIT_REASON_CPUID] = "CPUID",
|
|
|
|
[EXIT_REASON_GETSEC] = "GETSEC",
|
|
|
|
[EXIT_REASON_HLT] = "HLT",
|
|
|
|
[EXIT_REASON_INVD] = "INVD",
|
|
|
|
[EXIT_REASON_INVLPG] = "INVLPG",
|
|
|
|
[EXIT_REASON_RDPMC] = "RDPMC",
|
|
|
|
[EXIT_REASON_RDTSC] = "RDTSC",
|
|
|
|
[EXIT_REASON_RSM] = "RSM",
|
|
|
|
[EXIT_REASON_VMCALL] = "VMCALL",
|
|
|
|
[EXIT_REASON_VMCLEAR] = "VMCLEAR",
|
|
|
|
[EXIT_REASON_VMLAUNCH] = "VMLAUNCH",
|
|
|
|
[EXIT_REASON_VMPTRLD] = "VMPTRLD",
|
|
|
|
[EXIT_REASON_VMPTRST] = "VMPTRST",
|
|
|
|
[EXIT_REASON_VMREAD] = "VMREAD",
|
|
|
|
[EXIT_REASON_VMRESUME] = "VMRESUME",
|
|
|
|
[EXIT_REASON_VMWRITE] = "VMWRITE",
|
|
|
|
[EXIT_REASON_VMXOFF] = "VMXOFF",
|
|
|
|
[EXIT_REASON_VMXON] = "VMXON",
|
|
|
|
[EXIT_REASON_CR_ACCESS] = "Control-register accesses",
|
|
|
|
[EXIT_REASON_DR_ACCESS] = "MOV DR",
|
|
|
|
[EXIT_REASON_INOUT] = "I/O instruction",
|
|
|
|
[EXIT_REASON_RDMSR] = "RDMSR",
|
|
|
|
[EXIT_REASON_WRMSR] = "WRMSR",
|
|
|
|
[EXIT_REASON_INVAL_VMCS] =
|
|
|
|
"VM-entry failure due to invalid guest state",
|
|
|
|
[EXIT_REASON_INVAL_MSR] = "VM-entry failure due to MSR loading",
|
|
|
|
[EXIT_REASON_MWAIT] = "MWAIT",
|
|
|
|
[EXIT_REASON_MTF] = "Monitor trap flag",
|
|
|
|
[EXIT_REASON_MONITOR] = "MONITOR",
|
|
|
|
[EXIT_REASON_PAUSE] = "PAUSE",
|
|
|
|
[EXIT_REASON_MCE_DURING_ENTRY] =
|
|
|
|
"VM-entry failure due to machine-check event",
|
|
|
|
[EXIT_REASON_TPR] = "TPR below threshold",
|
|
|
|
[EXIT_REASON_APIC_ACCESS] = "APIC access",
|
|
|
|
[EXIT_REASON_VIRTUALIZED_EOI] = "Virtualized EOI",
|
|
|
|
[EXIT_REASON_GDTR_IDTR] = "Access to GDTR or IDTR",
|
|
|
|
[EXIT_REASON_LDTR_TR] = "Access to LDTR or TR",
|
|
|
|
[EXIT_REASON_EPT_FAULT] = "EPT violation",
|
|
|
|
[EXIT_REASON_EPT_MISCONFIG] = "EPT misconfiguration",
|
|
|
|
[EXIT_REASON_INVEPT] = "INVEPT",
|
|
|
|
[EXIT_REASON_RDTSCP] = "RDTSCP",
|
|
|
|
[EXIT_REASON_VMX_PREEMPT] = "VMX-preemption timer expired",
|
|
|
|
[EXIT_REASON_INVVPID] = "INVVPID",
|
|
|
|
[EXIT_REASON_WBINVD] = "WBINVD",
|
|
|
|
[EXIT_REASON_XSETBV] = "XSETBV",
|
|
|
|
[EXIT_REASON_APIC_WRITE] = "APIC write",
|
|
|
|
[EXIT_REASON_RDRAND] = "RDRAND",
|
|
|
|
[EXIT_REASON_INVPCID] = "INVPCID",
|
|
|
|
[EXIT_REASON_VMFUNC] = "VMFUNC",
|
|
|
|
[EXIT_REASON_ENCLS] = "ENCLS",
|
|
|
|
[EXIT_REASON_RDSEED] = "RDSEED",
|
|
|
|
[EXIT_REASON_PM_LOG_FULL] = "Page-modification log full",
|
|
|
|
[EXIT_REASON_XSAVES] = "XSAVES",
|
|
|
|
[EXIT_REASON_XRSTORS] = "XRSTORS"
|
|
|
|
};
|
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
typedef int (*vmexit_handler_t)(struct vmctx *, struct vcpu *, struct vm_run *);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
int guest_ncpus;
|
2022-09-09 00:40:02 +00:00
|
|
|
uint16_t cpu_cores, cpu_sockets, cpu_threads;
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
|
2020-01-08 22:55:22 +00:00
|
|
|
int raw_stdio = 0;
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
static char *progname;
|
|
|
|
static const int BSP = 0;
|
|
|
|
|
|
|
|
static cpuset_t cpumask;
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
static void vm_loop(struct vmctx *ctx, struct vcpu *vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
static struct vcpu_info {
|
|
|
|
struct vmctx *ctx;
|
|
|
|
struct vcpu *vcpu;
|
|
|
|
int vcpuid;
|
|
|
|
} *vcpu_info;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2022-03-09 23:39:08 +00:00
|
|
|
static cpuset_t **vcpumap;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
usage(int code)
|
|
|
|
{
|
|
|
|
|
2023-07-17 15:11:20 +00:00
|
|
|
fprintf(stderr,
|
2021-04-18 18:04:12 +00:00
|
|
|
"Usage: %s [-AaCDeHhPSuWwxY]\n"
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
" %*s [-c [[cpus=]numcpus][,sockets=n][,cores=n][,threads=n]]\n"
|
2021-10-12 19:49:43 +00:00
|
|
|
" %*s [-G port] [-k config_file] [-l lpc] [-m mem] [-o var=value]\n"
|
2021-04-18 18:04:12 +00:00
|
|
|
" %*s [-p vcpu:hostcpu] [-r file] [-s pci] [-U uuid] vmname\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -A: create ACPI tables\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -a: local apic is in xAPIC mode (deprecated)\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -C: include guest memory in core file\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -c: number of CPUs and/or topology specification\n"
|
2020-06-25 12:35:20 +00:00
|
|
|
" -D: destroy on power-off\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -e: exit on unhandled I/O access\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -G: start a debug server\n"
|
|
|
|
" -H: vmexit from the guest on HLT\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -h: help\n"
|
2019-06-26 20:30:41 +00:00
|
|
|
" -k: key=value flat config file\n"
|
2022-01-20 22:44:04 +00:00
|
|
|
" -K: PS2 keyboard layout\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -l: LPC device configuration\n"
|
2022-03-10 10:30:17 +00:00
|
|
|
" -m: memory size\n"
|
2019-06-26 20:30:41 +00:00
|
|
|
" -o: set config 'var' to 'value'\n"
|
|
|
|
" -P: vmexit from the guest on pause\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -p: pin 'vcpu' to 'hostcpu'\n"
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
" -r: path to checkpoint file\n"
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
" -S: guest memory cannot be swapped\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -s: <slot,driver,configinfo> PCI slot config\n"
|
|
|
|
" -U: UUID\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -u: RTC keeps UTC time\n"
|
|
|
|
" -W: force virtio to use single-vector MSI\n"
|
2021-04-18 18:13:54 +00:00
|
|
|
" -w: ignore unimplemented MSRs\n"
|
|
|
|
" -x: local APIC is in x2APIC mode\n"
|
2016-05-27 06:22:24 +00:00
|
|
|
" -Y: disable MPtable generation\n",
|
2019-02-01 03:09:11 +00:00
|
|
|
progname, (int)strlen(progname), "", (int)strlen(progname), "",
|
|
|
|
(int)strlen(progname), "");
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
exit(code);
|
|
|
|
}
|
|
|
|
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
/*
|
|
|
|
* XXX This parser is known to have the following issues:
|
2019-06-26 20:30:41 +00:00
|
|
|
* 1. It accepts null key=value tokens ",," as setting "cpus" to an
|
|
|
|
* empty string.
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
*
|
|
|
|
* The acceptance of a null specification ('-c ""') is by design to match the
|
|
|
|
* manual page syntax specification, this results in a topology of 1 vCPU.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
topology_parse(const char *opt)
|
|
|
|
{
|
2022-02-24 17:36:41 +00:00
|
|
|
char *cp, *str, *tofree;
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (*opt == '\0') {
|
|
|
|
set_config_value("sockets", "1");
|
|
|
|
set_config_value("cores", "1");
|
|
|
|
set_config_value("threads", "1");
|
|
|
|
set_config_value("cpus", "1");
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2022-02-24 17:36:41 +00:00
|
|
|
tofree = str = strdup(opt);
|
2018-05-25 18:54:40 +00:00
|
|
|
if (str == NULL)
|
2019-06-26 20:30:41 +00:00
|
|
|
errx(4, "Failed to allocate memory");
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
|
|
|
|
while ((cp = strsep(&str, ",")) != NULL) {
|
2019-06-26 20:30:41 +00:00
|
|
|
if (strncmp(cp, "cpus=", strlen("cpus=")) == 0)
|
|
|
|
set_config_value("cpus", cp + strlen("cpus="));
|
|
|
|
else if (strncmp(cp, "sockets=", strlen("sockets=")) == 0)
|
|
|
|
set_config_value("sockets", cp + strlen("sockets="));
|
|
|
|
else if (strncmp(cp, "cores=", strlen("cores=")) == 0)
|
|
|
|
set_config_value("cores", cp + strlen("cores="));
|
|
|
|
else if (strncmp(cp, "threads=", strlen("threads=")) == 0)
|
|
|
|
set_config_value("threads", cp + strlen("threads="));
|
|
|
|
else if (strchr(cp, '=') != NULL)
|
2018-05-25 02:07:05 +00:00
|
|
|
goto out;
|
2019-06-26 20:30:41 +00:00
|
|
|
else
|
|
|
|
set_config_value("cpus", cp);
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
}
|
2022-02-24 17:36:41 +00:00
|
|
|
free(tofree);
|
2019-06-26 20:30:41 +00:00
|
|
|
return (0);
|
2018-05-25 02:07:05 +00:00
|
|
|
|
|
|
|
out:
|
2022-02-24 17:36:41 +00:00
|
|
|
free(tofree);
|
2018-05-25 02:07:05 +00:00
|
|
|
return (-1);
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
static int
|
|
|
|
parse_int_value(const char *key, const char *value, int minval, int maxval)
|
|
|
|
{
|
|
|
|
char *cp;
|
|
|
|
long lval;
|
|
|
|
|
|
|
|
errno = 0;
|
|
|
|
lval = strtol(value, &cp, 0);
|
|
|
|
if (errno != 0 || *cp != '\0' || cp == value || lval < minval ||
|
|
|
|
lval > maxval)
|
|
|
|
errx(4, "Invalid value for %s: '%s'", key, value);
|
|
|
|
return (lval);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the sockets, cores, threads, and guest_cpus variables based on
|
|
|
|
* the configured topology.
|
|
|
|
*
|
|
|
|
* The limits of UINT16_MAX are due to the types passed to
|
|
|
|
* vm_set_topology(). vmm.ko may enforce tighter limits.
|
|
|
|
*/
|
|
|
|
static void
|
2022-10-25 13:22:12 +00:00
|
|
|
calc_topology(void)
|
2019-06-26 20:30:41 +00:00
|
|
|
{
|
|
|
|
const char *value;
|
|
|
|
bool explicit_cpus;
|
|
|
|
uint64_t ncpus;
|
|
|
|
|
|
|
|
value = get_config_value("cpus");
|
|
|
|
if (value != NULL) {
|
|
|
|
guest_ncpus = parse_int_value("cpus", value, 1, UINT16_MAX);
|
|
|
|
explicit_cpus = true;
|
|
|
|
} else {
|
|
|
|
guest_ncpus = 1;
|
|
|
|
explicit_cpus = false;
|
|
|
|
}
|
|
|
|
value = get_config_value("cores");
|
|
|
|
if (value != NULL)
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_cores = parse_int_value("cores", value, 1, UINT16_MAX);
|
2019-06-26 20:30:41 +00:00
|
|
|
else
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_cores = 1;
|
2019-06-26 20:30:41 +00:00
|
|
|
value = get_config_value("threads");
|
|
|
|
if (value != NULL)
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_threads = parse_int_value("threads", value, 1, UINT16_MAX);
|
2019-06-26 20:30:41 +00:00
|
|
|
else
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_threads = 1;
|
2019-06-26 20:30:41 +00:00
|
|
|
value = get_config_value("sockets");
|
|
|
|
if (value != NULL)
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_sockets = parse_int_value("sockets", value, 1, UINT16_MAX);
|
2019-06-26 20:30:41 +00:00
|
|
|
else
|
2022-09-09 00:40:02 +00:00
|
|
|
cpu_sockets = guest_ncpus;
|
2019-06-26 20:30:41 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Compute sockets * cores * threads avoiding overflow. The
|
|
|
|
* range check above insures these are 16 bit values.
|
|
|
|
*/
|
2022-09-09 00:40:02 +00:00
|
|
|
ncpus = (uint64_t)cpu_sockets * cpu_cores * cpu_threads;
|
2019-06-26 20:30:41 +00:00
|
|
|
if (ncpus > UINT16_MAX)
|
|
|
|
errx(4, "Computed number of vCPUs too high: %ju",
|
|
|
|
(uintmax_t)ncpus);
|
|
|
|
|
|
|
|
if (explicit_cpus) {
|
2022-10-23 14:32:45 +00:00
|
|
|
if (guest_ncpus != (int)ncpus)
|
2019-06-26 20:30:41 +00:00
|
|
|
errx(4, "Topology (%d sockets, %d cores, %d threads) "
|
2022-09-09 00:40:02 +00:00
|
|
|
"does not match %d vCPUs",
|
|
|
|
cpu_sockets, cpu_cores, cpu_threads,
|
2019-06-26 20:30:41 +00:00
|
|
|
guest_ncpus);
|
|
|
|
} else
|
|
|
|
guest_ncpus = ncpus;
|
|
|
|
}
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
static int
|
|
|
|
pincpu_parse(const char *opt)
|
|
|
|
{
|
2019-06-26 20:30:41 +00:00
|
|
|
const char *value;
|
|
|
|
char *newval;
|
|
|
|
char key[16];
|
2016-05-27 06:22:24 +00:00
|
|
|
int vcpu, pcpu;
|
|
|
|
|
|
|
|
if (sscanf(opt, "%d:%d", &vcpu, &pcpu) != 2) {
|
|
|
|
fprintf(stderr, "invalid format: %s\n", opt);
|
|
|
|
return (-1);
|
|
|
|
}
|
|
|
|
|
2022-03-09 23:39:16 +00:00
|
|
|
if (vcpu < 0) {
|
|
|
|
fprintf(stderr, "invalid vcpu '%d'\n", vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (-1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pcpu < 0 || pcpu >= CPU_SETSIZE) {
|
|
|
|
fprintf(stderr, "hostcpu '%d' outside valid range from "
|
|
|
|
"0 to %d\n", pcpu, CPU_SETSIZE - 1);
|
|
|
|
return (-1);
|
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
snprintf(key, sizeof(key), "vcpu.%d.cpuset", vcpu);
|
|
|
|
value = get_config_value(key);
|
|
|
|
|
|
|
|
if (asprintf(&newval, "%s%s%d", value != NULL ? value : "",
|
|
|
|
value != NULL ? "," : "", pcpu) == -1) {
|
|
|
|
perror("failed to build new cpuset string");
|
|
|
|
return (-1);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
2019-06-26 20:30:41 +00:00
|
|
|
|
|
|
|
set_config_value(key, newval);
|
|
|
|
free(newval);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
static void
|
|
|
|
parse_cpuset(int vcpu, const char *list, cpuset_t *set)
|
|
|
|
{
|
|
|
|
char *cp, *token;
|
|
|
|
int pcpu, start;
|
|
|
|
|
|
|
|
CPU_ZERO(set);
|
|
|
|
start = -1;
|
|
|
|
token = __DECONST(char *, list);
|
|
|
|
for (;;) {
|
|
|
|
pcpu = strtoul(token, &cp, 0);
|
|
|
|
if (cp == token)
|
|
|
|
errx(4, "invalid cpuset for vcpu %d: '%s'", vcpu, list);
|
|
|
|
if (pcpu < 0 || pcpu >= CPU_SETSIZE)
|
|
|
|
errx(4, "hostcpu '%d' outside valid range from 0 to %d",
|
|
|
|
pcpu, CPU_SETSIZE - 1);
|
|
|
|
switch (*cp) {
|
|
|
|
case ',':
|
|
|
|
case '\0':
|
|
|
|
if (start >= 0) {
|
|
|
|
if (start > pcpu)
|
|
|
|
errx(4, "Invalid hostcpu range %d-%d",
|
|
|
|
start, pcpu);
|
|
|
|
while (start < pcpu) {
|
2022-12-21 18:33:04 +00:00
|
|
|
CPU_SET(start, set);
|
2019-06-26 20:30:41 +00:00
|
|
|
start++;
|
|
|
|
}
|
|
|
|
start = -1;
|
|
|
|
}
|
2022-12-21 18:33:04 +00:00
|
|
|
CPU_SET(pcpu, set);
|
2019-06-26 20:30:41 +00:00
|
|
|
break;
|
|
|
|
case '-':
|
|
|
|
if (start >= 0)
|
|
|
|
errx(4, "invalid cpuset for vcpu %d: '%s'",
|
|
|
|
vcpu, list);
|
|
|
|
start = pcpu;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
errx(4, "invalid cpuset for vcpu %d: '%s'", vcpu, list);
|
|
|
|
}
|
|
|
|
if (*cp == '\0')
|
|
|
|
break;
|
|
|
|
token = cp + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
build_vcpumaps(void)
|
|
|
|
{
|
|
|
|
char key[16];
|
|
|
|
const char *value;
|
|
|
|
int vcpu;
|
|
|
|
|
2022-03-09 23:39:08 +00:00
|
|
|
vcpumap = calloc(guest_ncpus, sizeof(*vcpumap));
|
2019-06-26 20:30:41 +00:00
|
|
|
for (vcpu = 0; vcpu < guest_ncpus; vcpu++) {
|
|
|
|
snprintf(key, sizeof(key), "vcpu.%d.cpuset", vcpu);
|
|
|
|
value = get_config_value(key);
|
|
|
|
if (value == NULL)
|
|
|
|
continue;
|
|
|
|
vcpumap[vcpu] = malloc(sizeof(cpuset_t));
|
|
|
|
if (vcpumap[vcpu] == NULL)
|
|
|
|
err(4, "Failed to allocate cpuset for vcpu %d", vcpu);
|
|
|
|
parse_cpuset(vcpu, value, vcpumap[vcpu]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
void
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_inject_fault(struct vcpu *vcpu, int vector, int errcode_valid,
|
2016-05-27 06:22:24 +00:00
|
|
|
int errcode)
|
|
|
|
{
|
|
|
|
int error, restart_instruction;
|
|
|
|
|
|
|
|
restart_instruction = 1;
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_inject_exception(vcpu, vector, errcode_valid, errcode,
|
2016-05-27 06:22:24 +00:00
|
|
|
restart_instruction);
|
|
|
|
assert(error == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
void *
|
|
|
|
paddr_guest2host(struct vmctx *ctx, uintptr_t gaddr, size_t len)
|
|
|
|
{
|
|
|
|
|
|
|
|
return (vm_map_gpa(ctx, gaddr, len));
|
|
|
|
}
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
uintptr_t
|
|
|
|
paddr_host2guest(struct vmctx *ctx, void *addr)
|
|
|
|
{
|
|
|
|
return (vm_rev_map_gpa(ctx, addr));
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
int
|
|
|
|
fbsdrun_virtio_msix(void)
|
|
|
|
{
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
return (get_config_bool_default("virtio_msix", true));
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *
|
|
|
|
fbsdrun_start_thread(void *param)
|
|
|
|
{
|
|
|
|
char tname[MAXCOMLEN + 1];
|
2023-03-24 18:49:06 +00:00
|
|
|
struct vcpu_info *vi = param;
|
|
|
|
int error;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
snprintf(tname, sizeof(tname), "vcpu %d", vi->vcpuid);
|
|
|
|
pthread_set_name_np(pthread_self(), tname);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
if (vcpumap[vi->vcpuid] != NULL) {
|
|
|
|
error = pthread_setaffinity_np(pthread_self(),
|
|
|
|
sizeof(cpuset_t), vcpumap[vi->vcpuid]);
|
2022-12-21 18:33:18 +00:00
|
|
|
assert(error == 0);
|
|
|
|
}
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-03-24 18:49:06 +00:00
|
|
|
checkpoint_cpu_add(vi->vcpuid);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
2023-03-24 18:49:06 +00:00
|
|
|
gdb_cpu_add(vi->vcpu);
|
2018-05-01 15:17:46 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_loop(vi->ctx, vi->vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
/* not reached */
|
|
|
|
exit(1);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2022-09-07 07:05:36 +00:00
|
|
|
static void
|
2023-03-24 18:49:06 +00:00
|
|
|
fbsdrun_addcpu(struct vcpu_info *vi)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-03-24 18:49:06 +00:00
|
|
|
pthread_t thr;
|
2016-05-27 06:22:24 +00:00
|
|
|
int error;
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_activate_cpu(vi->vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error != 0)
|
2023-03-24 18:49:06 +00:00
|
|
|
err(EX_OSERR, "could not activate CPU %d", vi->vcpuid);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
CPU_SET_ATOMIC(vi->vcpuid, &cpumask);
|
2022-09-07 07:05:36 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_suspend_cpu(vi->vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
error = pthread_create(&thr, NULL, fbsdrun_start_thread, vi);
|
2016-05-27 06:22:24 +00:00
|
|
|
assert(error == 0);
|
|
|
|
}
|
|
|
|
|
2023-06-19 19:46:32 +00:00
|
|
|
static void
|
2022-09-08 23:08:10 +00:00
|
|
|
fbsdrun_deletecpu(int vcpu)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-06-19 19:46:32 +00:00
|
|
|
static pthread_mutex_t resetcpu_mtx = PTHREAD_MUTEX_INITIALIZER;
|
|
|
|
static pthread_cond_t resetcpu_cond = PTHREAD_COND_INITIALIZER;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-06-19 19:46:32 +00:00
|
|
|
pthread_mutex_lock(&resetcpu_mtx);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (!CPU_ISSET(vcpu, &cpumask)) {
|
|
|
|
fprintf(stderr, "Attempting to delete unknown cpu %d\n", vcpu);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2023-06-19 19:46:32 +00:00
|
|
|
CPU_CLR(vcpu, &cpumask);
|
|
|
|
|
|
|
|
if (vcpu != BSP) {
|
|
|
|
pthread_cond_signal(&resetcpu_cond);
|
|
|
|
pthread_mutex_unlock(&resetcpu_mtx);
|
|
|
|
pthread_exit(NULL);
|
|
|
|
/* NOTREACHED */
|
|
|
|
}
|
|
|
|
|
|
|
|
while (!CPU_EMPTY(&cpumask)) {
|
|
|
|
pthread_cond_wait(&resetcpu_cond, &resetcpu_mtx);
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&resetcpu_mtx);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_inout(struct vmctx *ctx, struct vcpu *vcpu, struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
2016-05-27 06:22:24 +00:00
|
|
|
int error;
|
2023-06-15 16:12:25 +00:00
|
|
|
int bytes, port, in;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vme = vmrun->vm_exit;
|
2016-05-27 06:22:24 +00:00
|
|
|
port = vme->u.inout.port;
|
|
|
|
bytes = vme->u.inout.bytes;
|
|
|
|
in = vme->u.inout.in;
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
error = emulate_inout(ctx, vcpu, vme);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error) {
|
|
|
|
fprintf(stderr, "Unhandled %s%c 0x%04x at 0x%lx\n",
|
|
|
|
in ? "in" : "out",
|
|
|
|
bytes == 1 ? 'b' : (bytes == 2 ? 'w' : 'l'),
|
2022-12-21 18:32:45 +00:00
|
|
|
port, vme->rip);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (VMEXIT_ABORT);
|
|
|
|
} else {
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_rdmsr(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
2016-05-27 06:22:24 +00:00
|
|
|
uint64_t val;
|
|
|
|
uint32_t eax, edx;
|
|
|
|
int error;
|
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vme = vmrun->vm_exit;
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
val = 0;
|
2023-03-24 18:49:06 +00:00
|
|
|
error = emulate_rdmsr(vcpu, vme->u.msr.code, &val);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error != 0) {
|
|
|
|
fprintf(stderr, "rdmsr to register %#x on vcpu %d\n",
|
2023-03-24 18:49:06 +00:00
|
|
|
vme->u.msr.code, vcpu_id(vcpu));
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool("x86.strictmsr")) {
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_inject_gp(vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
eax = val;
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_set_register(vcpu, VM_REG_GUEST_RAX, eax);
|
2016-05-27 06:22:24 +00:00
|
|
|
assert(error == 0);
|
|
|
|
|
|
|
|
edx = val >> 32;
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_set_register(vcpu, VM_REG_GUEST_RDX, edx);
|
2016-05-27 06:22:24 +00:00
|
|
|
assert(error == 0);
|
|
|
|
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_wrmsr(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
2016-05-27 06:22:24 +00:00
|
|
|
int error;
|
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vme = vmrun->vm_exit;
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
error = emulate_wrmsr(vcpu, vme->u.msr.code, vme->u.msr.wval);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error != 0) {
|
|
|
|
fprintf(stderr, "wrmsr to register %#x(%#lx) on vcpu %d\n",
|
2023-03-24 18:49:06 +00:00
|
|
|
vme->u.msr.code, vme->u.msr.wval, vcpu_id(vcpu));
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool("x86.strictmsr")) {
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_inject_gp(vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define DEBUG_EPT_MISCONFIG
|
|
|
|
#ifdef DEBUG_EPT_MISCONFIG
|
|
|
|
#define VMCS_GUEST_PHYSICAL_ADDRESS 0x00002400
|
|
|
|
|
|
|
|
static uint64_t ept_misconfig_gpa, ept_misconfig_pte[4];
|
|
|
|
static int ept_misconfig_ptenum;
|
|
|
|
#endif
|
|
|
|
|
2018-10-27 21:24:28 +00:00
|
|
|
static const char *
|
|
|
|
vmexit_vmx_desc(uint32_t exit_reason)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (exit_reason >= nitems(vmx_exit_reason_desc) ||
|
|
|
|
vmx_exit_reason_desc[exit_reason] == NULL)
|
|
|
|
return ("Unknown");
|
|
|
|
return (vmx_exit_reason_desc[exit_reason]);
|
|
|
|
}
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_vmx(struct vmctx *ctx, struct vcpu *vcpu, struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
|
|
|
|
|
|
|
vme = vmrun->vm_exit;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
fprintf(stderr, "vm exit[%d]\n", vcpu_id(vcpu));
|
2016-05-27 06:22:24 +00:00
|
|
|
fprintf(stderr, "\treason\t\tVMX\n");
|
2022-09-08 23:08:10 +00:00
|
|
|
fprintf(stderr, "\trip\t\t0x%016lx\n", vme->rip);
|
|
|
|
fprintf(stderr, "\tinst_length\t%d\n", vme->inst_length);
|
|
|
|
fprintf(stderr, "\tstatus\t\t%d\n", vme->u.vmx.status);
|
|
|
|
fprintf(stderr, "\texit_reason\t%u (%s)\n", vme->u.vmx.exit_reason,
|
|
|
|
vmexit_vmx_desc(vme->u.vmx.exit_reason));
|
2016-05-27 06:22:24 +00:00
|
|
|
fprintf(stderr, "\tqualification\t0x%016lx\n",
|
2022-09-08 23:08:10 +00:00
|
|
|
vme->u.vmx.exit_qualification);
|
|
|
|
fprintf(stderr, "\tinst_type\t\t%d\n", vme->u.vmx.inst_type);
|
|
|
|
fprintf(stderr, "\tinst_error\t\t%d\n", vme->u.vmx.inst_error);
|
2016-05-27 06:22:24 +00:00
|
|
|
#ifdef DEBUG_EPT_MISCONFIG
|
2022-09-08 23:08:10 +00:00
|
|
|
if (vme->u.vmx.exit_reason == EXIT_REASON_EPT_MISCONFIG) {
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_get_register(vcpu,
|
2016-05-27 06:22:24 +00:00
|
|
|
VMCS_IDENT(VMCS_GUEST_PHYSICAL_ADDRESS),
|
|
|
|
&ept_misconfig_gpa);
|
|
|
|
vm_get_gpa_pmap(ctx, ept_misconfig_gpa, ept_misconfig_pte,
|
|
|
|
&ept_misconfig_ptenum);
|
|
|
|
fprintf(stderr, "\tEPT misconfiguration:\n");
|
|
|
|
fprintf(stderr, "\t\tGPA: %#lx\n", ept_misconfig_gpa);
|
|
|
|
fprintf(stderr, "\t\tPTE(%d): %#lx %#lx %#lx %#lx\n",
|
|
|
|
ept_misconfig_ptenum, ept_misconfig_pte[0],
|
|
|
|
ept_misconfig_pte[1], ept_misconfig_pte[2],
|
|
|
|
ept_misconfig_pte[3]);
|
|
|
|
}
|
|
|
|
#endif /* DEBUG_EPT_MISCONFIG */
|
|
|
|
return (VMEXIT_ABORT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_svm(struct vmctx *ctx __unused, struct vcpu *vcpu, struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
|
|
|
|
|
|
|
vme = vmrun->vm_exit;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
fprintf(stderr, "vm exit[%d]\n", vcpu_id(vcpu));
|
2016-05-27 06:22:24 +00:00
|
|
|
fprintf(stderr, "\treason\t\tSVM\n");
|
2022-09-08 23:08:10 +00:00
|
|
|
fprintf(stderr, "\trip\t\t0x%016lx\n", vme->rip);
|
|
|
|
fprintf(stderr, "\tinst_length\t%d\n", vme->inst_length);
|
|
|
|
fprintf(stderr, "\texitcode\t%#lx\n", vme->u.svm.exitcode);
|
|
|
|
fprintf(stderr, "\texitinfo1\t%#lx\n", vme->u.svm.exitinfo1);
|
|
|
|
fprintf(stderr, "\texitinfo2\t%#lx\n", vme->u.svm.exitinfo2);
|
2016-05-27 06:22:24 +00:00
|
|
|
return (VMEXIT_ABORT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_bogus(struct vmctx *ctx __unused, struct vcpu *vcpu __unused,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
assert(vmrun->vm_exit->inst_length == 0);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_reqidle(struct vmctx *ctx __unused, struct vcpu *vcpu __unused,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
assert(vmrun->vm_exit->inst_length == 0);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_hlt(struct vmctx *ctx __unused, struct vcpu *vcpu __unused,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun __unused)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Just continue execution with the next instruction. We use
|
|
|
|
* the HLT VM exit as a way to be friendly with the host
|
|
|
|
* scheduler.
|
|
|
|
*/
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_pause(struct vmctx *ctx __unused, struct vcpu *vcpu __unused,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun __unused)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_mtrap(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
assert(vmrun->vm_exit->inst_length == 0);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-03-24 18:49:06 +00:00
|
|
|
checkpoint_cpu_suspend(vcpu_id(vcpu));
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
2023-03-24 18:49:06 +00:00
|
|
|
gdb_cpu_mtrap(vcpu);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-03-24 18:49:06 +00:00
|
|
|
checkpoint_cpu_resume(vcpu_id(vcpu));
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_inst_emul(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
2016-05-27 06:22:24 +00:00
|
|
|
struct vie *vie;
|
2023-05-24 01:13:33 +00:00
|
|
|
int err, i, cs_d;
|
2020-06-25 00:18:42 +00:00
|
|
|
enum vm_cpu_mode mode;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vme = vmrun->vm_exit;
|
|
|
|
|
2022-09-08 23:08:10 +00:00
|
|
|
vie = &vme->u.inst_emul.vie;
|
2020-06-25 00:18:42 +00:00
|
|
|
if (!vie->decoded) {
|
|
|
|
/*
|
|
|
|
* Attempt to decode in userspace as a fallback. This allows
|
|
|
|
* updating instruction decode in bhyve without rebooting the
|
|
|
|
* kernel (rapid prototyping), albeit with much slower
|
|
|
|
* emulation.
|
|
|
|
*/
|
|
|
|
vie_restart(vie);
|
2022-09-08 23:08:10 +00:00
|
|
|
mode = vme->u.inst_emul.paging.cpu_mode;
|
|
|
|
cs_d = vme->u.inst_emul.cs_d;
|
2020-11-19 07:23:39 +00:00
|
|
|
if (vmm_decode_instruction(mode, cs_d, vie) != 0)
|
|
|
|
goto fail;
|
2023-03-24 18:49:06 +00:00
|
|
|
if (vm_set_register(vcpu, VM_REG_GUEST_RIP,
|
2022-09-08 23:08:10 +00:00
|
|
|
vme->rip + vie->num_processed) != 0)
|
2020-11-19 07:23:39 +00:00
|
|
|
goto fail;
|
2020-06-25 00:18:42 +00:00
|
|
|
}
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
err = emulate_mem(vcpu, vme->u.inst_emul.gpa, vie,
|
|
|
|
&vme->u.inst_emul.paging);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == ESRCH) {
|
2020-04-15 02:34:44 +00:00
|
|
|
EPRINTLN("Unhandled memory access to 0x%lx\n",
|
2022-09-08 23:08:10 +00:00
|
|
|
vme->u.inst_emul.gpa);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
2020-11-19 07:23:39 +00:00
|
|
|
goto fail;
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (VMEXIT_CONTINUE);
|
2020-11-19 07:23:39 +00:00
|
|
|
|
|
|
|
fail:
|
|
|
|
fprintf(stderr, "Failed to emulate instruction sequence [ ");
|
|
|
|
for (i = 0; i < vie->num_valid; i++)
|
|
|
|
fprintf(stderr, "%02x", vie->inst[i]);
|
2022-09-08 23:08:10 +00:00
|
|
|
FPRINTLN(stderr, " ] at 0x%lx", vme->rip);
|
2020-11-19 07:23:39 +00:00
|
|
|
return (VMEXIT_ABORT);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-05-24 01:13:33 +00:00
|
|
|
vmexit_suspend(struct vmctx *ctx, struct vcpu *vcpu, struct vm_run *vmrun)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
2016-05-27 06:22:24 +00:00
|
|
|
enum vm_suspend_how how;
|
2023-03-24 18:49:06 +00:00
|
|
|
int vcpuid = vcpu_id(vcpu);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vme = vmrun->vm_exit;
|
|
|
|
|
2022-09-08 23:08:10 +00:00
|
|
|
how = vme->u.suspended.how;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
fbsdrun_deletecpu(vcpuid);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
switch (how) {
|
|
|
|
case VM_SUSPEND_RESET:
|
|
|
|
exit(0);
|
|
|
|
case VM_SUSPEND_POWEROFF:
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("destroy_on_poweroff", false))
|
2020-06-25 12:35:20 +00:00
|
|
|
vm_destroy(ctx);
|
2016-05-27 06:22:24 +00:00
|
|
|
exit(1);
|
|
|
|
case VM_SUSPEND_HALT:
|
|
|
|
exit(2);
|
|
|
|
case VM_SUSPEND_TRIPLEFAULT:
|
|
|
|
exit(3);
|
|
|
|
default:
|
|
|
|
fprintf(stderr, "vmexit_suspend: invalid reason %d\n", how);
|
|
|
|
exit(100);
|
|
|
|
}
|
|
|
|
return (0); /* NOTREACHED */
|
|
|
|
}
|
|
|
|
|
2018-05-01 15:17:46 +00:00
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_debug(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun __unused)
|
2018-05-01 15:17:46 +00:00
|
|
|
{
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-03-24 18:49:06 +00:00
|
|
|
checkpoint_cpu_suspend(vcpu_id(vcpu));
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
2023-03-24 18:49:06 +00:00
|
|
|
gdb_cpu_suspend(vcpu);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-03-24 18:49:06 +00:00
|
|
|
checkpoint_cpu_resume(vcpu_id(vcpu));
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
2023-03-22 13:02:54 +00:00
|
|
|
/*
|
|
|
|
* XXX-MJ sleep for a short period to avoid chewing up the CPU in the
|
|
|
|
* window between activation of the vCPU thread and the STARTUP IPI.
|
|
|
|
*/
|
|
|
|
usleep(1000);
|
2018-05-01 15:17:46 +00:00
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
2019-12-13 19:21:58 +00:00
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_breakpoint(struct vmctx *ctx __unused, struct vcpu *vcpu,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2019-12-13 19:21:58 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
gdb_cpu_breakpoint(vcpu, vmrun->vm_exit);
|
2019-12-13 19:21:58 +00:00
|
|
|
return (VMEXIT_CONTINUE);
|
|
|
|
}
|
|
|
|
|
2022-09-07 07:07:03 +00:00
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
vmexit_ipi(struct vmctx *ctx __unused, struct vcpu *vcpu __unused,
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run *vmrun)
|
2022-09-07 07:07:03 +00:00
|
|
|
{
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_exit *vme;
|
|
|
|
cpuset_t *dmask;
|
2022-09-07 07:07:03 +00:00
|
|
|
int error = -1;
|
|
|
|
int i;
|
2023-05-24 01:13:33 +00:00
|
|
|
|
|
|
|
dmask = vmrun->cpuset;
|
|
|
|
vme = vmrun->vm_exit;
|
|
|
|
|
2022-10-22 17:34:00 +00:00
|
|
|
switch (vme->u.ipi.mode) {
|
2022-09-07 07:07:03 +00:00
|
|
|
case APIC_DELMODE_INIT:
|
2023-05-24 01:13:33 +00:00
|
|
|
CPU_FOREACH_ISSET(i, dmask) {
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_suspend_cpu(vcpu_info[i].vcpu);
|
2022-09-07 07:07:03 +00:00
|
|
|
if (error) {
|
|
|
|
warnx("%s: failed to suspend cpu %d\n",
|
|
|
|
__func__, i);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case APIC_DELMODE_STARTUP:
|
2023-05-24 01:13:33 +00:00
|
|
|
CPU_FOREACH_ISSET(i, dmask) {
|
2023-03-24 18:49:06 +00:00
|
|
|
spinup_ap(vcpu_info[i].vcpu,
|
|
|
|
vme->u.ipi.vector << PAGE_SHIFT);
|
2022-09-07 07:07:03 +00:00
|
|
|
}
|
|
|
|
error = 0;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2023-06-19 19:46:02 +00:00
|
|
|
static const vmexit_handler_t handler[VM_EXITCODE_MAX] = {
|
2016-05-27 06:22:24 +00:00
|
|
|
[VM_EXITCODE_INOUT] = vmexit_inout,
|
|
|
|
[VM_EXITCODE_INOUT_STR] = vmexit_inout,
|
|
|
|
[VM_EXITCODE_VMX] = vmexit_vmx,
|
|
|
|
[VM_EXITCODE_SVM] = vmexit_svm,
|
|
|
|
[VM_EXITCODE_BOGUS] = vmexit_bogus,
|
|
|
|
[VM_EXITCODE_REQIDLE] = vmexit_reqidle,
|
|
|
|
[VM_EXITCODE_RDMSR] = vmexit_rdmsr,
|
|
|
|
[VM_EXITCODE_WRMSR] = vmexit_wrmsr,
|
|
|
|
[VM_EXITCODE_MTRAP] = vmexit_mtrap,
|
|
|
|
[VM_EXITCODE_INST_EMUL] = vmexit_inst_emul,
|
|
|
|
[VM_EXITCODE_SUSPENDED] = vmexit_suspend,
|
|
|
|
[VM_EXITCODE_TASK_SWITCH] = vmexit_task_switch,
|
2018-05-01 15:17:46 +00:00
|
|
|
[VM_EXITCODE_DEBUG] = vmexit_debug,
|
2019-12-13 19:21:58 +00:00
|
|
|
[VM_EXITCODE_BPT] = vmexit_breakpoint,
|
2022-09-07 07:07:03 +00:00
|
|
|
[VM_EXITCODE_IPI] = vmexit_ipi,
|
2023-06-19 19:46:02 +00:00
|
|
|
[VM_EXITCODE_HLT] = vmexit_hlt,
|
|
|
|
[VM_EXITCODE_PAUSE] = vmexit_pause,
|
2016-05-27 06:22:24 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
static void
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_loop(struct vmctx *ctx, struct vcpu *vcpu)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2022-12-21 18:32:45 +00:00
|
|
|
struct vm_exit vme;
|
2023-05-24 01:13:33 +00:00
|
|
|
struct vm_run vmrun;
|
2016-05-27 06:22:24 +00:00
|
|
|
int error, rc;
|
|
|
|
enum vm_exitcode exitcode;
|
2023-05-24 01:13:33 +00:00
|
|
|
cpuset_t active_cpus, dmask;
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
error = vm_active_cpus(ctx, &active_cpus);
|
2023-03-24 18:49:06 +00:00
|
|
|
assert(CPU_ISSET(vcpu_id(vcpu), &active_cpus));
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
vmrun.vm_exit = &vme;
|
|
|
|
vmrun.cpuset = &dmask;
|
|
|
|
vmrun.cpusetsize = sizeof(dmask);
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
while (1) {
|
2023-05-24 01:13:33 +00:00
|
|
|
error = vm_run(vcpu, &vmrun);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
|
2022-12-21 18:32:45 +00:00
|
|
|
exitcode = vme.exitcode;
|
2016-05-27 06:22:24 +00:00
|
|
|
if (exitcode >= VM_EXITCODE_MAX || handler[exitcode] == NULL) {
|
|
|
|
fprintf(stderr, "vm_loop: unexpected exitcode 0x%x\n",
|
|
|
|
exitcode);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2023-05-24 01:13:33 +00:00
|
|
|
rc = (*handler[exitcode])(ctx, vcpu, &vmrun);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
switch (rc) {
|
|
|
|
case VMEXIT_CONTINUE:
|
|
|
|
break;
|
|
|
|
case VMEXIT_ABORT:
|
|
|
|
abort();
|
|
|
|
default:
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
fprintf(stderr, "vm_run error %d, errno %d\n", error, errno);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2023-03-24 18:49:06 +00:00
|
|
|
num_vcpus_allowed(struct vmctx *ctx, struct vcpu *vcpu)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
2022-03-09 23:39:23 +00:00
|
|
|
uint16_t sockets, cores, threads, maxcpus;
|
2016-05-27 06:22:24 +00:00
|
|
|
int tmp, error;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The guest is allowed to spinup more than one processor only if the
|
|
|
|
* UNRESTRICTED_GUEST capability is available.
|
|
|
|
*/
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_get_capability(vcpu, VM_CAP_UNRESTRICTED_GUEST, &tmp);
|
2022-03-09 23:39:23 +00:00
|
|
|
if (error != 0)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
error = vm_get_topology(ctx, &sockets, &cores, &threads, &maxcpus);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (error == 0)
|
2022-03-09 23:39:23 +00:00
|
|
|
return (maxcpus);
|
2016-05-27 06:22:24 +00:00
|
|
|
else
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
2022-12-21 18:31:16 +00:00
|
|
|
static void
|
2023-06-19 19:46:02 +00:00
|
|
|
fbsdrun_set_capabilities(struct vcpu *vcpu)
|
2016-05-27 06:22:24 +00:00
|
|
|
{
|
|
|
|
int err, tmp;
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("x86.vmexit_on_hlt", false)) {
|
2023-03-24 18:49:06 +00:00
|
|
|
err = vm_get_capability(vcpu, VM_CAP_HALT_EXIT, &tmp);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (err < 0) {
|
|
|
|
fprintf(stderr, "VM exit on HLT not supported\n");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_set_capability(vcpu, VM_CAP_HALT_EXIT, 1);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("x86.vmexit_on_pause", false)) {
|
2016-05-27 06:22:24 +00:00
|
|
|
/*
|
|
|
|
* pause exit support required for this mode
|
|
|
|
*/
|
2023-03-24 18:49:06 +00:00
|
|
|
err = vm_get_capability(vcpu, VM_CAP_PAUSE_EXIT, &tmp);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (err < 0) {
|
|
|
|
fprintf(stderr,
|
|
|
|
"SMP mux requested, no pause support\n");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_set_capability(vcpu, VM_CAP_PAUSE_EXIT, 1);
|
2023-07-17 15:11:20 +00:00
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("x86.x2apic", false))
|
2023-03-24 18:49:06 +00:00
|
|
|
err = vm_set_x2apic_state(vcpu, X2APIC_ENABLED);
|
2016-05-27 06:22:24 +00:00
|
|
|
else
|
2023-03-24 18:49:06 +00:00
|
|
|
err = vm_set_x2apic_state(vcpu, X2APIC_DISABLED);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
if (err) {
|
|
|
|
fprintf(stderr, "Unable to set x2apic state (%d)\n", err);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_set_capability(vcpu, VM_CAP_ENABLE_INVPCID, 1);
|
2022-12-21 18:31:16 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
err = vm_set_capability(vcpu, VM_CAP_IPI_EXIT, 1);
|
2022-12-21 18:31:16 +00:00
|
|
|
assert(err == 0);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct vmctx *
|
|
|
|
do_open(const char *vmname)
|
|
|
|
{
|
|
|
|
struct vmctx *ctx;
|
|
|
|
int error;
|
|
|
|
bool reinit, romboot;
|
|
|
|
|
|
|
|
reinit = romboot = false;
|
|
|
|
|
|
|
|
if (lpc_bootrom())
|
|
|
|
romboot = true;
|
|
|
|
|
|
|
|
error = vm_create(vmname);
|
|
|
|
if (error) {
|
|
|
|
if (errno == EEXIST) {
|
|
|
|
if (romboot) {
|
|
|
|
reinit = true;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* The virtual machine has been setup by the
|
|
|
|
* userspace bootloader.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
perror("vm_create");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (!romboot) {
|
|
|
|
/*
|
|
|
|
* If the virtual machine was just created then a
|
|
|
|
* bootrom must be configured to boot it.
|
|
|
|
*/
|
|
|
|
fprintf(stderr, "virtual machine cannot be booted\n");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ctx = vm_open(vmname);
|
|
|
|
if (ctx == NULL) {
|
|
|
|
perror("vm_open");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2017-02-14 13:35:59 +00:00
|
|
|
#ifndef WITHOUT_CAPSICUM
|
2022-10-24 21:32:04 +00:00
|
|
|
if (vm_limit_rights(ctx) != 0)
|
|
|
|
err(EX_OSERR, "vm_limit_rights");
|
2017-02-14 13:35:59 +00:00
|
|
|
#endif
|
2021-12-26 07:52:38 +00:00
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
if (reinit) {
|
|
|
|
error = vm_reinit(ctx);
|
|
|
|
if (error) {
|
|
|
|
perror("vm_reinit");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
}
|
2023-06-12 10:47:35 +00:00
|
|
|
error = vm_set_topology(ctx, cpu_sockets, cpu_cores, cpu_threads, 0);
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
if (error)
|
|
|
|
errx(EX_OSERR, "vm_set_topology");
|
2016-05-27 06:22:24 +00:00
|
|
|
return (ctx);
|
|
|
|
}
|
|
|
|
|
2022-09-07 07:05:36 +00:00
|
|
|
static void
|
2023-03-24 18:49:06 +00:00
|
|
|
spinup_vcpu(struct vcpu_info *vi, bool bsp)
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
if (!bsp) {
|
2023-06-19 19:46:02 +00:00
|
|
|
fbsdrun_set_capabilities(vi->vcpu);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
|
2022-12-21 18:31:16 +00:00
|
|
|
/*
|
|
|
|
* Enable the 'unrestricted guest' mode for APs.
|
|
|
|
*
|
|
|
|
* APs startup in power-on 16-bit mode.
|
|
|
|
*/
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vm_set_capability(vi->vcpu, VM_CAP_UNRESTRICTED_GUEST, 1);
|
2022-12-21 18:31:16 +00:00
|
|
|
assert(error == 0);
|
|
|
|
}
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
fbsdrun_addcpu(vi);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
static bool
|
|
|
|
parse_config_option(const char *option)
|
|
|
|
{
|
|
|
|
const char *value;
|
|
|
|
char *path;
|
|
|
|
|
|
|
|
value = strchr(option, '=');
|
|
|
|
if (value == NULL || value[1] == '\0')
|
|
|
|
return (false);
|
|
|
|
path = strndup(option, value - option);
|
|
|
|
if (path == NULL)
|
|
|
|
err(4, "Failed to allocate memory");
|
|
|
|
set_config_value(path, value + 1);
|
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
parse_simple_config_file(const char *path)
|
|
|
|
{
|
|
|
|
FILE *fp;
|
|
|
|
char *line, *cp;
|
|
|
|
size_t linecap;
|
|
|
|
unsigned int lineno;
|
|
|
|
|
|
|
|
fp = fopen(path, "r");
|
|
|
|
if (fp == NULL)
|
|
|
|
err(4, "Failed to open configuration file %s", path);
|
|
|
|
line = NULL;
|
|
|
|
linecap = 0;
|
|
|
|
lineno = 1;
|
|
|
|
for (lineno = 1; getline(&line, &linecap, fp) > 0; lineno++) {
|
|
|
|
if (*line == '#' || *line == '\n')
|
|
|
|
continue;
|
|
|
|
cp = strchr(line, '\n');
|
|
|
|
if (cp != NULL)
|
|
|
|
*cp = '\0';
|
|
|
|
if (!parse_config_option(line))
|
|
|
|
errx(4, "%s line %u: invalid config option '%s'", path,
|
|
|
|
lineno, line);
|
|
|
|
}
|
|
|
|
free(line);
|
|
|
|
fclose(fp);
|
|
|
|
}
|
|
|
|
|
2021-08-19 17:52:12 +00:00
|
|
|
static void
|
2022-09-08 23:08:10 +00:00
|
|
|
parse_gdb_options(const char *opt)
|
2021-08-19 17:52:12 +00:00
|
|
|
{
|
|
|
|
const char *sport;
|
|
|
|
char *colon;
|
|
|
|
|
2022-09-08 23:08:10 +00:00
|
|
|
if (opt[0] == 'w') {
|
2021-08-19 17:52:12 +00:00
|
|
|
set_config_bool("gdb.wait", true);
|
2022-09-08 23:08:10 +00:00
|
|
|
opt++;
|
2021-08-19 17:52:12 +00:00
|
|
|
}
|
|
|
|
|
2022-09-08 23:08:10 +00:00
|
|
|
colon = strrchr(opt, ':');
|
2021-08-19 17:52:12 +00:00
|
|
|
if (colon == NULL) {
|
2022-09-08 23:08:10 +00:00
|
|
|
sport = opt;
|
2021-08-19 17:52:12 +00:00
|
|
|
} else {
|
|
|
|
*colon = '\0';
|
|
|
|
colon++;
|
|
|
|
sport = colon;
|
2022-09-08 23:08:10 +00:00
|
|
|
set_config_value("gdb.address", opt);
|
2021-08-19 17:52:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
set_config_value("gdb.port", sport);
|
|
|
|
}
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
static void
|
|
|
|
set_defaults(void)
|
|
|
|
{
|
|
|
|
|
|
|
|
set_config_bool("acpi_tables", false);
|
2022-07-27 12:47:54 +00:00
|
|
|
set_config_bool("acpi_tables_in_memory", true);
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_value("memory.size", "256M");
|
|
|
|
set_config_bool("x86.strictmsr", true);
|
2021-08-18 07:31:59 +00:00
|
|
|
set_config_value("lpc.fwcfg", "bhyve");
|
2019-06-26 20:30:41 +00:00
|
|
|
}
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
int
|
|
|
|
main(int argc, char *argv[])
|
|
|
|
{
|
2023-03-01 07:45:46 +00:00
|
|
|
int c, error;
|
2019-06-26 20:30:41 +00:00
|
|
|
int max_vcpus, memflags;
|
2023-03-24 18:49:06 +00:00
|
|
|
struct vcpu *bsp;
|
2016-05-27 06:22:24 +00:00
|
|
|
struct vmctx *ctx;
|
2021-09-09 09:37:04 +00:00
|
|
|
struct qemu_fwcfg_item *e820_fwcfg_item;
|
2016-05-27 06:22:24 +00:00
|
|
|
size_t memsize;
|
2022-09-08 23:08:10 +00:00
|
|
|
const char *optstr, *value, *vmname;
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
char *restore_file;
|
|
|
|
struct restore_state rstate;
|
|
|
|
|
|
|
|
restore_file = NULL;
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
init_config();
|
|
|
|
set_defaults();
|
2016-05-27 06:22:24 +00:00
|
|
|
progname = basename(argv[0]);
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-06-21 06:55:34 +00:00
|
|
|
optstr = "aehuwxACDHIPSWYk:f:o:p:G:c:s:m:l:K:U:r:";
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#else
|
2021-09-08 09:31:21 +00:00
|
|
|
optstr = "aehuwxACDHIPSWYk:f:o:p:G:c:s:m:l:K:U:";
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
while ((c = getopt(argc, argv, optstr)) != -1) {
|
|
|
|
switch (c) {
|
|
|
|
case 'a':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.x2apic", false);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'A':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("acpi_tables", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
2020-06-25 12:35:20 +00:00
|
|
|
case 'D':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("destroy_on_poweroff", true);
|
2020-06-25 12:35:20 +00:00
|
|
|
break;
|
2016-05-27 06:22:24 +00:00
|
|
|
case 'p':
|
2023-07-17 15:11:20 +00:00
|
|
|
if (pincpu_parse(optarg) != 0) {
|
|
|
|
errx(EX_USAGE, "invalid vcpu pinning "
|
|
|
|
"configuration '%s'", optarg);
|
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
2023-07-17 15:11:20 +00:00
|
|
|
case 'c':
|
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
|
|
|
if (topology_parse(optarg) != 0) {
|
|
|
|
errx(EX_USAGE, "invalid cpu topology "
|
|
|
|
"'%s'", optarg);
|
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'C':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("memory.guest_in_core", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
2021-09-08 09:31:21 +00:00
|
|
|
case 'f':
|
|
|
|
if (qemu_fwcfg_parse_cmdline_arg(optarg) != 0) {
|
|
|
|
errx(EX_USAGE, "invalid fwcfg item '%s'", optarg);
|
|
|
|
}
|
|
|
|
break;
|
2018-05-01 15:17:46 +00:00
|
|
|
case 'G':
|
2021-08-19 17:52:12 +00:00
|
|
|
parse_gdb_options(optarg);
|
2019-06-26 20:30:41 +00:00
|
|
|
break;
|
|
|
|
case 'k':
|
|
|
|
parse_simple_config_file(optarg);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
2022-01-20 22:44:04 +00:00
|
|
|
case 'K':
|
|
|
|
set_config_value("keyboard.layout", optarg);
|
|
|
|
break;
|
2016-05-27 06:22:24 +00:00
|
|
|
case 'l':
|
2018-08-22 20:23:08 +00:00
|
|
|
if (strncmp(optarg, "help", strlen(optarg)) == 0) {
|
|
|
|
lpc_print_supported_devices();
|
|
|
|
exit(0);
|
|
|
|
} else if (lpc_device_parse(optarg) != 0) {
|
2016-05-27 06:22:24 +00:00
|
|
|
errx(EX_USAGE, "invalid lpc device "
|
|
|
|
"configuration '%s'", optarg);
|
|
|
|
}
|
|
|
|
break;
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
case 'r':
|
|
|
|
restore_file = optarg;
|
|
|
|
break;
|
|
|
|
#endif
|
2016-05-27 06:22:24 +00:00
|
|
|
case 's':
|
2018-08-22 20:23:08 +00:00
|
|
|
if (strncmp(optarg, "help", strlen(optarg)) == 0) {
|
|
|
|
pci_print_supported_devices();
|
|
|
|
exit(0);
|
|
|
|
} else if (pci_parse_slot(optarg) != 0)
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
else
|
|
|
|
break;
|
|
|
|
case 'S':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("memory.wired", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
2023-07-17 15:11:20 +00:00
|
|
|
case 'm':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_value("memory.size", optarg);
|
|
|
|
break;
|
|
|
|
case 'o':
|
|
|
|
if (!parse_config_option(optarg))
|
|
|
|
errx(EX_USAGE, "invalid configuration option '%s'", optarg);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'H':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.vmexit_on_hlt", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'I':
|
|
|
|
/*
|
|
|
|
* The "-I" option was used to add an ioapic to the
|
|
|
|
* virtual machine.
|
|
|
|
*
|
|
|
|
* An ioapic is now provided unconditionally for each
|
|
|
|
* virtual machine and this option is now deprecated.
|
|
|
|
*/
|
|
|
|
break;
|
|
|
|
case 'P':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.vmexit_on_pause", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'e':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.strictio", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'u':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("rtc.use_localtime", false);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'U':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_value("uuid", optarg);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'w':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.strictmsr", false);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'W':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("virtio_msix", false);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'x':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.x2apic", true);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'Y':
|
2019-06-26 20:30:41 +00:00
|
|
|
set_config_bool("x86.mptable", false);
|
2016-05-27 06:22:24 +00:00
|
|
|
break;
|
|
|
|
case 'h':
|
2021-12-26 07:52:38 +00:00
|
|
|
usage(0);
|
2016-05-27 06:22:24 +00:00
|
|
|
default:
|
|
|
|
usage(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
argc -= optind;
|
|
|
|
argv += optind;
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (argc > 1)
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
usage(1);
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
if (restore_file != NULL) {
|
|
|
|
error = load_restore_file(restore_file, &rstate);
|
|
|
|
if (error) {
|
|
|
|
fprintf(stderr, "Failed to read checkpoint info from "
|
|
|
|
"file: '%s'.\n", restore_file);
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
vmname = lookup_vmname(&rstate);
|
2019-06-26 20:30:41 +00:00
|
|
|
if (vmname != NULL)
|
|
|
|
set_config_value("name", vmname);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
}
|
2019-06-26 20:30:41 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
if (argc == 1)
|
|
|
|
set_config_value("name", argv[0]);
|
|
|
|
|
|
|
|
vmname = get_config_value("name");
|
|
|
|
if (vmname == NULL)
|
2016-05-27 06:22:24 +00:00
|
|
|
usage(1);
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("config.dump", false)) {
|
|
|
|
dump_config();
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2022-10-25 13:22:12 +00:00
|
|
|
calc_topology();
|
2019-06-26 20:30:41 +00:00
|
|
|
build_vcpumaps();
|
|
|
|
|
|
|
|
value = get_config_value("memory.size");
|
|
|
|
error = vm_parse_memsize(value, &memsize);
|
|
|
|
if (error)
|
|
|
|
errx(EX_USAGE, "invalid memsize '%s'", value);
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
ctx = do_open(vmname);
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
if (restore_file != NULL) {
|
|
|
|
guest_ncpus = lookup_guest_ncpus(&rstate);
|
|
|
|
memflags = lookup_memflags(&rstate);
|
|
|
|
memsize = lookup_memsize(&rstate);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (guest_ncpus < 1) {
|
|
|
|
fprintf(stderr, "Invalid guest vCPUs (%d)\n", guest_ncpus);
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
bsp = vm_vcpu_open(ctx, BSP);
|
|
|
|
max_vcpus = num_vcpus_allowed(ctx, bsp);
|
2016-05-27 06:22:24 +00:00
|
|
|
if (guest_ncpus > max_vcpus) {
|
|
|
|
fprintf(stderr, "%d vCPUs requested but only %d available\n",
|
|
|
|
guest_ncpus, max_vcpus);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2023-06-19 19:46:02 +00:00
|
|
|
fbsdrun_set_capabilities(bsp);
|
2023-03-24 18:49:06 +00:00
|
|
|
|
|
|
|
/* Allocate per-VCPU resources. */
|
|
|
|
vcpu_info = calloc(guest_ncpus, sizeof(*vcpu_info));
|
|
|
|
for (int vcpuid = 0; vcpuid < guest_ncpus; vcpuid++) {
|
|
|
|
vcpu_info[vcpuid].ctx = ctx;
|
|
|
|
vcpu_info[vcpuid].vcpuid = vcpuid;
|
|
|
|
if (vcpuid == BSP)
|
|
|
|
vcpu_info[vcpuid].vcpu = bsp;
|
|
|
|
else
|
|
|
|
vcpu_info[vcpuid].vcpu = vm_vcpu_open(ctx, vcpuid);
|
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
memflags = 0;
|
|
|
|
if (get_config_bool_default("memory.wired", false))
|
|
|
|
memflags |= VM_MEM_F_WIRED;
|
|
|
|
if (get_config_bool_default("memory.guest_in_core", false))
|
|
|
|
memflags |= VM_MEM_F_INCORE;
|
2016-05-27 06:22:24 +00:00
|
|
|
vm_set_memflags(ctx, memflags);
|
2023-03-01 07:45:46 +00:00
|
|
|
error = vm_setup_memory(ctx, memsize, VM_MMAP_ALL);
|
|
|
|
if (error) {
|
2016-05-27 06:22:24 +00:00
|
|
|
fprintf(stderr, "Unable to setup memory (%d)\n", errno);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
error = init_msr();
|
|
|
|
if (error) {
|
|
|
|
fprintf(stderr, "init_msr error %d", error);
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
2022-03-09 23:38:49 +00:00
|
|
|
init_mem(guest_ncpus);
|
2016-05-27 06:22:24 +00:00
|
|
|
init_inout();
|
2020-05-15 15:54:22 +00:00
|
|
|
kernemu_dev_init();
|
2020-04-15 01:58:51 +00:00
|
|
|
init_bootrom(ctx);
|
Initial bhyve native graphics support.
This adds emulations for a raw framebuffer device, PS2 keyboard/mouse,
XHCI USB controller and a USB tablet.
A simple VNC server is provided for keyboard/mouse input, and graphics
output.
A VGA emulation is included, but is currently disconnected until an
additional bhyve change to block out VGA memory is committed.
Credits:
- raw framebuffer, VNC server, XHCI controller, USB bus/device emulation
and UEFI f/w support by Leon Dang
- VGA, console/g, initial VNC server by tychon@
- PS2 keyboard/mouse jointly done by tychon@ and Leon Dang
- hypervisor framebuffer mem support by neel@
Tested by: Michael Dexter, in a number of revisions of this code.
With the appropriate UEFI image, FreeBSD, Windows and Linux guests can
installed and run in graphics mode using the UEFI/GOP framebuffer.
2016-05-27 06:30:35 +00:00
|
|
|
atkbdc_init(ctx);
|
2016-05-27 06:22:24 +00:00
|
|
|
pci_irq_init(ctx);
|
|
|
|
ioapic_init(ctx);
|
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
rtc_init(ctx);
|
2016-05-27 06:22:24 +00:00
|
|
|
sci_init(ctx);
|
|
|
|
|
2021-08-18 07:31:59 +00:00
|
|
|
if (qemu_fwcfg_init(ctx) != 0) {
|
|
|
|
fprintf(stderr, "qemu fwcfg initialization error");
|
|
|
|
exit(4);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (qemu_fwcfg_add_file("opt/bhyve/hw.ncpu", sizeof(guest_ncpus),
|
|
|
|
&guest_ncpus) != 0) {
|
|
|
|
fprintf(stderr, "Could not add qemu fwcfg opt/bhyve/hw.ncpu");
|
|
|
|
exit(4);
|
|
|
|
}
|
|
|
|
|
2021-09-09 09:37:04 +00:00
|
|
|
if (e820_init(ctx) != 0) {
|
|
|
|
fprintf(stderr, "Unable to setup E820");
|
|
|
|
exit(4);
|
|
|
|
}
|
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
/*
|
2023-06-01 23:41:31 +00:00
|
|
|
* Exit if a device emulation finds an error in its initialization
|
2016-05-27 06:22:24 +00:00
|
|
|
*/
|
2018-07-11 03:23:09 +00:00
|
|
|
if (init_pci(ctx) != 0) {
|
|
|
|
perror("device emulation initialization error");
|
|
|
|
exit(4);
|
|
|
|
}
|
2021-10-07 14:20:37 +00:00
|
|
|
if (init_tpm(ctx) != 0) {
|
|
|
|
fprintf(stderr, "Failed to init TPM device");
|
|
|
|
exit(4);
|
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2020-04-15 02:00:17 +00:00
|
|
|
/*
|
|
|
|
* Initialize after PCI, to allow a bootrom file to reserve the high
|
|
|
|
* region.
|
|
|
|
*/
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool("acpi_tables"))
|
2020-04-15 02:00:17 +00:00
|
|
|
vmgenc_init(ctx);
|
|
|
|
|
2021-08-19 17:52:12 +00:00
|
|
|
init_gdb(ctx);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
|
|
|
if (lpc_bootrom()) {
|
2023-03-24 18:49:06 +00:00
|
|
|
if (vm_set_capability(bsp, VM_CAP_UNRESTRICTED_GUEST, 1)) {
|
2016-05-27 06:22:24 +00:00
|
|
|
fprintf(stderr, "ROM boot failed: unrestricted guest "
|
|
|
|
"capability not available\n");
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
2023-03-24 18:49:06 +00:00
|
|
|
error = vcpu_reset(bsp);
|
2016-05-27 06:22:24 +00:00
|
|
|
assert(error == 0);
|
|
|
|
}
|
|
|
|
|
2023-02-28 10:28:40 +00:00
|
|
|
/*
|
|
|
|
* Add all vCPUs.
|
|
|
|
*/
|
2023-03-24 18:49:06 +00:00
|
|
|
for (int vcpuid = 0; vcpuid < guest_ncpus; vcpuid++)
|
|
|
|
spinup_vcpu(&vcpu_info[vcpuid], vcpuid == BSP);
|
2023-02-28 10:28:40 +00:00
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-06-21 06:55:34 +00:00
|
|
|
if (restore_file != NULL) {
|
|
|
|
fprintf(stdout, "Pausing pci devs...\r\n");
|
2023-05-15 14:28:14 +00:00
|
|
|
if (vm_pause_devices() != 0) {
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
fprintf(stderr, "Failed to pause PCI device state.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
2023-06-21 06:55:34 +00:00
|
|
|
fprintf(stdout, "Restoring vm mem...\r\n");
|
|
|
|
if (restore_vm_mem(ctx, &rstate) != 0) {
|
|
|
|
fprintf(stderr, "Failed to restore VM memory.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
2023-06-19 06:46:28 +00:00
|
|
|
|
2023-06-21 06:55:34 +00:00
|
|
|
fprintf(stdout, "Restoring pci devs...\r\n");
|
|
|
|
if (vm_restore_devices(&rstate) != 0) {
|
|
|
|
fprintf(stderr, "Failed to restore PCI device state.\n");
|
|
|
|
exit(1);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
}
|
|
|
|
|
2023-06-21 06:55:34 +00:00
|
|
|
fprintf(stdout, "Restoring kernel structs...\r\n");
|
|
|
|
if (vm_restore_kern_structs(ctx, &rstate) != 0) {
|
|
|
|
fprintf(stderr, "Failed to restore kernel structs.\n");
|
|
|
|
exit(1);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
}
|
|
|
|
|
2023-06-21 06:55:34 +00:00
|
|
|
fprintf(stdout, "Resuming pci devs...\r\n");
|
2023-05-15 14:28:14 +00:00
|
|
|
if (vm_resume_devices() != 0) {
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
fprintf(stderr, "Failed to resume PCI device state.\n");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
}
|
2023-06-21 06:55:34 +00:00
|
|
|
#endif
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
/*
|
|
|
|
* build the guest tables, MP etc.
|
|
|
|
*/
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool_default("x86.mptable", true)) {
|
2016-05-27 06:22:24 +00:00
|
|
|
error = mptable_build(ctx, guest_ncpus);
|
2018-07-11 03:23:09 +00:00
|
|
|
if (error) {
|
|
|
|
perror("error to build the guest tables");
|
|
|
|
exit(4);
|
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
error = smbios_build(ctx);
|
2022-06-16 20:17:44 +00:00
|
|
|
if (error != 0)
|
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2019-06-26 20:30:41 +00:00
|
|
|
if (get_config_bool("acpi_tables")) {
|
2016-05-27 06:22:24 +00:00
|
|
|
error = acpi_build(ctx, guest_ncpus);
|
|
|
|
assert(error == 0);
|
|
|
|
}
|
|
|
|
|
2021-09-09 09:37:04 +00:00
|
|
|
e820_fwcfg_item = e820_get_fwcfg_item();
|
|
|
|
if (e820_fwcfg_item == NULL) {
|
|
|
|
fprintf(stderr, "invalid e820 table");
|
|
|
|
exit(4);
|
|
|
|
}
|
|
|
|
if (qemu_fwcfg_add_file("etc/e820", e820_fwcfg_item->size,
|
|
|
|
e820_fwcfg_item->data) != 0) {
|
|
|
|
fprintf(stderr, "could not add qemu fwcfg etc/e820");
|
|
|
|
exit(4);
|
|
|
|
}
|
|
|
|
free(e820_fwcfg_item);
|
|
|
|
|
2021-08-18 07:31:59 +00:00
|
|
|
if (lpc_bootrom() && strcmp(lpc_fwcfg(), "bhyve") == 0) {
|
2016-05-27 06:22:24 +00:00
|
|
|
fwctl_init();
|
2021-08-18 07:31:59 +00:00
|
|
|
}
|
2016-05-27 06:22:24 +00:00
|
|
|
|
2018-08-02 21:54:34 +00:00
|
|
|
/*
|
|
|
|
* Change the proc title to include the VM name.
|
|
|
|
*/
|
|
|
|
setproctitle("%s", vmname);
|
|
|
|
|
2023-03-06 12:35:21 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
/* initialize mutex/cond variables */
|
|
|
|
init_snapshot();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* checkpointing thread for communication with bhyvectl
|
|
|
|
*/
|
|
|
|
if (init_checkpoint_thread(ctx) != 0)
|
|
|
|
errx(EX_OSERR, "Failed to start checkpoint thread");
|
|
|
|
#endif
|
|
|
|
|
2017-02-14 13:35:59 +00:00
|
|
|
#ifndef WITHOUT_CAPSICUM
|
|
|
|
caph_cache_catpages();
|
|
|
|
|
|
|
|
if (caph_limit_stdout() == -1 || caph_limit_stderr() == -1)
|
|
|
|
errx(EX_OSERR, "Unable to apply rights for sandbox");
|
|
|
|
|
2018-06-19 23:43:14 +00:00
|
|
|
if (caph_enter() == -1)
|
2017-02-14 13:35:59 +00:00
|
|
|
errx(EX_OSERR, "cap_enter() failed");
|
|
|
|
#endif
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
2023-06-21 06:55:34 +00:00
|
|
|
if (restore_file != NULL) {
|
2023-02-28 10:28:40 +00:00
|
|
|
destroy_restore_state(&rstate);
|
2023-03-06 12:30:54 +00:00
|
|
|
if (vm_restore_time(ctx) < 0)
|
|
|
|
err(EX_OSERR, "Unable to restore time");
|
2022-03-09 23:39:08 +00:00
|
|
|
|
2023-03-24 18:49:06 +00:00
|
|
|
for (int vcpuid = 0; vcpuid < guest_ncpus; vcpuid++)
|
|
|
|
vm_resume_cpu(vcpu_info[vcpuid].vcpu);
|
|
|
|
} else
|
2023-02-28 10:28:40 +00:00
|
|
|
#endif
|
2023-03-24 18:49:06 +00:00
|
|
|
vm_resume_cpu(bsp);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
|
2016-05-27 06:22:24 +00:00
|
|
|
/*
|
|
|
|
* Head off to the main event dispatch loop
|
|
|
|
*/
|
|
|
|
mevent_dispatch();
|
|
|
|
|
2018-07-11 03:23:09 +00:00
|
|
|
exit(4);
|
2016-05-27 06:22:24 +00:00
|
|
|
}
|