2011-05-13 04:54:01 +00:00
|
|
|
/*-
|
2017-11-27 15:37:16 +00:00
|
|
|
* SPDX-License-Identifier: BSD-2-Clause-FreeBSD
|
|
|
|
*
|
2011-05-13 04:54:01 +00:00
|
|
|
* Copyright (c) 2011 NetApp, Inc.
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
|
|
|
* $FreeBSD$
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/linker_set.h>
|
|
|
|
#include <sys/select.h>
|
|
|
|
#include <sys/uio.h>
|
|
|
|
#include <sys/ioctl.h>
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#include <machine/vmm_snapshot.h>
|
2013-07-04 05:35:56 +00:00
|
|
|
#include <net/ethernet.h>
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
#include <net/if.h> /* IFNAMSIZ */
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2017-02-14 13:35:59 +00:00
|
|
|
#include <err.h>
|
2011-05-13 04:54:01 +00:00
|
|
|
#include <errno.h>
|
|
|
|
#include <fcntl.h>
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <stdint.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <strings.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <assert.h>
|
|
|
|
#include <pthread.h>
|
2013-04-26 05:13:48 +00:00
|
|
|
#include <pthread_np.h>
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2012-12-13 01:58:11 +00:00
|
|
|
#include "bhyverun.h"
|
2019-06-26 13:30:41 -07:00
|
|
|
#include "config.h"
|
2020-01-08 22:55:22 +00:00
|
|
|
#include "debug.h"
|
2011-05-13 04:54:01 +00:00
|
|
|
#include "pci_emul.h"
|
|
|
|
#include "mevent.h"
|
|
|
|
#include "virtio.h"
|
2019-06-13 17:39:32 +00:00
|
|
|
#include "net_utils.h"
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
#include "net_backends.h"
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
#include "iov.h"
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2013-04-26 05:13:48 +00:00
|
|
|
#define VTNET_RINGSZ 1024
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2015-10-02 02:09:50 +00:00
|
|
|
#define VTNET_MAXSEGS 256
|
2011-05-13 04:54:01 +00:00
|
|
|
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
#define VTNET_MAX_PKT_LEN (65536 + 64)
|
|
|
|
|
2020-04-07 17:06:33 +00:00
|
|
|
#define VTNET_MIN_MTU ETHERMIN
|
|
|
|
#define VTNET_MAX_MTU 65535
|
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
#define VTNET_S_HOSTCAPS \
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
( VIRTIO_NET_F_MAC | VIRTIO_NET_F_STATUS | \
|
2015-10-02 02:09:50 +00:00
|
|
|
VIRTIO_F_NOTIFY_ON_EMPTY | VIRTIO_RING_F_INDIRECT_DESC)
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
/*
|
|
|
|
* PCI config-space "registers"
|
|
|
|
*/
|
|
|
|
struct virtio_net_config {
|
|
|
|
uint8_t mac[6];
|
|
|
|
uint16_t status;
|
2020-04-07 17:06:33 +00:00
|
|
|
uint16_t max_virtqueue_pairs;
|
|
|
|
uint16_t mtu;
|
2013-07-17 23:37:33 +00:00
|
|
|
} __packed;
|
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
/*
|
|
|
|
* Queue definitions.
|
|
|
|
*/
|
|
|
|
#define VTNET_RXQ 0
|
|
|
|
#define VTNET_TXQ 1
|
2013-07-17 23:37:33 +00:00
|
|
|
#define VTNET_CTLQ 2 /* NB: not yet supported */
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
#define VTNET_MAXQ 3
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Debug printf
|
|
|
|
*/
|
|
|
|
static int pci_vtnet_debug;
|
2020-01-08 22:55:22 +00:00
|
|
|
#define DPRINTF(params) if (pci_vtnet_debug) PRINTLN params
|
|
|
|
#define WPRINTF(params) PRINTLN params
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Per-device softc
|
|
|
|
*/
|
|
|
|
struct pci_vtnet_softc {
|
2013-07-17 23:37:33 +00:00
|
|
|
struct virtio_softc vsc_vs;
|
|
|
|
struct vqueue_info vsc_queues[VTNET_MAXQ - 1];
|
2011-05-13 04:54:01 +00:00
|
|
|
pthread_mutex_t vsc_mtx;
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
net_backend_t *vsc_be;
|
2016-01-09 03:08:21 +00:00
|
|
|
|
2020-12-17 16:52:40 +00:00
|
|
|
bool features_negotiated; /* protected by rx_mtx */
|
|
|
|
|
2019-06-09 12:41:21 +00:00
|
|
|
int resetting; /* protected by tx_mtx */
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2014-09-09 22:35:02 +00:00
|
|
|
uint64_t vsc_features; /* negotiated features */
|
2021-12-26 09:52:38 +02:00
|
|
|
|
2013-04-30 00:36:16 +00:00
|
|
|
pthread_mutex_t rx_mtx;
|
2014-09-09 22:35:02 +00:00
|
|
|
int rx_merge; /* merged rx bufs in use */
|
2013-04-30 00:36:16 +00:00
|
|
|
|
2013-04-26 05:13:48 +00:00
|
|
|
pthread_t tx_tid;
|
|
|
|
pthread_mutex_t tx_mtx;
|
|
|
|
pthread_cond_t tx_cond;
|
2013-04-30 00:36:16 +00:00
|
|
|
int tx_in_progress;
|
2016-01-09 03:08:21 +00:00
|
|
|
|
2020-02-12 22:44:18 +00:00
|
|
|
size_t vhdrlen;
|
|
|
|
size_t be_vhdrlen;
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
struct virtio_net_config vsc_config;
|
|
|
|
struct virtio_consts vsc_consts;
|
2011-05-13 04:54:01 +00:00
|
|
|
};
|
2012-12-12 19:45:36 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
static void pci_vtnet_reset(void *);
|
|
|
|
/* static void pci_vtnet_notify(void *, struct vqueue_info *); */
|
|
|
|
static int pci_vtnet_cfgread(void *, int, int, uint32_t *);
|
|
|
|
static int pci_vtnet_cfgwrite(void *, int, int, uint32_t);
|
2014-09-09 22:35:02 +00:00
|
|
|
static void pci_vtnet_neg_features(void *, uint64_t);
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
static void pci_vtnet_pause(void *);
|
|
|
|
static void pci_vtnet_resume(void *);
|
|
|
|
static int pci_vtnet_snapshot(void *, struct vm_snapshot_meta *);
|
|
|
|
#endif
|
2013-07-17 23:37:33 +00:00
|
|
|
|
|
|
|
static struct virtio_consts vtnet_vi_consts = {
|
|
|
|
"vtnet", /* our name */
|
|
|
|
VTNET_MAXQ - 1, /* we currently support 2 virtqueues */
|
|
|
|
sizeof(struct virtio_net_config), /* config reg size */
|
|
|
|
pci_vtnet_reset, /* reset */
|
|
|
|
NULL, /* device-wide qnotify -- not used */
|
|
|
|
pci_vtnet_cfgread, /* read PCI config */
|
|
|
|
pci_vtnet_cfgwrite, /* write PCI config */
|
2014-09-09 22:35:02 +00:00
|
|
|
pci_vtnet_neg_features, /* apply negotiated features */
|
2013-07-17 23:37:33 +00:00
|
|
|
VTNET_S_HOSTCAPS, /* our capabilities */
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
pci_vtnet_pause, /* pause rx/tx threads */
|
|
|
|
pci_vtnet_resume, /* resume rx/tx threads */
|
|
|
|
pci_vtnet_snapshot, /* save / restore device state */
|
|
|
|
#endif
|
2013-07-17 23:37:33 +00:00
|
|
|
};
|
2012-12-12 19:45:36 +00:00
|
|
|
|
2013-04-30 00:36:16 +00:00
|
|
|
static void
|
2019-06-09 12:41:21 +00:00
|
|
|
pci_vtnet_reset(void *vsc)
|
2013-04-30 00:36:16 +00:00
|
|
|
{
|
2019-06-09 12:41:21 +00:00
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
2020-01-08 22:55:22 +00:00
|
|
|
DPRINTF(("vtnet: device reset requested !"));
|
2019-06-09 12:41:21 +00:00
|
|
|
|
|
|
|
/* Acquire the RX lock to block RX processing. */
|
|
|
|
pthread_mutex_lock(&sc->rx_mtx);
|
2013-04-30 00:36:16 +00:00
|
|
|
|
2019-11-19 21:10:44 +00:00
|
|
|
/*
|
|
|
|
* Make sure receive operation is disabled at least until we
|
|
|
|
* re-negotiate the features, since receive operation depends
|
|
|
|
* on the value of sc->rx_merge and the header length, which
|
|
|
|
* are both set in pci_vtnet_neg_features().
|
|
|
|
* Receive operation will be enabled again once the guest adds
|
|
|
|
* the first receive buffers and kicks us.
|
|
|
|
*/
|
2020-12-17 16:52:40 +00:00
|
|
|
sc->features_negotiated = false;
|
2019-11-19 21:10:44 +00:00
|
|
|
netbe_rx_disable(sc->vsc_be);
|
|
|
|
|
2019-06-09 12:41:21 +00:00
|
|
|
/* Set sc->resetting and give a chance to the TX thread to stop. */
|
2013-04-30 00:36:16 +00:00
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
2019-06-09 12:41:21 +00:00
|
|
|
sc->resetting = 1;
|
2013-04-30 00:36:16 +00:00
|
|
|
while (sc->tx_in_progress) {
|
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
|
|
|
usleep(10000);
|
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2019-06-09 12:41:21 +00:00
|
|
|
/*
|
|
|
|
* Now reset rings, MSI-X vectors, and negotiated capabilities.
|
|
|
|
* Do that with the TX lock held, since we need to reset
|
|
|
|
* sc->resetting.
|
|
|
|
*/
|
2013-07-17 23:37:33 +00:00
|
|
|
vi_reset_dev(&sc->vsc_vs);
|
2013-05-03 01:16:18 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
sc->resetting = 0;
|
2019-06-09 12:41:21 +00:00
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
2013-05-03 01:16:18 +00:00
|
|
|
}
|
|
|
|
|
2020-02-12 22:44:18 +00:00
|
|
|
static __inline struct iovec *
|
|
|
|
iov_trim_hdr(struct iovec *iov, int *iovcnt, unsigned int hlen)
|
|
|
|
{
|
|
|
|
struct iovec *riov;
|
|
|
|
|
|
|
|
if (iov[0].iov_len < hlen) {
|
|
|
|
/*
|
|
|
|
* Not enough header space in the first fragment.
|
|
|
|
* That's not ok for us.
|
|
|
|
*/
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
iov[0].iov_len -= hlen;
|
|
|
|
if (iov[0].iov_len == 0) {
|
|
|
|
*iovcnt -= 1;
|
|
|
|
if (*iovcnt == 0) {
|
|
|
|
/*
|
|
|
|
* Only space for the header. That's not
|
|
|
|
* enough for us.
|
|
|
|
*/
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
riov = &iov[1];
|
|
|
|
} else {
|
|
|
|
iov[0].iov_base = (void *)((uintptr_t)iov[0].iov_base + hlen);
|
|
|
|
riov = &iov[0];
|
|
|
|
}
|
|
|
|
|
|
|
|
return (riov);
|
|
|
|
}
|
|
|
|
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
struct virtio_mrg_rxbuf_info {
|
|
|
|
uint16_t idx;
|
|
|
|
uint16_t pad;
|
|
|
|
uint32_t len;
|
|
|
|
};
|
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
static void
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
pci_vtnet_rx(struct pci_vtnet_softc *sc)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
2020-02-12 22:44:18 +00:00
|
|
|
int prepend_hdr_len = sc->vhdrlen - sc->be_vhdrlen;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
struct virtio_mrg_rxbuf_info info[VTNET_MAXSEGS];
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
struct iovec iov[VTNET_MAXSEGS + 1];
|
2013-07-17 23:37:33 +00:00
|
|
|
struct vqueue_info *vq;
|
2021-03-30 16:43:24 +08:00
|
|
|
struct vi_req req;
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
vq = &sc->vsc_queues[VTNET_RXQ];
|
2020-12-17 16:52:40 +00:00
|
|
|
|
|
|
|
/* Features must be negotiated */
|
|
|
|
if (!sc->features_negotiated) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-11-03 19:02:32 +00:00
|
|
|
for (;;) {
|
2020-02-12 22:44:18 +00:00
|
|
|
struct virtio_net_rxhdr *hdr;
|
2020-02-20 21:07:23 +00:00
|
|
|
uint32_t riov_bytes;
|
|
|
|
struct iovec *riov;
|
|
|
|
uint32_t ulen;
|
|
|
|
int riov_len;
|
|
|
|
int n_chains;
|
|
|
|
ssize_t rlen;
|
|
|
|
ssize_t plen;
|
|
|
|
|
|
|
|
plen = netbe_peek_recvlen(sc->vsc_be);
|
|
|
|
if (plen <= 0) {
|
|
|
|
/*
|
|
|
|
* No more packets (plen == 0), or backend errored
|
|
|
|
* (plen < 0). Interrupt if needed and stop.
|
|
|
|
*/
|
|
|
|
vq_endchains(vq, /*used_all_avail=*/0);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
plen += prepend_hdr_len;
|
2020-02-12 22:44:18 +00:00
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
/*
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
* Get a descriptor chain to store the next ingress
|
|
|
|
* packet. In case of mergeable rx buffers, get as
|
|
|
|
* many chains as necessary in order to make room
|
2020-02-20 21:07:23 +00:00
|
|
|
* for plen bytes.
|
2011-05-13 04:54:01 +00:00
|
|
|
*/
|
2020-02-12 22:44:18 +00:00
|
|
|
riov_bytes = 0;
|
|
|
|
riov_len = 0;
|
|
|
|
riov = iov;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
n_chains = 0;
|
|
|
|
do {
|
2021-03-30 16:43:24 +08:00
|
|
|
int n = vq_getchain(vq, riov, VTNET_MAXSEGS - riov_len,
|
|
|
|
&req);
|
|
|
|
info[n_chains].idx = req.idx;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
|
|
|
|
if (n == 0) {
|
2019-11-03 19:02:32 +00:00
|
|
|
/*
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
* No rx buffers. Enable RX kicks and double
|
|
|
|
* check.
|
2019-11-03 19:02:32 +00:00
|
|
|
*/
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
vq_kick_enable(vq);
|
|
|
|
if (!vq_has_descs(vq)) {
|
|
|
|
/*
|
|
|
|
* Still no buffers. Return the unused
|
|
|
|
* chains (if any), interrupt if needed
|
|
|
|
* (including for NOTIFY_ON_EMPTY), and
|
|
|
|
* disable the backend until the next
|
|
|
|
* kick.
|
|
|
|
*/
|
|
|
|
vq_retchains(vq, n_chains);
|
|
|
|
vq_endchains(vq, /*used_all_avail=*/1);
|
|
|
|
netbe_rx_disable(sc->vsc_be);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* More rx buffers found, so keep going. */
|
|
|
|
vq_kick_disable(vq);
|
|
|
|
continue;
|
2019-11-03 19:02:32 +00:00
|
|
|
}
|
2020-02-12 22:44:18 +00:00
|
|
|
assert(n >= 1 && riov_len + n <= VTNET_MAXSEGS);
|
|
|
|
riov_len += n;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
if (!sc->rx_merge) {
|
|
|
|
n_chains = 1;
|
|
|
|
break;
|
|
|
|
}
|
2020-02-12 22:44:18 +00:00
|
|
|
info[n_chains].len = (uint32_t)count_iov(riov, n);
|
|
|
|
riov_bytes += info[n_chains].len;
|
|
|
|
riov += n;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
n_chains++;
|
2020-02-20 21:07:23 +00:00
|
|
|
} while (riov_bytes < plen && riov_len < VTNET_MAXSEGS);
|
2020-02-12 22:44:18 +00:00
|
|
|
|
|
|
|
riov = iov;
|
|
|
|
hdr = riov[0].iov_base;
|
|
|
|
if (prepend_hdr_len > 0) {
|
|
|
|
/*
|
|
|
|
* The frontend uses a virtio-net header, but the
|
|
|
|
* backend does not. We need to prepend a zeroed
|
|
|
|
* header.
|
|
|
|
*/
|
|
|
|
riov = iov_trim_hdr(riov, &riov_len, prepend_hdr_len);
|
|
|
|
if (riov == NULL) {
|
|
|
|
/*
|
|
|
|
* The first collected chain is nonsensical,
|
|
|
|
* as it is not even enough to store the
|
|
|
|
* virtio-net header. Just drop it.
|
|
|
|
*/
|
|
|
|
vq_relchain(vq, info[0].idx, 0);
|
|
|
|
vq_retchains(vq, n_chains - 1);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
memset(hdr, 0, prepend_hdr_len);
|
|
|
|
}
|
2019-11-03 19:02:32 +00:00
|
|
|
|
2020-02-20 21:07:23 +00:00
|
|
|
rlen = netbe_recv(sc->vsc_be, riov, riov_len);
|
|
|
|
if (rlen != plen - prepend_hdr_len) {
|
2013-07-17 23:37:33 +00:00
|
|
|
/*
|
2020-02-20 21:07:23 +00:00
|
|
|
* If this happens it means there is something
|
|
|
|
* wrong with the backend (e.g., some other
|
|
|
|
* process is stealing our packets).
|
2013-07-17 23:37:33 +00:00
|
|
|
*/
|
2020-02-20 21:07:23 +00:00
|
|
|
WPRINTF(("netbe_recv: expected %zd bytes, "
|
|
|
|
"got %zd", plen - prepend_hdr_len, rlen));
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
vq_retchains(vq, n_chains);
|
2020-02-20 21:07:23 +00:00
|
|
|
continue;
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2020-02-20 21:07:23 +00:00
|
|
|
ulen = (uint32_t)plen;
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
|
2020-02-12 22:44:18 +00:00
|
|
|
/*
|
|
|
|
* Publish the used buffers to the guest, reporting the
|
|
|
|
* number of bytes that we wrote.
|
|
|
|
*/
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
if (!sc->rx_merge) {
|
|
|
|
vq_relchain(vq, info[0].idx, ulen);
|
|
|
|
} else {
|
|
|
|
uint32_t iolen;
|
|
|
|
int i = 0;
|
|
|
|
|
|
|
|
do {
|
|
|
|
iolen = info[i].len;
|
|
|
|
if (iolen > ulen) {
|
|
|
|
iolen = ulen;
|
|
|
|
}
|
|
|
|
vq_relchain_prepare(vq, info[i].idx, iolen);
|
|
|
|
ulen -= iolen;
|
|
|
|
i++;
|
|
|
|
} while (ulen > 0);
|
|
|
|
|
|
|
|
hdr->vrh_bufs = i;
|
|
|
|
vq_relchain_publish(vq);
|
2020-02-20 21:07:23 +00:00
|
|
|
assert(i == n_chains);
|
bhyve: add support for virtio-net mergeable rx buffers
Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.
This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.
Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007
2019-11-08 17:57:03 +00:00
|
|
|
}
|
2019-11-03 19:02:32 +00:00
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2016-01-09 03:08:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
* Called when there is read activity on the backend file descriptor.
|
|
|
|
* Each buffer posted by the guest is assumed to be able to contain
|
|
|
|
* an entire ethernet frame + rx header.
|
2016-01-09 03:08:21 +00:00
|
|
|
*/
|
|
|
|
static void
|
|
|
|
pci_vtnet_rx_callback(int fd, enum ev_type type, void *param)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
|
|
|
struct pci_vtnet_softc *sc = param;
|
|
|
|
|
2013-04-30 00:36:16 +00:00
|
|
|
pthread_mutex_lock(&sc->rx_mtx);
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
pci_vtnet_rx(sc);
|
2013-04-30 00:36:16 +00:00
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
}
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
/* Called on RX kick. */
|
2011-05-13 04:54:01 +00:00
|
|
|
static void
|
2013-07-17 23:37:33 +00:00
|
|
|
pci_vtnet_ping_rxq(void *vsc, struct vqueue_info *vq)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
2013-07-17 23:37:33 +00:00
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
/*
|
2019-11-03 19:02:32 +00:00
|
|
|
* A qnotify means that the rx process can now begin.
|
2020-12-17 16:52:40 +00:00
|
|
|
* Enable RX only if features are negotiated.
|
2011-05-13 04:54:01 +00:00
|
|
|
*/
|
2019-06-18 17:51:30 +00:00
|
|
|
pthread_mutex_lock(&sc->rx_mtx);
|
2020-12-17 16:52:40 +00:00
|
|
|
if (!sc->features_negotiated) {
|
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-11-03 19:02:32 +00:00
|
|
|
vq_kick_disable(vq);
|
|
|
|
netbe_rx_enable(sc->vsc_be);
|
2019-06-18 17:51:30 +00:00
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
/* TX virtqueue processing, called by the TX thread. */
|
2011-05-13 04:54:01 +00:00
|
|
|
static void
|
2013-07-17 23:37:33 +00:00
|
|
|
pci_vtnet_proctx(struct pci_vtnet_softc *sc, struct vqueue_info *vq)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
|
|
|
struct iovec iov[VTNET_MAXSEGS + 1];
|
2020-02-12 22:44:18 +00:00
|
|
|
struct iovec *siov = iov;
|
2021-03-30 16:43:24 +08:00
|
|
|
struct vi_req req;
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
ssize_t len;
|
|
|
|
int n;
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
/*
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
* Obtain chain of descriptors. The first descriptor also
|
|
|
|
* contains the virtio-net header.
|
2011-05-13 04:54:01 +00:00
|
|
|
*/
|
2021-03-30 16:43:24 +08:00
|
|
|
n = vq_getchain(vq, iov, VTNET_MAXSEGS, &req);
|
2013-07-17 23:37:33 +00:00
|
|
|
assert(n >= 1 && n <= VTNET_MAXSEGS);
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2020-02-12 22:44:18 +00:00
|
|
|
if (sc->vhdrlen != sc->be_vhdrlen) {
|
|
|
|
/*
|
|
|
|
* The frontend uses a virtio-net header, but the backend
|
|
|
|
* does not. We simply strip the header and ignore it, as
|
|
|
|
* it should be zero-filled.
|
|
|
|
*/
|
|
|
|
siov = iov_trim_hdr(siov, &n, sc->vhdrlen);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (siov == NULL) {
|
|
|
|
/* The chain is nonsensical. Just drop it. */
|
|
|
|
len = 0;
|
|
|
|
} else {
|
|
|
|
len = netbe_send(sc->vsc_be, siov, n);
|
|
|
|
if (len < 0) {
|
|
|
|
/*
|
|
|
|
* If send failed, report that 0 bytes
|
|
|
|
* were read.
|
|
|
|
*/
|
|
|
|
len = 0;
|
|
|
|
}
|
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2020-02-12 22:44:18 +00:00
|
|
|
/*
|
|
|
|
* Return the processed chain to the guest, reporting
|
|
|
|
* the number of bytes that we read.
|
|
|
|
*/
|
2021-03-30 16:43:24 +08:00
|
|
|
vq_relchain(vq, req.idx, len);
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
/* Called on TX kick. */
|
2011-05-13 04:54:01 +00:00
|
|
|
static void
|
2013-07-17 23:37:33 +00:00
|
|
|
pci_vtnet_ping_txq(void *vsc, struct vqueue_info *vq)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
2013-07-17 23:37:33 +00:00
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
/*
|
2013-07-17 23:37:33 +00:00
|
|
|
* Any ring entries to process?
|
2011-05-13 04:54:01 +00:00
|
|
|
*/
|
2013-07-17 23:37:33 +00:00
|
|
|
if (!vq_has_descs(vq))
|
2011-05-13 04:54:01 +00:00
|
|
|
return;
|
|
|
|
|
2013-04-26 05:13:48 +00:00
|
|
|
/* Signal the tx thread for processing */
|
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
2019-06-11 15:52:41 +00:00
|
|
|
vq_kick_disable(vq);
|
2013-04-26 05:13:48 +00:00
|
|
|
if (sc->tx_in_progress == 0)
|
|
|
|
pthread_cond_signal(&sc->tx_cond);
|
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2013-04-26 05:13:48 +00:00
|
|
|
/*
|
|
|
|
* Thread which will handle processing of TX desc
|
|
|
|
*/
|
|
|
|
static void *
|
|
|
|
pci_vtnet_tx_thread(void *param)
|
|
|
|
{
|
2013-07-17 23:37:33 +00:00
|
|
|
struct pci_vtnet_softc *sc = param;
|
|
|
|
struct vqueue_info *vq;
|
2015-05-06 18:04:31 +00:00
|
|
|
int error;
|
2013-07-17 23:37:33 +00:00
|
|
|
|
|
|
|
vq = &sc->vsc_queues[VTNET_TXQ];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Let us wait till the tx queue pointers get initialised &
|
|
|
|
* first tx signaled
|
2013-04-26 05:13:48 +00:00
|
|
|
*/
|
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
|
|
|
error = pthread_cond_wait(&sc->tx_cond, &sc->tx_mtx);
|
|
|
|
assert(error == 0);
|
2013-07-17 23:37:33 +00:00
|
|
|
|
2013-04-26 05:13:48 +00:00
|
|
|
for (;;) {
|
2013-07-17 23:37:33 +00:00
|
|
|
/* note - tx mutex is locked here */
|
2015-05-06 18:04:31 +00:00
|
|
|
while (sc->resetting || !vq_has_descs(vq)) {
|
2019-06-11 15:52:41 +00:00
|
|
|
vq_kick_enable(vq);
|
2015-05-06 18:04:31 +00:00
|
|
|
if (!sc->resetting && vq_has_descs(vq))
|
|
|
|
break;
|
|
|
|
|
|
|
|
sc->tx_in_progress = 0;
|
|
|
|
error = pthread_cond_wait(&sc->tx_cond, &sc->tx_mtx);
|
|
|
|
assert(error == 0);
|
|
|
|
}
|
2019-06-11 15:52:41 +00:00
|
|
|
vq_kick_disable(vq);
|
2013-04-26 05:13:48 +00:00
|
|
|
sc->tx_in_progress = 1;
|
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
do {
|
2013-04-26 05:13:48 +00:00
|
|
|
/*
|
2013-07-17 23:37:33 +00:00
|
|
|
* Run through entries, placing them into
|
|
|
|
* iovecs and sending when an end-of-packet
|
|
|
|
* is found
|
2013-04-26 05:13:48 +00:00
|
|
|
*/
|
2013-07-17 23:37:33 +00:00
|
|
|
pci_vtnet_proctx(sc, vq);
|
|
|
|
} while (vq_has_descs(vq));
|
2013-05-03 01:16:18 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Generate an interrupt if needed.
|
|
|
|
*/
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
vq_endchains(vq, /*used_all_avail=*/1);
|
2013-04-26 05:13:48 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
#ifdef notyet
|
2011-05-13 04:54:01 +00:00
|
|
|
static void
|
2013-07-17 23:37:33 +00:00
|
|
|
pci_vtnet_ping_ctlq(void *vsc, struct vqueue_info *vq)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
|
|
|
|
2020-01-08 22:55:22 +00:00
|
|
|
DPRINTF(("vtnet: control qnotify!"));
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
2013-07-17 23:37:33 +00:00
|
|
|
#endif
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
static int
|
2019-06-26 13:30:41 -07:00
|
|
|
pci_vtnet_init(struct vmctx *ctx, struct pci_devinst *pi, nvlist_t *nvl)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
|
|
|
struct pci_vtnet_softc *sc;
|
2019-06-26 13:30:41 -07:00
|
|
|
const char *value;
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
char tname[MAXCOMLEN + 1];
|
2020-04-07 17:06:33 +00:00
|
|
|
unsigned long mtu = ETHERMTU;
|
2019-06-26 13:30:41 -07:00
|
|
|
int err;
|
2011-05-13 04:54:01 +00:00
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
/*
|
|
|
|
* Allocate data structures for further virtio initializations.
|
|
|
|
* sc also contains a copy of vtnet_vi_consts, since capabilities
|
|
|
|
* change depending on the backend.
|
|
|
|
*/
|
2014-04-22 18:55:21 +00:00
|
|
|
sc = calloc(1, sizeof(struct pci_vtnet_softc));
|
2011-05-13 04:54:01 +00:00
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
sc->vsc_consts = vtnet_vi_consts;
|
2011-05-13 04:54:01 +00:00
|
|
|
pthread_mutex_init(&sc->vsc_mtx, NULL);
|
2013-07-17 23:37:33 +00:00
|
|
|
|
|
|
|
sc->vsc_queues[VTNET_RXQ].vq_qsize = VTNET_RINGSZ;
|
|
|
|
sc->vsc_queues[VTNET_RXQ].vq_notify = pci_vtnet_ping_rxq;
|
|
|
|
sc->vsc_queues[VTNET_TXQ].vq_qsize = VTNET_RINGSZ;
|
|
|
|
sc->vsc_queues[VTNET_TXQ].vq_notify = pci_vtnet_ping_txq;
|
|
|
|
#ifdef notyet
|
|
|
|
sc->vsc_queues[VTNET_CTLQ].vq_qsize = VTNET_RINGSZ;
|
|
|
|
sc->vsc_queues[VTNET_CTLQ].vq_notify = pci_vtnet_ping_ctlq;
|
|
|
|
#endif
|
2013-07-04 05:35:56 +00:00
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
value = get_config_value_node(nvl, "mac");
|
|
|
|
if (value != NULL) {
|
|
|
|
err = net_parsemac(value, sc->vsc_config.mac);
|
|
|
|
if (err) {
|
|
|
|
free(sc);
|
|
|
|
return (err);
|
2020-02-12 22:44:18 +00:00
|
|
|
}
|
2019-06-26 13:30:41 -07:00
|
|
|
} else
|
|
|
|
net_genmac(pi, sc->vsc_config.mac);
|
2020-02-12 22:44:18 +00:00
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
value = get_config_value_node(nvl, "mtu");
|
|
|
|
if (value != NULL) {
|
|
|
|
err = net_parsemtu(value, &mtu);
|
2020-02-12 22:44:18 +00:00
|
|
|
if (err) {
|
|
|
|
free(sc);
|
|
|
|
return (err);
|
2013-07-04 05:35:56 +00:00
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
if (mtu < VTNET_MIN_MTU || mtu > VTNET_MAX_MTU) {
|
|
|
|
err = EINVAL;
|
|
|
|
errno = EINVAL;
|
2019-07-12 05:19:37 +00:00
|
|
|
free(sc);
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
return (err);
|
2019-07-12 05:19:37 +00:00
|
|
|
}
|
2019-06-26 13:30:41 -07:00
|
|
|
sc->vsc_consts.vc_hv_caps |= VIRTIO_NET_F_MTU;
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
2019-06-26 13:30:41 -07:00
|
|
|
sc->vsc_config.mtu = mtu;
|
2011-05-13 04:54:01 +00:00
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
/* Permit interfaces without a configured backend. */
|
|
|
|
if (get_config_value_node(nvl, "backend") != NULL) {
|
|
|
|
err = netbe_init(&sc->vsc_be, nvl, pci_vtnet_rx_callback, sc);
|
|
|
|
if (err) {
|
|
|
|
free(sc);
|
|
|
|
return (err);
|
|
|
|
}
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
sc->vsc_consts.vc_hv_caps |= VIRTIO_NET_F_MRG_RXBUF |
|
|
|
|
netbe_get_cap(sc->vsc_be);
|
2020-04-07 17:06:33 +00:00
|
|
|
|
2021-12-26 09:52:38 +02:00
|
|
|
/*
|
2020-04-07 17:06:33 +00:00
|
|
|
* Since we do not actually support multiqueue,
|
2021-12-26 09:52:38 +02:00
|
|
|
* set the maximum virtqueue pairs to 1.
|
2020-04-07 17:06:33 +00:00
|
|
|
*/
|
|
|
|
sc->vsc_config.max_virtqueue_pairs = 1;
|
|
|
|
|
2011-05-13 04:54:01 +00:00
|
|
|
/* initialize config space */
|
|
|
|
pci_set_cfgdata16(pi, PCIR_DEVICE, VIRTIO_DEV_NET);
|
|
|
|
pci_set_cfgdata16(pi, PCIR_VENDOR, VIRTIO_VENDOR);
|
|
|
|
pci_set_cfgdata8(pi, PCIR_CLASS, PCIC_NETWORK);
|
2021-03-16 19:27:38 +08:00
|
|
|
pci_set_cfgdata16(pi, PCIR_SUBDEV_0, VIRTIO_ID_NETWORK);
|
2015-05-13 17:38:07 +00:00
|
|
|
pci_set_cfgdata16(pi, PCIR_SUBVEND_0, VIRTIO_VENDOR);
|
2013-01-30 04:30:36 +00:00
|
|
|
|
2019-06-26 13:30:41 -07:00
|
|
|
/* Link is always up. */
|
|
|
|
sc->vsc_config.status = 1;
|
2021-12-26 09:52:38 +02:00
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
vi_softc_linkup(&sc->vsc_vs, &sc->vsc_consts, sc, pi, sc->vsc_queues);
|
|
|
|
sc->vsc_vs.vs_mtx = &sc->vsc_mtx;
|
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
/* use BAR 1 to map MSI-X table and PBA, if we're using MSI-X */
|
2019-07-12 05:19:37 +00:00
|
|
|
if (vi_intr_init(&sc->vsc_vs, 1, fbsdrun_virtio_msix())) {
|
|
|
|
free(sc);
|
2013-07-17 23:37:33 +00:00
|
|
|
return (1);
|
2019-07-12 05:19:37 +00:00
|
|
|
}
|
2013-07-17 23:37:33 +00:00
|
|
|
|
|
|
|
/* use BAR 0 to map config regs in IO space */
|
|
|
|
vi_set_io_bar(&sc->vsc_vs, 0);
|
2013-04-30 00:36:16 +00:00
|
|
|
|
|
|
|
sc->resetting = 0;
|
|
|
|
|
2019-11-19 21:10:44 +00:00
|
|
|
sc->rx_merge = 0;
|
2020-02-12 22:44:18 +00:00
|
|
|
sc->vhdrlen = sizeof(struct virtio_net_rxhdr) - 2;
|
2021-12-26 09:52:38 +02:00
|
|
|
pthread_mutex_init(&sc->rx_mtx, NULL);
|
2013-04-30 00:36:16 +00:00
|
|
|
|
2021-12-26 09:52:38 +02:00
|
|
|
/*
|
2013-07-17 23:37:33 +00:00
|
|
|
* Initialize tx semaphore & spawn TX processing thread.
|
2013-04-26 05:13:48 +00:00
|
|
|
* As of now, only one thread for TX desc processing is
|
2021-12-26 09:52:38 +02:00
|
|
|
* spawned.
|
2013-04-26 05:13:48 +00:00
|
|
|
*/
|
|
|
|
sc->tx_in_progress = 0;
|
|
|
|
pthread_mutex_init(&sc->tx_mtx, NULL);
|
|
|
|
pthread_cond_init(&sc->tx_cond, NULL);
|
|
|
|
pthread_create(&sc->tx_tid, NULL, pci_vtnet_tx_thread, (void *)sc);
|
2013-11-06 00:25:17 +00:00
|
|
|
snprintf(tname, sizeof(tname), "vtnet-%d:%d tx", pi->pi_slot,
|
|
|
|
pi->pi_func);
|
2018-06-14 01:34:53 +00:00
|
|
|
pthread_set_name_np(sc->tx_tid, tname);
|
2011-05-13 04:54:01 +00:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
static int
|
|
|
|
pci_vtnet_cfgwrite(void *vsc, int offset, int size, uint32_t value)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
2013-07-17 23:37:33 +00:00
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
2011-06-07 18:35:45 +00:00
|
|
|
void *ptr;
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
if (offset < (int)sizeof(sc->vsc_config.mac)) {
|
|
|
|
assert(offset + size <= (int)sizeof(sc->vsc_config.mac));
|
2011-05-13 04:54:01 +00:00
|
|
|
/*
|
|
|
|
* The driver is allowed to change the MAC address
|
|
|
|
*/
|
2013-07-17 23:37:33 +00:00
|
|
|
ptr = &sc->vsc_config.mac[offset];
|
|
|
|
memcpy(ptr, &value, size);
|
|
|
|
} else {
|
2014-09-09 22:35:02 +00:00
|
|
|
/* silently ignore other writes */
|
2020-01-08 22:55:22 +00:00
|
|
|
DPRINTF(("vtnet: write to readonly reg %d", offset));
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
2014-09-09 22:35:02 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
return (0);
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
static int
|
|
|
|
pci_vtnet_cfgread(void *vsc, int offset, int size, uint32_t *retval)
|
2011-05-13 04:54:01 +00:00
|
|
|
{
|
2013-07-17 23:37:33 +00:00
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
2011-06-07 18:35:45 +00:00
|
|
|
void *ptr;
|
2013-01-30 04:30:36 +00:00
|
|
|
|
2013-07-17 23:37:33 +00:00
|
|
|
ptr = (uint8_t *)&sc->vsc_config + offset;
|
|
|
|
memcpy(retval, ptr, size);
|
|
|
|
return (0);
|
2011-05-13 04:54:01 +00:00
|
|
|
}
|
|
|
|
|
2014-09-09 22:35:02 +00:00
|
|
|
static void
|
|
|
|
pci_vtnet_neg_features(void *vsc, uint64_t negotiated_features)
|
|
|
|
{
|
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
|
|
|
sc->vsc_features = negotiated_features;
|
|
|
|
|
2019-11-19 21:10:44 +00:00
|
|
|
if (negotiated_features & VIRTIO_NET_F_MRG_RXBUF) {
|
2020-02-12 22:44:18 +00:00
|
|
|
sc->vhdrlen = sizeof(struct virtio_net_rxhdr);
|
2019-11-19 21:10:44 +00:00
|
|
|
sc->rx_merge = 1;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Without mergeable rx buffers, virtio-net header is 2
|
|
|
|
* bytes shorter than sizeof(struct virtio_net_rxhdr).
|
|
|
|
*/
|
2020-02-12 22:44:18 +00:00
|
|
|
sc->vhdrlen = sizeof(struct virtio_net_rxhdr) - 2;
|
2014-09-09 22:35:02 +00:00
|
|
|
sc->rx_merge = 0;
|
|
|
|
}
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
|
|
|
|
/* Tell the backend to enable some capabilities it has advertised. */
|
2020-02-12 22:44:18 +00:00
|
|
|
netbe_set_cap(sc->vsc_be, negotiated_features, sc->vhdrlen);
|
|
|
|
sc->be_vhdrlen = netbe_get_vnet_hdr_len(sc->vsc_be);
|
|
|
|
assert(sc->be_vhdrlen == 0 || sc->be_vhdrlen == sc->vhdrlen);
|
2020-12-17 16:52:40 +00:00
|
|
|
|
|
|
|
pthread_mutex_lock(&sc->rx_mtx);
|
|
|
|
sc->features_negotiated = true;
|
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
2014-09-09 22:35:02 +00:00
|
|
|
}
|
|
|
|
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
static void
|
|
|
|
pci_vtnet_pause(void *vsc)
|
|
|
|
{
|
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
|
|
|
DPRINTF(("vtnet: device pause requested !\n"));
|
|
|
|
|
|
|
|
/* Acquire the RX lock to block RX processing. */
|
|
|
|
pthread_mutex_lock(&sc->rx_mtx);
|
|
|
|
|
|
|
|
/* Wait for the transmit thread to finish its processing. */
|
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
|
|
|
while (sc->tx_in_progress) {
|
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
|
|
|
usleep(10000);
|
|
|
|
pthread_mutex_lock(&sc->tx_mtx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
pci_vtnet_resume(void *vsc)
|
|
|
|
{
|
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
|
|
|
DPRINTF(("vtnet: device resume requested !\n"));
|
|
|
|
|
|
|
|
pthread_mutex_unlock(&sc->tx_mtx);
|
|
|
|
/* The RX lock should have been acquired in vtnet_pause. */
|
|
|
|
pthread_mutex_unlock(&sc->rx_mtx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
pci_vtnet_snapshot(void *vsc, struct vm_snapshot_meta *meta)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct pci_vtnet_softc *sc = vsc;
|
|
|
|
|
|
|
|
DPRINTF(("vtnet: device snapshot requested !\n"));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Queues and consts should have been saved by the more generic
|
|
|
|
* vi_pci_snapshot function. We need to save only our features and
|
|
|
|
* config.
|
|
|
|
*/
|
|
|
|
|
|
|
|
SNAPSHOT_VAR_OR_LEAVE(sc->vsc_features, meta, ret, done);
|
|
|
|
|
|
|
|
/* Force reapply negociated features at restore time */
|
|
|
|
if (meta->op == VM_SNAPSHOT_RESTORE) {
|
|
|
|
pci_vtnet_neg_features(sc, sc->vsc_features);
|
|
|
|
netbe_rx_enable(sc->vsc_be);
|
|
|
|
}
|
|
|
|
|
|
|
|
SNAPSHOT_VAR_OR_LEAVE(sc->vsc_config, meta, ret, done);
|
|
|
|
SNAPSHOT_VAR_OR_LEAVE(sc->rx_merge, meta, ret, done);
|
|
|
|
|
|
|
|
SNAPSHOT_VAR_OR_LEAVE(sc->vhdrlen, meta, ret, done);
|
|
|
|
SNAPSHOT_VAR_OR_LEAVE(sc->be_vhdrlen, meta, ret, done);
|
|
|
|
|
|
|
|
done:
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
bhyve: abstraction for network backends
Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.
Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659
2019-07-07 12:15:24 +00:00
|
|
|
static struct pci_devemu pci_de_vnet = {
|
2012-10-19 18:11:17 +00:00
|
|
|
.pe_emu = "virtio-net",
|
|
|
|
.pe_init = pci_vtnet_init,
|
2019-06-26 13:30:41 -07:00
|
|
|
.pe_legacy_config = netbe_legacy_config,
|
2013-07-17 23:37:33 +00:00
|
|
|
.pe_barwrite = vi_pci_write,
|
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
|
|
|
.pe_barread = vi_pci_read,
|
|
|
|
#ifdef BHYVE_SNAPSHOT
|
|
|
|
.pe_snapshot = vi_pci_snapshot,
|
|
|
|
.pe_pause = vi_pci_pause,
|
|
|
|
.pe_resume = vi_pci_resume,
|
|
|
|
#endif
|
2011-05-13 04:54:01 +00:00
|
|
|
};
|
|
|
|
PCI_EMUL_SET(pci_de_vnet);
|