The vhost_net_device_ops indirection is unnecessary because there is only
one implementation of the vhost common code.
Removing it makes the code more readable.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The common vhost code only supported a single mmap per device. vhost-user
worked around this by saving the address/length/fd of each mmap after the end
of the rte_virtio_memory struct. This only works if the vhost-user code frees
dev->mem, since the common code is unaware of the extra info. The
VHOST_USER_RESET_OWNER message is one situation where the common code frees
dev->mem and leaks the fds and mappings. This happens every time I shut down a
VM.
The new code calls back into the implementation (vhost-user or vhost-cuse) to
clean up these resources.
The vhost-cuse changes are only compile tested.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
It's actually a feature already enabled in Linux kernel (since v3.5).
What we need to do is simply to claim that we support such feature,
and nothing else.
With that, the guest will send an ARP message after live migration
to notify the switches about the new location of migrated VM.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Introduce vhost_log_write() helper function to log the dirty pages we
touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each
log is presented by 1 bit.
Therefore, vhost_log_write() simply finds the right bit for related
page we are gonna change, and set it to 1. dev->log_base denotes the
start of the dirty page bitmap.
Every time we update virtio used ring, we need to log it. And it's
been done by a new vhost_log_write() wrapper, vhost_log_used_vring().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Add guest offload setting in vhost lib.
Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says:
1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so,
the packet checksum at offset csum_offset from csum_start
and any preceding checksums have been validated. The checksum
on the packet is incomplete and csum_start and csum_offset
indicate how to calculate it (see Packet Transmission point 1).
2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then gso_type MAY be something other than
VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the
desired MSS (see Packet Transmission point 2).
In order to support these features, the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation.
2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr.
There are more explanations for the implementation.
For VM2VM case, there is no need to do checksum, for we think the
data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM
at RX side will let the TCP layer to bypass the checksum validation,
so that the RX side could receive the packet in the end.
In terms of us-vhost, at vhost RX side, the offload information is
inherited from mbuf, which is in turn inherited from TX side. If we
can still get those info at RX side, it means the packet is from
another VM at same host. So, it's safe to set the
VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib.
In order to support these features, and the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features
negotiation.
2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the
related fileds in mbuf.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The patch fixes reset_owner message handling not to clear callfd,
because callfd will be valid while connection is established.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.
And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.
This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.
Cc: Tetsuya Mukawa <mukawa@igel.co.jp>
Cc: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
QEMU sends VHOST_RESET_OWNER first when shutting down.
There was previously no way for the dataplane to know that the
virtio_net instance had become unusable and it would segfault
when trying to do RX/TX.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix build error:
virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here
rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here
Above two virtio-net MQ macros are introduced since kernel v3.8.
For older kernel, we should not reference them directly, hence,
this patch introduced two wrapper macros, with proper values
being set depending on we support MQ or not.
Fixes: b09b198bfb5c ("vhost-user: announce queue number in message")
Reported-by: Yongjie Gu <yongjiex.gu@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: David Marchand <david.marchand@6wind.com>
This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.
This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.
This patch is based on Changchun's patch.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:
Any protocol extensions are gated by protocol feature bits,
which allows full backwards compatibility on both master
and slave.
The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.
VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry.
Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.
In user_get_vring_base, get_device return is not checked and may cause
segfault when device is not found.
Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.
Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
According to eventfd man page:
typedef uint64_t eventfd_t;
int eventfd_read(int fd, eventfd_t *value);
int eventfd_write(int fd, eventfd_t value);
eventfd_t is defined for the second arg(value), but not for fd.
Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR
message, will try to reallocate vhost device and virt queue to the same
numa node.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
When we migrate VM, without this feature, qemu will report error:
"migrate: Migration disabled: vhost lacks VHOST_F_LOG_ALL feature".
Signed-off-by: Krishna Murthy <krishna.j.murthy@intel.com>
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.
In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.
To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
remove set_memory_table ops
vhost-cuse or vhost-user will both implement their own set_memory_region handler.
In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
File descriptor is copied from qemu process into vhost process.
vhost-user doesn't need eventfd kernel module to copy fds between processes.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
Rename vhost-net-cdev.h to vhost-net.h.
This file defines common operations provided by virtio-net(.c).
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
In virtnet_send_command:
/* Caller should know better */
BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
(out + in > VIRTNET_SEND_COMMAND_SG_MAX));
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
This patch fixes the issue whereby when using userspace vhost ports
in the context of vSwitching, the name provided to the hypervisor/QEMU
of the vhost tap device needs to be exposed in the library, in order
for the vSwitch to be able to direct packets to the correct device.
This patch introduces an 'ifname' member to the virtio-net structure
which is populated with the tap device name when QEMU is brought up
with a vhost device.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Anthony Fee <anthonyx.fee@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
This is to enable user space vhost receiving and forwarding broadcast
and multicast packets:
Use new option in command line to enable promisc mode;
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Commit aec8283d47 fixes the compilation issue, but it leads to
one runtime issue: early exit wrongly. In some case, 'path' is NULL, but
'resolved_path' has effective path, it should continue going ahead rather
than exit.
This is due to that qemu unlink the file after it maps the huge page file.
In this special case, it is ok to check the resolved path
when path is NULL if errno indicates "No such file or directory".
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Fix alignment issues, lengthy lines, misordered type and other coding style issues.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
It fixes this compilation complain: "error: ignoring return value of 'realpath',
declared with attribute warn_unused_result [-Werror=unused-result]"
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
VHOST_SUPPORTED_FEATURES is the feature mask that vhost lib supports.
VHOST_FEATURES is the feature mask vhost currently supports after some features are turned on/off.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: split patch]
Rename init_virtio_net as rte_vhost_callback_register API.
rte_vhost_callback_register register the callbacks called when a
vhost device is created and ready to be added to data processing core
or is de-actived by guest.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
The structure virtio_net_config_ll is moved to virtio_net.c.
It is related to internal virtio device management,
so it should not be exposed to other files.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Currently zero copy feature isn't generic as it couples closely with nic.
It isn't put in the vhost lib in this version.
gpa(guest physical address) to hpa(host physical address) mapping region
logic is removed.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
The following logics will be moved to vhost example:
1. mac learning, which is used to learn the mac address from the first
transmitted packet of guest and bind the vhost device to a queue in a
pool of VMDQ.
2. VMDQ mac/vlan filter: Each pool the vhost device is bind to is
assigned a mac/vlan filter.
3. num_devices is used to specify the maximum vhost devices the nic supports.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Those files will be refactored in subsequent patches to form user space
vhost library.
Makefile and main.h are removed.
main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API.
virtio-net.h is renamed to rte_virtio_net.h which is the API header file.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove from examples Makefile and merge file renaming]