Introduce vhost_log_write() helper function to log the dirty pages we
touched. Page size is harded code to 4096 (VHOST_LOG_PAGE), and each
log is presented by 1 bit.
Therefore, vhost_log_write() simply finds the right bit for related
page we are gonna change, and set it to 1. dev->log_base denotes the
start of the dirty page bitmap.
Every time we update virtio used ring, we need to log it. And it's
been done by a new vhost_log_write() wrapper, vhost_log_used_vring().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk
vhost-user) where we should log dirty pages, and how big the log
buffer is.
This request introduces a new payload:
typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
} VhostUserLog;
Also, a fd is delivered from QEMU by ancillary data.
With those info given, an area of memory is mmaped, assigned
to dev->log_base, for logging dirty pages.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Commit d0cf91303d73 added dependency on librte_net headers to vhost
but did not add this to the Makefile, which makes builds
non-deterministic. Curiously it is non-parallel build that is
consistently broken by this missing dependency, usually it's the other
way around, but trying to build without -j(n) fails with:
lib/librte_vhost/vhost_rxtx.c:41:20:
fatal error: rte_ip.h: No such file or directory
Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add guest offload setting in vhost lib.
Virtio 1.0 spec (5.1.6.4 Processing of Incoming Packets) says:
1. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so,
the packet checksum at offset csum_offset from csum_start
and any preceding checksums have been validated. The checksum
on the packet is incomplete and csum_start and csum_offset
indicate how to calculate it (see Packet Transmission point 1).
2. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then gso_type MAY be something other than
VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the
desired MSS (see Packet Transmission point 2).
In order to support these features, the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation.
2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr.
There are more explanations for the implementation.
For VM2VM case, there is no need to do checksum, for we think the
data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM
at RX side will let the TCP layer to bypass the checksum validation,
so that the RX side could receive the packet in the end.
In terms of us-vhost, at vhost RX side, the offload information is
inherited from mbuf, which is in turn inherited from TX side. If we
can still get those info at RX side, it means the packet is from
another VM at same host. So, it's safe to set the
VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add vhost TX offload (CSUM and TSO) support capabilities in vhost lib.
In order to support these features, and the following changes are added,
1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features
negotiation.
2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the
related fileds in mbuf.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
When guest is booting (or any othertime guest is busy) it is possible
for the small receive ring (256) to get full. If this happens the
vhost library should just return normally. It's current behavior
of logging just creates massive log spew/overflow which could even
act as a DoS attack against host.
Reported-by: Nathan Law <nlaw@brocade.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
So that we will not break ABI in future extension by adding few more
fields.
Struct vhost_virtqueue is reserved with 16 qwords (the later vhost-live
migration support would at least consume 3 of them), and struct virtio_net
is reserved with a bit more, 64 qwords, as there is only one instance for
a virtio nic instance.
Note that both reservation are not placed at the end of the struct, but
instead before the last field, since both the last field at the two struct
take a lot spaces. Putting the reservation after it would divide those
reserved fields to another cacheline. (we might need fix them in future, btw)
Suggested-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Problem:if I firstly insert my kmod_test.ko, then insert eventfd_link.ko,
error will happen with hint "Device or resource busy". This is because
the default minor device number, 0, has been occupied by my kmod_test.ko .
root@distro:~/test$ lsmod
Module Size Used by
kmod_test 927 0
vboxsf 35930 4
vboxguest 222130 1 vboxsf
microcode 10315 0
autofs4 25051 0
root@distro:~/test$ insmod ./eventfd_link.ko
insmod: ERROR: could not insert module ./eventfd_link.ko: Device or
resource busy
Explanation: For miscdevices, the major device_no is same, so the minor
device_no should be set to ditinguish different misc devices; if not set
the minor, it may fail while insmod due to the default minor value, 0, has
been used by other miscdevice. MISC_DYNAMIC_MINOR means to let Linux
kernel dynamically assign one minor devide number while loading.
Signed-off-by: Xiaobo Chi <xiaobo.chi@nokia.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The VHOST_USER_SET_VRING_ENABLE request was sent for each queue-pair.
However, it's changed to be sent per queue in the queue-pair by QEMU
commit dc3db6ad ("vhost-user: start/stop all rings"). The change
is reasonable, as we send all other request per queue, instead of
queue-pair.
Hence we should do proper changes to adapt to the QEMU change here.
Otherwise, a segfault will be triggered when last TX queue was enabled.
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This patch fixes a bug under lower version linux kernel, mmap()
fails when length is not aligned with hugepage size. mmap()
without flag of MAP_ANONYMOUS, should be called with length
argument aligned with hugepagesz at older longterm version
Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
This bug was fixed in Linux kernel by commit:
dab2d3dc45ae7343216635d981d43637e1cb7d45
To avoid failure, make sure in caller to keep length aligned.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The patch fixes reset_owner message handling not to clear callfd,
because callfd will be valid while connection is established.
Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.
And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.
This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.
Cc: Tetsuya Mukawa <mukawa@igel.co.jp>
Cc: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
QEMU sends VHOST_RESET_OWNER first when shutting down.
There was previously no way for the dataplane to know that the
virtio_net instance had become unusable and it would segfault
when trying to do RX/TX.
Signed-off-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Commit 15e9ee6982a4822ce395fd597dd500a61ceafa7c
uses the VIRTIO_F_VERSION_1 macro existing only in newer kernels.
Fixed it by manually defining it for older kernels.
Fixes: 15e9ee6982a4 ("vhost: enable virtio 1.0")
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Make vhost-user virtio 1.0 compatible by adding it to the
supported features and keeping the header length
the same as for mergeable RX buffers.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
* Move ioctl `EVENTFD_COPY' code to a separate function
* Remove extra #includes
* Introduce function fget_from_files
* Fix ioctl return values
Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Fix build error:
virtio-net.c:80:89: error: ‘VIRTIO_NET_F_MQ’ undeclared here
rte_virtio_net.h:109: error: ‘VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX’ undeclared here
Above two virtio-net MQ macros are introduced since kernel v3.8.
For older kernel, we should not reference them directly, hence,
this patch introduced two wrapper macros, with proper values
being set depending on we support MQ or not.
Fixes: b09b198bfb5c ("vhost-user: announce queue number in message")
Reported-by: Yongjie Gu <yongjiex.gu@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: David Marchand <david.marchand@6wind.com>
This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.
For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.
This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.
This patch is based on Changchun's patch.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Add VHOST_USER_GET_QUEUE_NUM message
to tell the frontend (qemu) how many queue pairs we support.
And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:
Any protocol extensions are gated by protocol feature bits,
which allows full backwards compatibility on both master
and slave.
The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.
VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Huawei Xie <huawei.xie@intel.com>
virtio-net search for it's device in reset_owner.
The function don't check the return result of get_config_ll_entry.
Using get_config_ll_entry in reset_owner don't show any error when the
device is not found. This patch fix this by using get_device instead
instead of get_config_ll_entry.
In user_get_vring_base, get_device return is not checked and may cause
segfault when device is not found.
Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
virtio-net clean and init device after a VHOST_USER_RESET_OWNER.
This reset device identifier to 0 and break ll_root listing logic.
This patch keep the old device identifier and re-write it on the cleaned
device.
Signed-off-by: Jerome Jutteau <jerome.jutteau@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
In merge-able RX path, vhost injects interrupts to guest for each packet.
This should degrade performance a lot.
This patch fixes this issue by injecting one interrupt for a batch of packets.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
According to eventfd man page:
typedef uint64_t eventfd_t;
int eventfd_read(int fd, eventfd_t *value);
int eventfd_write(int fd, eventfd_t value);
eventfd_t is defined for the second arg(value), but not for fd.
Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The vHost eventlink driver is a kernel module that requires a kernel
source/build directory to build the ko. Convert the fixed kernel build
directory specifier to one which may be user specified on the command-line.
Signed-off-by: Aaron Conole <aconole@redhat.com>
This patch originates from the patch:
"Patch for Qemu wrapper for US-VHost to ensure Qemu process ends when
VM is shutdown", http://dpdk.org/ml/archives/dev/2014-June/003606.html
Also update the vhost sample guide doc.
Signed-off-by: Claire Murphy <claire.k.murphy@intel.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
This commit removes the dev-index, so update the doc for this change:
Fixes: 17b8320a3e11 ("vhost: remove index parameter")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
It adds more readable log info if a socket fails to bind to
local socket file name.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
rte_vhost_driver_unregister API will remove the listenfd from event list,
and then close it.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Peng Sun <peng.a.sun@intel.com>
In the event handler of connection fd, the connection fd could be possibly
closed. The event dispatch loop would then try to remove the fd from fdset.
Between these two actions, another thread might register a new listenfd
reusing the val of just closed fd, so we couldn't call fdset_del which would
wrongly clean up the new listenfd. A new function fdset_del_slot is provided
to cleanup the fd at the specified location.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
When we get the address of vring descriptor table in VHOST_SET_VRING_ADDR
message, will try to reallocate vhost device and virt queue to the same
numa node.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
This patch simply applies the transform previously committed in
scripts/cocci/mtod-offset.cocci. No other modifications have been
made here.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Remove these unnecessary vring descriptor length updating, vhost should
not change them.
virtio in front end should assign value to desc.len for both rx and tx.
Test report: http://dpdk.org/ml/archives/dev/2015-June/018610.html
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>