52 Commits

Author SHA1 Message Date
Maxime Coquelin
b1cce26af1 vhost: add notification for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
37f5e79a27 vhost: add shadow used ring support for packed rings
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Yuanhan Liu
2d1541e2b6 vhost: add vring address setup for packed queues
Add code to set up packed queues when enabled.

Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Tonghao Zhang
2396806765 vhost: introduce new function helper
Introduce an new common helper to avoid redundancy.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 12:27:25 +02:00
Tiwei Bie
d90cf7d111 vhost: support host notifier
When a vDPA device is attached, vhost user will try to
register host notifiers to QEMU to allow notifications
to be delivered between the driver in the guest and the
vDPA device in the host directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:39 +02:00
Tonghao Zhang
76b1e4cec7 vhost: refine new device function
Make sure find avalid device id before allocating
virtio_net, if not, return directly. It may avoid
allocating and freeing virtio_net when there is
not valid device id.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:09 +02:00
Maxime Coquelin
070aceda33 vhost: check all range is mapped when translating GPAs
There is currently no check done on the length when translating
guest addresses into host virtual addresses. Also, there is no
guanrantee that the guest addresses range is contiguous in
the host virtual address space.

This patch prepares vhost_iova_to_vva() and its callers to
return and check the mapped size. If the mapped size is smaller
than the requested size, the caller handle it as an error.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Junjie Chen
3f8ff12821 vhost: support interrupt mode
In some cases we want vhost dequeue work in interrupt mode to
release cpus to others when no data to transmit. So we install
interrupt handler of vhost device and interrupt vectors for each
rx queue when creating new backend according to vhost interrupt
configuration. Thus, applications could register a epoll event fd
to associate rx queues with interrupt vectors.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
9b91fbd6ec vhost/crypto: add vhost-user message handlers
Previously, vhost library lacks the support to the vhost backend
other than net such as adding private data or registering vhost-user
message handlers. This patch fills the gap by adding data pointer and
vhost-user pre and post message handlers to vhost library.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Zhihong Wang
bd2e0c3fe5 vhost: add APIs for live migration
This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
07718b4f87 vhost: adapt library for selective datapath
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
b4953225ce vhost: add APIs for datapath configuration
This patch adds APIs for datapath configuration.

The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Jianfeng Tan
ae034edaa6 vhost: avoid function call in data path
Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.

To avoid that, we turn this function to inline type.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Maxime Coquelin
82b9c15403 vhost: remove pending IOTLB entry if miss request failed
In case vhost_user_iotlb_miss returns an error, the pending IOTLB
entry has to be removed from the list as no IOTLB update will be
received.

Fixes: fed67a20ac94 ("vhost: introduce guest IOVA to backend VA helper")
Cc: stable@dpdk.org

Suggested-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-02-05 19:56:04 +01:00
Stefan Hajnoczi
1c717af4c6 vhost: add flag for built-in virtio driver
The librte_vhost API is used in two ways:
1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().
2. As a library for implementing vhost device backends.

There is no distinction between the two at the API level or in the
librte_vhost implementation.  For example, device state is kept in
"struct virtio_net" regardless of whether this is actually a net device
backend or whether the built-in virtio_net.c driver is in use.

The virtio_net.c driver should be a librte_vhost API client just like
the vhost-scsi code and have no special access to vhost.h internals.
Unfortunately, fixing this requires significant librte_vhost API
changes.

This patch takes a different approach: keep the librte_vhost API
unchanged but track whether the built-in virtio_net.c driver is in use.
See the next patch for a bug fix that requires knowledge of whether
virtio_net.c is in use.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-02-05 15:11:07 +01:00
Victor Kaplansky
a368804699 vhost: protect active rings from async ring changes
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.

This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.

These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.

Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.

See https://bugzilla.redhat.com/show_bug.cgi?id=1450680

Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-21 15:51:52 +01:00
Junjie Chen
e37ff95440 vhost: support virtqueue interrupt/notification suppression
The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
negotiated. The driver set vring flags to 0, and MAY use used_event in
available ring to advise device interrupt util reach an index specified
by used_event. The device ignore the lower bit of vring flags, and send
an interrupt when index reach used_event.

The device can suppress notification in a manner analogous to the ways
driver suppress interrupt. The device manipulates flags or avail_event in
the used ring in the same way the driver manipulates flags or used_event in
available ring.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Maxime Coquelin
467fe22df9 vhost: extract virtqueue cleaning and freeing functions
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Stefan Hajnoczi
6c299bb732 vhost: introduce vring call API
Users of librte_vhost currently implement the vring call operation
themselves.  Each caller performs the operation slightly differently.

This patch introduces a new librte_vhost API called
rte_vhost_vring_call() that performs the operation so that vhost-user
applications don't have to duplicate it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Bruce Richardson
369991d997 lib: use SPDX tag for Intel copyright files
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-04 22:41:39 +01:00
Maxime Coquelin
1aadb2f6b1 vhost: fix deadlock on IOTLB miss
An optimization was done to only take the iotlb cache lock
once per packet burst instead of once per IOVA translation.

With this, IOTLB miss requests are sent to Qemu with the lock
held, which can cause a deadlock if the socket buffer is full,
and if Qemu is waiting for an IOTLB update to be done.

Holding the lock is not necessary when sending an IOTLB miss
request, as it is not manipulating the IOTLB cache list, which
the lock protects. Let's just release it while sending the
IOTLB miss.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-13 22:08:21 +02:00
Maxime Coquelin
36031f80cc vhost: invalidate vring in case of matching IOTLB invalidate
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
eefac9536a vhost: postpone device creation until rings are mapped
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
321203a54b vhost: enable rings at the right time
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
fed67a20ac vhost: introduce guest IOVA to backend VA helper
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
76e99bfc4c vhost: initialize vrings IOTLB caches
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
d012d1f293 vhost: add IOTLB helper functions
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
275c3f9447 vhost: support slave requests channel
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Tiwei Bie
e5c494a7a2 vhost: batch small guest memory copies
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.

This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:48:53 +02:00
Ilya Maximets
185a883597 vhost: print reason of NUMA node query failure
syscall always returns '-1' on failure and there is no point
in printing that value. 'errno' is much more informative.

Fixes: 586e39001317 ("vhost: export numa node")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-07 02:17:56 +02:00
Maxime Coquelin
02f62392ff vhost: fix MTU device feature check
The MTU feature support check has to be done against MTU
feature bit mask, and not bit position.

Fixes: 72e8543093df ("vhost: add API to get MTU value")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:39:29 +02:00
Dariusz Stojaczyk
d1b2842a9d vhost: fix malloc size too small
Amount of allocated memory was too small, causing buffer overflow.

Fixes: eb32247457fe ("vhost: export guest memory regions")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreiman@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Zhihong Wang
29150b70ab vhost: support Rx queue count request
This patch implements the ops rx_queue_count for vhost PMD by adding
a helper function rte_vhost_rx_queue_count in vhost lib.

The ops rx_queue_count gets vhost RX queue avail count and helps to
understand the queue fill level.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Yuanhan Liu
a798beb47c vhost: rename header file
Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
52f8091f05 vhost: export APIs for live migration support
Export few APIs for the vhost-user driver to log the guest memory writes,
which is a must for live migration support.

This patch basically moves vhost_log_write() and vhost_log_used_vring()
into vhost.h and then add an wrapper (the public API) to them.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
b50a203986 vhost: export the number of vrings
We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ab4d7b9f1a vhost: turn queue pair to vring
The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.

This patch just does a simple convert, a later patch would export the
number of vrings to applications.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
40ef286f23 vhost: export vhost vring info
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ca33faf9ef vhost: introduce API to fetch negotiated features
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:41:54 +02:00
Yuanhan Liu
eb32247457 vhost: export guest memory regions
Some vhost-user driver may need this info to setup its own page tables
for GPA (guest physical addr) to HPA (host physical addr) translation.
SPDK (Storage Performance Development Kit) is one example.

Besides, by exporting this memory info, we could also export the
gpa_to_vva() as an inline function, which helps for performance.
Otherwise, it has to be referenced indirectly by a "vid".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
93433b639d vhost: make notify ops per vhost driver
Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
0917f9d1f0 vhost: use new APIs to handle features
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Maxime Coquelin
72e8543093 vhost: add API to get MTU value
This patch implements the function for the application to
get the MTU value.

rte_vhost_get_mtu() fills the mtu parameter with the MTU value
set in QEMU if VIRTIO_NET_F_MTU has been negotiated and returns 0,
-ENOTSUP otherwise.

The function returns -EAGAIN if Virtio feature negotiation
didn't happened yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Maxime Coquelin
3d3c6590b5 vhost: enable virtio MTU feature
This patch enables the new VIRTIO_NET_F_MTU feature,
which makes possible for the host to advise the guest
with its maximum supported MTU.

MTU value is set via QEMU parameters, either via Libvirt XML, or
directly in virtio-net device command line arguments.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Yuanhan Liu
8d286dbeb8 vhost: fix multiple queue not enabled for old kernels
Some macros (say VIRTIO_NET_F_MQ) are needed for enabling multiple queue,
however they are introduced since kernel v3.8, meaning build error happens
if we build DPDK vhost on those platforms.

71dfdbe66a66 ("vhost: fix build with kernel < 3.8") meant to fix it, but
in a wrong way: it completely disables the MQ features for those kernels.
However, the MQ feature doesn't depend on the kernel at all (except the
macros dependency stated above), that we could still enable the MQ feature
even the host kernel has no such support.

The right fix is to define the macro if it's not defined.

Fixes: 71dfdbe66a66 ("vhost: fix build with kernel < 3.8")
Cc: stable@dpdk.org

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 08:58:54 +02:00
Yong Wang
5731d6acc7 vhost: fix memory leak
In function vhost_new_device(), current code dose not free 'dev'
in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
memory leak.

Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")
Cc: stable@dpdk.org

Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-17 09:20:17 +01:00
Yuanhan Liu
164a601b52 vhost: remove references to vhost-cuse
vhost-cuse is removed, update corresponding comments that are still
referencing it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2016-11-07 15:59:21 +01:00
Zhihong Wang
f689586bc0 vhost: shadow used ring update
The basic idea is to shadow the used ring update: update them into a
local buffer first, and then flush them all to the virtio used vring
at once in the end.

And since we do avail ring reservation before enqueuing data, we would
know which and how many descs will be used. Which means we could update
the shadow used ring at the reservation time. It also introduce another
slight advantage: we don't need access the desc->flag any more inside
copy_mbuf_to_desc_mergeable().

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
9ba1e744ab vhost: add a flag to enable dequeue zero copy
Dequeue zero copy is disabled by default. Here add a new flag
``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-13 10:29:20 +02:00
Yuanhan Liu
b0a985d1f3 vhost: add dequeue zero copy
The basic idea of dequeue zero copy is, instead of copying data from
the desc buf, here we let the mbuf reference the desc buf addr directly.

Doing so, however, has one major issue: we can't update the used ring
at the end of rte_vhost_dequeue_burst. Because we don't do the copy
here, an update of the used ring would let the driver to reclaim the
desc buf. As a result, DPDK might reference a stale memory region.

To update the used ring properly, this patch does several tricks:

- when mbuf references a desc buf, refcnt is added by 1.

  This is to pin lock the mbuf, so that a mbuf free from the DPDK
  won't actually free it, instead, refcnt is subtracted by 1.

- We chain all those mbuf together (by tailq)

  And we check it every time on the rte_vhost_dequeue_burst entrance,
  to see if the mbuf is freed (when refcnt equals to 1). If that
  happens, it means we are the last user of this mbuf and we are
  safe to update the used ring.

- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
  right desc idx.

Dequeue zero copy is introduced for performance reason, and some rough
tests show about 50% perfomance boost for packet size 1500B. For small
packets, (e.g. 64B), it actually slows a bit down (well, it could up to
15%). That is expected because this patch introduces some extra works,
and it outweighs the benefit from saving few bytes copy.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-12 09:45:14 +02:00