When live migration starts, QEMU will set ring addrs again for
each virtqueue. In this case, we should try to translate ring
addrs after we invalidating the ring, otherwise virtqueues can
be enabled with the addrs untranslated. Besides, also leverage
the access_ok flag in non-IOMMU case to prevent the data path
accessing invalidated virtqueues.
Fixes: 5a4933e56b ("vhost: postpone ring address translations at kick time only")
Cc: stable@dpdk.org
Reported-by: Yilong Lv <lvyilong.lyl@alibaba-inc.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When the device has been started, don't do the reallocation anymore.
Otherwise the pointers used in application threads can be invalidated
without proper protection. Instead of introducing a global lock to
protect the change of device pointers which will hurt the performance,
let's just do the reallocation during setup.
Fixes: af295ad469 ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This function is listed under EXPERIMENTAL in the
rte_vhost_version.map, so it needs to be marked
with __rte_experimental in the header file as well.
Found by check-experimental-syms.sh when trying to compile
DPDK with -finstrument-functions. This script didn't
catch this in the normal case, since the function is
declared __rte_always_inline.
This also requires updating the vhost_scsi example to allow
use of this newly marked experimental API.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We need to close the old slave request fd if any first
before taking the new one.
Fixes: 275c3f9447 ("vhost: support slave requests channel")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds an operation callback which gets called every time
the library is waking up the guest trough an eventfd_write() call.
This can be used by 3rd party application, like OVS, to track the
number of times interrupts where generated. This might be of
interest to find out system-call were called in the fast path.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Having this info logged by default when analysing bug reports
has proved to be useful.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
For each library where we optionally disable it, add in the reason why it's
being disabled, so the user knows how to fix it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Putting a '__attribute__((deprecated))' in the middle of a function
prototype does not result in the expected result with gcc (while clang
is fine with this syntax).
$ cat deprecated.c
void * __attribute__((deprecated)) incorrect() { return 0; }
__attribute__((deprecated)) void *correct(void) { return 0; }
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
$ gcc -o deprecated.o -c deprecated.c
deprecated.c: In function ‘main’:
deprecated.c:3:1: warning: ‘correct’ is deprecated (declared at
deprecated.c:2) [-Wdeprecated-declarations]
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
^
Move the tag on a separate line and make it the first thing of function
prototypes.
This is not perfect but we will trust reviewers to catch the other not
so easy to detect patterns.
sed -i \
-e '/^\([^#].*\)\?__rte_experimental */{' \
-e 's//\1/; s/ *$//; i\' \
-e __rte_experimental \
-e '/^$/d}' \
$(git grep -l __rte_experimental -- '*.h')
Special mention for rte_mbuf_data_addr_default():
There is either a bug or a (not yet understood) issue with gcc.
gcc won't drop this inline when unused and rte_mbuf_data_addr_default()
calls rte_mbuf_buf_addr() which itself is experimental.
This results in a build warning when not accepting experimental apis
from sources just including rte_mbuf.h.
For this specific case, we hide the call to rte_mbuf_buf_addr() under
the ALLOW_EXPERIMENTAL_API flag.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
We had some inconsistencies between functions prototypes and actual
definitions.
Let's avoid this by only adding the experimental tag to the prototypes.
Tests with gcc and clang show it is enough.
git grep -l __rte_experimental |grep \.c$ |while read file; do
sed -i -e '/^__rte_experimental$/d' $file;
sed -i -e 's/ *__rte_experimental//' $file;
sed -i -e 's/__rte_experimental *//' $file;
done
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This patch fixes the inferred misuse of enum of crypto algorithms.
Coverity issue: 325879
Fixes: e80a987081 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes a few same class bugs that causes the
logically dead code in vhost_crypto.
Coverity issue: 277236, 277233, 277220, 277214
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add a missing include with the defines for vhost-user driver features.
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Cc: stable@dpdk.org
Signed-off-by: Noa Ezra <noae@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Matan Azrad <matan@mellanox.com>
Now that we have a single function to map the descriptors
buffers, let's prefetch them there as it is the earliest
place we can do it.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Handling of fragmented virtio-net header and indirect descriptors
tables was implemented to fix CVE-2018-1059. It should never
happen with healthy guests and so is already considered as
unlikely code path.
This patch moves these bits into non-inline dedicated functions
to reduce the I-cache pressure.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
At runtime either packed Tx/Rx functions will always be called,
or split Tx/Rx functions will always be called.
This patch removes the forced inlining in order to reduce
the I-cache pressure.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
In order to reduce the I-cache pressure, this patch removes
the inlining of the dirty pages logging functions, that we
can consider as cold path.
Indeed, these functions are only called while doing live
migration, so not called most of the time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Since we now always use _FILE_OFFSET_BITS=64 flag when building
DPDK, we can remove the Makefile and C-file #defines setting it
individually for parts of the build.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we change these macros, we might as well avoid triggering complaints
from checkpatch because of mixed case.
old=RTE_IPv4
new=RTE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv4
new=RTE_ETHER_TYPE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv6
new=RTE_ETHER_TYPE_IPV6
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
vhost should notify the application in case of all vring state changes.
In general, application should not care about negotiation of
VHOST_USER_F_PROTOCOL_FEATURES. Protocol details like this should
be hidden by the vhost library.
With this patch applications like OVS will be able to assume that
all vrings disabled by default and only process 'vring_state_changed'
events.
Fixes: 321203a54b ("vhost: enable rings at the right time")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Application should be able to obtain information like 'ifname' from
the 'vid' passed to 'destroy_connection' callback. Currently, all the
API calls with passed 'vid' fails with 'device not found'.
Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Null value for parameters will cause segfault.
Fixes: d7280c9fff ("vhost: support selective datapath")
Fixes: 72e8543093 ("vhost: add API to get MTU value")
Fixes: a277c71598 ("vhost: refactor code structure")
Fixes: ca33faf9ef ("vhost: introduce API to fetch negotiated features")
Fixes: eb32247457 ("vhost: export guest memory regions")
Fixes: 40ef286f23 ("vhost: export vhost vring info")
Fixes: bd2e0c3fe5 ("vhost: add APIs for live migration")
Fixes: 0b8572a0c1 ("vhost: add external message handling to the API")
Fixes: b4953225ce ("vhost: add APIs for datapath configuration")
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Fixes: 292959c719 ("vhost: cleanup unix socket")
Cc: stable@dpdk.org
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Need to destroy allocated device if application fails to
add new connection or we have fdset failure.
Fixes: acbff5c67e ("vhost: fix crash when exceeding file descriptors")
Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Define variables for "is_linux", "is_freebsd" and "is_windows"
to make the code shorter for comparisons and more readable.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
External backends may have specific requests to handle, and so
we don't want the vhost-user lib to handle these requests as
errors.
This patch also changes the experimental API by introducing
RTE_VHOST_MSG_RESULT_NOT_HANDLED so that vhost-user lib
can report an error if a message is handled neither by
the vhost-user library nor by the external backend.
The logic changes a bit so that if the callback returns
with ERR, OK or REPLY, it is considered the message
is handled by the external backend so it won't be
handled by the vhost-user library.
It is still possible for an external backend to listen
to requests that have to be handled by the vhost-user
library like SET_MEM_TABLE, but the callback have to
return NOT_HANDLED in that case.
Vhost-crypto backend is also adapted to this API change.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
rte_vhost_driver_set_protocol_features API is to be used
by external backends to advertise vhost-user protocol
features it supports.
It has to be called after rte_vhost_driver_register() and
before rte_vhost_driver_start().
Example of usage to advertize VHOST_USER_PROTOCOL_F_FOOBAR
protocol feature:
const char *path = "/tmp/vhost-user";
uint64_t protocol_features;
rte_vhost_driver_register(path, 0);
rte_vhost_driver_get_protocol_features(path, &protocol_features);
protocol_features |= VHOST_USER_PROTOCOL_F_FOOBAR;
rte_vhost_driver_set_protocol_features(path, protocol_features);
rte_vhost_driver_start(path);
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The VIRTIO_RING_F_EVENT_IDX feature of split ring might
be broken, as the value of signalled_used is invalid
after live migration, start up and virtio driver reload.
This patch fixes it by using signalled_used_valid.
In addition, this patch makes the VIRTIO_RING_F_EVENT_IDX
implementation of split ring match kernel backend to suppress
more interrupts.
Fixes: e37ff95440 ("vhost: support virtqueue interrupt/notification suppression")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The vhost-user spec says that once the vring is disabled, the
client has to stop processing it. But it can happen when
dequeue zero-copy is enabled if outstanding descriptors buffers
are still being processed by an external NIC or another guest.
The fix consists in draining the zmbufs list to ensure no more
descriptors buffers are in the wild.
Note that this fix is only working in the case REPLY_ACK
protocol feature is enabled, which is not the case by default
for now (it is only enabled when IOMMU feature is enabled in
the vhost library).
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The rte_vhost API to put data into virtqueues operates
on mbufs and hence it is strictly vhost-net specific.
External backends need to implement virtqueue handling
from scratch and that's just not possible without APIs
to get/set vring base addresses.
Those relevant APIs are there, but they have a check that
prevents them from working with any non-vhost-net device.
This patch removes those checks.
rte_vhost_get_log_base() is not necessarily needed for
external backends, as other, higher level vhost APIs for
live migration are available and could be used instead.
We remove the extra check from it anyway for consistency.
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reclaim outstanding zmbufs first before freeing memory regions,
otherwise there could be use-after-free.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Don't free the zero copy mbufs before they have been consumed,
otherwise there could be use-after-free.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The mbufs should also be restored in free_zmbufs().
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Fixes: 3ebd930588 ("vhost: fix mbuf free")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
sprintf function is not secure as it doesn't check the length of string.
More secure function snprintf is used.
Fixes: d7280c9fff ("vhost: support selective datapath")
Cc: stable@dpdk.org
Signed-off-by: Pallantla Poornima <pallantlax.poornima@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
In rte_vhost_driver_unregister(), the connection fd is
removed from the fdset using fdset_try_del(). Call to
this function may fail if the corresponding fd is in
busy state, indicating that event dispatcher is
executing the read or write callback on this fd.
When it happens, rte_vhost_driver_unregister() keeps
trying to remove the fd from the set until it is no
more busy.
This situation is causing a deadlock, because
rte_vhost_driver_unregister() keeps trying to remove
the fd from the set with vhost_user.mutex held, while
the callback executed by the dispatcher,
vhost_user_read_cb(), also takes this mutex at
numerous places.
The fix consists in releasing vhost_user.mutex between
each retry in vhost_driver_unregister().
Fixes: 8b4b949144 ("vhost: fix dead lock on closing in server mode")
Cc: stable@dpdk.org
Signed-off-by: Wenjie Sun <findtheonlyway@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
External message callbacks are used e.g. by vhost crypto
to parse crypto-specific vhost-user messages.
We are now publishing the API to register those callbacks,
so that other backends outside of DPDK can use them as well.
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We don't need to relay available ring and check the desc, vdpa device
can access the available ring in the guest directly. With this patch,
we can achieve better throughput and lower CPU usage.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fix a possible out of bound access which may happen when handling
indirect descs in split ring.
Fixes: 1be4ebb1c4 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org
Reported-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_user_host_notifier_ctrl is not existed anymore, its statement in
header file should be removed accordingly.
Fixes: 43f34e3566 ("vhost: provide helper for host notifier ctrl")
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
As qemu will only send VHOST_USER_SET_VRING_ENABLE message for guest
enabled vrings (only first queue pair will be enabled at initialized
stage), this will cause trouble for multiqueue case, vDPA's dev_conf
callback will get no chance be invoked. Decouple the dev_conf callback from
VHOST_USER_SET_VRING_ENABLE solves this issue.
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes a out of bound access possbility in vhost
crypto. Originally the incorrect next descriptor index may
cause the library read invalid memory content and crash
the application.
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes a possible infinite loop caused by incorrect
descriptor chain created by the driver.
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fixes: 30920b1e2b ("vhost: ensure all range is mapped when translating QVAs")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fix a possible dead loop which may happen, e.g. when driver
created a loop in the desc list and lens in descs are zero.
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fixes: 7f74b95c44 ("vhost: pre update used ring for Tx and Rx")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fix a possible dead loop which may happen, e.g. when
driver created a loop in the desc list.
Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Descs in desc table should be indexed using the desc idx
instead of the idx of avail ring and used ring.
Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch uses the two session mempool approach to vhost crypto.
One mempool is for session header objects, and the other is for
session private data.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.
Also, move the write barrier in logging cache sync, so that it
is done only when logging is enabled. It means there is now
one more barrier for split ring when logging is enabled.
With Kernel's pktgen benchmark, ~3% performance gain is measured.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
This prefetch does not show any performance improvement.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
This patch moves the prefetch after the available index
is read to avoid prefetching a descriptor not available yet.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
A read barrier is required to ensure that the ordering between
descriptor's flags and content reads is enforced.
1. read flags = desc->flags
if (flags & AVAIL_BIT)
2. read desc->id
There is a control dependency between steps 1 and step 2.
2 could be speculatively executed before 1, which could result
in 'id' to not be updated yet.
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
A read barrier is required to ensure the ordering between
available index and the descriptor reads is enforced.
1. read avail_head = avail->idx
2. read cur_idx = last_avail_idx
if (cur_idx != avail_head) {
3. read idx = avail->ring[cur_idx]
4. read desc[idx]
}
There is a control dependency between step 1 and steps 3 & 4,
3 could be speculatively executed before 1, which could result
in 'idx' to not being updated yet.
Fixes: 4796ad63ba ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.
The available ring relay will synchronize the available entries, and
help to do desc validity checking.
The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.
The later patch will leverage these two helpers.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
fdset_add can call fdset_shrink_nolock which call fdset_move
concurrently to poll that is call in fdset_event_dispatch.
This patch add a mutex to protect poll from been call at the same time
fdset_add call fdset_shrink_nolock.
Fixes: 1b815b8959 ("vhost: try to shrink pfdset when fdset_add fails")
Cc: stable@dpdk.org
Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Flags could be updated in a separate process leading to the
inconsistent check.
Additionally, read marked as 'volatile' to highlight the shared
nature of the variable and avoid such issues in the future.
Fixes: d3211c98c4 ("vhost: add helpers for packed virtqueues")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If mmap() call fails in vhost_user_set_mem_table, dev->mem
is set to NULL. If later, qva_to_vva() is called, a segfault
occurs.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
The packed ring defines were declared only if kernel
header does not declare them.
The problem is that they are not applied in upstream kernel,
and some changes in the names have been required.
This patch declares the defines unconditionally, which
fixes potential build issues.
Fixes: 297b1e7350 ("vhost: add virtio packed virtqueue defines")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The caller will guarantee that msg won't be null. Remove
the unneeded null pointer check which caused a Coverity
warning.
Coverity issue: 323484
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes the incorrect packet content copy in the
chaining mode. Originally the content before cipher offset is
overwritten by all zeros. This patch fixes the problem by
making sure the correct write back source and destination
settings during set up.
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We should apply for RO access when receiving packets from the
VM and apply for RW access when sending packets to the VM.
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
For packed ring layout, we need save avail index and its wrap
counter value. At restore time, the used index and its wrap counter
are set to available's ones, as the ring procressing is stopped
at vring base get time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Currently, postcopy_ufd is initialized to 0 implicitly, so fd 0
could be closed unexpectedly by vhost_backend_cleanup(). Fix this
issue by initializing postcopy_ufd to -1 explicitly.
Fixes: 9eefef3b59 ("vhost: introduce postcopy advise message")
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In both split and packed dequeue paths, flush_shadow_used_ring
and vhost_ring_call variants gets called even if not packets
have been dequeued, and so no descriptors updates happened.
It has an impact on CPU pipeline, as memory barriers are used
in these functions.
This patch don't call these functions if no descriptors have
been dequeued. The performance gain with split ring when
dequeue zero-copy is disabled should be null, but should be
noticeable with packed ring or dequeue zero-copy enabled.
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Fixes: 915cf94042 ("vhost: use shadow used ring in dequeue path")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
We should return the length of the buffers described by
the current descriptor chain after filling the buffer
vector. So we need to zero the *len first.
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently it's not possible to build DPDK as shared library with
cryptodev disabled since vhost is trying to link with rte_crypto,
but rte_crypto and rte_hash are only needed when you build vhost_crypto
and so only when cryptodev is enabled.
This patch fix this by linking rte_vhost with rte_crypto and rte_hash
only when cryptodev is enabled.
Fixes: b4ca812986 ("vhost/crypto: fix build without cryptodev")
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Postcopy live-migration feature requires the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.
The application should only enable the feature if it does not
force the guest memory to be populated.
In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.
For the same reason, postcopy and dequeue zero copy features
are not compatible, so don't advertize postcopy support if
dequeue zero copy is requested.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.
The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.
This is not very important, but next, we'll need to update the
payload and send it back to Qemu.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.
This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.
With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.
With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Passing userfault fds to Qemu will be required for postcopy
live-migration feature.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As soon as some ancillary data (fds) are received, it is copied
without checking its length.
This patch adds the number of fds received to the message,
which is set in read_vhost_message().
This is preliminary work to support sending fds to Qemu.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.
Fixes: d5022533c2 ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.
Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.
The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.
Fixes: 0bff510b5e ("vhost: unify message handling function signature")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.
External post and pre callbacks return type needs also to be
changed to the new enum, so that its handling is consistent.
This is done in this patch alongside with the convertion of
its only user, vhost-crypto backend.
Fixes: 0bff510b5e ("vhost: unify message handling function signature")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As APIs in rte_vdpa.h are public, we need to add doxygen comments
to all APIs and structures.
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The notification can't be disabled in packed ring when
application tries to disable notification, because the
device event flags field is overwritten by an unexpected
value. This patch fixes this issue.
Fixes: b1cce26af1 ("vhost: add notification for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
It's used to get number of available registered vDPA devices.
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When performing enqueue operations on the split and packed rings,
if the reserved buffer length from the descriptor table exceeds
65535, the returned length by fill_vec_buf_split/_packed()
overflows. This patch is to avoid this corner case.
Fixes: f689586bc0 ("vhost: shadow used ring update")
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Fixes: 37f5e79a27 ("vhost: add shadow used ring support for packed rings")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.
Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.
Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.
Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As VhostUserMsg structure is reused to generate the reply, move the
relevant fields update into the respective message handling functions.
Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Do not use the typedef version of struct VhostUserMsg. Also unify the
related parameter name.
Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
There are a lot of cases where vhost-user massage handling
could fail and end up in a fully not recoverable state. For
example, allocation failures of shadow used ring and batched
copy array are not recoverable and leads to the segmentation
faults like this on the receiving/transmission path:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f913fecf0 (LWP 43625)]
in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760
760 batch_copy[vq->batch_copy_nb_elems].dst =
This could be easily reproduced in case of low memory or big
number of vhost-user ports.
Fix that by propagating error to the upper layer which will
end up with disconnection in case we can not report to
the message sender when the error happens.
Fixes: f689586bc0 ("vhost: shadow used ring update")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When VIRTIO_RING_F_EVENT_IDX is negotiated, we need to
update the avail event to enable the notification.
Fixes: 3f8ff12821 ("vhost: support interrupt mode")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
'numa_realloc()' allocates 'zmbufs' even if zero copy mode
is not configured. This leads to memory leak, because array
is freed only for zero copy case.
Fixes: 2651726def ("vhost: do deep copy while reallocating queue")
CC: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
It's a common case that 'get_mempolicy' fails on systems
without NUMA support. No need to flag an error in log for
this situation.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
IOTLB entries contain the host virtual address of the guest
pages. When receiving a new VHOST_USER_SET_MEM_TABLE request,
the previous regions get unmapped, so the IOTLB entries, if any,
will be invalid. It does cause the vhost-user process to
segfault.
This patch introduces a new function to flush the IOTLB cache,
and call it as soon as the backend handles a VHOST_USER_SET_MEM
request.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
The shadow used ring's size is the same as the vq's size,
so we shouldn't try more than "vq size" times. Besides,
the element pointed by avail->idx isn't available to the
device, so we will return error when try "vq size" times.
Fixes: 24e4844048 ("vhost: unify Rx mergeable and non-mergeable paths")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
This patch fixes the incorrect return value for rte_vhost_dequeue_burst()
when virtqueue is not enabled or virtqueue address translation fails.
Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Add code to set up packed queues when enabled.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Add some helper functions to check descriptor flags
and check if a vring is of type packed.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
This is an optimization to prefetch next buffer while the
current one is being processed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
This is an optimization to prefetch next buffer while the
current one is being processed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
To ease packed ring layout integration, this patch makes
the dequeue path to re-use buffer vectors implemented for
enqueue path.
Doing this, copy_desc_to_mbuf() is now ring layout type
agnostic.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Relax used ring contention by reusing the shadow used
ring feature used by enqueue path.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
If devices always use descriptors in the same order in which they have
been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the use
of a batch of buffers to virtio driver by only writing used ring index.
Vhost user device has supported this feature by default. If vhost
dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t
assure that descriptors returned from NIC are in order.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The log_cache_nb_elem was never incremented, resulting
in all dirty pages to be missed during live migration.
Fixes: c16915b871 ("vhost: improve dirty pages logging performance")
Cc: stable@dpdk.org
Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
vhost_vring_call() used rte_mb(), which translates into
mfence instruction on x86.
This patch changes to use rte_smp_mb(), which changed recently
to translate into a locked ADD instruction for performance
reason.
The measured gain is up to 3% with the testpmd benchmarks.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Introduce an new common helper to avoid redundancy.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When a vDPA device is attached, vhost user will try to
register host notifiers to QEMU to allow notifications
to be delivered between the driver in the guest and the
vDPA device in the host directly.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Make sure find avalid device id before allocating
virtio_net, if not, return directly. It may avoid
allocating and freeing virtio_net when there is
not valid device id.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Instead of copying batch_copy_nb_elems into the stack,
this patch uses it directly.
Small performance gain of 3% is seen when running PVP
benchmark.
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch reworks the vhost enqueue path so that a single
code path is used for both Rx mergeable or non-mergeable cases.
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch caches all dirty pages logging until the used ring index
is updated.
The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurrent accesses,
contention was induced.
With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 or 64 (depending on CPU) pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.
Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
This patch enables the handling of buffers non-contiguous in
virtual address space in the vhost_crypto. Instead of using
rte_vhost_va_from_guest_pa(), the host virtual address is
converted by vhost_iova_to_vva() for wider use cases.
For copy mode, the copy length is limited to the chunk size,
next chunks VAs being fetched afterward.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>