Commit Graph

654 Commits

Author SHA1 Message Date
Tiwei Bie
72d002b3eb vhost: fix vring address handling during live migration
When live migration starts, QEMU will set ring addrs again for
each virtqueue. In this case, we should try to translate ring
addrs after we invalidating the ring, otherwise virtqueues can
be enabled with the addrs untranslated. Besides, also leverage
the access_ok flag in non-IOMMU case to prevent the data path
accessing invalidated virtqueues.

Fixes: 5a4933e56b ("vhost: postpone ring address translations at kick time only")
Cc: stable@dpdk.org

Reported-by: Yilong Lv <lvyilong.lyl@alibaba-inc.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Tiwei Bie
37f7c1b609 vhost: forbid reallocation when running
When the device has been started, don't do the reallocation anymore.
Otherwise the pointers used in application threads can be invalidated
without proper protection. Instead of introducing a global lock to
protect the change of device pointers which will hurt the performance,
let's just do the reallocation during setup.

Fixes: af295ad469 ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org

Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Jim Harris
61af1713d3 vhost: add missing experimental flag
This function is listed under EXPERIMENTAL in the
rte_vhost_version.map, so it needs to be marked
with __rte_experimental in the header file as well.

Found by check-experimental-syms.sh when trying to compile
DPDK with -finstrument-functions.  This script didn't
catch this in the normal case, since the function is
declared __rte_always_inline.

This also requires updating the vhost_scsi example to allow
use of this newly marked experimental API.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:57 +02:00
Tiwei Bie
761d57651c vhost: fix slave request fd leak
We need to close the old slave request fd if any first
before taking the new one.

Fixes: 275c3f9447 ("vhost: support slave requests channel")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:53 +02:00
Eelco Chaudron
039253166a vhost: add device op when notification to guest is sent
This patch adds an operation callback which gets called every time
the library is waking up the guest trough an eventfd_write() call.

This can be used by 3rd party application, like OVS, to track the
number of times interrupts where generated. This might be of
interest to find out system-call were called in the fast path.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-07 15:00:53 +02:00
Maxime Coquelin
bff3b9a80e vhost: replace IOTLB license with SPDX tag
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2019-08-05 16:06:11 +02:00
Maxime Coquelin
906accb60b vhost: log virtio and vhost-user negotiated features
Having this info logged by default when analysing bug reports
has proved to be useful.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-07-08 21:26:52 +02:00
Bruce Richardson
759a5fb18e lib: add reasons for components being disabled
For each library where we optionally disable it, add in the reason why it's
being disabled, so the user knows how to fix it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-07-02 23:21:05 +02:00
David Marchand
18218713bf enforce experimental tag at beginning of declarations
Putting a '__attribute__((deprecated))' in the middle of a function
prototype does not result in the expected result with gcc (while clang
is fine with this syntax).

$ cat deprecated.c
void * __attribute__((deprecated)) incorrect() { return 0; }
__attribute__((deprecated)) void *correct(void) { return 0; }
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
$ gcc -o deprecated.o -c deprecated.c
deprecated.c: In function ‘main’:
deprecated.c:3:1: warning: ‘correct’ is deprecated (declared at
deprecated.c:2) [-Wdeprecated-declarations]
 int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
 ^

Move the tag on a separate line and make it the first thing of function
prototypes.
This is not perfect but we will trust reviewers to catch the other not
so easy to detect patterns.

sed -i \
     -e '/^\([^#].*\)\?__rte_experimental */{' \
     -e 's//\1/; s/ *$//; i\' \
     -e __rte_experimental \
     -e '/^$/d}' \
     $(git grep -l __rte_experimental -- '*.h')

Special mention for rte_mbuf_data_addr_default():

There is either a bug or a (not yet understood) issue with gcc.
gcc won't drop this inline when unused and rte_mbuf_data_addr_default()
calls rte_mbuf_buf_addr() which itself is experimental.
This results in a build warning when not accepting experimental apis
from sources just including rte_mbuf.h.

For this specific case, we hide the call to rte_mbuf_buf_addr() under
the ALLOW_EXPERIMENTAL_API flag.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-06-29 19:04:48 +02:00
David Marchand
cfe3aeb170 remove experimental tags from all symbol definitions
We had some inconsistencies between functions prototypes and actual
definitions.
Let's avoid this by only adding the experimental tag to the prototypes.
Tests with gcc and clang show it is enough.

git grep -l __rte_experimental |grep \.c$ |while read file; do
	sed -i -e '/^__rte_experimental$/d' $file;
	sed -i -e 's/  *__rte_experimental//' $file;
	sed -i -e 's/__rte_experimental  *//' $file;
done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-06-29 19:04:43 +02:00
Fan Zhang
4349d412af vhost/crypto: fix inferred misuse of enum
This patch fixes the inferred misuse of enum of crypto algorithms.

Coverity issue: 325879
Fixes: e80a987081 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-06-20 23:42:04 +02:00
Fan Zhang
1a92c9632b vhost/crypto: fix logically dead code
This patch fixes a few same class bugs that causes the
logically dead code in vhost_crypto.

Coverity issue: 277236, 277233, 277220, 277214
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-06-20 23:42:04 +02:00
Noa Ezra
346c2c954e vhost: fix missing include
Add a missing include with the defines for vhost-user driver features.

Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Cc: stable@dpdk.org

Signed-off-by: Noa Ezra <noae@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Matan Azrad <matan@mellanox.com>
2019-06-20 23:42:04 +02:00
Maxime Coquelin
d1134c09e3 vhost: simplify descriptor buffer prefetching
Now that we have a single function to map the descriptors
buffers, let's prefetch them there as it is the earliest
place we can do it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-06-13 23:54:29 +09:00
Maxime Coquelin
084fac96ca vhost: do not inline unlikely fragmented buffers code
Handling of fragmented virtio-net header and indirect descriptors
tables was implemented to fix CVE-2018-1059. It should never
happen with healthy guests and so is already considered as
unlikely code path.

This patch moves these bits into non-inline dedicated functions
to reduce the I-cache pressure.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-06-13 23:54:29 +09:00
Maxime Coquelin
5a5f6e78b2 vhost: do not inline packed and split functions
At runtime either packed Tx/Rx functions will always be called,
or split Tx/Rx functions will always be called.

This patch removes the forced inlining in order to reduce
the I-cache pressure.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-06-13 23:54:29 +09:00
Maxime Coquelin
094b643d9b vhost: un-inline dirty pages logging functions
In order to reduce the I-cache pressure, this patch removes
the inlining of the dirty pages logging functions, that we
can consider as cold path.

Indeed, these functions are only called while doing live
migration, so not called most of the time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-06-13 23:54:29 +09:00
Bruce Richardson
f5256f8dfd build: remove unnecessary large file support defines
Since we now always use _FILE_OFFSET_BITS=64 flag when building
DPDK, we can remove the Makefile and C-file #defines setting it
individually for parts of the build.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-06-03 23:55:50 +02:00
David Marchand
0c9da7555d net: replace IPv4/v6 constants with uppercase name
Since we change these macros, we might as well avoid triggering complaints
from checkpatch because of mixed case.

old=RTE_IPv4
new=RTE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"

old=RTE_ETHER_TYPE_IPv4
new=RTE_ETHER_TYPE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"

old=RTE_ETHER_TYPE_IPv6
new=RTE_ETHER_TYPE_IPV6
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2019-06-03 16:54:54 +02:00
Olivier Matz
e73e3547ce net: add rte prefix to UDP structure
Add 'rte_' prefix to structures:
- rename struct udp_hdr as struct rte_udp_hdr.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:46 +02:00
Olivier Matz
f41b5156fe net: add rte prefix to TCP structure
Add 'rte_' prefix to structures:
- rename struct tcp_hdr as struct rte_tcp_hdr.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:46 +02:00
Olivier Matz
09d9ae1ac9 net: add rte prefix to SCTP structure
Add 'rte_' prefix to structures:
- rename struct sctp_hdr as struct rte_sctp_hdr.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:46 +02:00
Olivier Matz
a7c528e5d7 net: add rte prefix to IP structure
Add 'rte_' prefix to structures:
- rename struct ipv4_hdr as struct rte_ipv4_hdr.
- rename struct ipv6_hdr as struct rte_ipv6_hdr.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:46 +02:00
Olivier Matz
35b2d13fd6 net: add rte prefix to ether defines
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.

Do not update the command line library to avoid adding a dependency to
librte_net.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:45 +02:00
Olivier Matz
6d13ea8e8e net: add rte prefix to ether structures
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.

Do not update the command line library to avoid adding a dependency to
librte_net.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:45 +02:00
John McNamara
8bd5f07c7a doc: fix spelling reported by aspell in comments
Fix spelling errors in the doxygen docs.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
2019-05-03 00:38:14 +02:00
Stephen Hemminger
2aa0db405f drivers: remove blank line at EOF
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-22 13:24:59 +02:00
Ilya Maximets
966027b4b3 vhost: fix silent queue enabling with legacy guests
vhost should notify the application in case of all vring state changes.

In general, application should not care about negotiation of
VHOST_USER_F_PROTOCOL_FEATURES. Protocol details like this should
be hidden by the vhost library.

With this patch applications like OVS will be able to assume that
all vrings disabled by default and only process 'vring_state_changed'
events.

Fixes: 321203a54b ("vhost: enable rings at the right time")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-04-19 14:51:54 +02:00
Ilya Maximets
2344395df2 vhost: fix passing destroyed device to destroy callback
Application should be able to obtain information like 'ifname' from
the 'vid' passed to 'destroy_connection' callback. Currently, all the
API calls with passed 'vid' fails with 'device not found'.

Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-04-19 14:51:54 +02:00
Mohammad Abdul Awal
b045785f92 vhost: fix null pointer checking
Null value for parameters will cause segfault.

Fixes: d7280c9fff ("vhost: support selective datapath")
Fixes: 72e8543093 ("vhost: add API to get MTU value")
Fixes: a277c71598 ("vhost: refactor code structure")
Fixes: ca33faf9ef ("vhost: introduce API to fetch negotiated features")
Fixes: eb32247457 ("vhost: export guest memory regions")
Fixes: 40ef286f23 ("vhost: export vhost vring info")
Fixes: bd2e0c3fe5 ("vhost: add APIs for live migration")
Fixes: 0b8572a0c1 ("vhost: add external message handling to the API")
Fixes: b4953225ce ("vhost: add APIs for datapath configuration")
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Fixes: 292959c719 ("vhost: cleanup unix socket")
Cc: stable@dpdk.org

Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-04-19 14:51:54 +02:00
Ilya Maximets
a21510e750 vhost: fix device leak on connection add failure
Need to destroy allocated device if application fails to
add new connection or we have fdset failure.

Fixes: acbff5c67e ("vhost: fix crash when exceeding file descriptors")
Fixes: efba12a78d ("vhost: add user callbacks for socket open/close")
Cc: stable@dpdk.org

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-04-19 14:51:54 +02:00
Bruce Richardson
adf93ca564 build: increase readability via shortcut variables
Define variables for "is_linux", "is_freebsd" and "is_windows"
to make the code shorter for comparisons and more readable.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-04-17 18:09:52 +02:00
Fan Zhang
bc5560c15e vhost/crypto: fix parens
Coverity issue: 277214, 277220, 277233, 277236
Fixes: cd1e8f03ab ("vhost/crypto: fix packet copy in chaining mode")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-03-29 17:25:32 +01:00
Maxime Coquelin
3e0396166b vhost: support requests only handled by external backend
External backends may have specific requests to handle, and so
we don't want the vhost-user lib to handle these requests as
errors.

This patch also changes the experimental API by introducing
RTE_VHOST_MSG_RESULT_NOT_HANDLED so that vhost-user lib
can report an error if a message is handled neither by
the vhost-user library nor by the external backend.

The logic changes a bit so that if the callback returns
with ERR, OK or REPLY, it is considered the message
is handled by the external backend so it won't be
handled by the vhost-user library.
It is still possible for an external backend to listen
to requests that have to be handled by the vhost-user
library like SET_MEM_TABLE, but the callback have to
return NOT_HANDLED in that case.

Vhost-crypto backend is also adapted to this API change.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00
Maxime Coquelin
9401b80327 vhost: add API to set protocol features flags
rte_vhost_driver_set_protocol_features API is to be used
by external backends to advertise vhost-user protocol
features it supports.

It has to be called after rte_vhost_driver_register() and
before rte_vhost_driver_start().

Example of usage to advertize VHOST_USER_PROTOCOL_F_FOOBAR
protocol feature:

const char *path = "/tmp/vhost-user";
uint64_t protocol_features;
rte_vhost_driver_register(path, 0);
rte_vhost_driver_get_protocol_features(path, &protocol_features);
protocol_features |= VHOST_USER_PROTOCOL_F_FOOBAR;
rte_vhost_driver_set_protocol_features(path, protocol_features);
rte_vhost_driver_start(path);

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00
Jiayu Hu
2f706027c8 vhost: fix interrupt suppression for the split ring
The VIRTIO_RING_F_EVENT_IDX feature of split ring might
be broken, as the value of signalled_used is invalid
after live migration, start up and virtio driver reload.
This patch fixes it by using signalled_used_valid.

In addition, this patch makes the VIRTIO_RING_F_EVENT_IDX
implementation of split ring match kernel backend to suppress
more interrupts.

Fixes: e37ff95440 ("vhost: support virtqueue interrupt/notification suppression")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00
Maxime Coquelin
11d5253a3e vhost: prevent disabled rings to be processed with zero-copy
The vhost-user spec says that once the vring is disabled, the
client has to stop processing it. But it can happen when
dequeue zero-copy is enabled if outstanding descriptors buffers
are still being processed by an external NIC or another guest.

The fix consists in draining the zmbufs list to ensure no more
descriptors buffers are in the wild.

Note that this fix is only working in the case REPLY_ACK
protocol feature is enabled, which is not the case by default
for now (it is only enabled when IOMMU feature is enabled in
the vhost library).

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-20 18:15:42 +01:00
Darek Stojaczyk
c19429844c vhost: remove vhost-net requirements from generic APIs
The rte_vhost API to put data into virtqueues operates
on mbufs and hence it is strictly vhost-net specific.
External backends need to implement virtqueue handling
from scratch and that's just not possible without APIs
to get/set vring base addresses.

Those relevant APIs are there, but they have a check that
prevents them from working with any non-vhost-net device.
This patch removes those checks.

rte_vhost_get_log_base() is not necessarily needed for
external backends, as other, higher level vhost APIs for
live migration are available and could be used instead.
We remove the extra check from it anyway for consistency.

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-03-01 18:17:36 +01:00
Tiwei Bie
2a2904fa9c vhost: fix potential use-after-free for memory region
Reclaim outstanding zmbufs first before freeing memory regions,
otherwise there could be use-after-free.

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-03-01 18:17:36 +01:00
Tiwei Bie
d767436ee5 vhost: fix potential use-after-free for zero copy mbuf
Don't free the zero copy mbufs before they have been consumed,
otherwise there could be use-after-free.

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-03-01 18:17:36 +01:00
Tiwei Bie
041d37b2ef vhost: restore mbuf first when freeing zmbuf
The mbufs should also be restored in free_zmbufs().

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Fixes: 3ebd930588 ("vhost: fix mbuf free")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-03-01 18:17:35 +01:00
Pallantla Poornima
7c7b756225 vhost: fix sprintf with snprintf
sprintf function is not secure as it doesn't check the length of string.
More secure function snprintf is used.

Fixes: d7280c9fff ("vhost: support selective datapath")
Cc: stable@dpdk.org

Signed-off-by: Pallantla Poornima <pallantlax.poornima@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2019-02-22 14:39:49 +01:00
Wenjie Sun
054617fd82 vhost: fix deadlock in driver unregister
In rte_vhost_driver_unregister(), the connection fd is
removed from the fdset using fdset_try_del(). Call to
this function may fail if the corresponding fd is in
busy state, indicating that event dispatcher is
executing the read or write callback on this fd.
When it happens, rte_vhost_driver_unregister() keeps
trying to remove the fd from the set until it is no
more busy.

This situation is causing a deadlock, because
rte_vhost_driver_unregister() keeps trying to remove
the fd from the set with vhost_user.mutex held, while
the callback executed by the dispatcher,
vhost_user_read_cb(), also takes this mutex at
numerous places.

The fix consists in releasing vhost_user.mutex between
each retry in vhost_driver_unregister().

Fixes: 8b4b949144 ("vhost: fix dead lock on closing in server mode")
Cc: stable@dpdk.org

Signed-off-by: Wenjie Sun <findtheonlyway@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-02-22 14:39:49 +01:00
Darek Stojaczyk
0b8572a0c1 vhost: add external message handling to the API
External message callbacks are used e.g. by vhost crypto
to parse crypto-specific vhost-user messages.

We are now publishing the API to register those callbacks,
so that other backends outside of DPDK can use them as well.

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-02-08 19:27:07 +01:00
Xiao Wang
b172129583 vhost: remove vDPA available ring relay helper
We don't need to relay available ring and check the desc, vdpa device
can access the available ring in the guest directly. With this patch,
we can achieve better throughput and lower CPU usage.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-02-08 19:27:07 +01:00
Tiwei Bie
4800639000 vhost: fix access for indirect descriptors
Fix a possible out of bound access which may happen when handling
indirect descs in split ring.

Fixes: 1be4ebb1c4 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org

Reported-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-24 10:08:31 +01:00
Tiwei Bie
e1c0834f95 vhost: fix memory leak on realloc failure
When realloc() fails, the original block isn't freed.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-18 09:47:26 +01:00
Xiaolong Ye
9303de0c94 vhost: remove unused function prototype
vhost_user_host_notifier_ctrl is not existed anymore, its statement in
header file should be removed accordingly.

Fixes: 43f34e3566 ("vhost: provide helper for host notifier ctrl")

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2019-01-18 09:47:26 +01:00
Xiaolong Ye
9f90145128 vhost: configure vDPA device after set vring call message
As qemu will only send VHOST_USER_SET_VRING_ENABLE message for guest
enabled vrings (only first queue pair will be enabled at initialized
stage), this will cause trouble for multiqueue case, vDPA's dev_conf
callback will get no chance be invoked. Decouple the dev_conf callback from
VHOST_USER_SET_VRING_ENABLE solves this issue.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-18 09:47:26 +01:00
Fan Zhang
16d2e718b8 vhost/crypto: fix possible out of bound access
This patch fixes a out of bound access possbility in vhost
crypto. Originally the incorrect next descriptor index may
cause the library read invalid memory content and crash
the application.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Fan Zhang
c7e7244b82 vhost/crypto: fix possible dead loop
This patch fixes a possible infinite loop caused by incorrect
descriptor chain created by the driver.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
61ec8f58b0 vhost: ensure event idx is mapped when negotiated
Fixes: 30920b1e2b ("vhost: ensure all range is mapped when translating QVAs")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
450539b47e vhost: fix possible dead loop in vector filling
Fix a possible dead loop which may happen, e.g. when driver
created a loop in the desc list and lens in descs are zero.

Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
06fc8545fd vhost: fix possible out of bound access in vector filling
Fixes: 7f74b95c44 ("vhost: pre update used ring for Tx and Rx")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
4e282bc6c5 vhost: fix possible dead loop in relay helpers
Fix a possible dead loop which may happen, e.g. when
driver created a loop in the desc list.

Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
85936f0546 vhost: fix possible out of bound access in relay helpers
Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Tiwei Bie
e218fa09f4 vhost: fix desc access in relay helpers
Descs in desc table should be indexed using the desc idx
instead of the idx of avail ring and used ring.

Fixes: b13ad2decc ("vhost: provide helpers for virtio ring relay")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-01-14 17:44:29 +01:00
Fan Zhang
ac5e42daca vhost/crypto: use separate session mempools
This patch uses the two session mempool approach to vhost crypto.
One mempool is for session header objects, and the other is for
session private data.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2019-01-10 16:57:22 +01:00
Maxime Coquelin
b473ec1131 vhost: batch used descs chains write-back with packed ring
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.

Also, move the write barrier in logging cache sync, so that it
is done only when logging is enabled. It means there is now
one more barrier for split ring when logging is enabled.

With Kernel's pktgen benchmark, ~3% performance gain is measured.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
815814c4ff vhost: remove useless prefetch for packed ring descriptor
This prefetch does not show any performance improvement.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
aaf8979d6f vhost: prefetch descriptor after the read barrier
This patch moves the prefetch after the available index
is read to avoid prefetching a descriptor not available yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
33e12d63d1 vhost: enforce desc flags and content read ordering
A read barrier is required to ensure that the ordering between
descriptor's flags and content reads is enforced.

1. read flags = desc->flags
if (flags & AVAIL_BIT)
2.   read desc->id

There is a control dependency between steps 1 and step 2.
2 could be speculatively executed before 1, which could result
in 'id' to not be updated yet.

Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Maxime Coquelin
d4ff2135eb vhost: enforce avail index and desc read ordering
A read barrier is required to ensure the ordering between
available index and the descriptor reads is enforced.

1. read avail_head = avail->idx
2. read cur_idx = last_avail_idx
if (cur_idx != avail_head) {
    3. read idx = avail->ring[cur_idx]
    4. read desc[idx]
}

There is a control dependency between step 1 and steps 3 & 4,
3 could be speculatively executed before 1, which could result
in 'idx' to not being updated yet.

Fixes: 4796ad63ba ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org

Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-12-21 16:22:41 +01:00
Xiao Wang
b13ad2decc vhost: provide helpers for virtio ring relay
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.

The available ring relay will synchronize the available entries, and
help to do desc validity checking.

The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.

The later patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
43f34e3566 vhost: provide helper for host notifier ctrl
VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Xiao Wang
02e3b285d4 vhost: remove unused function
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Matthias Gatto
276d63505b vhost: fix race condition when adding fd in the fdset
fdset_add can call fdset_shrink_nolock which call fdset_move
concurrently to poll that is call in fdset_event_dispatch.

This patch add a mutex to protect poll from been call at the same time
fdset_add call fdset_shrink_nolock.

Fixes: 1b815b8959 ("vhost: try to shrink pfdset when fdset_add fails")
Cc: stable@dpdk.org

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-21 16:22:40 +01:00
Ilya Maximets
48cae0bfa6 vhost: fix double read of descriptor flags
Flags could be updated in a separate process leading to the
inconsistent check.

Additionally, read marked as 'volatile' to highlight the shared
nature of the variable and avoid such issues in the future.

Fixes: d3211c98c4 ("vhost: add helpers for packed virtqueues")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-13 18:17:42 +00:00
Maxime Coquelin
cf14478d77 vhost: fix crash after mmap failure
If mmap() call fails in vhost_user_set_mem_table, dev->mem
is set to NULL. If later, qva_to_vva() is called, a segfault
occurs.

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-12-13 17:56:21 +00:00
Maxime Coquelin
5a12b67e74 vhost: fix packed ring constants declaration
The packed ring defines were declared only if kernel
header does not declare them.
The problem is that they are not applied in upstream kernel,
and some changes in the names have been required.

This patch declares the defines unconditionally, which
fixes potential build issues.

Fixes: 297b1e7350 ("vhost: add virtio packed virtqueue defines")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-22 23:06:26 +01:00
Tiwei Bie
0541588a44 vhost: remove unneeded null pointer check
The caller will guarantee that msg won't be null. Remove
the unneeded null pointer check which caused a Coverity
warning.

Coverity issue: 323484
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Fan Zhang
cd1e8f03ab vhost/crypto: fix packet copy in chaining mode
This patch fixes the incorrect packet content copy in the
chaining mode. Originally the content before cipher offset is
overwritten by all zeros. This patch fixes the problem by
making sure the correct write back source and destination
settings during set up.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Tiwei Bie
30affaeebc vhost: fix IOVA access for packed ring
We should apply for RO access when receiving packets from the
VM and apply for RW access when sending packets to the VM.

Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-14 00:35:53 +01:00
Ferruh Yigit
7b178300ac vhost: fix possible out of bound access
Fixes: d7280c9fff ("vhost: support selective datapath")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-06 01:14:23 +01:00
Fan Zhang
d09328567e vhost/crypto: fix inferred misuse of enum
Fix inffered misuse of enum rte_crypto_cipher_algorithm and
rte_crypto_auth_algorithm

Coverity issue: 277202
Fixes: e80a987081 ("vhost/crypto: add session message handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 15:01:25 +01:00
Maxime Coquelin
708e14d8b9 vhost: advertize packed ring layout support
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-11-05 15:01:25 +01:00
Maxime Coquelin
2ce8b8973d vhost: add packed ring support to vring base requests
For packed ring layout, we need save avail index and its wrap
counter value. At restore time, the used index and its wrap counter
are set to available's ones, as the ring procressing is stopped
at vring base get time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-11-05 15:01:25 +01:00
Tiwei Bie
0dcdf32e64 vhost: initialize postcopy ufd properly
Currently, postcopy_ufd is initialized to 0 implicitly, so fd 0
could be closed unexpectedly by vhost_backend_cleanup(). Fix this
issue by initializing postcopy_ufd to -1 explicitly.

Fixes: 9eefef3b59 ("vhost: introduce postcopy advise message")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-26 22:14:06 +02:00
Maxime Coquelin
e988a6d845 vhost: avoid memory barriers when no descriptors dequeued
In both split and packed dequeue paths, flush_shadow_used_ring
and vhost_ring_call variants gets called even if not packets
have been dequeued, and so no descriptors updates happened.

It has an impact on CPU pipeline, as memory barriers are used
in these functions.

This patch don't call these functions if no descriptors have
been dequeued. The performance gain with split ring when
dequeue zero-copy is disabled should be null, but should be
noticeable with packed ring or dequeue zero-copy enabled.

Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Fixes: 915cf94042 ("vhost: use shadow used ring in dequeue path")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-26 22:14:06 +02:00
Tiwei Bie
16b9e38e74 vhost: fix vector filling for packed ring
We should return the length of the buffers described by
the current descriptor chain after filling the buffer
vector. So we need to zero the *len first.

Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-26 22:14:05 +02:00
Timothy Redaelli
1726e9994c vhost/crypto: fix shared lib build without cryptodev
Currently it's not possible to build DPDK as shared library with
cryptodev disabled since vhost is trying to link with rte_crypto,
but rte_crypto and rte_hash are only needed when you build vhost_crypto
and so only when cryptodev is enabled.

This patch fix this by linking rte_vhost with rte_crypto and rte_hash
only when cryptodev is enabled.

Fixes: b4ca812986 ("vhost/crypto: fix build without cryptodev")
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-26 22:14:05 +02:00
Maxime Coquelin
aa18d98bca vhost: enable postcopy protocol feature
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
cd85039e7e vhost: restrict postcopy live-migration enablement
Postcopy live-migration feature requires the application to
not populate the guest memory. As the vhost library cannot
prevent the application to that (e.g. preventing the
application to call mlockall()), the feature is disabled by
default.

The application should only enable the feature if it does not
force the guest memory to be populated.

In case the user passes the RTE_VHOST_USER_POSTCOPY_SUPPORT
flag at registration but the feature was not compiled,
registration fails.

For the same reason, postcopy and dequeue zero copy features
are not compatible, so don't advertize postcopy support if
dequeue zero copy is requested.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9a0a3a25fa vhost: support postcopy end request
The master sends this message before stopping handling
userfaults, so that the backend closes the userfaultfd.

The master waits for the slave to acknowledge the request
with an empty 64bits payload for synchronization purpose.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
5c2ee9662d vhost: send userfault range addresses back to Qemu
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
f40d8ea63a vhost: avoid useless VhostUserMemory copy
The VHOST_USER_SET_MEM_TABLE payload is copied when handled,
whereas it could directly be referenced.

This is not very important, but next, we'll need to update the
payload and send it back to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9b62e2da18 vhost: register new regions with userfaultfd
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
25807d6893 vhost: support postcopy listen message
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9eefef3b59 vhost: introduce postcopy advise message
This patch opens a userfaultfd and sends it back to Qemu's
VHOST_USER_POSTCOPY_ADVISE request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
06e787bd94 vhost: add config flag for postcopy
Postcopy live-migration features relies on userfaultfd,
which was only introduced in kernel v4.3.

This patch introduces a new define to allow building vhost
library on kernels not supporting userfaultfd.

With legacy build system, user has to explicitly set
CONFIG_RTE_LIBRTE_VHOST_POSTCOPY to 'y'.

With Meson build system, RTE_LIBRTE_VHOST_POSTCOPY gets
automatically defined if userfaultfd kernel header is
present.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
9bd3979306 vhost: enable fds passing in vhost-user messages
Passing userfault fds to Qemu will be required for postcopy
live-migration feature.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
02ef07efc0 vhost: pass socket fd to message handling callbacks
This is not used for now, but will be needed for the
special handling of VHOST_USER_SET_MEM_TABLE message
once postcopy will be supported.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
c00bb88d35 vhost: add number of fds to vhost-user messages
As soon as some ancillary data (fds) are received, it is copied
without checking its length.

This patch adds the number of fds received to the message,
which is set in read_vhost_message().

This is preliminary work to support sending fds to Qemu.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
7f20d9a965 vhost: define postcopy protocol flag
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
74ee315e4f vhost: fix error handling when mem table gets updated
When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c2 ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
57b4d90b58 vhost: fix payload size of reply
QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
7987eb1bc7 vhost: clarify reply-ack in case a reply was already sent
For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.

The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
ef6fb7d3fd vhost: fix return code of messages requiring replies
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
a52ac8ec27 vhost: fix messages results handling
Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.

External post and pre callbacks return type needs also to be
changed to the new enum, so that its handling is consistent.
This is done in this patch alongside with the convertion of
its only user, vhost-crypto backend.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaolong Ye
d0d4887d62 vhost: add doxygen comment to vDPA header
As APIs in rte_vdpa.h are public, we need to add doxygen comments
to all APIs and structures.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Tiwei Bie
cd9012c3f8 vhost: fix notification for packed ring
The notification can't be disabled in packed ring when
application tries to disable notification, because the
device event flags field is overwritten by an unexpected
value. This patch fixes this issue.

Fixes: b1cce26af1 ("vhost: add notification for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaolong Ye
0e0a7d3801 vhost: introduce API to get vDPA device number
It's used to get number of available registered vDPA devices.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-11 18:53:49 +02:00
Jiayu Hu
729199397f vhost: fix corner case for enqueue operation
When performing enqueue operations on the split and packed rings,
if the reserved buffer length from the descriptor table exceeds
65535, the returned length by fill_vec_buf_split/_packed()
overflows. This patch is to avoid this corner case.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Fixes: 37f5e79a27 ("vhost: add shadow used ring support for packed rings")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
2f270595c0 vhost: rework message handling as a callback array
Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.

Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
0bff510b5e vhost: unify message handling function signature
Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
fd29c33b65 vhost: handle unsupported message types in functions
Add new functions to handle the unsupported vhost message types:
 - vhost_user_set_vring_err
 - vhost_user_set_log_fd

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
e951355ffc vhost: make message handling functions prepare the reply
As VhostUserMsg structure is reused to generate the reply, move the
relevant fields update into the respective message handling functions.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
44eb792f9f vhost: unify struct VhostUserMsg usage
Do not use the typedef version of struct VhostUserMsg. Also unify the
related parameter name.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Tiwei Bie
58e90a9113 vhost: fix return value on enqueue path
Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
0d7853a4da vhost-user: drop connection on message handling failures
There are a lot of cases where vhost-user massage handling
could fail and end up in a fully not recoverable state. For
example, allocation failures of shadow used ring and batched
copy array are not recoverable and leads to the segmentation
faults like this on the receiving/transmission path:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7f913fecf0 (LWP 43625)]
  in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760
  760       batch_copy[vq->batch_copy_nb_elems].dst =

This could be easily reproduced in case of low memory or big
number of vhost-user ports.

Fix that by propagating error to the upper layer which will
end up with disconnection in case we can not report to
the message sender when the error happens.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Tiwei Bie
77de7c781c vhost: fix vhost interrupt support
When VIRTIO_RING_F_EVENT_IDX is negotiated, we need to
update the avail event to enable the notification.

Fixes: 3f8ff12821 ("vhost: support interrupt mode")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
28925156d9 vhost: fix zmbufs array leak after NUMA realloc
'numa_realloc()' allocates 'zmbufs' even if zero copy mode
is not configured. This leads to memory leak, because array
is freed only for zero copy case.

Fixes: 2651726def ("vhost: do deep copy while reallocating queue")
CC: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-09-12 19:10:09 +02:00
Ilya Maximets
d02f2092a3 vhost: suppress error if NUMA is not available
It's a common case that 'get_mempolicy' fails on systems
without NUMA support. No need to flag an error in log for
this situation.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-08-28 15:27:39 +02:00
Maxime Coquelin
af53db4867 vhost: flush IOTLB cache on new mem table handling
IOTLB entries contain the host virtual address of the guest
pages. When receiving a new VHOST_USER_SET_MEM_TABLE request,
the previous regions get unmapped, so the IOTLB entries, if any,
will be invalid. It does cause the vhost-user process to
segfault.

This patch introduces a new function to flush the IOTLB cache,
and call it as soon as the backend handles a VHOST_USER_SET_MEM
request.

Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-08-05 01:47:47 +02:00
Tiwei Bie
adead74939 vhost: remove unused variable
The nr_updated is just increased and not really used.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
2018-08-02 04:41:49 +02:00
Tiwei Bie
0989161b26 vhost: release locks on RARP packet failure
Fixes: eefac9536a ("vhost: postpone device creation until rings are mapped")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-07-26 10:02:52 +02:00
Tiwei Bie
14962a9c4f vhost: fix overflow on shadow used ring
The shadow used ring's size is the same as the vq's size,
so we shouldn't try more than "vq size" times. Besides,
the element pointed by avail->idx isn't available to the
device, so we will return error when try "vq size" times.

Fixes: 24e4844048 ("vhost: unify Rx mergeable and non-mergeable paths")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-07-26 10:02:50 +02:00
Jiayu Hu
6e2fad861a vhost: fix return value on dequeue path
This patch fixes the incorrect return value for rte_vhost_dequeue_burst()
when virtqueue is not enabled or virtqueue address translation fails.

Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-26 10:02:43 +02:00
Tiwei Bie
04651f72d1 vhost: fix buffer length calculation
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")

Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
2018-07-23 23:55:26 +02:00
Dan Gora
9f976204f5 vhost/crypto: use function to access mbuf private area
Use rte_mbuf_to_priv() to access the private data area in the mbuf.

Signed-off-by: Dan Gora <dg@adax.com>
2018-07-13 23:14:41 +02:00
Maxime Coquelin
b1cce26af1 vhost: add notification for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
ae999ce49d vhost: add Tx support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
a922401f35 vhost: add Rx support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
2f3225a7d6 vhost: add vector filling support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
cf3af2be49 vhost: create descriptor mapping function
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
37f5e79a27 vhost: add shadow used ring support for packed rings
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
3e2f9700bb vhost: append shadow used ring function names with split
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
62250c1d09 vhost: extract split ring handling from Rx and Tx functions
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
2c22b14388 vhost: clear batch copy index at copy time
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
d6315ce796 vhost: make indirect desc table copy desc type agnostic
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
7e47bba30a vhost: clear shadow used table index at flush time
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Yuanhan Liu
2d1541e2b6 vhost: add vring address setup for packed queues
Add code to set up packed queues when enabled.

Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Jens Freimann
d3211c98c4 vhost: add helpers for packed virtqueues
Add some helper functions to check descriptor flags
and check if a vring is of type packed.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Jens Freimann
297b1e7350 vhost: add virtio packed virtqueue defines
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
611994fc1b vhost: improve prefetching in enqueue path
This is an optimization to prefetch next buffer while the
current one is being processed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
c1058a6b16 vhost: prefetch first descriptor in dequeue path
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
380a2adff3 vhost: improve prefetching in dequeue path
This is an optimization to prefetch next buffer while the
current one is being processed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
fd68b4739d vhost: use buffer vectors in dequeue path
To ease packed ring layout integration, this patch makes
the dequeue path to re-use buffer vectors implemented for
enqueue path.

Doing this, copy_desc_to_mbuf() is now ring layout type
agnostic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
915cf94042 vhost: use shadow used ring in dequeue path
Relax used ring contention by reusing the shadow used
ring feature used by enqueue path.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Marvin Liu
22d2e78840 vhost: advertise support in-order feature
If devices always use descriptors in the same order in which they have
been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the use
of a batch of buffers to virtio driver by only writing used ring index.

Vhost user device has supported this feature by default. If vhost
dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t
assure that descriptors returned from NIC are in order.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-07-03 01:35:58 +02:00
Tiwei Bie
ad0fdb696a vhost: fix potential null pointer dereference
Coverity issue: 293097
Fixes: d90cf7d111 ("vhost: support host notifier")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-07-03 01:35:58 +02:00
Maxime Coquelin
e11411b52a vhost: fix missing increment of log cache count
The log_cache_nb_elem was never incremented, resulting
in all dirty pages to be missed during live migration.

Fixes: c16915b871 ("vhost: improve dirty pages logging performance")
Cc: stable@dpdk.org

Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-03 01:35:58 +02:00
Maxime Coquelin
63b113afa5 vhost: use SMP memory barrier before kicking guest
vhost_vring_call() used rte_mb(), which translates into
mfence instruction on x86.

This patch changes to use rte_smp_mb(), which changed recently
to translate into a locked ADD instruction for performance
reason.

The measured gain is up to 3% with the testpmd benchmarks.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-06-15 12:27:25 +02:00
Tonghao Zhang
2396806765 vhost: introduce new function helper
Introduce an new common helper to avoid redundancy.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 12:27:25 +02:00
Tiwei Bie
d90cf7d111 vhost: support host notifier
When a vDPA device is attached, vhost user will try to
register host notifiers to QEMU to allow notifications
to be delivered between the driver in the guest and the
vDPA device in the host directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:39 +02:00
Tonghao Zhang
76b1e4cec7 vhost: refine new device function
Make sure find avalid device id before allocating
virtio_net, if not, return directly. It may avoid
allocating and freeing virtio_net when there is
not valid device id.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:09 +02:00
Maxime Coquelin
c89e52d9c8 vhost: improve batched copies performance
Instead of copying batch_copy_nb_elems into the stack,
this patch uses it directly.

Small performance gain of 3% is seen when running PVP
benchmark.

Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-14 19:27:50 +02:00
Maxime Coquelin
24e4844048 vhost: unify Rx mergeable and non-mergeable paths
This patch reworks the vhost enqueue path so that a single
code path is used for both Rx mergeable or non-mergeable cases.

Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-14 19:27:50 +02:00
Maxime Coquelin
c16915b871 vhost: improve dirty pages logging performance
This patch caches all dirty pages logging until the used ring index
is updated.

The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurrent accesses,
contention was induced.

With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 or 64 (depending on CPU) pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.

Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-05-17 14:19:05 +02:00
Fan Zhang
3c79609fda vhost/crypto: handle virtually non-contiguous buffers
This patch enables the handling of buffers non-contiguous in
virtual address space in the vhost_crypto. Instead of using
rte_vhost_va_from_guest_pa(), the host virtual address is
converted by vhost_iova_to_vva() for wider use cases.

For copy mode, the copy length is limited to the chunk size,
next chunks VAs being fetched afterward.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-17 12:29:05 +02:00