Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.
This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch removes unnecessary check and function calls, and it changes
appropriate types for internal variables and fixes typos.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves memory region mmaping and related
preparation in a dedicated function in order to simplify
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of postcopy to a
dedicated function, with the goal of simplifying
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of memory regions to
userfaultfd to a dedicated function, with the goal of
simplifying VHOST_USER_SET_MEM_TABLE request handling
function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Simply replace smp barriers with atomic thread fence for
virtio packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Used idx can be synchronized by one-way barrier instead of full
write barrier for split vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Relax the full read barrier to one-way barrier for desc flags in
packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The ordering between avail index and desc reads has been enforced
by load-acquire for split vring, so smp_rmb barrier is not needed
behind it.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As function desc_is_avail performs a load-acquire barrier to
enforce the ordering between desc flags and desc content, it is
unnecessary to add a rte_smp_rmb barrier around the trace which
follows desc_is_avail.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reasons for building not supported generally start with lowercase
because printed as the second part of a line.
Other changes:
- "linux" should be "Linux" with a capital letter.
- ARCH_X86_64 may be simply x86_64.
- aarch64 is preferred over arm64.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
This patch fixes a file descriptor leak which happens
in the error path of vhost_user_set_vring_kick().
Fixes: 4796ad63ba ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
This patch fixes a file descriptor leak which happens
in the error path of vhost_user_set_log_base().
Fixes: 4796ad63ba ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org
Reported-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
If an error is encountered before the memory regions are
parsed, the file descriptors for these shared buffers are
leaked.
This patch fixes this by closing the message file descriptors
on error, taking care of avoiding double closing of the file
descriptors. guest_pages is also freed, even though it was not
leaked as its pointer was not overridden on subsequent function
calls.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Reported-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
This patches fixes virtqueue initialization issue causing
segfault or file descriptor being closed unexpectedly.
The wrong index was passed to init_vring_queue() by
alloc_vring_queue() when a hole in the virtqueue array was
met.
Fixes: 8acd7c2133 ("vhost: fix virtqueues metadata allocation")
Cc: stable@dpdk.org
Reported-by: Yu Jiang <yux.jiang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>
Async inflight packet counter should take failed packets into account.
Failed packets will be deducted in the error handling logic.
Fixes: 6b3c81db8b ("vhost: simplify async copy completion")
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch initializes a local parameter in async data path to avoid
compiler warnings.
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
gpa_to_hpa() function almost always fails due to the wrong setup of
the binary tree search key. Since there has already been a similar
function gpa_to_first_hpa() available in the vhost, instead of fixing
the issue in its original logic, gpa_to_hpa() function is rewritten to
be a wrapper of the gpa_to_first_hpa() to avoid code redundancy.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Fixes: faa9867c4d ("vhost: use binary search in address conversion")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
By design, async enqueue API should return directly if async device
is not registered. This patch removes the corrupted implementation of
the enqueue fallback from async mode to sync mode.
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch checks whether the virtqueue metadata pointer
is valid before dereferencing it. It is not considered
a fix as earlier patch ensures there are no holes in the
array of virtqueue metadata pointers.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure no out-of-bound accesses happen.
Fixes: 9eed6bfd2e ("vhost: allow to enable or disable features")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: 4d891f77dd ("vhost: add APIs to get inflight ring")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure no out-of-bound accesses happen.
Fixes: bd2e0c3fe5 ("vhost: add APIs for live migration")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: 9eed6bfd2e ("vhost: allow to enable or disable features")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: a67f286a65 ("vhost: export queue free entries")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The Vhost-user backend implementation assumes there will be
no holes in the device's array of virtqueues metadata
pointers.
It can happen though, and would cause segmentation faults,
memory leaks or undefined behaviour.
This patch keep the assumption that there is no holes in this
array, and allocate all uninitialized virtqueues metadata up
to requested index.
Fixes: 160cbc815b ("vhost: remove a hack on queue allocation")
Cc: stable@dpdk.org
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Since each version map file is contained in the subdirectory of the library
it refers to, there is no need to include the library name in the filename.
This makes things simpler in case of library renaming.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
When async unregister function is invoked in certain vhost event
callbacks (e.g. vring state change), deadlock may occur due to
recursive spinlock acquire. This patch uses trylock() primitive in
the unregister API to avoid deadlock.
Fixes: 78639d5456 ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add check on the async vector buffer usage to prevent the buf overrun.
If the unused vector buffer is not sufficient to prepare for next
packet's iov creation, an async transfer will be triggered immediately
to free the vector buffer.
Fixes: 78639d5456 ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Allocate async internal memory buffer by rte_malloc(), replacing array
declaration inside vq structure. Dynamic allocation can help to save
memory footprint when async path is not registered.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Current async ops allows check_completed_copies() callback to return
arbitrary number of async iov segments finished from backend async
devices. This design creates complexity for vhost to handle breaking
transfer of a single packet (i.e. transfer completes in the middle
of a async descriptor) and prevents application callbacks from
leveraging hardware capability to offload the work. Thus, this patch
enforces the check_completed_copies() callback to return the number
of async memory descriptors, which is aligned with async transfer
data ops callbacks. vhost async data path are revised to work with
new ops define, which provides a clean and simplified processing.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add a function to calculate the length of an IPv4 header as suggested
on the mailing list [1]. Call where appropriate.
[1] https://mails.dpdk.org/archives/dev/2020-October/184471.html
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This small optimization uses the static the Virtio-net
header len in packed datapath, since Virtio-net header
cannot be the legacy one in case of packed ring.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In case packed ring layout has been negotiated, but neither
Version 1 nor mergeable buffers, the Virtio-net header len
is assigned to the legacy devices value, which is wrong.
This patch fixes this with using the proper len as devices
using packed ring are not legacy devices.
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org
Reported-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In virtio_dev_extbuf_alloc(), the shinfo structure used to store
the reference counter and the free callback of the external buffer
is by default stored inside the mbuf data.
This is wrong because the mbuf (and its data) can be freed before
the external buffer, for instance in the following situation:
pkt2 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_attach(pkt2, pkt);
rte_pktmbuf_free(pkt);
After this, pkt is freed, but it still contains shinfo, which is
referenced by pkt2.
Fix this by always storing the shinfo beside the external buffer.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Dequeue zero-copy removal was announced in DPDK v20.08.
This feature brings constraints which makes the maintenance
of the Vhost library difficult. Its limitations makes it also
difficult to use by the applications (Tx vring starvation).
Removing it makes it easier to add new features, and also remove
some code in the hot path, which should bring a performance
improvement for the standard path.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
As announced in v20.08, this patch makes the vDPA
and related Vhost API stable.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the possible time-of-check to time-of-use (TOCTOU)
attack problem by copying request data and descriptor index to local
variable prior to process.
Also the original sequential read of descriptors may lead to TOCTOU
attack. This patch fixes the problem by loading all descriptors of a
request to local buffer before processing.
CVE-2020-14375
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the incorrect data length check to vhost crypto.
Instead of blindly accepting the descriptor length as data length, the
change compare the request provided data length and descriptor length
first. The security issue CVE-2020-14374 is not fixed alone by this
patch, part of the fix is done through:
"vhost/crypto: fix missed request check for copy mode".
CVE-2020-14374
Fixes: 3c79609fda ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes vhost crypto library for the incorrect source and
destination buffer calculation in the copy mode.
Fixes: cd1e8f03ab ("vhost/crypto: fix packet copy in chaining mode")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the incorrect descriptor deduction for vhost crypto.
CVE-2020-14378
Fixes: 16d2e718b8 ("vhost/crypto: fix possible out of bound access")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the missing iv space allocation in crypto
operation mempool.
Fixes: 709521f4c2 ("examples/vhost_crypto: support multi-core")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Commit d0fcc38f5f ("vhost: improve device readiness notifications")
makes the assumption that every Virtio devices are considered
ready for preocessing as soon as first queue pair is configured
and enabled.
While this is true for Virtio-net, it isn't for Virtio-scsi
and Virtio-blk.
This patch fixes this by only making this assumption for
the builtin Virtio-net backend, and restores back to previous
behaviour for other backends.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Reported-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Control thread (which handles iotlb msg) and forwarding thread
both use iotlb to translate address. The former may modify the
same entry of mempool and may cause a loop in iotlb_pending_entries
list.
Bugzilla ID: 523
Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Cc: stable@dpdk.org
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost lib now does not have definition of reset status. This patch
adds the reset status definition and changes related log.
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
A decision was made [1] to no longer support Make in DPDK, this patch
removes all Makefiles that do not make use of pkg-config, along with
the mk directory previously used by make.
[1] https://mails.dpdk.org/archives/dev/2020-April/162839.html
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Start a new release cycle with empty release notes.
The ABI version becomes 21.0.
The ABI major is back to normal, having only one number (21 vs 20.0).
The map files are updated to the new ABI major number (21).
The ABI exceptions are dropped.
Travis ABI check is disabled because compatibility is not preserved.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>