Similar as single dequeue, the multiple accesses of descriptor length
will lead to potential risk. One-time access of descriptor length can
eliminate this risk.
Fixes: 75ed51697820 ("vhost: add packed ring batch dequeue")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Similar as split ring, the multiple accesses of descriptor length will
lead to potential risk. One-time access of descriptor length can
eliminate this risk.
Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In vhost datapath, descriptor's length are mostly used in two coherent
operations. First step is used for address translation, second step is
used for memory transaction from guest to host. But the interval between
two steps will give a window for malicious guest, in which can change
descriptor length after vhost calculated buffer size. Thus may lead to
buffer overflow in vhost side. This potential risk can be eliminated by
accessing the descriptor length once.
Fixes: 1be4ebb1c464 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add rte_vhost_get_negotiated_protocol_features, which returns a set of
enabled protocol features.
Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves vhost_virtqueue struct fields in order
to both optimize packing and move hot fields on the first
cachelines.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.
It saves 8 cachelines in vhost_virtqueue struct.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch removes the "backend" field of the
vhost_virtqueue struct, which is not used by the
library.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
When vhost is doing dequeue offloading, it parses ethernet and L3/L4
headers of the packet. Then vhost will set corresponding value in mbuf
attributes. It means offloading action should be after packet data copy.
Fixes: 75ed51697820 ("vhost: add packed ring batch dequeue")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_new_device might be called in different threads at
the same time.
thread 1(config thread)
rte_vhost_driver_start
->vhost_user_start_client
->vhost_user_add_connection
-> vhost_new_device
thread 2(vhost-events)
vhost_user_read_cb
->vhost_user_msg_handler (return value < 0)
-> vhost_user_start_client
-> vhost_new_device
So there could be a case that a same vid has been allocated
twice, or some vid might be lost in DPDK lib however still
held by the upper applications.
Another place where race would happen is at the func
*vhost_destroy_device*, but after a detailed investigation,
the race does not exist as long as no two devices have the
same vid: Calling vhost_destroy_devices in different
threads with different vids is actually safe.
Fixes: a277c7159876 ("vhost: refactor code structure")
Cc: stable@dpdk.org
Reported-by: Peng He <hepeng.0320@bytedance.com>
Signed-off-by: Fei Chen <chenwei.0515@bytedance.com>
Reviewed-by: Zhihong Wang <wangzhihong.wzh@bytedance.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The vhost header files were missing definitions from headers to allow
them to be compiled up individually.
Fixes: d7280c9fffcb ("vhost: support selective datapath")
Fixes: a49f758d1170 ("vhost: split vDPA header file")
Fixes: 939066d96563 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.
This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch removes unnecessary check and function calls, and it changes
appropriate types for internal variables and fixes typos.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves memory region mmaping and related
preparation in a dedicated function in order to simplify
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of postcopy to a
dedicated function, with the goal of simplifying
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of memory regions to
userfaultfd to a dedicated function, with the goal of
simplifying VHOST_USER_SET_MEM_TABLE request handling
function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Simply replace smp barriers with atomic thread fence for
virtio packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Used idx can be synchronized by one-way barrier instead of full
write barrier for split vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Relax the full read barrier to one-way barrier for desc flags in
packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The ordering between avail index and desc reads has been enforced
by load-acquire for split vring, so smp_rmb barrier is not needed
behind it.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As function desc_is_avail performs a load-acquire barrier to
enforce the ordering between desc flags and desc content, it is
unnecessary to add a rte_smp_rmb barrier around the trace which
follows desc_is_avail.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reasons for building not supported generally start with lowercase
because printed as the second part of a line.
Other changes:
- "linux" should be "Linux" with a capital letter.
- ARCH_X86_64 may be simply x86_64.
- aarch64 is preferred over arm64.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
This patch fixes a file descriptor leak which happens
in the error path of vhost_user_set_vring_kick().
Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
This patch fixes a file descriptor leak which happens
in the error path of vhost_user_set_log_base().
Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
Cc: stable@dpdk.org
Reported-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
If an error is encountered before the memory regions are
parsed, the file descriptors for these shared buffers are
leaked.
This patch fixes this by closing the message file descriptors
on error, taking care of avoiding double closing of the file
descriptors. guest_pages is also freed, even though it was not
leaked as its pointer was not overridden on subsequent function
calls.
Fixes: 8f972312b8f4 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Reported-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
This patches fixes virtqueue initialization issue causing
segfault or file descriptor being closed unexpectedly.
The wrong index was passed to init_vring_queue() by
alloc_vring_queue() when a hole in the virtqueue array was
met.
Fixes: 8acd7c213353 ("vhost: fix virtqueues metadata allocation")
Cc: stable@dpdk.org
Reported-by: Yu Jiang <yux.jiang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yu Jiang <yux.jiang@intel.com>
Async inflight packet counter should take failed packets into account.
Failed packets will be deducted in the error handling logic.
Fixes: 6b3c81db8bb7 ("vhost: simplify async copy completion")
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch initializes a local parameter in async data path to avoid
compiler warnings.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
gpa_to_hpa() function almost always fails due to the wrong setup of
the binary tree search key. Since there has already been a similar
function gpa_to_first_hpa() available in the vhost, instead of fixing
the issue in its original logic, gpa_to_hpa() function is rewritten to
be a wrapper of the gpa_to_first_hpa() to avoid code redundancy.
Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Fixes: faa9867c4da2 ("vhost: use binary search in address conversion")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
By design, async enqueue API should return directly if async device
is not registered. This patch removes the corrupted implementation of
the enqueue fallback from async mode to sync mode.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch checks whether the virtqueue metadata pointer
is valid before dereferencing it. It is not considered
a fix as earlier patch ensures there are no holes in the
array of virtqueue metadata pointers.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure no out-of-bound accesses happen.
Fixes: 9eed6bfd2efb ("vhost: allow to enable or disable features")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: 4d891f77ddfa ("vhost: add APIs to get inflight ring")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure no out-of-bound accesses happen.
Fixes: bd2e0c3fe5ac ("vhost: add APIs for live migration")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: 9eed6bfd2efb ("vhost: allow to enable or disable features")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch validates the queue index parameter, in order
to ensure neither out-of-bound accesses nor NULL pointer
dereferencing happen.
Fixes: a67f286a6596 ("vhost: export queue free entries")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The Vhost-user backend implementation assumes there will be
no holes in the device's array of virtqueues metadata
pointers.
It can happen though, and would cause segmentation faults,
memory leaks or undefined behaviour.
This patch keep the assumption that there is no holes in this
array, and allocate all uninitialized virtqueues metadata up
to requested index.
Fixes: 160cbc815b41 ("vhost: remove a hack on queue allocation")
Cc: stable@dpdk.org
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Since each version map file is contained in the subdirectory of the library
it refers to, there is no need to include the library name in the filename.
This makes things simpler in case of library renaming.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
When async unregister function is invoked in certain vhost event
callbacks (e.g. vring state change), deadlock may occur due to
recursive spinlock acquire. This patch uses trylock() primitive in
the unregister API to avoid deadlock.
Fixes: 78639d54563a ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add check on the async vector buffer usage to prevent the buf overrun.
If the unused vector buffer is not sufficient to prepare for next
packet's iov creation, an async transfer will be triggered immediately
to free the vector buffer.
Fixes: 78639d54563a ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Allocate async internal memory buffer by rte_malloc(), replacing array
declaration inside vq structure. Dynamic allocation can help to save
memory footprint when async path is not registered.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Current async ops allows check_completed_copies() callback to return
arbitrary number of async iov segments finished from backend async
devices. This design creates complexity for vhost to handle breaking
transfer of a single packet (i.e. transfer completes in the middle
of a async descriptor) and prevents application callbacks from
leveraging hardware capability to offload the work. Thus, this patch
enforces the check_completed_copies() callback to return the number
of async memory descriptors, which is aligned with async transfer
data ops callbacks. vhost async data path are revised to work with
new ops define, which provides a clean and simplified processing.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add a function to calculate the length of an IPv4 header as suggested
on the mailing list [1]. Call where appropriate.
[1] https://mails.dpdk.org/archives/dev/2020-October/184471.html
Suggested-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Michael Pfeiffer <michael.pfeiffer@tu-ilmenau.de>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This small optimization uses the static the Virtio-net
header len in packed datapath, since Virtio-net header
cannot be the legacy one in case of packed ring.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In case packed ring layout has been negotiated, but neither
Version 1 nor mergeable buffers, the Virtio-net header len
is assigned to the legacy devices value, which is wrong.
This patch fixes this with using the proper len as devices
using packed ring are not legacy devices.
Fixes: a922401f35cc ("vhost: add Rx support for packed ring")
Fixes: ae999ce49dcb ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org
Reported-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In virtio_dev_extbuf_alloc(), the shinfo structure used to store
the reference counter and the free callback of the external buffer
is by default stored inside the mbuf data.
This is wrong because the mbuf (and its data) can be freed before
the external buffer, for instance in the following situation:
pkt2 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_attach(pkt2, pkt);
rte_pktmbuf_free(pkt);
After this, pkt is freed, but it still contains shinfo, which is
referenced by pkt2.
Fix this by always storing the shinfo beside the external buffer.
Fixes: c3ff0ac70acb ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.
Fixes: 939066d96563 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Dequeue zero-copy removal was announced in DPDK v20.08.
This feature brings constraints which makes the maintenance
of the Vhost library difficult. Its limitations makes it also
difficult to use by the applications (Tx vring starvation).
Removing it makes it easier to add new features, and also remove
some code in the hot path, which should bring a performance
improvement for the standard path.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
As announced in v20.08, this patch makes the vDPA
and related Vhost API stable.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the possible time-of-check to time-of-use (TOCTOU)
attack problem by copying request data and descriptor index to local
variable prior to process.
Also the original sequential read of descriptors may lead to TOCTOU
attack. This patch fixes the problem by loading all descriptors of a
request to local buffer before processing.
CVE-2020-14375
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>