Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.
This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
gpa_to_hpa() function almost always fails due to the wrong setup of
the binary tree search key. Since there has already been a similar
function gpa_to_first_hpa() available in the vhost, instead of fixing
the issue in its original logic, gpa_to_hpa() function is rewritten to
be a wrapper of the gpa_to_first_hpa() to avoid code redundancy.
Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Fixes: faa9867c4da2 ("vhost: use binary search in address conversion")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add check on the async vector buffer usage to prevent the buf overrun.
If the unused vector buffer is not sufficient to prepare for next
packet's iov creation, an async transfer will be triggered immediately
to free the vector buffer.
Fixes: 78639d54563a ("vhost: introduce async enqueue registration API")
Cc: stable@dpdk.org
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Allocate async internal memory buffer by rte_malloc(), replacing array
declaration inside vq structure. Dynamic allocation can help to save
memory footprint when async path is not registered.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Current async ops allows check_completed_copies() callback to return
arbitrary number of async iov segments finished from backend async
devices. This design creates complexity for vhost to handle breaking
transfer of a single packet (i.e. transfer completes in the middle
of a async descriptor) and prevents application callbacks from
leveraging hardware capability to offload the work. Thus, this patch
enforces the check_completed_copies() callback to return the number
of async memory descriptors, which is aligned with async transfer
data ops callbacks. vhost async data path are revised to work with
new ops define, which provides a clean and simplified processing.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Dequeue zero-copy removal was announced in DPDK v20.08.
This feature brings constraints which makes the maintenance
of the Vhost library difficult. Its limitations makes it also
difficult to use by the applications (Tx vring starvation).
Removing it makes it easier to add new features, and also remove
some code in the hot path, which should bring a performance
improvement for the standard path.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
vhost lib now does not have definition of reset status. This patch
adds the reset status definition and changes related log.
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Async copy fails when single ring buffer vector is split on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If rte_vhost_enable_guest_notification is called before
the virtqueue is ready, the configuration is lost.
This patch fixes this by saving the guest notification
enablement value requested by the application, and apply
it before the virtqueue is made ready to the application.
Fixes: 604052ae5395 ("net/vhost: support queue update")
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In async enqueue copy, a packet could be split into multiple copy
segments. When polling the copy completion status, current async data
path assumes the async device callbacks are aware of the packet
boundary and return completed segments only if all segments belonging
to the same packet are done. Such assumption are not generic to common
async devices and may degrade the copy performance if async callbacks
have to implement it in software manner.
This patch adds tracking of the completed copy segments at vhost side.
If async copy device reports partial completion of a packets, only
vhost internal record is updated and vring status keeps unchanged
until remaining segments of the packet are also finished. The async
copy device is no longer necessary to care about the packet boundary.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds support to the new Virtio device get status
Vhost-user message.
The driver can send this new message to read the device status.
One of the uses of this message is to ensure the feature negotiation has
succeeded. According to the virtio spec, after completing the feature
negotiation, the driver sets the FEATURE_OK status bit and re-reads it
to ensure the device has accepted the features.
This patch also clears the FEATURE_OK status bit if the feature
negotiation has failed to let the driver know about his failure.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch adds support to the new Virtio device status
Vhost-user protocol feature.
Getting such information in the backend helps to know
when the driver is done with the device configuration
and so makes the initialization phase more robust.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Performing large memory copies usually takes up a major part of CPU
cycles and becomes the hot spot in vhost-user enqueue operation. To
offload the large copies from CPU to the DMA devices, asynchronous
APIs are introduced, with which the CPU just submits copy jobs to
the DMA but without waiting for its copy completion. Thus, there is
no CPU intervention during data transfer. We can save precious CPU
cycles and improve the overall throughput for vhost-user based
applications. This patch introduces registration/un-registration
APIs for vhost async data enqueue operation. Together with the
registration APIs implementations, data structures and the prototype
of the async callback functions required for async enqueue data path
are also defined.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The vhost library provide an infrastructure in order to help the DPDK
users to manage vhost devices.
One of the infrastructure parts is the features enablement APIs.
Some features bits may be defined only in the internal file vhost.h in
case the kernel version doesn't include them.
Hence, user running on old kernel may not be able to manage thus
features.
Move all the feature bits definitions to the API file rte_vhost.h.
Fixes: db69be54b6ff ("vhost: hide internal code")
Fixes: 8d286dbeb8d7 ("vhost: fix multiple queue not enabled for old kernels")
Fixes: 3d3c6590b58c ("vhost: enable virtio MTU feature")
Fixes: 704098fc478c ("vhost: fix build with old kernels")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Some guest drivers may not configure disabled virtio queues.
In this case, the vhost management never notifies the application and
the vDPA device readiness because it waits to the device to be ready.
The current ready state means that all the virtio queues should be
configured regardless the enablement status.
In order to support this case, this patch changes the ready state:
The device is ready when at least 1 queue pair is configured and
enabled.
So, now, the application and vDPA driver are notifies when the first
queue pair is configured and enabled.
Also the queue notifications will be triggered according to the new
ready definition.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch split the vDPA header file in two, making
rte_vdpa_device structure opaque to the application.
Applications should only include rte_vdpa.h, while drivers
should include both rte_vdpa.h and rte_vdpa_dev.h.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
This removes the notion of device ID in Vhost library
as a preliminary step to get rid of the vDPA device ID.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
If Tx zero copy enabled, gpa to hpa mapping table is updated one by
one. This will harm performance when guest memory backend using 2M
hugepages. Now utilize binary search to find the entry in mapping
table, meanwhile set the threshold to 256 entries for linear search.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The rarp packet broadcast flag is synchronized with rte_atomic_XX APIs
which is a full barrier, DMB, on aarch64. This patch optimized it with
c11 atomic one-way barrier.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
VHOST_FEATURES has been removed in previous refactoring.
Fixes: 0917f9d1f059 ("vhost: use new APIs to handle features")
Cc: stable@dpdk.org
Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently, the log address translation only happens in the vhost-user's
translate_ring_addresses(). However, the IOTLB update handler is not
checking if it was mapped to re-trigger that translation.
Since the log address mapping could fail, check it on iotlb updates.
Also, check it on vring_translate() so we do not dirty pages if the
logging address is not yet ready.
Additionally, properly protect the accesses to the iotlb structures.
Fixes: fbda9f145927 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently there are a couple of limitations on the logging system: Most
of the logs are compiled out and both datapath and controlpath logs
share the same loglevel.
This patch tries to help fix that situation by:
- Splitting control plane and data plane logs
- Making control plane logs dynamic while keeping data plane logs
compiled out by default for log levels lower than the INFO.
As a result, two macros are introduced:
- VHOST_LOG_CONFIG(LEVEL, ...): Config path logging. Level can be
dynamically controlled by "lib.vhost.config"
- VHOST_LOG_DATA(LEVEL, ...): Data path logging. Level can be
dynamically controlled by "lib.vhost.data". Every log macro with a
level lower than RTE_LOG_DP_LEVEL (which defaults to RTE_LOG_INFO)
will be compiled out.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer used ring updates as many as possible in vhost dequeue function
for coordinating with virtio driver. For supporting buffer, shadow used
ring element should contain descriptor's flags. First shadowed ring
index was recorded for calculating buffered number.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Flush used elements when batched enqueue function is finished.
Descriptor's flags are pre-calculated as they will be reset by vhost.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer vhost packed ring enqueue updates, flush ring descs if buffered
content filled up one cacheline. Thus virtio can receive packets at a
faster frequency.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add batch dequeue function like enqueue function for packed ring, batch
dequeue function will not support chained descriptors, single packet
dequeue function will handle it.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Create macro for adding unroll pragma before for each loop. Batch
functions will be contained of several small loops which can be
optimized by compilers' loop unrolling pragma.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When enqueuing or dequeuing, the virtqueue's local available and used
indexes are increased.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.
While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.
To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add IOVA versions of dirty page logging functions.
Note that the API facing rte_vhost_log_write is not modified.
So, make explicit that it expects the address in GPA space.
Fixes: 69c90e98f483 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When IOMMU is enabled the incoming log address is in IOVA space. In that
case, look in IOTLB table and translate the resulting HVA to GPA.
If IOMMU is not enabled, the incoming log address is already a GPA so no
transformation is needed.
Fixes: 69c90e98f483 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the frontend
and backend are assumed to be implemented in software, that is they can
run on identical CPUs in an SMP configuration.
Thus a weak form of memory barriers like rte_smp_r/wmb, other than
rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
and yields better performance.
For the above case, this patch helps yielding even better performance
by replacing the two-way barriers with C11 one-way barriers for avail
flags in packed ring.
Meanwhile, a read barrier is required to ensure ordering between
descriptor's flags and content reads [1]. With C11, load-acquire can
enforce the ordering instead of rmb barrier.
[1] https://patchwork.dpdk.org/patch/49109/
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds an operation callback which gets called every time
the library is waking up the guest trough an eventfd_write() call.
This can be used by 3rd party application, like OVS, to track the
number of times interrupts where generated. This might be of
interest to find out system-call were called in the fast path.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Handling of fragmented virtio-net header and indirect descriptors
tables was implemented to fix CVE-2018-1059. It should never
happen with healthy guests and so is already considered as
unlikely code path.
This patch moves these bits into non-inline dedicated functions
to reduce the I-cache pressure.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
In order to reduce the I-cache pressure, this patch removes
the inlining of the dirty pages logging functions, that we
can consider as cold path.
Indeed, these functions are only called while doing live
migration, so not called most of the time.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The VIRTIO_RING_F_EVENT_IDX feature of split ring might
be broken, as the value of signalled_used is invalid
after live migration, start up and virtio driver reload.
This patch fixes it by using signalled_used_valid.
In addition, this patch makes the VIRTIO_RING_F_EVENT_IDX
implementation of split ring match kernel backend to suppress
more interrupts.
Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification suppression")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reclaim outstanding zmbufs first before freeing memory regions,
otherwise there could be use-after-free.
Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Don't free the zero copy mbufs before they have been consumed,
otherwise there could be use-after-free.
Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The mbufs should also be restored in free_zmbufs().
Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Fixes: 3ebd930588b7 ("vhost: fix mbuf free")
Cc: stable@dpdk.org
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
External message callbacks are used e.g. by vhost crypto
to parse crypto-specific vhost-user messages.
We are now publishing the API to register those callbacks,
so that other backends outside of DPDK can use them as well.
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.
Also, move the write barrier in logging cache sync, so that it
is done only when logging is enabled. It means there is now
one more barrier for split ring when logging is enabled.
With Kernel's pktgen benchmark, ~3% performance gain is measured.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.
The available ring relay will synchronize the available entries, and
help to do desc validity checking.
The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.
The later patch will leverage these two helpers.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Flags could be updated in a separate process leading to the
inconsistent check.
Additionally, read marked as 'volatile' to highlight the shared
nature of the variable and avoid such issues in the future.
Fixes: d3211c98c456 ("vhost: add helpers for packed virtqueues")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The packed ring defines were declared only if kernel
header does not declare them.
The problem is that they are not applied in upstream kernel,
and some changes in the names have been required.
This patch declares the defines unconditionally, which
fixes potential build issues.
Fixes: 297b1e7350f6 ("vhost: add virtio packed virtqueue defines")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>