To support multi-queue, configure device
after call fd of all queues are set.
Signed-off-by: Andy Pei <andy.pei@intel.com>
Signed-off-by: Huang Wei <wei.huang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When boot from virtio blk device, seabios in QEMU only enables one queue.
To work in this scenario, vDPA BLK device back-end configure device
when the first queue is ready.
Signed-off-by: Andy Pei <andy.pei@intel.com>
Signed-off-by: Huang Wei <wei.huang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently in function vhost_user_msg_handler, variable ret is used to
store both vhost msg result code and function call return value.
After this patch, variable ret is used only to store function call
return value, a new dedicated variable msg_result is used to
store vhost msg result. This can improve readability.
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Those macros have no real value and are easily replaced with a simple
if() block.
Existing users have been converted using a new cocci script.
Deprecate them.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Having a back reference to the index of the vq in the dev->virtqueue[]
array makes it possible to unify the internal API, with only passing dev
and vq.
It also allows displaying the vq index in log messages.
Remove virtqueue index checks where unneeded (like in static helpers
called from a loop on all available virtqueue).
Move virtqueue index validity checks the sooner possible.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
translate_ring_addresses and numa_realloc may change a virtio device and
virtio queue. Callers of those helpers must be extra careful and refresh
any reference to old data.
Change those functions prototype as a way to hint about this issue and
always ask for an indirect pointer.
Besides, when reallocating the device and queue, the code already made
sure it will return a pointer to a valid device. The checks on such
returned pointer can be removed.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
translate_ring_addresses (via numa_realloc) may change a virtio device and
virtio queue.
The virtqueue object must be refreshed before accessing the lock.
Fixes: 04c27cb673 ("vhost: fix unsafe vring addresses modifications")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).
Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.
Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
device information in the log messages was dropped.
Fixes: 52ade97e36 ("vhost: fix physical address mapping")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In the virtio blk vDPA live migration use case, before the live
migration process, QEMU will set call fd to vDPA back-end. QEMU
and vDPA back-end stand by until live migration starts.
During live migration process, QEMU sets kick fd and a new call
fd. However, after the kick fd is set to the vDPA back-end, the
vDPA back-end configures device and data path starts. The new
call fd will cause some kind of "re-configuration", this kind
of "re-configuration" cause IO drop.
After this patch, vDPA back-end configures device after kick fd
and call fd are well set and make sure no IO drops.
This patch only impact virtio blk vDPA device and does not impact
net device.
Fixes: 7015b65771 ("vdpa/ifc: add block device SW live-migration")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add support for VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG.
VHOST_USER_GET_CONFIG and VHOST_USER_SET_CONFIG message is only
supported by virtio blk VDPA device.
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In vhost_user_msg_handler(), if vhost message handling
failed, we should check whether the queue is locked and
release the lock before returning. Or, it will cause a
deadlock later.
Fixes: 7f31d4ea05 ("vhost: fix lock on device readiness notification")
Cc: stable@dpdk.org
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Some message handlers do not expect any file descriptor attached as
ancillary data.
Provide a common way to enforce this by adding a accepts_fd boolean in
the message handler structure. When a message handler sets accepts_fd to
true, it is responsible for calling validate_msg_fds with a right
expected file descriptor count.
This will avoid leaking some file descriptor by mistake when adding
support for new vhost user message types.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Move message handler description and callbacks into a single array and
remove unneeded VHOST_USER_MAX and VHOST_SLAVE_MAX enums.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Even if unlikely, a buggy vhost-user master might attach fds to inflight
messages. Add checks like for other types of vhost-user messages.
Fixes: d87f1a1cb7 ("vhost: support inflight info sharing")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In function vhost_user_set_inflight_fd, queue number in inflight
message is used to access virtqueue. However, queue number could
be larger than VHOST_MAX_VRING and cause write OOB as this number
will be used to write inflight info in virtqueue structure. This
patch checks the queue number to avoid the issue and also make
sure virtqueues are allocated before setting inflight information.
Fixes: ad0a4ae491 ("vhost: checkout resubmit inflight information")
Cc: stable@dpdk.org
Reported-by: Wenxiang Qian <leonwxqian@gmail.com>
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Following a rework, external message handlers were receiving a pointer
to a vhost_user message (as stated in the API), but lost the ability to
interact with fds attached to the message.
Restore the original layout and put a build check and reminders.
Bugzilla ID: 953
Fixes: 5e0099dc70 ("vhost: remove payload size limitation")
Reported-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jakub Poczatek <jakub.poczatek@intel.com>
Acked-by: Jakub Poczatek <jakub.poczatek@intel.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
This patch adds missing protection around vring_invalidate
and translate_ring_addresses calls in vhost_user_iotlb_msg.
Fixes: eefac9536a ("vhost: postpone device creation until rings are mapped")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When choosing IOVA as PA mode, IOVA is likely to be discontinuous,
which requires page by page mapping for DMA devices. To be consistent,
this patch implements page by page mapping instead of mapping at the
region granularity for both IOVA as VA and PA mode.
Fixes: 7c61fa08b7 ("vhost: enable IOMMU for async vhost")
Cc: stable@dpdk.org
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch renames the host_phys_addr to host_iova in guest_page
struct. The host_phys_addr is iova, it depends on the DPDK
IOVA mode.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds vDPA device cleanup callback to release resources on
vhost user connection close.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.
Remove redundant NULL pointer checks before free functions
found by nullfree.cocci
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
FDs at the end of the VhostUserMessage structure limits the size
of the payload. Move them to an other englobing structure, before
the header & payload of a VhostUserMessage.
Also removes a reference to fds in the VHUMsg structure defined in
drivers/net/virtio/virtio_user/vhost_user.c
Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch replaces multi-lines logs in multiple single-
line logs in order to ease logs filtering based on their
socket path.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch adds the Vhost socket path whenever possible in
order to make debugging possible when multiple Vhost
devices are in use. Some vhost-user layer functions are
modified to pass the device path down to the socket layer.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch adds the Vhost-user socket path to Vhost-user
layer logs in order to ease logs filtering.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch adds IOTLB mempool name when logging debug
or error messages, and also prepends the socket path.
to all the logs.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch adds log for vring related info in handling of vhost message
VHOST_USER_SET_VRING_BASE, which will be useful in live migration case.
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves async-related metadata from vhost_virtqueue
to a dedicated struct. It makes it clear which fields are
async related, and also saves some memory when async feature
is not in use.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Async DMA map status flag was added to prevent the unnecessary unmap
when DMA devices bound to kernel driver. This brings maintenance cost
for a lot of code. This patch removes the DMA map status by using
rte_errno instead.
This patch relies on the following patch to fix a partial
unmap check in vfio unmapping API.
[1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The use of IOMMU has many advantages, such as isolation and address
translation. This patch extends the capability of DMA engine to use
IOMMU if the DMA engine is bound to vfio.
When set memory table, the guest memory will be mapped
into the default container of DPDK.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Old IOVA cache entries are left when there is a change on virtio driver
in VM. In case that all these old entries have iova addresses lesser
than new iova entries, vhost code will need to iterate all the cache to
find the new ones. In case of just a new iova entry needed for the new
translations, this condition will last forever.
This has been observed in virtio-net to testpmd's vfio-pci driver
transition, reducing the performance from more than 10Mpps to less than
0.07Mpps if the hugepage address was higher than the networking
buffers. Since all new buffers are contained in this new gigantic page,
vhost needs to scan IOTLB_CACHE_SIZE - 1 for each translation at worst.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reported-by: Pei Zhang <pezhang@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Added macro to print six bytes of MAC address.
The MAC addresses will be printed in upper case
hexadecimal format.
In case there is a specific check for lower case
MAC address, the user may need to make a change in
such test case after this patch.
Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When the vhost-user frontend like Virtio-user tries to
reconnect to the restarted Vhost backend, the Vhost backend
segfaults when multiqueue is enabled.
This is caused by VHOST_USER_GET_VRING_BASE being called for
a virtqueue that has not been created before, causing a NULL
pointer dereferencing.
This patch adds the VHOST_USER_GET_VRING_BASE requests to
the list of requests that trigger queue pair allocations.
Fixes: 160cbc815b ("vhost: remove a hack on queue allocation")
Cc: stable@dpdk.org
Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest memory is hotplugged, the vhost application which
enables DMA acceleration must stop DMA transfers before the vhost
re-maps the guest memory.
This patch is to notify the vhost application of stopping DMA
transfers.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vhost notifies the application of device readiness via
vhost_user_notify_queue_state(), but calling this function
is not protected by the lock. This patch is to make this
function call lock protected.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.
While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.
For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Since the Vhost-user device initialization has been reworked,
enabling the application to start using the device as soon as
the first queue pair is ready, NUMA reallocation no more
happened on queue pairs other than the first one since
numa_realloc() was returning early if the device was running.
This patch fixes this issue by reallocating the device metadata
only if the device is running. For the virtqueues, a vring state
change notification is sent to notify the application of its
disablement. Since the callback is supposed to be blocking, it
is safe to reallocate it afterwards.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.
Fixes: 1818a63147 ("vhost: move dirty logging cache out of virtqueue")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.
This patch reallocates this table on the same NUMA node as the
other metadata.
Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.
However, reallocating the Vhost memory table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.
This patch reallocates this table on the same NUMA node as the
other metadata.
Fixes: 552e8fd3d2 ("vhost: simplify memory regions handling")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
IOTLB messages will be sent when some queues are not enabled. If we
initialize IOTLB in vhost_user_set_vring_num, it could happen that IOTLB
update comes when IOTLB pool of disabled queues are not initialized.
Fixes: 968bbc7e2e ("vhost: avoid IOTLB mempool allocation while IOMMU disabled")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
there is no need for vhost_user_set_vring_kick() to
notify the application of vring enabled, as
vhost_user_msg_handler() also notifies the application.
This patch is to remove unnecessary vring_state_changed() call.
Fixes: d0fcc38f5f ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch removes unnecessary rte_free() for async_pkts_info
and async_descs_split.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>