Currently, the log address translation only happens in the vhost-user's
translate_ring_addresses(). However, the IOTLB update handler is not
checking if it was mapped to re-trigger that translation.
Since the log address mapping could fail, check it on iotlb updates.
Also, check it on vring_translate() so we do not dirty pages if the
logging address is not yet ready.
Additionally, properly protect the accesses to the iotlb structures.
Fixes: fbda9f1459 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The frontend may not send the get_inflight_fd and
set_inflight_fd although we negotiate the protocol
feature. When we meet this situation just return OK.
Fixes: ad0a4ae491 ("vhost: checkout resubmit inflight information")
Cc: stable@dpdk.org
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds a check to ensure the read size of
the Vhost-user message header is not smaller than
the expected size.
In case of unexpected read size, report an error
and close file descriptors passed with the message,
if any.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
This patch catches an overflow that could happen if an
invalid region size or page alignment is provided by the
guest via the VHOST_USER_SET_MEM_TABLE request.
If the sum of the size to mmap and the alignment overflows
uint64_t, then RTE_ALIGN_CEIL(mmap_size, alignment) macro
will return 0. This value was passed as is as size argument
to mmap().
While kernel handling of mmap() syscall returns an error
if size is 0, it is better to catch it earlier and provide
a meaningful error log.
Fixes: ec09c280b8 ("vhost: fix mmap not aligned with hugepage size")
Cc: stable@dpdk.org
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Consider a virtqueue ready when, apart from the descriptor area,
both event suppression areas have been mapped.
Fixes: 2d1541e2b6 ("vhost: add vring address setup for packed queues")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The current implementation of vhost_net in packed vring tries to fill
the shadow vector before send any actual changes to the guest. While
this can be beneficial for the throughput, it conflicts with some
bufferfloats methods like the linux kernel napi, that stops
transmitting packets if there are too much bytes/buffers in the
driver.
To solve it, we flush the shadow packets at the end of
virtio_dev_tx_packed if we have starved the vring, i.e. the next
buffer is not available for the device.
Since this last check can be expensive because of the atomic, we only
check it if we have not obtained the expected "count" packets. If it
happens to obtain "count" packets and there is no more available
packets the caller needs to keep call virtio_dev_tx_packed again.
Fixes: 31d6c6a5b8 ("vhost: optimize packed ring dequeue")
Cc: stable@dpdk.org
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
According to recvmsg() specification, 0 is a valid
return code when client is disconnecting.
Therefore, it should not be reported as error, unless there
are other dependencies that require message to not be empty.
But there are none, since the next immediate caller of recvmsg()
reports "vhost peer closed" info (not error) when message is empty.
This patch changes return code check for recvmsg() so that
misleading error message is not printed when the code is 0.
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
The vhost_user_read_cb() and rte_vhost_driver_unregister()
can be called at the same time by 2 threads. Eg thread1
calls vhost_user_read_cb() and removes the vsocket from
conn_list, then thread2 calls rte_vhost_driver_unregister()
and frees the vsocket since it is NOT in the conn_list.
So thread1 will access invalid memory when trying to
reconnect.
The fix is to move the "removing of vsocket from conn_list"
to end of the vhost_user_read_cb(), then avoid the race
condition.
The core trace is:
Program terminated with signal 11, Segmentation fault.
Fixes: af14759181 ("vhost: introduce API to start a specific driver")
Cc: stable@dpdk.org
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If the vhost-user application (e.g. OVS) deletes the vhost-user
port while Qemu sends a vhost-user request, a deadlock can
happen if the request handler tries to acquire vhost-user's
global mutex, which is also locked by the vhost-user port
deletion API (rte_vhost_driver_unregister).
This patch prevents the deadlock by making
rte_vhost_driver_unregister() to release the mutex and try
again if a request is being handled to give a chance to
the request handler to complete.
Fixes: 8b4b949144 ("vhost: fix dead lock on closing in server mode")
Fixes: 5fbb3941da ("vhost: introduce driver features related APIs")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
This msg is used to notify qemu that should get the config of backend.
For example, vhost-user-blk uses this msg to notify guest OS the
capacity of backend has changed.
The need_reply flag is not mandatory because it will block the sender
thread and master process will send get_config message to fetch the
configuration, this need an extra thread to process the vhost message.
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
By default, a vhost socket is created without attaching VDPA device,
this patch fixes the initial value of vdpa_dev_id.
Fixes: b4953225ce ("vhost: add APIs for datapath configuration")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Currently there are a couple of limitations on the logging system: Most
of the logs are compiled out and both datapath and controlpath logs
share the same loglevel.
This patch tries to help fix that situation by:
- Splitting control plane and data plane logs
- Making control plane logs dynamic while keeping data plane logs
compiled out by default for log levels lower than the INFO.
As a result, two macros are introduced:
- VHOST_LOG_CONFIG(LEVEL, ...): Config path logging. Level can be
dynamically controlled by "lib.vhost.config"
- VHOST_LOG_DATA(LEVEL, ...): Data path logging. Level can be
dynamically controlled by "lib.vhost.data". Every log macro with a
level lower than RTE_LOG_DP_LEVEL (which defaults to RTE_LOG_INFO)
will be compiled out.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Merge all versions in linker version script files to DPDK_20.0.
This commit was generated by running the following command:
:~/DPDK$ buildtools/update-abi.sh 20.0
Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Since the library versioning for both stable and experimental ABI's is
now managed globally, the LIBABIVER and version variables no longer
serve any useful purpose, and can be removed.
The replacement in Makefiles was done using the following regex:
^(#.*\n)?LIBABIVER\s*:=\s*\d+\n(\s*\n)?
(LIBABIVER := numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
The replacement for meson files was done using the following regex:
^(#.*\n)?version\s*=\s*\d+\n(\s*\n)?
(version = numbers, optionally preceded by a comment and optionally
succeeded by an empty line)
[David]: those variables are manually removed for the files:
- drivers/common/qat/Makefile
- lib/librte_eal/meson.build
[David]: the LIBABIVER is restored for the external ethtool example
library.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Fix these as they are user visible. Found with codespell.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Fixes: f05e26051c ("eal: add IPC asynchronous request")
Fixes: 0cbce3a167 ("vfio: skip DMA map failure if already mapped")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When VHOST_USER_VRING_NOFD_MASK is set, the fd_num is 0,
so validate_msg_fds() will return error. In this case,
the negotiation of vring message between vhost user front end and
back end would fail, and as a result, vhost user link could NOT be up.
How to reproduce:
1.Run dpdk testpmd insides VM, which locates at host with ovs+dpdk.
2.Notice that inside ovs there are endless logs regarding failure to
handle VHOST_USER_SET_VRING_CALL, and link of vm could NOT be up.
Fixes: bf472259dd ("vhost: fix possible denial of service by leaking FDs")
Cc: stable@dpdk.org
Signed-off-by: Zhike Wang <wangzk320@163.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
A malicious Vhost-user master could send in loop hand-crafted
vhost-user messages containing more file descriptors the
vhost-user slave expects. Doing so causes the application using
the vhost-user library to run out of FDs.
This issue has been assigned CVE-2019-14818
Fixes: 8f972312b8 ("vhost: support vhost-user")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
vhost_user_set_vring_num() performs multiple allocations
without checking whether data were previously allocated.
It may cause a denial of service because of the memory leaks
that happen if a malicious vhost-user master keeps sending
VHOST_USER_SET_VRING_NUM request until the slave runs out
of memory.
This issue has been assigned CVE-2019-14818
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
After enqueue function finished, packet index has been increased. Batch
enqueue function should retrieve mbuf structure pointed by that index.
Fixes: 0294211bb6 ("vhost: optimize packed ring enqueue")
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Packets data are directly copied when doing batch enqueue, add missed
dirty page logging after memory copy.
Fixes: ef861692c3 ("vhost: add packed ring batch enqueue")
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Log feature is disabled in vhost user, so that log address was invalid
when checking. Check whether log address is valid can work around it.
Log address should also be translated in packed ring virtqueue.
Fixes: fbda9f1459 ("vhost: translate incoming log address to GPA")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Virtio spec only set rule that packed ring maximum size is up to 2^15
entries. Should not limit packed ring size to power of two.
Fixes: 708e14d8b9 ("vhost: advertize packed ring layout support")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use of %llx print formatting causes meson build error on Power systems with
RHEL 7.6 and gcc 4.8.5. Replace with PRIx64 macro.
Fixes: 9b62e2da18 ("vhost: register new regions with userfaultfd")
Cc: stable@dpdk.org
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Currently the IPv4 header checksum is calculated including its
current value, which can be a valid checksum or just garbage.
In any case, if the original value is not zero, then the result
is always wrong.
The IPv4 checksum is defined in RFC791, page 14 says:
Header Checksum: 16 bits
The checksum algorithm is:
The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.
Thus force the csum field to always be zero.
Fixes: b08b8cfeb2 ("vhost: fix IP checksum")
Cc: stable@dpdk.org
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If linear buffers requested and external buffers are not, vhost
will not be able to receive any buffer that doesn't fit in a
single mbuf. Moreover, if such a buffer will appear in a vring
it will never be dequeued and the whole vring will become dead
breaking the network connection.
Disable segmentation offloading from the host side to avoid
having such a big buffers.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
mbuf allocation failure is a hard failure that highlights some
significant issues with memory pool size or a mbuf leak.
We still have the message for subsequent chained mbufs, but not
for the first one. It was removed while introducing extbuf
support for large buffers. But it was useful for catching
mempool issues and needs to be returned back.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue
function by only update first used descriptor.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Optimize vhost device packed ring dequeue function by splitting batch
and single functions. No-chained and direct descriptors will be handled
by batch and other will be handled by single as before.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost packed ring zero copy batch and single dequeue functions like
normal dequeue path.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Optimize vhost device packed ring enqueue function by splitting batch
and single functions. Packets can be filled into one desc will be
handled by batch and others will be handled by single as before.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer used ring updates as many as possible in vhost dequeue function
for coordinating with virtio driver. For supporting buffer, shadow used
ring element should contain descriptor's flags. First shadowed ring
index was recorded for calculating buffered number.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Flush used elements when batched enqueue function is finished.
Descriptor's flags are pre-calculated as they will be reset by vhost.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Buffer vhost packed ring enqueue updates, flush ring descs if buffered
content filled up one cacheline. Thus virtio can receive packets at a
faster frequency.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add batch dequeue function like enqueue function for packed ring, batch
dequeue function will not support chained descriptors, single packet
dequeue function will handle it.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost single packet dequeue function for packed ring and meanwhile
left space for shadow used ring update function.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Batch enqueue function will first check whether descriptors are cache
aligned. It will also check prerequisites in the beginning. Batch
enqueue function do not support chained mbufs, single packet enqueue
function will handle it.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Create macro for adding unroll pragma before for each loop. Batch
functions will be contained of several small loops which can be
optimized by compilers' loop unrolling pragma.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add vhost enqueue function for single packet and meanwhile left space
for flush used ring function.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When enqueuing or dequeuing, the virtqueue's local available and used
indexes are increased.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The rte_vhost_dequeue_burst supports two ways of dequeuing data.
If the data fits into a buffer, then all data is copied and a
single linear buffer is returned. Otherwise it allocates
additional mbufs and chains them together to return a multiple
segments mbuf.
While that covers most use cases, it forces applications that
need to work with larger data sizes to support multiple segments
mbufs. The non-linear characteristic brings complexity and
performance implications to the application.
To resolve the issue, add support to attach external buffer
to a pktmbuf and let the host provide during registration if
attaching an external buffer to pktmbuf is supported and if
only linear buffer are supported.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add packed ring support in two APIs
so user can get the packed ring`.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two APIs. one is for getting inflgiht
ring and the other is for getting base.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces three APIs to operate the inflight
ring. Three APIs are set, set last and clear. It includes
split and packed ring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds the inflight queue region structure include
the split and packed.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the packed ring in the rte_vhost_vring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch add the inflight message description and
the inflight share fd protocol feature flag.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The simultaneous use of dequeue_zero_copy and IOMMU is problematic.
Not only because IOVA_VA mode is not supported but also because the
potential invalidation of guest pages while the buffers are in use,
is not handled.
Prevent these two features to be enabled simultaneously.
Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>