Add code to set up packed queues when enabled.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Add some helper functions to check descriptor flags
and check if a vring is of type packed.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
To ease packed ring layout integration, this patch makes
the dequeue path to re-use buffer vectors implemented for
enqueue path.
Doing this, copy_desc_to_mbuf() is now ring layout type
agnostic.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
If devices always use descriptors in the same order in which they have
been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the use
of a batch of buffers to virtio driver by only writing used ring index.
Vhost user device has supported this feature by default. If vhost
dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t
assure that descriptors returned from NIC are in order.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The log_cache_nb_elem was never incremented, resulting
in all dirty pages to be missed during live migration.
Fixes: c16915b87109 ("vhost: improve dirty pages logging performance")
Cc: stable@dpdk.org
Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
vhost_vring_call() used rte_mb(), which translates into
mfence instruction on x86.
This patch changes to use rte_smp_mb(), which changed recently
to translate into a locked ADD instruction for performance
reason.
The measured gain is up to 3% with the testpmd benchmarks.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Introduce an new common helper to avoid redundancy.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When a vDPA device is attached, vhost user will try to
register host notifiers to QEMU to allow notifications
to be delivered between the driver in the guest and the
vDPA device in the host directly.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch caches all dirty pages logging until the used ring index
is updated.
The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurrent accesses,
contention was induced.
With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 or 64 (depending on CPU) pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.
Fixes: 897f13a1f726 ("vhost: make page logging atomic")
Cc: stable@dpdk.org
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
This reverts commit 394313fff39d0f994325c47f7eab39daf5dc9e11.
While the patch did solve concurrency issue, it induces more
pages copies as some clean pages are marked as dirty for
performance reasons. Moreover, as there is no more contention
doing the logging, the rate of packets than can be processed is
higher, leading to even more pages to be dirtied.
It has been reported that with more than one queue pair, and
with a relatively low packet rate (1Mpps), the live migration
never converges until the flow is stopped.
While a better solution is found, it is better to reset to the
old behaviour, i.e. using atomic operation for dirty pages
logging.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This new rte_vhost_va_from_guest_pa API takes an extra len
parameter, used to specify the size of the range to be mapped.
Effective mapped range is returned via len parameter.
This issue has been assigned CVE-2018-1059.
Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
There is currently no check done on the length when translating
guest addresses into host virtual addresses. Also, there is no
guanrantee that the guest addresses range is contiguous in
the host virtual address space.
This patch prepares vhost_iova_to_vva() and its callers to
return and check the mapped size. If the mapped size is smaller
than the requested size, the caller handle it as an error.
This issue has been assigned CVE-2018-1059.
Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, vhost library lacks the support to the vhost backend
other than net such as adding private data or registering vhost-user
message handlers. This patch fills the gap by adding data pointer and
vhost-user pre and post message handlers to vhost library.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds APIs for datapath configuration.
The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch exports vhost-user protocol features to support device driver
development.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The vhost.h file uses bool type, but not include stdbool
header file. If other c files include vhost.h directly,
there will be a compile error.
This patch will be used in the next patch.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch aims at fixing a migration performance regression
faced since atomic operation is used to log pages as dirty when
doing live migration.
Instead of setting a single bit by doing an atomic read-modify-write
operation to log a page as dirty, this patch write 0xFF to the
corresponding byte, and so logs 8 page as dirty.
The advantage is that it avoids concurrent atomic operations by
multiple PMD threads, the drawback is that some clean pages are
marked as dirty and so are transferred twice.
Fixes: 897f13a1f726 ("vhost: make page logging atomic")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h is
included the symbols conflict.
This patch changes LOG_DEBUG to VHOST_LOG_DEBUG.
Fixes: 1c01d52392d5 ("vhost: add debug print")
Cc: stable@dpdk.org
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.
To avoid that, we turn this function to inline type.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The librte_vhost API is used in two ways:
1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().
2. As a library for implementing vhost device backends.
There is no distinction between the two at the API level or in the
librte_vhost implementation. For example, device state is kept in
"struct virtio_net" regardless of whether this is actually a net device
backend or whether the built-in virtio_net.c driver is in use.
The virtio_net.c driver should be a librte_vhost API client just like
the vhost-scsi code and have no special access to vhost.h internals.
Unfortunately, fixing this requires significant librte_vhost API
changes.
This patch takes a different approach: keep the librte_vhost API
unchanged but track whether the built-in virtio_net.c driver is in use.
See the next patch for a bug fix that requires knowledge of whether
virtio_net.c is in use.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch fixes compile failure with old kernels which have no
VIRTIO_F_ANY_LAYOUT defined.
Fixes: 5a8bb6e9020f ("vhost: claim to support any layout feature")
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary
descriptor layouts. The vhost-user lib already supports it, but the
feature declaration is missing. This patch fixes the mismatch.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.
This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.
These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.
Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.
See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, Explicit Congestion Notification (ECN) includes two parts:
guest ECN and host ECN. Guest ECN means the frontend can handle TSO
packets which have ECN set, and host ECN means the backend can handle
TSO packets which have ECN set.
The ECN features are rarely used. However, virtio-net enables them by
default, and vhost-net support both. To make live migration from
vhost-net to vhost-user possible, this patch announces to support
guest and host ECN in vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
negotiated. The driver set vring flags to 0, and MAY use used_event in
available ring to advise device interrupt util reach an index specified
by used_event. The device ignore the lower bit of vring flags, and send
an interrupt when index reach used_event.
The device can suppress notification in a manner analogous to the ways
driver suppress interrupt. The device manipulates flags or avail_event in
the used ring in the same way the driver manipulates flags or used_event in
available ring.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
and guest UFO. Guest UFO means the frontend can receive large UDP
packets, and host UFO means the backend can receive large UDP packets.
This patch supports host UFO and guest UFO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Extract the callfd eventfd signal operation so virtio_net.c does not
have to repeat it multiple times.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In virtio, Generic Segmentation Offload (GSO) is the feature for the
backend, which means the backend can receive packets with any GSO
type.
Virtio-net enables the GSO feature by default, and vhost-net supports it.
To make live migration from vhost-net to vhost-user possible, this patch
enables GSO for vhost-user.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Renamed data type from phys_addr_t to rte_iova_t.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.
It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.
This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].
When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().
[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.
When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.
When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.
This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.
This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Each dirty page logging operation should be atomic. But it's not
atomic in current implementation. So it's possible that some dirty
pages can't be logged successfully when different threads try to
log different pages into the same byte of the log buffer concurrently.
This patch fixes this issue.
Fixes: b171fad1ffa5 ("vhost: log used vring changes")
Cc: stable@dpdk.org
Reported-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.
Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>