Commit Graph

610 Commits

Author SHA1 Message Date
Maxime Coquelin
74ee315e4f vhost: fix error handling when mem table gets updated
When the memory table gets updated, the rings addresses need
to be translated again. If it fails, we need to exit cleanly
by unmapping memory regions.

Fixes: d5022533c2 ("vhost: retranslate vring addr when memory table changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
57b4d90b58 vhost: fix payload size of reply
QEMU doesn't expect any payload for the reply of
VHOST_USER_SET_LOG_BASE request, so don't send any.
Note that the Vhost-user specification isn't clear about
it and would need to be fixed.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
7987eb1bc7 vhost: clarify reply-ack in case a reply was already sent
For messages that require a reply, a second ack should not be
sent when reply-ack protocol feature is negotiated, even if
the corresponding flag is set in the message.

The code is compliant with the spec but it isn't clear it is,
so this patch adds a comment to make it explicit.

Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
ef6fb7d3fd vhost: fix return code of messages requiring replies
VHOST_USER_GET_PROTOCOL_FEATURES, VHOST_USER_GET_VRING_BASE
and VHOST_USER_SET_LOG_BASE require replies, so their handlers
should return VH_RESULT_REPLY, not VH_RESULT_OK.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-10-18 10:24:39 +02:00
Maxime Coquelin
a52ac8ec27 vhost: fix messages results handling
Return of message handling has now changed to an enum that can
take non-negative value that is not zero in case a reply is
needed. But the code checking the variable afterwards has not
been updated, leading to success messages handling being
treated as errors.

External post and pre callbacks return type needs also to be
changed to the new enum, so that its handling is consistent.
This is done in this patch alongside with the convertion of
its only user, vhost-crypto backend.

Fixes: 0bff510b5e ("vhost: unify message handling function signature")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaolong Ye
d0d4887d62 vhost: add doxygen comment to vDPA header
As APIs in rte_vdpa.h are public, we need to add doxygen comments
to all APIs and structures.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-18 10:24:39 +02:00
Tiwei Bie
cd9012c3f8 vhost: fix notification for packed ring
The notification can't be disabled in packed ring when
application tries to disable notification, because the
device event flags field is overwritten by an unexpected
value. This patch fixes this issue.

Fixes: b1cce26af1 ("vhost: add notification for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2018-10-18 10:24:39 +02:00
Xiaolong Ye
0e0a7d3801 vhost: introduce API to get vDPA device number
It's used to get number of available registered vDPA devices.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-10-11 18:53:49 +02:00
Jiayu Hu
729199397f vhost: fix corner case for enqueue operation
When performing enqueue operations on the split and packed rings,
if the reserved buffer length from the descriptor table exceeds
65535, the returned length by fill_vec_buf_split/_packed()
overflows. This patch is to avoid this corner case.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")
Fixes: 2f3225a7d6 ("vhost: add vector filling support for packed ring")
Fixes: 37f5e79a27 ("vhost: add shadow used ring support for packed rings")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
2f270595c0 vhost: rework message handling as a callback array
Introduce vhost_message_handlers, which maps the message request
type to the message handler. Then replace the switch construct
with a map and call.

Failing vhost_user_set_features is fatal and all processing should
stop immediately and propagate the error to the upper layers. Change
the code accordingly to reflect that.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
0bff510b5e vhost: unify message handling function signature
Each vhost-user message handling function will return an int result
which is described in the new enum vh_result: error, OK and reply.
All functions will now have two arguments, virtio_net double pointer
and VhostUserMsg pointer.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
fd29c33b65 vhost: handle unsupported message types in functions
Add new functions to handle the unsupported vhost message types:
 - vhost_user_set_vring_err
 - vhost_user_set_log_fd

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
e951355ffc vhost: make message handling functions prepare the reply
As VhostUserMsg structure is reused to generate the reply, move the
relevant fields update into the respective message handling functions.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Nikolay Nikolaev
44eb792f9f vhost: unify struct VhostUserMsg usage
Do not use the typedef version of struct VhostUserMsg. Also unify the
related parameter name.

Signed-off-by: Nikolay Nikolaev <nicknickolaev@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-28 01:41:03 +02:00
Tiwei Bie
58e90a9113 vhost: fix return value on enqueue path
Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
0d7853a4da vhost-user: drop connection on message handling failures
There are a lot of cases where vhost-user massage handling
could fail and end up in a fully not recoverable state. For
example, allocation failures of shadow used ring and batched
copy array are not recoverable and leads to the segmentation
faults like this on the receiving/transmission path:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7f913fecf0 (LWP 43625)]
  in copy_desc_to_mbuf () at /lib/librte_vhost/virtio_net.c:760
  760       batch_copy[vq->batch_copy_nb_elems].dst =

This could be easily reproduced in case of low memory or big
number of vhost-user ports.

Fix that by propagating error to the upper layer which will
end up with disconnection in case we can not report to
the message sender when the error happens.

Fixes: f689586bc0 ("vhost: shadow used ring update")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Tiwei Bie
77de7c781c vhost: fix vhost interrupt support
When VIRTIO_RING_F_EVENT_IDX is negotiated, we need to
update the avail event to enable the notification.

Fixes: 3f8ff12821 ("vhost: support interrupt mode")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-14 20:08:41 +02:00
Ilya Maximets
28925156d9 vhost: fix zmbufs array leak after NUMA realloc
'numa_realloc()' allocates 'zmbufs' even if zero copy mode
is not configured. This leads to memory leak, because array
is freed only for zero copy case.

Fixes: 2651726def ("vhost: do deep copy while reallocating queue")
CC: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-09-12 19:10:09 +02:00
Ilya Maximets
d02f2092a3 vhost: suppress error if NUMA is not available
It's a common case that 'get_mempolicy' fails on systems
without NUMA support. No need to flag an error in log for
this situation.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-08-28 15:27:39 +02:00
Maxime Coquelin
af53db4867 vhost: flush IOTLB cache on new mem table handling
IOTLB entries contain the host virtual address of the guest
pages. When receiving a new VHOST_USER_SET_MEM_TABLE request,
the previous regions get unmapped, so the IOTLB entries, if any,
will be invalid. It does cause the vhost-user process to
segfault.

This patch introduces a new function to flush the IOTLB cache,
and call it as soon as the backend handles a VHOST_USER_SET_MEM
request.

Fixes: 69c90e98f4 ("vhost: enable IOMMU support")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-08-05 01:47:47 +02:00
Tiwei Bie
adead74939 vhost: remove unused variable
The nr_updated is just increased and not really used.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
2018-08-02 04:41:49 +02:00
Tiwei Bie
0989161b26 vhost: release locks on RARP packet failure
Fixes: eefac9536a ("vhost: postpone device creation until rings are mapped")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-07-26 10:02:52 +02:00
Tiwei Bie
14962a9c4f vhost: fix overflow on shadow used ring
The shadow used ring's size is the same as the vq's size,
so we shouldn't try more than "vq size" times. Besides,
the element pointed by avail->idx isn't available to the
device, so we will return error when try "vq size" times.

Fixes: 24e4844048 ("vhost: unify Rx mergeable and non-mergeable paths")
Fixes: a922401f35 ("vhost: add Rx support for packed ring")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-07-26 10:02:50 +02:00
Jiayu Hu
6e2fad861a vhost: fix return value on dequeue path
This patch fixes the incorrect return value for rte_vhost_dequeue_burst()
when virtqueue is not enabled or virtqueue address translation fails.

Fixes: 62250c1d09 ("vhost: extract split ring handling from Rx and Tx functions")

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-26 10:02:43 +02:00
Tiwei Bie
04651f72d1 vhost: fix buffer length calculation
Fixes: fd68b4739d ("vhost: use buffer vectors in dequeue path")

Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
2018-07-23 23:55:26 +02:00
Dan Gora
9f976204f5 vhost/crypto: use function to access mbuf private area
Use rte_mbuf_to_priv() to access the private data area in the mbuf.

Signed-off-by: Dan Gora <dg@adax.com>
2018-07-13 23:14:41 +02:00
Maxime Coquelin
b1cce26af1 vhost: add notification for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
ae999ce49d vhost: add Tx support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
a922401f35 vhost: add Rx support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
2f3225a7d6 vhost: add vector filling support for packed ring
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
cf3af2be49 vhost: create descriptor mapping function
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
37f5e79a27 vhost: add shadow used ring support for packed rings
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
3e2f9700bb vhost: append shadow used ring function names with split
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
62250c1d09 vhost: extract split ring handling from Rx and Tx functions
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
2c22b14388 vhost: clear batch copy index at copy time
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
d6315ce796 vhost: make indirect desc table copy desc type agnostic
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Maxime Coquelin
7e47bba30a vhost: clear shadow used table index at flush time
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Yuanhan Liu
2d1541e2b6 vhost: add vring address setup for packed queues
Add code to set up packed queues when enabled.

Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:19:29 +02:00
Jens Freimann
d3211c98c4 vhost: add helpers for packed virtqueues
Add some helper functions to check descriptor flags
and check if a vring is of type packed.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Jens Freimann
297b1e7350 vhost: add virtio packed virtqueue defines
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
611994fc1b vhost: improve prefetching in enqueue path
This is an optimization to prefetch next buffer while the
current one is being processed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
c1058a6b16 vhost: prefetch first descriptor in dequeue path
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
380a2adff3 vhost: improve prefetching in dequeue path
This is an optimization to prefetch next buffer while the
current one is being processed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
fd68b4739d vhost: use buffer vectors in dequeue path
To ease packed ring layout integration, this patch makes
the dequeue path to re-use buffer vectors implemented for
enqueue path.

Doing this, copy_desc_to_mbuf() is now ring layout type
agnostic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Maxime Coquelin
915cf94042 vhost: use shadow used ring in dequeue path
Relax used ring contention by reusing the shadow used
ring feature used by enqueue path.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-10 23:13:36 +02:00
Marvin Liu
22d2e78840 vhost: advertise support in-order feature
If devices always use descriptors in the same order in which they have
been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the use
of a batch of buffers to virtio driver by only writing used ring index.

Vhost user device has supported this feature by default. If vhost
dequeue zero is enabled, should disable VIRTIO_F_IN_ORDER as vhost can’t
assure that descriptors returned from NIC are in order.

Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-07-03 01:35:58 +02:00
Tiwei Bie
ad0fdb696a vhost: fix potential null pointer dereference
Coverity issue: 293097
Fixes: d90cf7d111 ("vhost: support host notifier")

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-07-03 01:35:58 +02:00
Maxime Coquelin
e11411b52a vhost: fix missing increment of log cache count
The log_cache_nb_elem was never incremented, resulting
in all dirty pages to be missed during live migration.

Fixes: c16915b871 ("vhost: improve dirty pages logging performance")
Cc: stable@dpdk.org

Reported-by: Peng He <xnhp0320@icloud.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-07-03 01:35:58 +02:00
Maxime Coquelin
63b113afa5 vhost: use SMP memory barrier before kicking guest
vhost_vring_call() used rte_mb(), which translates into
mfence instruction on x86.

This patch changes to use rte_smp_mb(), which changed recently
to translate into a locked ADD instruction for performance
reason.

The measured gain is up to 3% with the testpmd benchmarks.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-06-15 12:27:25 +02:00
Tonghao Zhang
2396806765 vhost: introduce new function helper
Introduce an new common helper to avoid redundancy.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 12:27:25 +02:00
Tiwei Bie
d90cf7d111 vhost: support host notifier
When a vDPA device is attached, vhost user will try to
register host notifiers to QEMU to allow notifications
to be delivered between the driver in the guest and the
vDPA device in the host directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:39 +02:00
Tonghao Zhang
76b1e4cec7 vhost: refine new device function
Make sure find avalid device id before allocating
virtio_net, if not, return directly. It may avoid
allocating and freeing virtio_net when there is
not valid device id.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-15 09:49:09 +02:00
Maxime Coquelin
c89e52d9c8 vhost: improve batched copies performance
Instead of copying batch_copy_nb_elems into the stack,
this patch uses it directly.

Small performance gain of 3% is seen when running PVP
benchmark.

Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-14 19:27:50 +02:00
Maxime Coquelin
24e4844048 vhost: unify Rx mergeable and non-mergeable paths
This patch reworks the vhost enqueue path so that a single
code path is used for both Rx mergeable or non-mergeable cases.

Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-06-14 19:27:50 +02:00
Maxime Coquelin
c16915b871 vhost: improve dirty pages logging performance
This patch caches all dirty pages logging until the used ring index
is updated.

The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurrent accesses,
contention was induced.

With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 or 64 (depending on CPU) pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.

Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
2018-05-17 14:19:05 +02:00
Fan Zhang
3c79609fda vhost/crypto: handle virtually non-contiguous buffers
This patch enables the handling of buffers non-contiguous in
virtual address space in the vhost_crypto. Instead of using
rte_vhost_va_from_guest_pa(), the host virtual address is
converted by vhost_iova_to_vva() for wider use cases.

For copy mode, the copy length is limited to the chunk size,
next chunks VAs being fetched afterward.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-17 12:29:05 +02:00
Fan Zhang
2017bc3356 vhost/crypto: fix descriptor move
This patch fixes the redundant descriptor move in the copy mode
of vhost crypto. Originally the redundant descriptor move will
cause the message parsing error.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-17 12:29:05 +02:00
Maxime Coquelin
d5022533c2 vhost: retranslate vring addr when memory table changes
When the vhost-user master sends memory updates using
VHOST_USER_SET_MEM request, the user backends unmap and then
mmap again the memory regions in its address space.

If the ring addresses have already been translated, it needs to
be translated again as they point to unmapped memory.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:31:50 +01:00
Fan Zhang
dc3530ec0d vhost/crypto: fix symmetric ciphering
A bracket was misplaced in a condition check, this patch
fixes it.

Coverity issue: 277232, 277237
Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:31:03 +01:00
Maxime Coquelin
8eac207d49 vhost: fix header copy to discontiguous desc buffer
In the loop to copy virtio-net header to the descriptor buffer,
destination pointer was incremented instead of the source
pointer.

Fixes: fb3815cc61 ("vhost: handle virtually non-contiguous buffers in Rx-mrg")
Fixes: 6727f5a739 ("vhost: handle virtually non-contiguous buffers in Rx")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:31:03 +01:00
Tonghao Zhang
bfbf0143d1 vhost: fix typo in comment
Fixes: 3670686ab9 ("vhost: fix race for connection fd")
Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:31:03 +01:00
Tonghao Zhang
52d874dc67 vhost: fix crash on closing in client mode
when rte_vhost_driver_unregister detstroy the vsocket, we
should set it to NULL after freeing it, because in client mode,
the conn may be added to reconnect thread while vsocket is
destroyed. In one case, if qemu create vhostuser port as a
server with the same unix path, the reconnect thread will
reconnect to it while vsocket is destroyed.

To fix this:
1. set vsocket to NULL after free it.
2. remove the reconnection from reconnection thread in suitable
   position.

Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:30:48 +01:00
Tonghao Zhang
8b4b949144 vhost: fix dead lock on closing in server mode
When qemu close the unix socket fd of the vhostuser as a
server, and then immediately delete the vhostuser port on
openvswitch. There will be a deadlock.

A thread (fdset event thread):       B thread:
1. fdset_event_dispatch              rte_vhost_driver_unregister
2. set the fd busy to 1.             lock vsocket->conn_mutex
3. vhost_user_read_cb                fdset_del waits busy changed to 0.
4. vhost peer closed, remove the
   conn from vsocket->conn_list:
   lock vsocket->conn_mutex

5. set the fd busy to 0

Fixes: 65388b43f5 ("vhost: fix fd leaks for vhost-user server mode")
Cc: stable@dpdk.org

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-05-14 22:29:59 +01:00
Ferruh Yigit
04db1d0da7 lib: clear experimental version tag in linker scripts
Remove version tag from experimental block in linker version scripts
(.map files).

That label is not used by linker and information only. It is useful
for version blocks but not useful for experimental block but confusing.
Removing those labels.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2018-05-14 03:37:28 +02:00
Fan Zhang
613e827fb2 vhost/crypto: fix checks while moving descriptors
This patch fix final condition check while moving virtqueue
descriptors.

Fixes: 3bb595ecd6 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 19:49:20 +02:00
Fan Zhang
d4cc4c65df vhost/crypto: fix missing head correction
This patch fixes the missing head descriptor correction for
indirect descriptors.

Fixes: 0aee242841 ("vhost/crypto: move to safe GPA translation API")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 19:49:07 +02:00
Xiao Wang
dfdf4b84b8 vhost: fix vDPA set features
We should call set_features callback after setting features in virtio_net
structure, otherwise vDPA driver cannot get the right features.

Fixes: 07718b4f87 ("vhost: adapt library for selective datapath")

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 18:01:00 +01:00
Maxime Coquelin
bb77d555d4 vhost: revert avoid concurrency when logging dirty pages
This reverts commit 394313fff3.

While the patch did solve concurrency issue, it induces more
pages copies as some clean pages are marked as dirty for
performance reasons. Moreover, as there is no more contention
doing the logging, the rate of packets than can be processed is
higher, leading to even more pages to be dirtied.

It has been reported that with more than one queue pair, and
with a relatively low packet rate (1Mpps), the live migration
never converges until the flow is stopped.

While a better solution is found, it is better to reset to the
old behaviour, i.e. using atomic operation for dirty pages
logging.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 18:01:00 +01:00
Maxime Coquelin
996096e629 vhost/crypto: fix build with gcc 4.7.2
Build error has been reported by Intel build system:
SUSE12SP3_64 / Linux 3.7.10-1 / GCC 4.7.2
lib/librte_vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_set_zero_copy’:
lib/librte_vhost/vhost_crypto.c:1192:2: error:
comparison of unsigned expression < 0 is always false

As enums can be either signed or unsigned, this patch removes
the negative check and cast to unsigned the upper limit check.

Fixes: 939066d965 ("vhost/crypto: add public function implementation")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-27 11:31:39 +02:00
Olivier Matz
6383d2642b eal: set name when creating a control thread
To avoid code duplication, add a parameter to rte_ctrl_thread_create()
to specify the name of the thread.

This requires to add a wrapper for the thread start routine in
rte_thread_init(), which will first wait that the thread is configured.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Olivier Matz
9e5afc72c9 eal: add function to create control threads
Many parts of dpdk use their own management threads. Introduce a new
wrapper for thread creation that will be extended in next commits to set
the name and affinity.

To be consistent with other DPDK APIs, the return value is negative in
case of error, which was not the case for pthread_create().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Olivier Matz
dec7b1884a use sizeof to avoid double use of a length define
Only a cosmetic change: the *_LEN defines are already used
when defining the buffer. Using sizeof() ensures that the length
stays consistent, even if the definition is modified.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-04-25 00:51:31 +02:00
Maxime Coquelin
9553e6e408 vhost: deprecate unsafe GPA translation API
This patch marks rte_vhost_gpa_to_vva() as deprecated because
it is unsafe. Application relying on this API should move
to the new rte_vhost_va_from_guest_pa() API, and check
returned length to avoid out-of-bound accesses.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
0aee242841 vhost/crypto: move to safe GPA translation API
This patch uses the new rte_vhost_va_from_guest_pa() API
to ensure all the descriptor buffer is mapped contiguously
in the application virtual address space.

It does not handle buffers discontiguous in host virtual
address space, but only return an error.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
fb3815cc61 vhost: handle virtually non-contiguous buffers in Rx-mrg
This patch enables the handling of buffers non-contiguous in
process virtual address space in the enqueue path when mergeable
buffers are used.

When virtio-net header doesn't fit in a single chunck, it is
computed in a local variable and copied to the buffer chuncks
afterwards.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
6727f5a739 vhost: handle virtually non-contiguous buffers in Rx
This patch enables the handling of buffers non-contiguous in
process virtual address space in the enqueue path when mergeable
buffers aren't used.

When virtio-net header doesn't fit in a single chunck, it is
computed in a local variable and copied to the buffer chuncks
afterwards.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
91b7b40806 vhost: handle virtually non-contiguous buffers in Tx
This patch enables the handling of buffers non-contiguous in
process virtual address space in the dequeue path.

When virtio-net header doesn't fit in a single chunck, it is
copied into a local variablei before being processed.

For packet content, the copy length is limited to the chunck
size, next chuncks VAs being fetched afterward.

This issue has been assigned CVE-2018-1059.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 17:12:13 +02:00
Maxime Coquelin
d0c24508e1 vhost: add support for non-contiguous indirect descs tables
This patch adds support for non-contiguous indirect descriptor
tables in VA space.

When it happens, which is unlikely, a table is allocated and the
non-contiguous content is copied into it.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
30920b1e2b vhost: ensure all range is mapped when translating QVAs
This patch ensures that all the address range is mapped when
translating addresses from master's addresses (e.g. QEMU host
addressess) to process VAs.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
41333fba5b vhost: introduce safe API for GPA translation
This new rte_vhost_va_from_guest_pa API takes an extra len
parameter, used to specify the size of the range to be mapped.
Effective mapped range is returned via len parameter.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
070aceda33 vhost: check all range is mapped when translating GPAs
There is currently no check done on the length when translating
guest addresses into host virtual addresses. Also, there is no
guanrantee that the guest addresses range is contiguous in
the host virtual address space.

This patch prepares vhost_iova_to_vva() and its callers to
return and check the mapped size. If the mapped size is smaller
than the requested size, the caller handle it as an error.

This issue has been assigned CVE-2018-1059.

Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Maxime Coquelin
c6ae7de0de vhost: fix indirect descriptors table translation size
This patch fixes the size passed at the indirect descriptor
table translation time, which is the len field of the descriptor,
and not a single descriptor.

This issue has been assigned CVE-2018-1059.

Fixes: 62fdb8255a ("vhost: use the guest IOVA to host VA helper")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-23 16:04:30 +02:00
Fan Zhang
b4ca812986 vhost/crypto: fix build without cryptodev
Vhost-Crypto shall not be compiled if rte_cryptodev is disabled.
This patch fix this by adding checking to Makefile.

Fixes: d090c7f86a76 ("vhost/crypto: update makefile")

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
2018-04-17 12:36:40 +02:00
Junjie Chen
3f8ff12821 vhost: support interrupt mode
In some cases we want vhost dequeue work in interrupt mode to
release cpus to others when no data to transmit. So we install
interrupt handler of vhost device and interrupt vectors for each
rx queue when creating new backend according to vhost interrupt
configuration. Thus, applications could register a epoll event fd
to associate rx queues with interrupt vectors.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
939066d965 vhost/crypto: add public function implementation
This patch adds public API implementation to vhost crypto.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
3bb595ecd6 vhost/crypto: add request handler
This patch adds the implementation that parses virtio crypto request
to dpdk crypto operation.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
e80a987081 vhost/crypto: add session message handler
This patch adds session message handler to vhost crypto.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
136076ed72 vhost/crypto: add user message structure
This patch adds virtio-crypto spec user message structure to
vhost_user.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Fan Zhang
9b91fbd6ec vhost/crypto: add vhost-user message handlers
Previously, vhost library lacks the support to the vhost backend
other than net such as adding private data or registering vhost-user
message handlers. This patch fills the gap by adding data pointer and
vhost-user pre and post message handlers to vhost library.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Jay Zhou
5303a48b53 vhost: add virtio crypto header file
Since the linux kernel header file virtio_crypto.h has been merged
in 4.9, if we include this header file directly, compilation will be
failed in the old kernels' environment, e.g. the vhost crypto backend
series.
Adding virtio_crypto.h in librte_vhost to make old kernels happy.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Lei Gong <arei.gonglei@huawei.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:43:30 +02:00
Zhihong Wang
bd2e0c3fe5 vhost: add APIs for live migration
This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
07718b4f87 vhost: adapt library for selective datapath
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
b4953225ce vhost: add APIs for datapath configuration
This patch adds APIs for datapath configuration.

The did of the vhost-user socket can be set to identify the backend device,
in this case each vhost-user socket can have only 1 connection. The did is
set to -1 by default when the software datapath is used.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
d7280c9fff vhost: support selective datapath
This patch set introduces support for selective datapath in DPDK vhost-user
lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
virtio ring compatible devices to serve virtio driver directly to enable
datapath acceleration.

A set of device ops is defined for device specific operations:

     a. get_queue_num: Called to get supported queue number of the device.

     b. get_features: Called to get supported features of the device.

     c. get_protocol_features: Called to get supported protocol features of
        the device.

     d. dev_conf: Called to configure the actual device when the virtio
        device becomes ready.

     e. dev_close: Called to close the actual device when the virtio device
        is stopped.

     f. set_vring_state: Called to change the state of the vring in the
        actual device when vring state changes.

     g. set_features: Called to set the negotiated features to device.

     h. migration_done: Called to allow the device to response to RARP
        sending.

     i. get_vfio_group_fd: Called to get the VFIO group fd of the device.

     j. get_vfio_device_fd: Called to get the VFIO device fd of the device.

     k. get_notify_area: Called to get the notify area info of the queue.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Zhihong Wang
2e28f45b69 vhost: export vhost feature definitions
This patch exports vhost-user protocol features to support device driver
development.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-04-14 00:40:21 +02:00
Jianfeng Tan
768274ebbd vhost: avoid populate guest memory
It's not necessary to populate guest memory from vhost side unless
zerocopy is enabled or users want better performance.

Update the doc for guest memory requirement clarification.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 17:25:45 +02:00
Tonghao Zhang
d64c43773a vhost: add pipe event for optimizing negotiation
When vhost-user connects qemu successfully, dpdk will call
the vhost_user_add_connection to add unix socket fd to poll.
And fdset_add only set the socket fd to a fdentry while poll
may sleep now. In a general case, this is no problem. But if
we use hot update for vhost-user, most downtime of VMs network
is 750+ms. This patch adds pipe event, so after connections are
ok, dpdk rebuild the poll immediately. With this patch, the
most downtime is 20~30ms.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 17:25:45 +02:00
Tonghao Zhang
9426ee2678 vhost: move stdbool include
The vhost.h file uses bool type, but not include stdbool
header file. If other c files include vhost.h directly,
there will be a compile error.

This patch will be used in the next patch.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Tonghao Zhang
ce5bd5fcae vhost: add fdset-event thread name
This patch adds the name for vhost fdset thread.
It can help us to know whether the thread is running.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Tonghao Zhang
2db2d3220b vhost: raise error on fdset-thread creation
When first call the 'rte_vhost_driver_start', the
fdset_event_dispatch thread should be created successfully.
Because the vhost uses it to poll socket events for vhost
server or clients. Without it, for example, vhost will not
get the connection event.

This patch returns err code directly when created not successful.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Maxime Coquelin
394313fff3 vhost: avoid concurrency when logging dirty pages
This patch aims at fixing a migration performance regression
faced since atomic operation is used to log pages as dirty when
doing live migration.

Instead of setting a single bit by doing an atomic read-modify-write
operation to log a page as dirty, this patch write 0xFF to the
corresponding byte, and so logs 8 page as dirty.

The advantage is that it avoids concurrent atomic operations by
multiple PMD threads, the drawback is that some clean pages are
marked as dirty and so are transferred twice.

Fixes: 897f13a1f7 ("vhost: make page logging atomic")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-03-30 14:08:44 +02:00
Tiwei Bie
7a36967029 vhost: do not generate signal when sendmsg fails
More precisely, do not generate a SIGPIPE signal if the peer
has closed the connection. Otherwise, it will terminate the
process by default. As a library, we should avoid terminating
the application process when error happens and just need to
return with an error.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Tiwei Bie
71d93e9dd6 vhost: support sending fds via slave channel
This function will be used to send fds to QEMU via slave channel.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:44 +02:00
Ilya Maximets
1cf62d9685 vhost: add note about sockets in server mode
From time to time, someone sends patches about unlinking existing
sockets when registering a vhost user in server mode.

A recent example:
	http://dpdk.org/ml/archives/dev/2018-February/090025.html

This problem has been discussed many times, and it was made clear that
the library should not unlink files given by the application in order
to avoid possible security problems, such as removing random files
used by other programs.

One of the first discussions:
	http://dpdk.org/ml/archives/dev/2015-December/030326.html

To avoid such patches in the future, it was decided to add a comment
that explains what is happening and tries to describe the reasoning.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:43 +02:00
Tomasz Kulasek
90bb22a197 vhost: fix ring index returned to master on stop
According to the "Vhost-user Protocol" document,
VHOST_USER_GET_VRING_BASE should get the available vring base offset.

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
06fc115977 vhost: fix log macro name conflict
LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h is
included the symbols conflict.

This patch changes LOG_DEBUG to VHOST_LOG_DEBUG.

Fixes: 1c01d52392 ("vhost: add debug print")
Cc: stable@dpdk.org

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Jianfeng Tan
ae034edaa6 vhost: avoid function call in data path
Previously, get_device() is a function call. It's OK for slow path
configuration, but takes some cycles for data path.

To avoid that, we turn this function to inline type.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Jianfeng Tan
bdf78f9f24 vhost: remove unused log constant
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
7afa2e4538 vhost: fix realloc failure
When reallocation of guest pages fails, vhost_user_set_mem_table() also
should fail.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
ace7b6b785 vhost: fix device cleanup at stop
This prevents from destroying & recreating user device in "incomplete"
vring state. virtio_is_ready() was returning true for devices with
vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL
hasn't arrived yet)

Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
aa001111b0 vhost: check cmsg not null
Fixes: 8f972312b8 ("vhost: support vhost-user")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Tomasz Kulasek
fbc4d248b1 vhost: fix offset while mmaping log base address
QEMU always set offset to 0 but for sanity we should take the offset
into account.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")
Cc: stable@dpdk.org

Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
0fe99cf73e vhost: check overflow before mmap
If memory_size + mmap_offset overflows then the memory region is bogus.
Do not use the overflowed mmap_size value for mmap().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
eb7c574b21 vhost: validate virtqueue size
Check the virtqueue size constraints so that invalid values don't cause
bugs later on in the code.  For example, sometimes the virtqueue size is
stored as unsigned int and sometimes as uint16_t, so bad things happen
if it is ever larger than 65535.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
55659ed3ed vhost: fix message payload union in setting ring address
vhost_user_set_vring_addr() uses the msg->payload.addr union member, not
msg->payload.state.  Luckily the offset of the 'index' field is
identical in both structs, so there was never any buggy behavior.

Fixes: 5cd690e4fd ("vhost: fix vring addresses not translated")
Cc: stable@dpdk.org

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
f83e0199c8 vhost: reject invalid log base mmap offset
If the log base mmap_offset is larger than mmap_size then it points
outside the mmap region.  We must not write to memory outside the mmap
region, so validate mmap_offset in vhost_user_set_log_base().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
2042cf7194 vhost: clear out unused SCM_RIGHTS file descriptors
The number of file descriptors received is not stored by vhost_user.c.
vhost_user_set_mem_table() assumes that memory.nregions matches the
number of file descriptors received, but nothing guarantees this:

  for (i = 0; i < memory.nregions; i++)
      close(pmsg->fds[i]);

Another questionable code snippet is:

  case VHOST_USER_SET_LOG_FD:
      close(msg.fds[0]);

If not enough file descriptors were received then fds[] contains
uninitialized data from the stack (see read_fd_message()).  This might
cause non-vhost file descriptors to be closed if the uninitialized data
happens to match.

Refactoring vhost_user.c to pass around and check the number of file
descriptors everywhere would make the code more complex.  It is simpler
for read_fd_message() to set unused elements in fds[] to -1.  This way
close(-1) is called and no harm is done.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
4d490c7ce3 vhost: validate untrusted memory regions number field
Check if memory.nregions is valid right away.  This eliminates the
possibility of bugs when memory.nregions is used later on in
vhost_user_set_mem_table().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
cdc37ca3d0 vhost: avoid enum fields in VhostUserMsg
The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.

The VhostUserMsg.request union contains enum fields.  Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:

  6.7.2.2 Enumeration specifiers
  ...
  Each enumerated type shall be compatible with char, a signed integer
  type, or an unsigned integer type. The choice of type is
  implementation-defined, but shall be capable of representing the
  values of all the members of the enumeration.

Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:

  if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {

If msg.request.master is signed then negative values pass this check!

Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type.  For
example, if we add a negative constant then its type changes to signed
int:

  typedef enum VhostUserRequest {
      ...
      VHOST_USER_INVALID = -1,
  };

This is very fragile and it's unlikely that anyone changing the code
would remember this.  A security hole can be introduced accidentally.

This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Stefan Hajnoczi
c45427a48e vhost: add security model documentation
Input validation is not applied consistently in vhost_user.c.  This
suggests that not everyone has the same security model in mind when
working on the code.

Make the security model explicit so that everyone can understand and
follow the same model when modifying the code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-03-30 14:08:42 +02:00
Maxime Coquelin
9fce5d0b40 vhost: do not take lock on owner reset
A deadlock happens when handling VHOST_USER_RESET_OWNER request
for the same reason the lock is not taken for
VHOST_USER_GET_VRING_BASE.

It is safe not to take the lock, as the queues are no more used
by the application when the virtqueues and the device are reset.

Fixes: a368804699 ("vhost: protect active rings from async ring changes")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
2018-02-13 18:58:02 +01:00
Maxime Coquelin
82b9c15403 vhost: remove pending IOTLB entry if miss request failed
In case vhost_user_iotlb_miss returns an error, the pending IOTLB
entry has to be removed from the list as no IOTLB update will be
received.

Fixes: fed67a20ac ("vhost: introduce guest IOVA to backend VA helper")
Cc: stable@dpdk.org

Suggested-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-02-05 19:56:04 +01:00
Maxime Coquelin
37771844a0 vhost: fix IOTLB pool out-of-memory handling
In the unlikely case the IOTLB memory pool runs out of memory,
an issue may happen if all entries are used by the IOTLB cache,
and an IOTLB miss happen. If the iotlb pending list is empty,
then no memory is freed and allocation fails a second time.

This patch fixes this by doing an IOTLB cache random evict if
the IOTLB pending list is empty, ensuring the second allocation
try will succeed.

In the same spirit, the opposite is done when inserting an
IOTLB entry in the IOTLB cache fails due to out of memory. In
this case, the IOTLB pending is flushed if the IOTLB cache is
empty to ensure the new entry can be inserted.

Fixes: d012d1f293 ("vhost: add IOTLB helper functions")
Fixes: f72c2ad63a ("vhost: add pending IOTLB miss request list and helpers")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-02-05 19:56:04 +01:00
Stefan Hajnoczi
33adfbc805 vhost: drop virtqueues only with built-in virtio driver
Commit e291093235 ("vhost: destroy unused
virtqueues when multiqueue not negotiated") broke vhost-scsi by removing
virtqueues when the virtio-net-specific VIRTIO_NET_F_MQ feature bit is
missing.

The vhost_user.c code shouldn't assume all devices are vhost net device
backends.  Use the new VIRTIO_DEV_BUILTIN_VIRTIO_NET flag to check
whether virtio_net.c is being used.

This fixes examples/vhost_scsi.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-02-05 15:11:19 +01:00
Stefan Hajnoczi
1c717af4c6 vhost: add flag for built-in virtio driver
The librte_vhost API is used in two ways:
1. As a vhost net device backend via rte_vhost_enqueue/dequeue_burst().
2. As a library for implementing vhost device backends.

There is no distinction between the two at the API level or in the
librte_vhost implementation.  For example, device state is kept in
"struct virtio_net" regardless of whether this is actually a net device
backend or whether the built-in virtio_net.c driver is in use.

The virtio_net.c driver should be a librte_vhost API client just like
the vhost-scsi code and have no special access to vhost.h internals.
Unfortunately, fixing this requires significant librte_vhost API
changes.

This patch takes a different approach: keep the librte_vhost API
unchanged but track whether the built-in virtio_net.c driver is in use.
See the next patch for a bug fix that requires knowledge of whether
virtio_net.c is in use.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-02-05 15:11:07 +01:00
Zhihong Wang
704098fc47 vhost: fix build with old kernels
This patch fixes compile failure with old kernels which have no
VIRTIO_F_ANY_LAYOUT defined.

Fixes: 5a8bb6e902 ("vhost: claim to support any layout feature")

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2018-01-31 12:11:13 +01:00
Bruce Richardson
6c9457c279 build: replace license text with SPDX tag
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
f3eeed27ff build: detect and use libnuma
DPDK has an optional dependency on libnuma, so manage that through the
build system, by dynamically detecting the presence of the needed library
and header files. Since this library is used by both EAL and vhost, check
for the presence at the top level in the config directory.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Luca Boccassi <bluca@debian.org>
2018-01-30 21:58:59 +01:00
Bruce Richardson
5b9656b157 lib: build with meson
Add non-EAL libraries to DPDK build. The compat lib is a special case,
along with the previously-added EAL, but all other libs can be build using
the same set of commands, where the individual meson.build files only need
to specify their dependencies, source files, header files and ABI versions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
2018-01-30 17:49:16 +01:00
Zhihong Wang
5a8bb6e902 vhost: claim to support any layout feature
The VIRTIO_F_ANY_LAYOUT feature indicates the device accepts arbitrary
descriptor layouts. The vhost-user lib already supports it, but the
feature declaration is missing. This patch fixes the mismatch.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-29 10:04:28 +01:00
Neil Horman
a6ec31597a mk: add experimental tag check
Add checks during build to ensure that all symbols in the EXPERIMENTAL
version map section have __experimental tags on their definitions, and
enable the warnings needed to announce their use.  Also add an
ALLOW_EXPERIMENTAL_APIS define to allow individual libraries and files
to declare the acceptability of experimental api usage

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-01-29 23:35:29 +01:00
Victor Kaplansky
a368804699 vhost: protect active rings from async ring changes
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.

This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.

These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.

Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.

See https://bugzilla.redhat.com/show_bug.cgi?id=1450680

Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-21 15:51:52 +01:00
Junjie Chen
3ebd930588 vhost: fix mbuf free
dequeue zero copy change buf_addr and buf_iova of mbuf, and return
to mbuf pool without restore them, it breaks vm memory if others allocate
mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova.

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-21 15:51:52 +01:00
Xiao Wang
c09141e56f net: fix RARP generation
Due to a mistake operation from me, older version (v10) was merged to
master branch. It's the v11 should be applied. However, the master branch
is not rebase-able. Thus, this patch is made, from the diff between v10
and v11.

The diffs are:

- Add check for parameter and tailroom in rte_net_make_rarp_packet
- Allocate mbuf in rte_net_make_rarp_packet

Besides that, a link error is fixed when shared lib is enabled.

Fixes: 45ae05df82 ("net: add a helper for making RARP packet")
Fixes: c3ffdba0e8 ("vhost: use API to make RARP packet")

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-21 15:51:52 +01:00
Junjie Chen
2651726def vhost: do deep copy while reallocating queue
When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform
deep copy, which lead to 1) zmbuf list not valid 2) remote memory access.
This patch is to re-initlize the zmbuf list and also do the deep copy.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-21 15:51:52 +01:00
Jiayu Hu
57b4eafa1d vhost: support Explicit Congestion Notification
In virtio, Explicit Congestion Notification (ECN) includes two parts:
guest ECN and host ECN. Guest ECN means the frontend can handle TSO
packets which have ECN set, and host ECN means the backend can handle
TSO packets which have ECN set.

The ECN features are rarely used. However, virtio-net enables them by
default, and vhost-net support both. To make live migration from
vhost-net to vhost-user possible, this patch announces to support
guest and host ECN in vhost-user.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Xiao Wang
c3ffdba0e8 vhost: use API to make RARP packet
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-01-16 18:47:49 +01:00
Olivier Matz
da51d2f6b8 vhost: fix error code check when creating thread
On error, pthread_create() returns a positive number (errno).
Fix the test on the return value.

Fixes: af14759181 ("vhost: introduce API to start a specific driver")
Fixes: e623e0c6d8 ("vhost: add reconnect ability")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2018-01-16 18:47:49 +01:00
Tonghao Zhang
ae0b1de941 vhost: add reconnect thread name for client mode
This patch adds the name for vhost-user reconnect thread.
It can help us to know whether the thread is running.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Junjie Chen
e37ff95440 vhost: support virtqueue interrupt/notification suppression
The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is
negotiated. The driver set vring flags to 0, and MAY use used_event in
available ring to advise device interrupt util reach an index specified
by used_event. The device ignore the lower bit of vring flags, and send
an interrupt when index reach used_event.

The device can suppress notification in a manner analogous to the ways
driver suppress interrupt. The device manipulates flags or avail_event in
the used ring in the same way the driver manipulates flags or used_event in
available ring.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Maxime Coquelin
e291093235 vhost: destroy unused virtqueues when multiqueue not negotiated
QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
declared in QEMU command line before the guest is started.
It has the effect in DPDK vhost-user backend to allocate vrings
for all queues declared by QEMU.

If the first driver being used does not support multiqueue,
the device never changes to VIRTIO_DEV_RUNNING state as only
the first queue pair is initialized. One driver impacted by
this bug is virtio-net's iPXE driver which does not support
VIRTIO_NET_F_MQ feature.

It is safe to destroy unused virtqueues in SET_FEATURES request
handler, as it is ensured the device is not in running state
at this stage, so virtqueues aren't being processed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Maxime Coquelin
467fe22df9 vhost: extract virtqueue cleaning and freeing functions
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Maxime Coquelin
59fe5e17d9 vhost: propagate set features handling error
Not propagating VHOST_USER_SET_FEATURES request handling
error may result in unpredictable behavior, as host and
guests features may no more be synchronized.

This patch fixes this by reporting the error to the upper
layer, which would result in the device being destroyed
and the connection with the master to be closed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Maxime Coquelin
07f8db29b8 vhost: prevent features to be changed while device is running
As section 2.2 of the Virtio spec states about features
negotiation:
"During device initialization, the driver reads this and tells
the device the subset that it accepts. The only way to
renegotiate is to reset the device."

This patch implements a check to prevent illegal features change
while the device is running.

One exception is the VHOST_F_LOG_ALL feature bit, which is enabled
when live-migration is initiated. But this feature is not negotiated
with the Virtio driver, but directly with the Vhost master.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Jiayu Hu
6d18505efa vhost: support UDP Fragmentation Offload
In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
and guest UFO. Guest UFO means the frontend can receive large UDP
packets, and host UFO means the backend can receive large UDP packets.
This patch supports host UFO and guest UFO for vhost-user.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Stefan Hajnoczi
6c299bb732 vhost: introduce vring call API
Users of librte_vhost currently implement the vring call operation
themselves.  Each caller performs the operation slightly differently.

This patch introduces a new librte_vhost API called
rte_vhost_vring_call() that performs the operation so that vhost-user
applications don't have to duplicate it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Stefan Hajnoczi
413a8fee30 vhost: add vring call helper
Extract the callfd eventfd signal operation so virtio_net.c does not
have to repeat it multiple times.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Jiayu Hu
ee1bc7d0dc vhost: support Generic Segmentation Offload
In virtio, Generic Segmentation Offload (GSO) is the feature for the
backend, which means the backend can receive packets with any GSO
type.

Virtio-net enables the GSO feature by default, and vhost-net supports it.
To make live migration from vhost-net to vhost-user possible, this patch
enables GSO for vhost-user.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Junjie Chen
803aeecef1 vhost: fix dequeue zero copy with virtio1
This fix dequeue zero copy can not work with Qemu
version >= 2.7. Since from Qemu 2.7 virtio device
use virtio-1 protocol, the zero copy code path
forget to add offset to buffer address.

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Jianfeng Tan
cab278dee9 vhost: fix crash
In a running VM, operations (like device attach/detach) will
trigger the QEMU to resend set_mem_table to vhost-user backend.

DPDK vhost-user handles this message rudely by unmap all existing
regions and map new ones. This might lead to segfault if there
is pmd thread just trying to touch those unmapped memory regions.

But for most cases, except VM memory hotplug, QEMU still sends the
set_mem_table message even the memory regions are not changed as
QEMU vhost-user filters out those not backed by file (fd > 0).

To fix this case, we add a check in the handler to see if the
memory regions are really changed; if not, we just keep old memory
regions.

Fixes: 8f972312b8 ("vhost: support vhost-user")
CC: stable@dpdk.org

Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2018-01-16 18:47:49 +01:00
Bruce Richardson
369991d997 lib: use SPDX tag for Intel copyright files
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-04 22:41:39 +01:00
Maxime Coquelin
002d6a7e55 vhost: add flag to enable IOMMU support
Qemu versions from v2.7.0 to v2.9.0 have their reply-ack protocol
feature implementation broken with multiqueue. The reply-ack
protocol feature is optional except for IOMMU feature.

This patch introduce a new RTE_VHOST_USER_IOMMU_SUPPORT flag to
enable VIRTIO_F_IOMMU_PLATFORM virtio feature.

By default, the IOMMU support is now disabled.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2017-11-07 14:19:11 +01:00
Maxime Coquelin
6ea069651e vhost: disable reply-ack feature if IOMMU disabled
If the application has disabled VIRTIO_F_IOMMU_PLATFORM, disable
VHOST_USER_PROTOCOL_F_REPLY_ACK protocol feature that is only
mandatory with IOMMU for now.

This is done to provide a way for the application to support
multiqueue with old Qemu versions (v2.7.0 to v2.9.0) that have
reply-ack feature broken.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2017-11-07 14:13:47 +01:00
Maxime Coquelin
5a4933e56b vhost: postpone ring address translations at kick time only
If multiple queue pairs are created but all are not used, the
device is never started, as unused queues aren't enabled and
their ring addresses aren't translated. The device is changed
to running state when all rings addresses are translated.

This patch fixes this by postponning rings addresses translation
at kick time unconditionnaly, VHOST_USER_F_PROTOCOL_FEATURES
being negotiated or not.

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-11-07 02:33:05 +01:00
Santosh Shukla
df6e0a06a3 drivers/net: rename physical address type to IOVA
Renamed data type from phys_addr_t to rte_iova_t.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2017-11-06 22:44:26 +01:00
Santosh Shukla
455da54539 mbuf: rename physical address to IOVA
Rename buf_physaddr to buf_iova.
Keep the deprecated name in an anonymous union to avoid breaking
the API.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2017-11-06 22:44:26 +01:00
Thomas Monjalon
62196f4e09 mem: rename address mapping function to IOVA
The function rte_mem_virt2phy() is kept and used in functions which
works only with physical addresses.
For all other calls this function is replaced by rte_mem_virt2iova()
which does a direct mapping (no conversion) in the VA case.

Note: the new function rte_mem_virt2iova() function matches the
behaviour implemented in rte_mem_virt2phy() by the commit
680f6c1260 ("mem: honor IOVA mode in virt2phy")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-11-06 22:24:19 +01:00
Tiwei Bie
1d8161ba02 vhost: fix dequeue offload support
When offload is enabled, vhost needs to access the first mbuf
to get the packet info, e.g. TCP header. So we couldn't delay
the data copy in this case.

Fixes: e5c494a7a2 ("vhost: batch small guest memory copies")

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-24 21:31:23 +02:00
Maxime Coquelin
5cd690e4fd vhost: fix vring addresses not translated
Commit 3ea7052f4b ("vhost: postpone rings addresses translation")
moves rings addresses translation at either vring kick or enable
time, depending on whether protocol features are enabled or not.
This is done not interpret ring information as long as the vring
is not fully initialized.

The problem is that with old QEMU versions, like v2.5, the ring
is enabled before addresses are sent, so addresses are never
translated.

This patch fixes the issue by doing the translation in
VHOST_USER_SET_VRING_ADDR handling if ring is already enabled.

Fixes: 3ea7052f4b ("vhost: postpone rings addresses translation")

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-24 21:26:10 +02:00
Olivier Matz
cbc12b0a96 mk: do not generate LDLIBS from directory dependencies
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.

The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.

Some DEPDIRS-xyz variables become useless, remove them.

Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
2017-10-24 02:14:57 +02:00
Maxime Coquelin
86fe881c03 vhost: fetch ring address after NUMA reallocation
In case of NUMA reallocation, the virtqueue struct is reallocated
on another socket, meaning that its address changes.

In translate_ring_addresses(), addr pointer was not fetched again
after the reallocation, so it pointed to freed memory.

This patch just fetch again addr pointer after the reallocation.

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-13 22:08:21 +02:00
Maxime Coquelin
b9c07b3141 vhost: fix IOTLB on NUMA realloc
In case of NUMA reallocation, virtqueue's iotlb list is broken,
has its head changes but first iotlb entry in the list still points
to the previous head pointer.

Also, in case of reallocation, we want the IOTLB cache mempool to be
on the new socket.

This patch perform a full re-init of the IOTLB cache when mempool
already exists, and calls the IOTLB cache init function in case
the virtqueue is being reallocated on a new socket.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-13 22:08:21 +02:00
Maxime Coquelin
1aadb2f6b1 vhost: fix deadlock on IOTLB miss
An optimization was done to only take the iotlb cache lock
once per packet burst instead of once per IOVA translation.

With this, IOTLB miss requests are sent to Qemu with the lock
held, which can cause a deadlock if the socket buffer is full,
and if Qemu is waiting for an IOTLB update to be done.

Holding the lock is not necessary when sending an IOTLB miss
request, as it is not manipulating the IOTLB cache list, which
the lock protects. Let's just release it while sending the
IOTLB miss.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-13 22:08:21 +02:00
Bruce Richardson
d65b3b1668 vhost: fix false-positive warning from clang 5
When compiling with clang extra warning flags, such as used by default with
meson, a warning is given in iotlb.c:

lib/librte_vhost/iotlb.c:318:6: warning:
	variable 'socket' is used uninitialized whenever
	'if' condition is false [-Wsometimes-uninitialized]

This is a false positive, as the socket value will be initialized by the
call to get_mempolicy in the case where the NUMA build-time flag is set,
and in cases where it is not set, "if (ret)" will always be true as ret is
initialized to -1 and never changed.

However, this is not immediately obvious, and is perhaps a little fragile,
as it will break if other code using ret is subsequently added above the
call to get_mempolicy by someone unaware of this subtle dependency.
Therefore, we can fix the warning and making the code more robust by
explicitly initializing socket to zero, and moving the extra condition
check on the return from get_mempolicy() into the #ifdef

Fixes: d012d1f293 ("vhost: add IOTLB helper functions")

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2017-10-11 13:56:34 +02:00
Maxime Coquelin
3494ed045e vhost: distinguish master and slave requests
This patch adds an union in VhostUserMsg to distinguish between
master and slave initiated requests, instead of casting slave
requests as master request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:54:31 +02:00
Dariusz Stojaczyk
efba12a78d vhost: add user callbacks for socket open/close
Added new callbacks to notify about socket connection status.
As destroy_device is used for virtqueue processing *pause* as well as
connection close, the user has no distinction between those.

Consider the following scenario:
rte_vhost: received SET_VRING_BASE message,
           calling destroy_device() as usual

user:  end-user asks to remove the device (together with socket file),
       OK, device is not *in use* - that's NOT the behavior we want
       calling rte_vhost_driver_unregister() etc.

Instead of changing new_device/destroy_device callbacks and breaking
the ABI, a set of new functions new_connection/destroy_connection
has been added.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
2017-10-10 15:54:31 +02:00
Kuba Kozak
66a6210124 vhost: check poll error code
Add return value check for poll() call.

Coverity issue: 140740
Fixes: 59317cef24 ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org

Signed-off-by: Kuba Kozak <kubax.kozak@intel.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:54:31 +02:00
Maxime Coquelin
69c90e98f4 vhost: enable IOMMU support
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:53:27 +02:00
Maxime Coquelin
36031f80cc vhost: invalidate vring in case of matching IOTLB invalidate
As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
eefac9536a vhost: postpone device creation until rings are mapped
Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
09927b5249 vhost: translate ring addresses when IOMMU enabled
When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.

When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.

The function will be called again on the next IOTLB update message
reception if matching the vring addresses.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
3ea7052f4b vhost: postpone rings addresses translation
This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].

When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().

[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
b0098b5e21 vhost: fix dereferencing invalid pointer after realloc
numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.

Problem is that vhost_user_msg_handler() still dereferences old
pointer afterward.

This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.

Fixes: af295ad469 ("vhost: realloc device and queues to same numa node as vring desc")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
321203a54b vhost: enable rings at the right time
When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
62fdb8255a vhost: use the guest IOVA to host VA helper
Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
fed67a20ac vhost: introduce guest IOVA to backend VA helper
This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
e95f34d380 vhost: handle IOTLB update and invalidate requests
Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
76e99bfc4c vhost: initialize vrings IOTLB caches
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
01a4bb55f9 vhost: support IOTLB miss slave requests
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
f72c2ad63a vhost: add pending IOTLB miss request list and helpers
In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
d012d1f293 vhost: add IOTLB helper functions
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
06903abc0d vhost: add IOMMU-related macros for old kernels
These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
275c3f9447 vhost: support slave requests channel
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
a0563bd2e3 vhost: prepare for slave requests
send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.

With upcoming channel for backend initiated request,
this function can be used to send requests.

This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
25bf7a0b09 vhost: make error handling consistent in Rx path
In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why
it failed, and assume the desc buffer is large enough.

In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.

This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
94018cf3d5 vhost: revert workaround MQ fails to startup
This reverts commit 04d8122796 ("vhost: workaround MQ fails to
startup").

As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.

The reply-ack feature is required for vhost-user IOMMU support.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Tiwei Bie
e5c494a7a2 vhost: batch small guest memory copies
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.

This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:48:53 +02:00
Tiwei Bie
897f13a1f7 vhost: make page logging atomic
Each dirty page logging operation should be atomic. But it's not
atomic in current implementation. So it's possible that some dirty
pages can't be logged successfully when different threads try to
log different pages into the same byte of the log buffer concurrently.
This patch fixes this issue.

Fixes: b171fad1ff ("vhost: log used vring changes")
Cc: stable@dpdk.org

Reported-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-08-03 22:09:48 +02:00
Zhiyong Yang
78b2e3bae1 vhost: fix initialization
Exception handling is executed in the normal path and it will cause
vhost-user init failure.

Fixes: d6983a70e2 ("vhost: check return of pthread calls")

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-07-19 22:49:47 +03:00
Ilya Maximets
185a883597 vhost: print reason of NUMA node query failure
syscall always returns '-1' on failure and there is no point
in printing that value. 'errno' is much more informative.

Fixes: 586e390013 ("vhost: export numa node")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-07 02:17:56 +02:00
Jens Freimann
2dfeebe265 vhost: check return of mutex initialization
Check return value of pthread_mutex_init(). Also destroy
mutex in case of other erros before returning.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:30:54 +02:00
Jens Freimann
d6983a70e2 vhost: check return of pthread calls
Make sure we catch and log failed calls to pthread
functions.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:30:47 +02:00
Jens Freimann
6846128798 vhost: add missing check in driver registration
Add a check for strdup() return value and fail gracefully if we
get a bad return code.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:11:01 +02:00
Maxime Coquelin
02f62392ff vhost: fix MTU device feature check
The MTU feature support check has to be done against MTU
feature bit mask, and not bit position.

Fixes: 72e8543093 ("vhost: add API to get MTU value")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:39:29 +02:00
Ivan Dyukov
c665d9a231 vhost: fix checking of device features
To compare enabled features in current device we must use bit
mask instead of bit position.

Fixes: c843af3aa1 ("vhost: access header only if offloading is supported")
Cc: stable@dpdk.org

Signed-off-by: Ivan Dyukov <i.dyukov@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:38:39 +02:00
Jianfeng Tan
b08b8cfeb2 vhost: fix IP checksum
There is no way to bypass IP checksum verification in Linux
kernel, no matter skb->ip_summed is assigned as CHECKSUM_UNNECESSARY
or CHECKSUM_PARTIAL.

So any packets with bad IP checksum will be dropped at VM IP layer.

To correct, we check this flag PKT_TX_IP_CKSUM to calculate IP csum.

Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:28:34 +02:00
Jianfeng Tan
46b7a8372d vhost: fix TCP checksum
As PKT_TX_TCP_SEG flag in mbuf->ol_flags implies PKT_TX_TCP_CKSUM,
applications, e.g., testpmd, don't set PKT_TX_TCP_CKSUM when TSO
is set.

This leads to that packets get dropped in VM tcp stack layer because
of bad TCP csum.

To fix this, we make sure TCP NEEDS_CSUM info is set into virtio net
header when PKT_TX_TCP_SEG is set, so that VM tcp stack will not
check the TCP csum.

Fixes: 859b480d5a ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:28:22 +02:00
Daniel Verkamp
3cb502b310 vhost: clean up per-socket mutex
vsocket->conn_mutex was allocated with pthread_mutex_init() but never
freed with pthread_mutex_destroy().  This is a potential memory leak,
depending on how pthread_mutex_t is implemented.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:16:31 +02:00
Dariusz Stojaczyk
058e2d294b vhost: log error for badly negotiated features
Since vhost_user_set_features failure is not handled in any way, a
single error log has been added to at least to let the user know that
something has gone wrong.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Yuanhan Liu
ebd792b386 vhost: fix crash on NUMA
The queue allocation was changed, from allocating one queue-pair at a
time to one queue at a time. Most of the changes have been done, but
just with one being missed: the size of copying the old queue is still
based on queue-pair at numa_realloc(), which leads to overwritten issue.
As a result, crash may happen.

Fix it by specifying the right copy size. Also, the net queue macros
are not used any more. Remove them.

Fixes: ab4d7b9f1a ("vhost: turn queue pair to vring")
Cc: stable@dpdk.org

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jens Freimann <jfreiman@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2017-06-16 14:04:25 +02:00