434 Commits

Author SHA1 Message Date
Maxime Coquelin
e95f34d380 vhost: handle IOTLB update and invalidate requests
Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
76e99bfc4c vhost: initialize vrings IOTLB caches
The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
01a4bb55f9 vhost: support IOTLB miss slave requests
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
f72c2ad63a vhost: add pending IOTLB miss request list and helpers
In order to be able to handle other ports or queues while waiting
for an IOTLB miss reply, a pending list is created so that waiter
can return and restart later on with sending again a miss request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
d012d1f293 vhost: add IOTLB helper functions
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
06903abc0d vhost: add IOMMU-related macros for old kernels
These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
275c3f9447 vhost: support slave requests channel
Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
a0563bd2e3 vhost: prepare for slave requests
send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.

With upcoming channel for backend initiated request,
this function can be used to send requests.

This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
25bf7a0b09 vhost: make error handling consistent in Rx path
In the non-mergeable receive case, when copy_mbuf_to_desc()
call fails the packet is skipped, the corresponding used element
len field is set to vnet header size, and it continues with next
packet/desc. It could be a problem because it does not know why
it failed, and assume the desc buffer is large enough.

In mergeable receive case, when copy_mbuf_to_desc_mergeable()
fails, packets burst is simply stopped.

This patch makes the non-mergeable error path to behave as the
mergeable one, as it seems the safest way. Also, doing this way
will simplify pending IOTLB miss requests handling.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Maxime Coquelin
94018cf3d5 vhost: revert workaround MQ fails to startup
This reverts commit 04d81227960b ("vhost: workaround MQ fails to
startup").

As agreed when this workaround was introduced, it can be reverted
as Qemu v2.10 that fixes the issue is now out.

The reply-ack feature is required for vhost-user IOMMU support.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:52:27 +02:00
Tiwei Bie
e5c494a7a2 vhost: batch small guest memory copies
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.

This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
2017-10-10 15:48:53 +02:00
Tiwei Bie
897f13a1f7 vhost: make page logging atomic
Each dirty page logging operation should be atomic. But it's not
atomic in current implementation. So it's possible that some dirty
pages can't be logged successfully when different threads try to
log different pages into the same byte of the log buffer concurrently.
This patch fixes this issue.

Fixes: b171fad1ffa5 ("vhost: log used vring changes")
Cc: stable@dpdk.org

Reported-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-08-03 22:09:48 +02:00
Zhiyong Yang
78b2e3bae1 vhost: fix initialization
Exception handling is executed in the normal path and it will cause
vhost-user init failure.

Fixes: d6983a70e259 ("vhost: check return of pthread calls")

Reported-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-07-19 22:49:47 +03:00
Ilya Maximets
185a883597 vhost: print reason of NUMA node query failure
syscall always returns '-1' on failure and there is no point
in printing that value. 'errno' is much more informative.

Fixes: 586e39001317 ("vhost: export numa node")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-07 02:17:56 +02:00
Jens Freimann
2dfeebe265 vhost: check return of mutex initialization
Check return value of pthread_mutex_init(). Also destroy
mutex in case of other erros before returning.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:30:54 +02:00
Jens Freimann
d6983a70e2 vhost: check return of pthread calls
Make sure we catch and log failed calls to pthread
functions.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:30:47 +02:00
Jens Freimann
6846128798 vhost: add missing check in driver registration
Add a check for strdup() return value and fail gracefully if we
get a bad return code.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-04 11:11:01 +02:00
Maxime Coquelin
02f62392ff vhost: fix MTU device feature check
The MTU feature support check has to be done against MTU
feature bit mask, and not bit position.

Fixes: 72e8543093df ("vhost: add API to get MTU value")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:39:29 +02:00
Ivan Dyukov
c665d9a231 vhost: fix checking of device features
To compare enabled features in current device we must use bit
mask instead of bit position.

Fixes: c843af3aa13e ("vhost: access header only if offloading is supported")
Cc: stable@dpdk.org

Signed-off-by: Ivan Dyukov <i.dyukov@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:38:39 +02:00
Jianfeng Tan
b08b8cfeb2 vhost: fix IP checksum
There is no way to bypass IP checksum verification in Linux
kernel, no matter skb->ip_summed is assigned as CHECKSUM_UNNECESSARY
or CHECKSUM_PARTIAL.

So any packets with bad IP checksum will be dropped at VM IP layer.

To correct, we check this flag PKT_TX_IP_CKSUM to calculate IP csum.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:28:34 +02:00
Jianfeng Tan
46b7a8372d vhost: fix TCP checksum
As PKT_TX_TCP_SEG flag in mbuf->ol_flags implies PKT_TX_TCP_CKSUM,
applications, e.g., testpmd, don't set PKT_TX_TCP_CKSUM when TSO
is set.

This leads to that packets get dropped in VM tcp stack layer because
of bad TCP csum.

To fix this, we make sure TCP NEEDS_CSUM info is set into virtio net
header when PKT_TX_TCP_SEG is set, so that VM tcp stack will not
check the TCP csum.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:28:22 +02:00
Daniel Verkamp
3cb502b310 vhost: clean up per-socket mutex
vsocket->conn_mutex was allocated with pthread_mutex_init() but never
freed with pthread_mutex_destroy().  This is a potential memory leak,
depending on how pthread_mutex_t is implemented.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-07-02 01:16:31 +02:00
Dariusz Stojaczyk
058e2d294b vhost: log error for badly negotiated features
Since vhost_user_set_features failure is not handled in any way, a
single error log has been added to at least to let the user know that
something has gone wrong.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Yuanhan Liu
ebd792b386 vhost: fix crash on NUMA
The queue allocation was changed, from allocating one queue-pair at a
time to one queue at a time. Most of the changes have been done, but
just with one being missed: the size of copying the old queue is still
based on queue-pair at numa_realloc(), which leads to overwritten issue.
As a result, crash may happen.

Fix it by specifying the right copy size. Also, the net queue macros
are not used any more. Remove them.

Fixes: ab4d7b9f1afc ("vhost: turn queue pair to vring")
Cc: stable@dpdk.org

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jens Freimann <jfreiman@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2017-06-16 14:04:25 +02:00
Daniel Verkamp
368c6625b6 vhost: access VhostUsrMsg via packed struct
Accessing fields of a packed struct through unaligned pointers is
undefined behavior. Instead of passing pointers to particular fields,
a pointer to the root struct should be used. This patch does exactly
that.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Dariusz Stojaczyk
29c7c2fdaa vhost: fix guest pages memory leak
This patch fixes a memory leak.
virtio_net::guest_pages is allocated in vhost_setup_mem_table(),
reallocated in add_one_guest_page(), but never freed.

Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreiman@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Dariusz Stojaczyk
d1b2842a9d vhost: fix malloc size too small
Amount of allocated memory was too small, causing buffer overflow.

Fixes: eb32247457fe ("vhost: export guest memory regions")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jens Freimann <jfreiman@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Zhihong Wang
29150b70ab vhost: support Rx queue count request
This patch implements the ops rx_queue_count for vhost PMD by adding
a helper function rte_vhost_rx_queue_count in vhost lib.

The ops rx_queue_count gets vhost RX queue avail count and helps to
understand the queue fill level.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Jens Freimann
4cee38a6fc vhost: check allocation of guest pages
When we try to allocate guest pages we need to check the return value of
malloc(). Print an error message and return when it fails.

Signed-off-by: Jens Freimann <jfreiman@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-06-16 14:04:25 +02:00
Jerin Jacob
98a7ea332b fix typos using codespell utility
Fixing typos across dpdk source code using codespell utility.
Skipped the ethdev driver's base code fixes to keep the base
code intact.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2017-06-14 23:54:13 +02:00
Jerin Jacob
c0583d98a9 eal: introduce macro for always inline
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.

Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2017-06-06 17:21:55 +02:00
Zhiyong Yang
04d8122796 vhost: workaround MQ fails to startup
vhost since dpdk17.02 + qemu2.7 and above will cause failures of
new connection when negotiating to set MQ. (one queue pair works
well).

Because there exist some bugs in qemu code when introducing
VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. when dealing with the vhost
message VHOST_USER_SET_MEM_TABLE for the second time, qemu indeed
doesn't send the messge (The message needs to be sent only once)but
still will be waiting for dpdk's reply ack, then, qemu is always
freezing. DPDK code indeed works in the right way.

The feature VHOST_USER_PROTOCOL_F_REPLY_ACK has to be disabled
by default at the dpdk side in order to avoid the feature support of
DPDK + qemu at the same time. if doing like that, MQ can works well.

Cc: stable@dpdk.org

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-28 06:28:37 +02:00
Adrien Mazarguil
f8904d5636 vhost: fix header for strict compilation flags
Exported headers must allow compilation with the strictest flags. This
commit addresses the following errors:

 In file included from /tmp/check-includes.sh.20132.c:1:0:
 build/include/rte_vhost.h:73:30: error: ISO C forbids zero-size array
    'regions' [-Werror=pedantic]
 [...]

Also:

- Add C++ awareness to rte_vhost.h for consistency with rte_eth_vhost.h.
- Move Linux includes into C++ block to prevent linking issues with
  exported symbols.
- Update check-includes.sh following the removal of rte_virtio_net.h.

Finally, update check-includes.sh to ignore rte_vhost.h and rte_eth_vhost.h
from now on since the Linux headers they depend on are not clean enough:

 In file included from /usr/include/linux/vhost.h:17:0,
                  from build/include/rte_vhost.h:43,
                  from build/include/rte_eth_vhost.h:44,
                  from /tmp/check-includes.sh.20132.c:1:
 /usr/include/linux/virtio_ring.h: In function 'vring_init':
 /usr/include/linux/virtio_ring.h:146:16: error: pointer of type 'void *'
    used in arithmetic [-Werror=pointer-arith]
 [...]
 In file included from build/include/rte_vhost.h:43:0,
                  from build/include/rte_eth_vhost.h:44,
                  from /tmp/check-includes.sh.20132.c:1:
 /usr/include/linux/vhost.h: At top level:
 /usr/include/linux/vhost.h:73:3: error: ISO C99 doesn't support unnamed
    structs/unions [-Werror=pedantic]
 [...]

Fixes: eb32247457fe ("vhost: export guest memory regions")
Fixes: a798beb47c8e ("vhost: rename header file")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-05-01 00:13:15 +02:00
Yuanhan Liu
84ad6e4491 vhost: fix dequeue zero copy
For zero copy mode, we need pin the mbuf to not let the underlaying PMD
driver (or the app) free the mbuf. Currently, only the heading mbuf is
pinned. However, the mbuf free function would try to free all mbufs
in the mbuf chain (-1 to the refcnt). This may lead the head mbuf being
still pinned, while the other subsequent mbufs are actually freed. Which
is wrong.

It becomes more fatal after the mbuf refactor, more specificly, after
the commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool"). The
refcnt resets to 1 after the last real reference. OTOH, it leads to a
situtation that we never know one mbuf is actually freed or not. This
would result the mbuf __just__ after the heading mbuf being freed twice:
it's firstly freed (and put back to mempool) when the underlaying PMD
finishes the DMA.  Later, it will then be freed again when vhost unpins
it. Meaning, one mbuf may be returned to the mempool twice, while in
turn, being allocated twice later. Something uncertain may happen then.
For example, the VM2VM case becomes broken.

Fixes: b0a985d1f340 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-19 10:49:06 +02:00
Yuanhan Liu
cca5c0c008 vhost: avoid memory write on net header when necessary
Like what we did for virtio PMD driver [0][1], we could also apply such
trick to vhost, to avoid the memory write on net header when necessary.

[0]: c9ea670c1dc7 ("net/virtio: fix performance regression due to TSO")
[1]: 16994abee215 ("net/virtio: optimize header reset on any layout")

With this, the cache issue of the mergeable path is again greatly reduced:
even the write of "num_buffers" could be avoided. A quick PVP test shows
the gap between the mergeable Rx and non-mergeable Rx is pretty small now:
they are basically the same in my test.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-19 10:49:06 +02:00
Yuanhan Liu
7bd841b269 vhost: fix use after free
A "return" is missing on error, which could lead to a "use after free"
issue (about var "conn").

Coverity issue: 143476
Fixes: 65388b43f592 ("vhost: fix fd leaks for vhost-user server mode")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-19 10:49:06 +02:00
Stephen Hemminger
c5ba278876 lib: remove unnecessary void cast
Remove unnecessary casts of void * pointers to a specific type.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2017-04-11 18:05:10 +02:00
Yuanhan Liu
27052cd63f vhost: do not destroy device on repeat mem table message
It doesn't make any sense to invoke destroy_device() callback at
while handling SET_MEM_TABLE message.

From the vhost-user spec, it's the GET_VRING_BASE message indicates
the end of a vhost device: the destroy_device() should be invoked
from there (luckily, we already did that).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ea67f6ccf7 vhost: workaround the build dependency on mbuf header
rte_mbuf struct is something more likely will be used only in vhost-user
net driver, while we have made vhost-user generic enough that it can
be used for implementing other drivers (such as vhost-user SCSI), they
have also include <rte_mbuf.h>. Otherwise, the build will be broken.

We could workaround it by using forward declaration, so that other
non-net drivers won't need include <rte_mbuf.h>.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
a798beb47c vhost: rename header file
Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
af14759181 vhost: introduce API to start a specific driver
We used to use rte_vhost_driver_session_start() to trigger the vhost-user
session. It takes no argument, thus it's a global trigger. And it could
be problematic.

The issue is, currently, rte_vhost_driver_register(path, flags) actually
tries to put it into the session loop (by fdset_add). However, it needs
a set of APIs to set a vhost-user driver properly:
  * rte_vhost_driver_register(path, flags);
  * rte_vhost_driver_set_features(path, features);
  * rte_vhost_driver_callback_register(path, vhost_device_ops);

If a new vhost-user driver is registered after the trigger (think OVS-DPDK
that could add a port dynamically from cmdline), the current code will
effectively starts the session for the new driver just after the first
API rte_vhost_driver_register() is invoked, leaving later calls taking
no effect at all.

To handle the case properly, this patch introduce a new API,
rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
To do that, the rte_vhost_driver_register(path, flags) is simplified
to create the socket only and let rte_vhost_driver_start(path) to
actually put it into the session loop.

Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
the session thread internally (create the thread if it has not been
created). This would also simplify the application.

NOTE: the API order in prog guide is slightly adjusted for showing the
correct invoke order.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
52f8091f05 vhost: export APIs for live migration support
Export few APIs for the vhost-user driver to log the guest memory writes,
which is a must for live migration support.

This patch basically moves vhost_log_write() and vhost_log_used_vring()
into vhost.h and then add an wrapper (the public API) to them.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
abd53c16b6 vhost: add features changed callback
Features could be changed after the feature negotiation. For example,
VHOST_F_LOG_ALL will be set/cleared at the start/end of live migration,
respecitively. Thus, we need a new callback to inform the application
on such change.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
cb04355743 vhost: rename virtio-net to vhost
Rename "virtio-net" to "vhost" in the API comments and vhost prog guide.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
7c12903746 vhost: rename device ops struct
rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
aca49772f6 vhost: do not include net specific headers
Include it internally, at vhost.h.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
f53cf83980 vhost: drop the Rx and Tx queue macro
They are virtio-net specific and should be defined inside the virtio-net
driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
c0674b1bc8 vhost: move the device ready check at proper place
Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether
a virtio device is ready or not. However, we only do it when handling
SET_VRING_KICK message, which could be wrong if a vhost-user frontend
send SET_VRING_KICK first and SET_VRING_CALL later.

To work for all possible vhost-user frontend implementations, we could
move the ready check at the end of vhost-user message handler.

Meanwhile, since we do the check more often than before, the "virtio
not ready" message is dropped, to not flood the screen.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
b50a203986 vhost: export the number of vrings
We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ab4d7b9f1a vhost: turn queue pair to vring
The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.

This patch just does a simple convert, a later patch would export the
number of vrings to applications.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00