Commit Graph

200 Commits

Author SHA1 Message Date
Yuanhan Liu
fcdbe1fe1a vhost: use last available index for ring reservation
shadow_used_ring will be introduced later. Since then last avail
idx will not be updated together with last used idx.

So, here we use last_avail_idx for avail ring reservation.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
3f9e48f7da vhost: simplify mergeable Rx vring reservation
Let it return "num_buffers" we reserved, so that we could re-use it
with copy_mbuf_to_desc_mergeable() directly, instead of calculating
it again there.

Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed
to "int". -1 will be return on error.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Zhihong Wang
e7ef688562 vhost: optimize cache access
This patch reorders the code to delay virtio header write to improve
cache access efficiency for cases where the mrg_rxbuf feature is turned
on. CPU pipeline stall cycles can be significantly reduced.

Virtio header write and mbuf data copy are all remote store operations
which takes a long time to finish. It's a good idea to put them together
to remove bubbles in between, to let as many remote store instructions
as possible go into store buffer at the same time to hide latency, and
to let the H/W prefetcher goes to work as early as possible.

On a Haswell machine, about 100 cycles can be saved per packet by this
patch alone. Taking 64B packets traffic for example, this means about 60%
efficiency improvement for the enqueue operation.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Zhihong Wang
b24fb48886 vhost: remove useless volatile
last_used_idx is a local var, there is no need to decorate it
by "volatile".

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Maxime Coquelin
c843af3aa1 vhost: access header only if offloading is supported
If offloading features are not negotiated, parsing the virtio header
is not needed.

Micro-benchmark with testpmd shows that the gain is +4% with indirect
descriptors, +1% when using direct descriptors.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-10-26 13:39:09 +02:00
Zhihong Wang
f46f655143 vhost: fix Windows VM hang
This patch fixes a Windows VM compatibility issue in DPDK 16.07 vhost code
which causes the guest to hang once any packets are enqueued when mrg_rxbuf
is turned on by setting the right id and len in the used ring.

As defined in virtio spec 0.95 and 1.0, in each used ring element, id means
index of start of used descriptor chain, and len means total length of the
descriptor chain which was written to. While in 16.07 code, index of the
last descriptor is assigned to id, and the length of the last descriptor is
assigned to len.

How to test?

 1. Start testpmd in the host with a vhost port.

 2. Start a Windows VM image with qemu and connect to the vhost port.

 3. Start io forwarding with tx_first in host testpmd.

For 16.07 code, the Windows VM will hang once any packets are enqueued.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-10-13 10:29:31 +02:00
Yuanhan Liu
9ba1e744ab vhost: add a flag to enable dequeue zero copy
Dequeue zero copy is disabled by default. Here add a new flag
``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` to explictily enable it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-13 10:29:20 +02:00
Yuanhan Liu
b0a985d1f3 vhost: add dequeue zero copy
The basic idea of dequeue zero copy is, instead of copying data from
the desc buf, here we let the mbuf reference the desc buf addr directly.

Doing so, however, has one major issue: we can't update the used ring
at the end of rte_vhost_dequeue_burst. Because we don't do the copy
here, an update of the used ring would let the driver to reclaim the
desc buf. As a result, DPDK might reference a stale memory region.

To update the used ring properly, this patch does several tricks:

- when mbuf references a desc buf, refcnt is added by 1.

  This is to pin lock the mbuf, so that a mbuf free from the DPDK
  won't actually free it, instead, refcnt is subtracted by 1.

- We chain all those mbuf together (by tailq)

  And we check it every time on the rte_vhost_dequeue_burst entrance,
  to see if the mbuf is freed (when refcnt equals to 1). If that
  happens, it means we are the last user of this mbuf and we are
  safe to update the used ring.

- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
  right desc idx.

Dequeue zero copy is introduced for performance reason, and some rough
tests show about 50% perfomance boost for packet size 1500B. For small
packets, (e.g. 64B), it actually slows a bit down (well, it could up to
15%). That is expected because this patch introduces some extra works,
and it outweighs the benefit from saving few bytes copy.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-12 09:45:14 +02:00
Yuanhan Liu
f6be82d725 vhost: introduce last available index for dequeue
So far, we retrieve both the used ring and avail ring idx by the var
last_used_idx; it won't be a problem because the used ring is updated
immediately after those avail entries are consumed.

But that's not true when dequeue zero copy is enabled, that used ring is
updated only when the mbuf is consumed. Thus, we need use another var to
note the last avail ring idx we have consumed.

Therefore, last_avail_idx is introduced.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-12 09:45:12 +02:00
Yuanhan Liu
e246896178 vhost: get guest/host physical address mappings
So that we can convert a guest physical address to host physical
address, which will be used in later Tx zero copy implementation.

MAP_POPULATE is set while mmaping guest memory regions, to make
sure the page tables are setup and then rte_mem_virt2phy() could
yield proper physical address.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-12 09:45:09 +02:00
Yuanhan Liu
552e8fd3d2 vhost: simplify memory regions handling
Due to history reason (that vhost-cuse comes before vhost-user), some
fields for maintaining the vhost-user memory mappings (such as mmapped
address and size, with those we then can unmap on destroy) are kept in
"orig_region_map" struct, a structure that is defined only in vhost-user
source file.

The right way to go is to remove the structure and move all those fields
into virtio_memory_region struct. But we simply can't do that before,
because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue
any more. And here it goes: this patch removes orig_region_map and
redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
2016-10-12 09:44:56 +02:00
Maxime Coquelin
2304dd73d2 vhost: support indirect Tx descriptors
Indirect descriptors are usually supported by virtio-net devices,
allowing to dispatch a larger number of requests.

When the virtio device sends a packet using indirect descriptors,
only one slot is used in the ring, even for large packets.

The main effect is to improve the 0% packet loss benchmark.
A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
(fwd io for host, macswap for VM) on DUT shows a +50% gain for
zero loss.

On the downside, micro-benchmark using testpmd txonly in VM and
rxonly on host shows a loss between 1 and 4%. But depending on
the needs, feature can be disabled at VM boot time by passing
indirect_desc=off argument to vhost-user device in Qemu.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-09-28 02:18:33 +02:00
Matthias Gatto
255c4829ad vhost: remove obsolete comment
As new_device and destroy_device use an int instead of a
"struct virtio_net *", The comment about setting VIRTIO_DEV_RUNNING
doesn't make sense anymore, plus If I've correctly understand the
code, the drivers take care of setting the flag before calling the
callbacks, so I guess that this comment is obsolet and I've remove it.

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-09-13 05:25:09 +02:00
Yuanhan Liu
484e42d46f vhost: simplify features set/get
No need to use a pointer to store/retrieve features.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:09 +02:00
Yuanhan Liu
bbd7e83520 vhost: get device once
Invoke get_device() at the beginning of vhost_user_msg_handler, so that
we could check the return value once. Which could save tons of duplicate
get-and-check device.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Yuanhan Liu
fc2a9b5b64 vhost: unify function names
Some functions are with prefix "user_", while others with "vhost_".
Making them all starting with "vhost_user_" to unify the function names.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Yuanhan Liu
de854fd95d vhost: fold common message handlers
Due to history reason (that we have 2 vhost implementations), some
messages are handled in two calls: vhost specific implementation
handles it first and then invoke the common one to do another handling.

We have one implementation only now, we could write one method for
each message. Here fold those common handles to corresponding vhost
user handler.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Yuanhan Liu
a277c71598 vhost: refactor code structure
The code structure is a bit messy now. For example, vhost-user message
handling is spread to three different files:

    vhost-net-user.c  virtio-net.c  virtio-net-user.c

Where, vhost-net-user.c is the entrance to handle all those messages
and then invoke the right method for a specific message. Some of them
are stored at virtio-net.c, while others are stored at virtio-net-user.c.

The truth is all of them should be in one file, vhost_user.c.

So this patch refactors the source code structure: mainly on renaming
files and moving code from one file to another file that is more suitable
for storing it. Thus, no functional changes are made.

After the refactor, the code structure becomes to:

- socket.c      handles all vhost-user socket file related stuff, such
                as, socket file creation for server mode, reconnection
                for client mode.

- vhost.c       mainly on stuff like vhost device creation/destroy/reset.
                Most of the vhost API implementation are there, too.

- vhost_user.c  all stuff about vhost-user messages handling goes there.

- virtio_net.c  all stuff about virtio-net should go there. It has virtio
                net Rx/Tx implementation only so far: it's just a rename
                from vhost_rxtx.c

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Yuanhan Liu
8394025f54 vhost: remove sub-directory
We now have one vhost implementation; no sub source dir is needed.
Remove it by move them to upper dir.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Yuanhan Liu
466d914b01 vhost: remove vhost-cuse
remove vhost-cuse code, including the eventfd_link kernel module that
is for vhost-cuse only.

The lib/virt/qemu-wrap.py is also removed, as it's mainly for vhost-cuse
usage.

As we have one vhost implementation now, one vhost config option is
needed only. Thus, CONFIG_RTE_LIBRTE_VHOST_USER is removed.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-09-13 05:25:08 +02:00
Maxime Coquelin
b7aabd7a87 vhost: fix off-by-one error on descriptor number check
nr_desc is not an index but the number of descriptors,
so can be equal to the virtqueue size.

Fixes: a436f53ebf ("vhost: avoid dead loop chain")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-25 17:55:12 +02:00
Ilya Maximets
164fd39678 vhost: fix unregistering in client mode
Currently while calling of 'rte_vhost_driver_unregister()' connection
to QEMU will not be closed. This leads to inability to register driver
again and reconnect to same virtual machine.

This scenario is reproducible with OVS. While executing of the following
command vhost port will be re-created (will be executed
'rte_vhost_driver_register()' followed by 'rte_vhost_driver_unregister()')
network will be broken and QEMU possibly will crash:

	ovs-vsctl set Interface vhost1 ofport_request=15

Fix this by closing all established connections on driver unregister and
removing of pending connections from reconnection list.

Fixes: 64ab701c3d ("vhost: add vhost-user client mode")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-22 00:23:58 +02:00
Ilya Maximets
81b5d22f1d vhost: fix connect hang in client mode
If something abnormal happened to QEMU, 'connect()' can block calling
thread (e.g. main thread of OVS) forever or for a really long time.
This can break whole application or block the reconnection thread.

Example with OVS:

	ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce
	(gdb) bt
	#0  connect () from /lib64/libpthread.so.0
	#1  vhost_user_create_client (vsocket=0xa816e0)
	#2  rte_vhost_driver_register
	#3  netdev_dpdk_vhost_user_construct
	#4  netdev_open (name=0xa664b0 "vhost1")
	[...]
	#11 main

Fix that by setting non-blocking mode for client sockets for connection.

Fixes: 64ab701c3d ("vhost: add vhost-user client mode")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-22 00:21:51 +02:00
Patrik Andersson
acbff5c67e vhost: fix crash when exceeding file descriptors
Protect against DPDK crash when allocation of listen fd >= 1023.
For events on fd:s >1023, the current implementation will trigger
an abort due to access outside of allocated bit mask.

Corrections would include:

  * Match fdset_add() signature in fd_man.c to fd_man.h
  * Handling of return codes from fdset_add()
  * Addition of check of fd number in fdset_add_fd()

The rationale behind the suggested code change is that,
fdset_event_dispatch() could attempt access outside of the FD_SET
bitmask if there is an event on a file descriptor that in turn
looks up a virtio file descriptor with a value > 1023.
Such an attempt will lead to an abort() and a restart of any
vswitch using DPDK.

A discussion topic exist in the ovs-discuss mailing list that can
provide a little more background:
http://openvswitch.org/pipermail/discuss/2016-February/020243.html

Fixes: 8f972312 ("vhost: support vhost-user")

Signed-off-by: Patrik Andersson <patrik.r.andersson@ericsson.com>
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15 22:12:38 +02:00
Ilya Maximets
ebd02c060a vhost: check ring descriptor address
In current implementation vhost will crash with segmentation fault
if malicious or buggy virtio application breaks addresses of descriptors.

Before commit 0823c1cb0a ("vhost: workaround stale vring base")
this crash was reproducible even with normal DPDK application that tries
to change number of virtqueues dynamically inside VM.

Fix that by checking addresses of descriptors before using.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15 22:12:37 +02:00
Ilya Maximets
8729de6092 vhost: fix used descriptors number of mergeable enqueue
Return value on error changed from '-1' to '0' because it returns
unsigned value and it means number of used descriptors.

Also fixed updating of 'last_used_idx' by using actual number of
used descriptors.

Fixes: 623bc47054 ("vhost: do sanity check for ring descriptor length")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15 22:12:37 +02:00
Yuanhan Liu
9d8365874e vhost: fix potential null pointer dereference
Fix the potential NULL pointer dereference issue raised by Coverity.

    578             reconn = malloc(sizeof(*reconn));
    >>>     CID 127481:  Null pointer dereferences  (NULL_RETURNS)
    >>>     Dereferencing a null pointer "reconn".
    579             reconn->un = un;

Coverity issue: 127481
Fixes: e623e0c6d8 ("vhost: add reconnect ability")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04 04:08:41 +02:00
Yuanhan Liu
f80c3fd3b4 vhost: fix not null terminated string
Fix an issue raised by Coverity.

    >>>     CID 127475:  Memory - illegal accesses  (BUFFER_SIZE_WARNING)
    >>>     Calling strncpy with a maximum size argument of 108 bytes on
    >>>     destination array "un->sun_path" of size 108 bytes might leave
    >>>     the destination string unterminated.
    441             strncpy(un->sun_path, path, sizeof(un->sun_path));
    442
    443             return fd;
    444     }

Coverity issue: 127475
Fixes: 64ab701c3d ("vhost: add vhost-user client mode")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04 04:08:41 +02:00
Yuanhan Liu
532f2ea5f8 vhost: fix memory leak
Fix potential memory leak raised by Coverity.

    >>>     Variable "vsocket" going out of scope leaks the storage it
    >>>     points to.

Coverity issue: 127483
Fixes: e623e0c6d8 ("vhost: add reconnect ability")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04 04:08:41 +02:00
Yuanhan Liu
4feff06e50 vhost: fix missing flag reset on stop
Commit 550c9d27d1 ("vhost: set/reset device flags internally") moves
the VIRTIO_DEV_RUNNING set/reset to vhost lib. But I missed one reset
on stop; here fixes it.

Fixes: 550c9d27d1 ("vhost: set/reset device flags internally")

Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2016-06-30 07:46:29 +02:00
Thomas Monjalon
f8e9cbe2aa mk: fix internal dependencies
Some libraries were missing their dependency on eal, mbuf, mempool,
ring and kvargs.
It is revealed by the linker option "-z defs".

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-06-29 13:33:01 +02:00
Huawei Xie
2cdc118eef vhost: check hugepage fstat error
Value returned from fstat is not checked for errors before being used.
This patch fixes following coverity issue.

    static uint64_t
    get_blk_size(int fd)
    {
    	struct stat stat;

    	fstat(fd, &stat);
    	return (uint64_t)stat.st_blksize;
    >>>  CID 107103 (#1 of 1): Unchecked return value from library
         (CHECKED_RETURN)
    >>>  check_return: Calling fstat(fd, &stat) without checking
         return value.
    >>>  This library function may fail and return an error code.

Fixes: 8f972312b8 ("vhost: support vhost-user")

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Ilya Maximets
428261b461 vhost: unmap log memory on cleanup
Fixes memory leak on QEMU migration.

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Ilya Maximets
53af5b1e0a vhost: fix leak of file descriptors
While migration of vhost-user device QEMU allocates memfd
to store information about dirty pages and sends fd to
vhost-user process.

File descriptor for this memory should be closed to prevent
"Too many open files" error for vhost-user process after
some amount of migrations.

Ex.:
 # ls /proc/<ovs-vswitchd pid>/fd/ -alh
 total 0
 root qemu  .
 root qemu  ..
 root qemu  0 -> /dev/pts/0
 root qemu  1 -> pipe:[1804353]
 root qemu  10 -> socket:[1782240]
 root qemu  100 -> /memfd:vhost-log (deleted)
 root qemu  1000 -> /memfd:vhost-log (deleted)
 root qemu  1001 -> /memfd:vhost-log (deleted)
 root qemu  1004 -> /memfd:vhost-log (deleted)
 [...]
 root qemu  996 -> /memfd:vhost-log (deleted)
 root qemu  997 -> /memfd:vhost-log (deleted)

 ovs-vswitchd.log:
 |WARN|punix:ovs-vswitchd.ctl: accept failed: Too many open files

Fixes: 54f9e32305 ("vhost: handle dirty pages logging request")

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Marcin Kerlin
b497724652 vhost: fix null pointer dereference
Return value of function get_device() is not checking before
dereference. Fix this problem by adding checking condition.

Coverity issue: 119262

Fixes: 77d20126b4 ("vhost-user: handle message to enable vring")

Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Huawei Xie
39449e7429 vhost: remove concurrent enqueue
All other DPDK PMDs doesn't support concurrent receiving or sending
packets to the same queue. The upper application should deal with
this, normally through queue and core bindings.

Due to historical reason, vhost internally supports concurrent lockless
enqueuing packets to the same virtio queue through costly cmpset operation.
This patch removes this internal lockless implementation and should improve
performance a bit.

Luckily DPDK OVS doesn't rely on this behavior.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
a66bcad322 vhost: arrange struct fields for better cache sharing
The ifname[] field takes so much space, that it seperates some frequently
used fields into different caches, say, features and broadcast_rarp.

This patch moves all those fields that will be accessed frequently in Rx/Tx
together (before the ifname[] field) to let them share one cache line.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
1d41d77cf8 vhost: optimize dequeue for small packets
A virtio driver normally uses at least 2 desc buffers for Tx: the
first for storing the header, and the others for storing the data.

Therefore, we could fetch the first data desc buf before the main
loop, and do the copy first before the check of "are we done yet?".
This could save one check for small packets that just have one data
desc buffer and need one mbuf to store it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
7f74b95c44 vhost: pre update used ring for Tx and Rx
Pre update and update used ring in batch for Tx and Rx at the stage
while fetching all avail desc idx. This would reduce some cache misses
and hence, increase the performance a bit.

Pre update would be feasible as guest driver will not start processing
those entries as far as we don't update "used->idx". (I'm not 100%
certain I don't miss anything, though).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
0823c1cb0a vhost: workaround stale vring base
When DPDK app crashes (or quits, or gets killed), a restart of DPDK
app would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a much saner (and may not the most accurate) vring base
from used->idx. That would work because:

- there is a memory barrier between updating used ring entries and
  used->idx. So, even though we crashed at updating the used ring
  entries, it will not cause any issue, as the guest driver will not
  process those stale used entries, for used-idx is not updated yet.

- DPDK process vring in order, that means a crash may just lead some
  packet retransmission for Tx and drop for Rx.

Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
e623e0c6d8 vhost: add reconnect ability
Allow reconnecting on failure by default when:

- DPDK app starts first and QEMU (as the server) is not started yet.
  Without reconnecting, DPDK app would simply fail on vhost-user
  registration.

- QEMU restarts, say due to OS reboot.
  Without reconnecting, you can't re-establish the connection without
  restarting DPDK app.

This patch make it work well for both above cases. It simply creates
a new thread, and keep trying calling "connect()", until it succeeds.

The reconnect could be disabled when RTE_VHOST_USER_NO_RECONNECT flag
is set.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:12 +02:00
Yuanhan Liu
64ab701c3d vhost: add vhost-user client mode
Add a new paramter (flags) to rte_vhost_driver_register(). DPDK
vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag
is set.  The flags would also allow future extensions without
breaking the API (again).

The rest is straingfoward then: allocate a unix socket, and
bind/listen for server, connect for client.

This extension is for vhost-user only, therefore we simply quit
and report error when any flags are given for vhost-cuse.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:47:07 +02:00
Yuanhan Liu
9ebcd4f9c7 vhost: rename structs for enabling client mode
DPDK vhost-user just acts as server so far, so, using a struct named
as "vhost_server" is okay. However, if we add client mode, it doesn't
make sense any more. Here renames it to "vhost_user_socket".

There was no obvious wrong about "connfd_ctx", but I think it's obviously
better to rename it to "vhost_user_connection", as it does represent
a connection, a connection between the backend (DPDK) and the frontend
(QEMU).

Similarly, few more renames are taken, such as "vserver_new_vq_conn"
to "vhost_user_new_connection".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22 09:44:21 +02:00
Ilya Maximets
7fd5dde987 vhost: make buffer vector for scatter Rx local
Array of buf_vector's is just an array for temporary storing information
about available descriptors. It used only locally in virtio_dev_merge_rx()
and there is no reason for that array to be shared.

Fix that by allocating local buf_vec inside virtio_dev_merge_rx().

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:44:21 +02:00
Yuanhan Liu
e197bd630a vhost: make virtio header length per device
Virtio net header length is set per device, but not per queue. So, there
is no reason to store it in vhost_virtqueue struct, instead, we should
store it in virtio_net struct, to make one copy only.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:44:20 +02:00
Yuanhan Liu
004b8ca8b5 vhost: reserve few more space for future extension
"virtio_net_device_ops" is the only left open struct that an application
can access, therefore, it's the only place that might introduce potential
ABI break in future for extension.

So, do some reservation for it. 5 should be pretty enough, considering
that we have barely touched it for a long while. Another reason to
choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit
machine.

With this, it's confidence to say that we might be able to be free from
the ABI violation forever.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:43:01 +02:00
Yuanhan Liu
f0fa04e35e vhost: remove virtio-net.h
It barely has anything useful there, just 2 functions prototype. Here
move them to vhost-net.h, and delete it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:43:01 +02:00
Yuanhan Liu
758e3471b4 vhost: remove unnecessary fields
The "reserved" field in virtio_net and vhost_virtqueue struct is not
necessary any more. We now expose virtio_net device with a number "vid".

This patch also removes the "priv" field: all fields are priviate now:
application can't access it now. The only way that we could still access
it is to expose it by a function, but I doubt that's needed or worthwhile.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:43:01 +02:00
Yuanhan Liu
db69be54b6 vhost: hide internal code
We are now safe to move all those internal structs/macros/functions to
vhost-net.h, to hide them from external access.

This patch also breaks long lines and removes some redundant comments.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:43:01 +02:00
Yuanhan Liu
4ecf22e356 vhost: export device id as the interface to applications
With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22 09:42:57 +02:00