Commit Graph

247 Commits

Author SHA1 Message Date
Yuanhan Liu
27052cd63f vhost: do not destroy device on repeat mem table message
It doesn't make any sense to invoke destroy_device() callback at
while handling SET_MEM_TABLE message.

From the vhost-user spec, it's the GET_VRING_BASE message indicates
the end of a vhost device: the destroy_device() should be invoked
from there (luckily, we already did that).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ea67f6ccf7 vhost: workaround the build dependency on mbuf header
rte_mbuf struct is something more likely will be used only in vhost-user
net driver, while we have made vhost-user generic enough that it can
be used for implementing other drivers (such as vhost-user SCSI), they
have also include <rte_mbuf.h>. Otherwise, the build will be broken.

We could workaround it by using forward declaration, so that other
non-net drivers won't need include <rte_mbuf.h>.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
a798beb47c vhost: rename header file
Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
af14759181 vhost: introduce API to start a specific driver
We used to use rte_vhost_driver_session_start() to trigger the vhost-user
session. It takes no argument, thus it's a global trigger. And it could
be problematic.

The issue is, currently, rte_vhost_driver_register(path, flags) actually
tries to put it into the session loop (by fdset_add). However, it needs
a set of APIs to set a vhost-user driver properly:
  * rte_vhost_driver_register(path, flags);
  * rte_vhost_driver_set_features(path, features);
  * rte_vhost_driver_callback_register(path, vhost_device_ops);

If a new vhost-user driver is registered after the trigger (think OVS-DPDK
that could add a port dynamically from cmdline), the current code will
effectively starts the session for the new driver just after the first
API rte_vhost_driver_register() is invoked, leaving later calls taking
no effect at all.

To handle the case properly, this patch introduce a new API,
rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
To do that, the rte_vhost_driver_register(path, flags) is simplified
to create the socket only and let rte_vhost_driver_start(path) to
actually put it into the session loop.

Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
the session thread internally (create the thread if it has not been
created). This would also simplify the application.

NOTE: the API order in prog guide is slightly adjusted for showing the
correct invoke order.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
52f8091f05 vhost: export APIs for live migration support
Export few APIs for the vhost-user driver to log the guest memory writes,
which is a must for live migration support.

This patch basically moves vhost_log_write() and vhost_log_used_vring()
into vhost.h and then add an wrapper (the public API) to them.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
abd53c16b6 vhost: add features changed callback
Features could be changed after the feature negotiation. For example,
VHOST_F_LOG_ALL will be set/cleared at the start/end of live migration,
respecitively. Thus, we need a new callback to inform the application
on such change.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
cb04355743 vhost: rename virtio-net to vhost
Rename "virtio-net" to "vhost" in the API comments and vhost prog guide.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
7c12903746 vhost: rename device ops struct
rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
aca49772f6 vhost: do not include net specific headers
Include it internally, at vhost.h.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
f53cf83980 vhost: drop the Rx and Tx queue macro
They are virtio-net specific and should be defined inside the virtio-net
driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
c0674b1bc8 vhost: move the device ready check at proper place
Currently, we check vq->desc, vq->kickfd and vq->callfd to know whether
a virtio device is ready or not. However, we only do it when handling
SET_VRING_KICK message, which could be wrong if a vhost-user frontend
send SET_VRING_KICK first and SET_VRING_CALL later.

To work for all possible vhost-user frontend implementations, we could
move the ready check at the end of vhost-user message handler.

Meanwhile, since we do the check more often than before, the "virtio
not ready" message is dropped, to not flood the screen.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
b50a203986 vhost: export the number of vrings
We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ab4d7b9f1a vhost: turn queue pair to vring
The queue pair is very virtio-net specific, other devices don't have
such concept. To make it generic, we should log the number of vrings
instead of the number of queue pairs.

This patch just does a simple convert, a later patch would export the
number of vrings to applications.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
dba9bf127b vhost: export API to translate gpa to vva
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
40ef286f23 vhost: export vhost vring info
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:42:44 +02:00
Yuanhan Liu
ca33faf9ef vhost: introduce API to fetch negotiated features
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:41:54 +02:00
Yuanhan Liu
eb32247457 vhost: export guest memory regions
Some vhost-user driver may need this info to setup its own page tables
for GPA (guest physical addr) to HPA (host physical addr) translation.
SPDK (Storage Performance Development Kit) is one example.

Besides, by exporting this memory info, we could also export the
gpa_to_vva() as an inline function, which helps for performance.
Otherwise, it has to be referenced indirectly by a "vid".

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
93433b639d vhost: make notify ops per vhost driver
Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
0917f9d1f0 vhost: use new APIs to handle features
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
5fbb3941da vhost: introduce driver features related APIs
Introduce few APIs to set/get/enable/disable driver features.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 10:40:13 +02:00
Yuanhan Liu
65388b43f5 vhost: fix fd leaks for vhost-user server mode
A vhost-user server socket could have many connections, thus many connfd.
However, we currently just use one single int var to store it. Meaning,
it will get overwritten every time a new connection is created.

While this will not create fatal issue as it sounds (since the correct
connfd is closured to the event loop thread by fdset_add), it may cause
fd leaks if a user invokes rte_vhost_driver_unregister before shutting
down all connections: it just closes the recent connfd.

A simple example that should be able to reproduce this leaks issues is,
del the ovs vhost-user port while the connected VMs are still alive. (Note
that it's suggested to use one socket for one VM, which makes the issue
not that fatal as it sounds again).

Since we already use a struct "vhost_user_connection" to track all info
about one connection, it's obvious that we should put the connfd there.
Then we could build a connection list inside the vhost_user_socket struct,
to represent all connections belong that socket file.

Fixes: 164fd39678 ("vhost: fix unregistering in client mode")
Cc: stable@dpdk.org
Cc: Ilya Maximets <i.maximets@samsung.com>

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Kevin Traynor
4e9474141e vhost: fix false sharing
The broadcast_rarp field in the virtio_net struct is checked in the
dequeue datapath regardless of whether descriptors are available or not.

As it is checked with cmpset leading to a write, false sharing on the
virtio_net struct can happen between enqueue and dequeue datapaths
regardless of whether a RARP is requested. In OVS, the issue can cause
a uni-directional performance drop of up to 15%.

Fix that by only performing the cmpset if a read of broadcast_rarp
indicates that the cmpset is likely to succeed.

Fixes: a66bcad322 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Maxime Coquelin
72e8543093 vhost: add API to get MTU value
This patch implements the function for the application to
get the MTU value.

rte_vhost_get_mtu() fills the mtu parameter with the MTU value
set in QEMU if VIRTIO_NET_F_MTU has been negotiated and returns 0,
-ENOTSUP otherwise.

The function returns -EAGAIN if Virtio feature negotiation
didn't happened yet.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Maxime Coquelin
4c5d8459d2 vhost: add new ready status flag
This patch adds a new status flag indicating the Virtio device
is ready to operate.

This is required to be able to call rte_vhost_mtu_get() in the
.new_device() callback, as rte_vhost_mtu_get needs that the
negotiation is done, but it is too early to rely on running status
flag, which is set just after .new_device() returns.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Maxime Coquelin
23f1e756ca vhost: support MTU protocol feature
This patch implements the vhost-user MTU protocol feature support.
When VIRTIO_NET_F_MTU is negotiated, QEMU notifies the vhost-user
backend with the configured MTU if dedicated protocol feature is
supported.

The value can be used by the application to ensure consistency with
value set by the user.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Maxime Coquelin
3d3c6590b5 vhost: enable virtio MTU feature
This patch enables the new VIRTIO_NET_F_MTU feature,
which makes possible for the host to advise the guest
with its maximum supported MTU.

MTU value is set via QEMU parameters, either via Libvirt XML, or
directly in virtio-net device command line arguments.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:17 +02:00
Yuanhan Liu
160cbc815b vhost: remove a hack on queue allocation
We used to allocate queues based on the index from SET_VRING_CALL
request: if corresponding queue hasn't been allocated, allocate it.

Though it's pratically right (it's the first per-vring request we
will get from QEMU for vhost-user negotiation), but it's not technically
right: it's not documented in the vhost-user spec that it will always
be the first per-vring request. For example, SET_VRING_ADDR could also
be the first per-vring request.

Thus, we should not depend the SET_VRING_CALL on queue allocation.
Instead, we could catch all the per-vring messages at the entrance of
request handler, and allocate one if it hasn't been allocated before.

By that, we could remove a hack.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 10:36:06 +02:00
Yuanhan Liu
de8e1fdcec vhost: fix max queues
0x8000 is the max virito-net queue pairs the virtio 1.0 spec claims to
support. While for vhost-user, it's a different story: the max vring
index could be passed by the vhost-user spec is 0xff, masked by the
VHOST_USER_VRING_IDX_MASK.

That said, the max queue pairs could vhost-user could supported is 0x80.
If user are asking more, I think the vhost-user need be extended.

Fixes: b09b198bfb ("vhost-user: announce queue number in message")
Cc: stable@dpdk.org

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 08:58:54 +02:00
Yuanhan Liu
8d286dbeb8 vhost: fix multiple queue not enabled for old kernels
Some macros (say VIRTIO_NET_F_MQ) are needed for enabling multiple queue,
however they are introduced since kernel v3.8, meaning build error happens
if we build DPDK vhost on those platforms.

71dfdbe66a ("vhost: fix build with kernel < 3.8") meant to fix it, but
in a wrong way: it completely disables the MQ features for those kernels.
However, the MQ feature doesn't depend on the kernel at all (except the
macros dependency stated above), that we could still enable the MQ feature
even the host kernel has no such support.

The right fix is to define the macro if it's not defined.

Fixes: 71dfdbe66a ("vhost: fix build with kernel < 3.8")
Cc: stable@dpdk.org

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-04-01 08:58:54 +02:00
Ilya Maximets
29b851e8de vhost: change log levels in client mode
Inability to connect to socket is a normal situation
in client mode because, in common case, server isn't
started yet. RTE_LOG_WARNING should be suitable for
the case of some unusual errors.
Message about reconnection is not an error at all.

Fixes: e623e0c6d8 ("vhost: add reconnect ability")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-04-01 08:58:54 +02:00
Matthias Gatto
1b815b8959 vhost: try to shrink pfdset when fdset_add fails
fdset_add increments pfdset->num, but fdset_del doesn't decrement
pfdset->num, so if we call fdset_add then fdset_del in a loop without
calling fdset_shrink, we can easily exceed MAX_FDS with only a few
number of fds used.

So my solution is simply to call fdset_shrink in fdset_add when it
exceeds MAX_FDS.

Because fdset_shrink and fdset_add locks pfdset->fd_mutex we can't
call fdset_shrink inside fdset_add because that would cause a dead
lock, so this patch split fdset_shrink in two, fdset_shrink and
fdset_shrink_nolock.

Fixes: 59317cef24 ("vhost: allow many vhost-user ports")
Cc: stable@dpdk.org

Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com>
2017-04-01 08:58:54 +02:00
Olivier Matz
feb9f680cd mk: optimize directory dependencies
Before this patch, the management of dependencies between directories
had several issues:

- the generation of .depdirs, done at configuration is slow: it can take
  more than one minute on some slow targets (usually ~10s on a standard
  PC without -j).

- for instance, it is possible to express a dependency like:
  - app/foo depends on lib/librte_foo
  - and lib/librte_foo depends on app/bar
  But this won't work because the directories are traversed with a
  depth-first algorithm, so we have to choose between doing 'app' before
  or after 'lib'.

- the script depdirs-rule.sh is too complex.

- we cannot use "make -d" for debug, because the output of make is used for
  the generation of .depdirs.

This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.

After this commit, "make config" is almost immediate.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-03-27 23:28:43 +02:00
Emmanuel Roullit
68759bbe73 vhost: remove unneeded variable assignment
Found with clang static analysis:
lib/librte_vhost/vhost_user.c:996:3: warning:
Value stored to 'ret' is never read
        ret = vhost_user_get_vring_base(dev, &msg.payload.state);
        ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Emmanuel Roullit <emmanuel.roullit@gmail.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-30 13:47:20 +01:00
Emmanuel Roullit
5c1f70daaf vhost: do not GSO when no header is present
Found with clang static analysis:
lib/librte_vhost/virtio_net.c:723:17: warning:
Access to field 'data_off' results in a dereference of a null pointer
(loaded from variable 'tcp_hdr')
        m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
                     ^~~~~~~~~~~~~~~~~

Fixes: d0cf91303d ("vhost: add Tx offload capabilities")
Cc: stable@dpdk.org

Signed-off-by: Emmanuel Roullit <emmanuel.roullit@gmail.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-30 13:46:57 +01:00
Yuanhan Liu
b8b992e93f vhost: fix long stall of negotiation
Setting up the mapping from GPA (guest physical address) to HPA (guest
physical address) could be very time consuming when the guest memory is
backened with small pages (4K). The bigger the guest memory, the longer
it takes. This could lead a very long vhost-user negotiation.

Since the mapping is only needed in zero copy mode so far, we could
avoid such time consuming settup when zero copy is turned off (which is
the default case).

It's actually a workaround, a right fix might be to start a new thread,
and hide the big latency there.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-01-28 14:25:40 +01:00
Yuanhan Liu
cc7301908c vhost: fix dead loop in enqueue path
If a malicious guest forges a dead loop desc chain (let desc->next point
to itself) and desc->len is zero, this could lead to a dead loop in
copy_mbuf_to_desc(following is a simplified code to show this issue
clearly):

    while (mbuf_is_not_totally_consumed) {
        if (desc_avail == 0) {
            desc = &descs[desc->next];
            desc_avail = desc->len;
        }

        COPY(desc, mbuf, desc_avail);
    }

I have actually fixed a same issue before: commit a436f53ebf ("vhost:
avoid dead loop chain"); it fixes the dequeue path though, leaving the
enqueue path still vulnerable.

The fix is the same. Add a var nr_desc to avoid the dead loop.

Fixes: f1a519ad98 ("vhost: fix enqueue/dequeue to handle chained vring descriptors")
Cc: stable@dpdk.org

Reported-by: Xieming Katty <katty.xieming@huawei.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-01-28 14:25:23 +01:00
Jan Wickbom
59317cef24 vhost: allow many vhost-user ports
Currently select() is used to monitor file descriptors for vhostuser
ports. This limits the number of ports possible to create since the
fd number is used as index in the fd_set and we have seen fds > 1023.
This patch changes select() to poll(). This way we can keep an
packed (pollfd) array for the fds, e.g. as many fds as the size of
the array.

Also see:
http://dpdk.org/ml/archives/dev/2016-April/037024.html

Reported-by: Patrik Andersson <patrik.r.andersson@ericsson.com>
Signed-off-by: Jan Wickbom <jan.wickbom@ericsson.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-17 09:20:18 +01:00
Maxime Coquelin
73c8f9f69c vhost: introduce reply ack feature
REPLY_ACK features provide a generic way for QEMU to ensure both
completion and success of a request.

As described in vhost-user spec in QEMU repository, QEMU sets
VHOST_USER_NEED_REPLY flag (bit 3) when expecting a reply_ack from
the backend. Backend must reply with 0 for success or non-zero
otherwise when flag is set.

Currently, only VHOST_USER_SET_MEM_TABLE request implements reply_ack,
in order to synchronize mapping updates.

This patch enables REPLY_ACK feature generally, but only checks error
code for VHOST_USER_SET_MEM_TABLE.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-17 09:20:18 +01:00
Yong Wang
5731d6acc7 vhost: fix memory leak
In function vhost_new_device(), current code dose not free 'dev'
in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
memory leak.

Fixes: 45ca9c6f7b ("vhost: get rid of linked list for devices")
Cc: stable@dpdk.org

Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-17 09:20:17 +01:00
Haifeng Lin
8c33fc10f6 vhost: fix guest/host physical address mapping
When reg_size < page_size the function read in
rte_mem_virt2phy would not return, because
host_user_addr is invalid.

Fixes: e246896178 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Haifeng Lin <haifeng.lin@huawei.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2017-01-17 09:20:17 +01:00
Yuanhan Liu
164a601b52 vhost: remove references to vhost-cuse
vhost-cuse is removed, update corresponding comments that are still
referencing it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2016-11-07 15:59:21 +01:00
Maxime Coquelin
2a51b1091c vhost: support indirect descriptor in non-mergeable Rx
Linux virtio-net kernel driver uses indirect descriptors when
mergeable buffers are not used.

This patch adds its support, fixing the use of indirect
descriptors with these guests.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-10-26 13:39:09 +02:00
Maxime Coquelin
1be4ebb1c4 vhost: support indirect descriptor in mergeable Rx
Windows virtio-net driver uses indirect descriptors with
mergeable buffers.

This patch adds its support, fixing the use of indirect
descriptors with these guests.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
54524e0391 vhost: fix use after free
Fix the coverity USE_AFTER_FREE issue.

Coverity issue: 137884
Fixes: a277c71598 ("vhost: refactor code structure")

Reported-by: John McNamara <john.mcnamara@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
f2f5bc8f30 vhost: retrieve available head once
There is no need to retrieve the latest avail head every time we enqueue
a packet in the mereable Rx path by

    avail_idx = *((volatile uint16_t *)&vq->avail->idx);

Instead, we could just retrieve it once at the beginning of the enqueue
path. This could diminish the cache penalty slightly, because the virtio
driver could be updating it while vhost is reading it (for each packet).

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
45847f015d vhost: prefetch available ring
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Zhihong Wang
f689586bc0 vhost: shadow used ring update
The basic idea is to shadow the used ring update: update them into a
local buffer first, and then flush them all to the virtio used vring
at once in the end.

And since we do avail ring reservation before enqueuing data, we would
know which and how many descs will be used. Which means we could update
the shadow used ring at the reservation time. It also introduce another
slight advantage: we don't need access the desc->flag any more inside
copy_mbuf_to_desc_mergeable().

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
fcdbe1fe1a vhost: use last available index for ring reservation
shadow_used_ring will be introduced later. Since then last avail
idx will not be updated together with last used idx.

So, here we use last_avail_idx for avail ring reservation.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Yuanhan Liu
3f9e48f7da vhost: simplify mergeable Rx vring reservation
Let it return "num_buffers" we reserved, so that we could re-use it
with copy_mbuf_to_desc_mergeable() directly, instead of calculating
it again there.

Meanwhile, the return type of copy_mbuf_to_desc_mergeable is changed
to "int". -1 will be return on error.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00
Zhihong Wang
e7ef688562 vhost: optimize cache access
This patch reorders the code to delay virtio header write to improve
cache access efficiency for cases where the mrg_rxbuf feature is turned
on. CPU pipeline stall cycles can be significantly reduced.

Virtio header write and mbuf data copy are all remote store operations
which takes a long time to finish. It's a good idea to put them together
to remove bubbles in between, to let as many remote store instructions
as possible go into store buffer at the same time to hide latency, and
to let the H/W prefetcher goes to work as early as possible.

On a Haswell machine, about 100 cycles can be saved per packet by this
patch alone. Taking 64B packets traffic for example, this means about 60%
efficiency improvement for the enqueue operation.

Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jianbo Liu <jianbo.liu@linaro.org>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2016-10-26 13:39:09 +02:00