Return value of function get_device() is not checking before
dereference. Fix this problem by adding checking condition.
Coverity issue: 119262
Fixes: 77d20126b4c2 ("vhost-user: handle message to enable vring")
Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
All other DPDK PMDs doesn't support concurrent receiving or sending
packets to the same queue. The upper application should deal with
this, normally through queue and core bindings.
Due to historical reason, vhost internally supports concurrent lockless
enqueuing packets to the same virtio queue through costly cmpset operation.
This patch removes this internal lockless implementation and should improve
performance a bit.
Luckily DPDK OVS doesn't rely on this behavior.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We skip kernel managed virtio devices, if it isn't whitelisted.
Before checking if the virtio device is whitelisted, check if devargs
is specified.
Fixes: ac5e1d838dc1 ("virtio: skip error when probing kernel managed device")
Reported-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The ifname[] field takes so much space, that it seperates some frequently
used fields into different caches, say, features and broadcast_rarp.
This patch moves all those fields that will be accessed frequently in Rx/Tx
together (before the ifname[] field) to let them share one cache line.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
A virtio driver normally uses at least 2 desc buffers for Tx: the
first for storing the header, and the others for storing the data.
Therefore, we could fetch the first data desc buf before the main
loop, and do the copy first before the check of "are we done yet?".
This could save one check for small packets that just have one data
desc buffer and need one mbuf to store it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Pre update and update used ring in batch for Tx and Rx at the stage
while fetching all avail desc idx. This would reduce some cache misses
and hence, increase the performance a bit.
Pre update would be feasible as guest driver will not start processing
those entries as far as we don't update "used->idx". (I'm not 100%
certain I don't miss anything, though).
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
When DPDK app crashes (or quits, or gets killed), a restart of DPDK
app would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.
So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a much saner (and may not the most accurate) vring base
from used->idx. That would work because:
- there is a memory barrier between updating used ring entries and
used->idx. So, even though we crashed at updating the used ring
entries, it will not cause any issue, as the guest driver will not
process those stale used entries, for used-idx is not updated yet.
- DPDK process vring in order, that means a crash may just lead some
packet retransmission for Tx and drop for Rx.
Suggested-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Allow reconnecting on failure by default when:
- DPDK app starts first and QEMU (as the server) is not started yet.
Without reconnecting, DPDK app would simply fail on vhost-user
registration.
- QEMU restarts, say due to OS reboot.
Without reconnecting, you can't re-establish the connection without
restarting DPDK app.
This patch make it work well for both above cases. It simply creates
a new thread, and keep trying calling "connect()", until it succeeds.
The reconnect could be disabled when RTE_VHOST_USER_NO_RECONNECT flag
is set.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add a new paramter (flags) to rte_vhost_driver_register(). DPDK
vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag
is set. The flags would also allow future extensions without
breaking the API (again).
The rest is straingfoward then: allocate a unix socket, and
bind/listen for server, connect for client.
This extension is for vhost-user only, therefore we simply quit
and report error when any flags are given for vhost-cuse.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
DPDK vhost-user just acts as server so far, so, using a struct named
as "vhost_server" is okay. However, if we add client mode, it doesn't
make sense any more. Here renames it to "vhost_user_socket".
There was no obvious wrong about "connfd_ctx", but I think it's obviously
better to rename it to "vhost_user_connection", as it does represent
a connection, a connection between the backend (DPDK) and the frontend
(QEMU).
Similarly, few more renames are taken, such as "vserver_new_vq_conn"
to "vhost_user_new_connection".
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Array of buf_vector's is just an array for temporary storing information
about available descriptors. It used only locally in virtio_dev_merge_rx()
and there is no reason for that array to be shared.
Fix that by allocating local buf_vec inside virtio_dev_merge_rx().
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Virtio net header length is set per device, but not per queue. So, there
is no reason to store it in vhost_virtqueue struct, instead, we should
store it in virtio_net struct, to make one copy only.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
"virtio_net_device_ops" is the only left open struct that an application
can access, therefore, it's the only place that might introduce potential
ABI break in future for extension.
So, do some reservation for it. 5 should be pretty enough, considering
that we have barely touched it for a long while. Another reason to
choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit
machine.
With this, it's confidence to say that we might be able to be free from
the ABI violation forever.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
It barely has anything useful there, just 2 functions prototype. Here
move them to vhost-net.h, and delete it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
The "reserved" field in virtio_net and vhost_virtqueue struct is not
necessary any more. We now expose virtio_net device with a number "vid".
This patch also removes the "priv" field: all fields are priviate now:
application can't access it now. The only way that we could still access
it is to expose it by a function, but I doubt that's needed or worthwhile.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
We are now safe to move all those internal structs/macros/functions to
vhost-net.h, to hide them from external access.
This patch also breaks long lines and removes some redundant comments.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
The new API rte_vhost_avail_entries() is actually a rename of
rte_vring_available_entries(), with the "vring" to "vhost" name
change to keep the consistency of other vhost exported APIs.
This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Introduce a new API rte_vhost_get_ifname() to export the ifname to
application.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Introduce a new API rte_vhost_get_queue_num() to export the number of
queues.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Introduce a new API rte_vhost_get_numa_node() to get the numa node
from which the virtio_net struct is allocated.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
vhost cuse is now the last reference of the vhost_device_ctx struct;
move it there, and do a rename to "vhost_cuse_device_ctx", to make
it clear that it's "cuse only".
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
get_device() just needs vid, so pass vid as the parameter only.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
I failed to figure out what does "fh" mean here for a long while.
The only guess I could have had is "file handle". So, you get the
point that it's not well named.
I then figured it out that "fh" is derived from the fuse lib, and
my above guess is right. However, device_fh represents a virtio
net device ID. Therefore, here I rename it to vid (Virtio-net device
ID, or Vhost device ID; choose one you prefer) to make it easier for
understanding.
This name (vid) then will be considered to the only interface to
applications. That's another reason to do the rename: it's our
interface, make it more understandable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Make a copy of virtio device id (device_fh) from the virtio_net struct,
so that we could have less dependency on the virtio_net struct.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
device_fh repsents the device id for a specific virtio net device.
Firstly, "int" would be big enough: we don't need 64 bit. Secondly,
this could let us avoid the ugly "%" PRIu64 ".." stuff.
And since ctx.fh is derived from device_fh, declare it as int, too.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
It does not make sense to ask the application to set/unset the flag
VIRTIO_DEV_RUNNING (that used internal only) at new_device()/
destroy_device() callback.
Instead, it should be set after new_device() succeeds and reset before
destroy_device() is invoked inside vhost lib. This patch fixes it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
It's an fd; so define it as "int", which could also save the unncessary
(int) case.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
There are two tailq lists, one for logging all vhost devices, another
one for logging vhost devices distributed on a specific core. However,
there is just one tailq entry, named "next", to chain the two list,
which is wrong and could result to a corrupted tailq list, that the
tailq list might always be non-empty: the entry is still there even
after you have invoked TAILQ_REMOVE several times.
Fix it by introducing two tailq entries, one for each list.
Fixes: 45657a5c6861 ("examples/vhost: use tailq to link vhost devices")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
We keep a common vq structure, containing only vq related fields,
and then split others into RX, TX and control queue respectively.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
[Jianfeng Tan: found and fixed 2 bugs]
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The commit dd856dfcb9e74 introduced an optimization that prepends virtio
header to mbuf data. It can be used when the tx mbuf is writeable, so we
need to check that the mbuf is direct (i.e. it embeds its own data).
Fixes: dd856dfcb9e7 ("virtio: use any layout on Tx")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Change type of nht field from uint32_t to uint8_t and increase max of
next hops.
nht_entry and nht should be declared as uint8_t because
entry_size is in bytes and is given as a parameter to compute
the position in nht array.
Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")
Signed-off-by: Michal Kobylinski <michalx.kobylinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
1. add KNI support to the IP Pipeline sample Application
2. some bug fix
3. update doc
4. add config file with two KNI interfaces connected using
a Linux kernel bridge
Signed-off-by: WeiJie Zhuang <zhuangwj@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
add KNI port type to the packet framework
Signed-off-by: WeiJie Zhuang <zhuangwj@gmail.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Following compile error observed with CentOS 6.8, which uses kernel
kernel-devel-2.6.32-642.el6.x86_64:
In function 'igbuio_msix_mask_irq':
error: 'PCI_MSIX_ENTRY_CTRL_MASKBIT' undeclared
Reported-by: Thiago Martins <thiagocmartinsc@gmail.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Fixes memory leaks detected by Coverity. These are due to ephemeral
memory allocations not being freed when errors occur.
Coverity issue: 127349
Fixes: e2aae1c1ced9 ("ethdev: remove name from extended statistic fetch")
Signed-off-by: Remy Horton <remy.horton@intel.com>
Fixes memory leaks detected by Coverity. These are due to ephemeral
memory allocations not being freed when errors occur.
Coverity issue: 127348
Fixes: e2aae1c1ced9 ("ethdev: remove name from extended statistic fetch")
Signed-off-by: Remy Horton <remy.horton@intel.com>
The class id is not filled and makes probing to fail.
Updated the code to use RTE_PCI_DEVICE which fills
the class id with a wildcard value.
Fixes: 701c8d80c820 ("pci: support class id probing")
Signed-off-by: Deepak Kumar Jain <deepak.k.jain@intel.com>
IPSec transport mode support.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Support IPSec IPv6 allowing IPv4/IPv6 traffic in IPv4 or IPv6 tunnel.
We need separate Routing (LPM) and SP (ACL) tables for IPv4 and IPv6,
but a common SA table.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Modify the default SP config variables names to be consistent with SA.
The resulting naming convention is that variables with suffixes _out/_in
are the default for ep0 and the reverse for ep1.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The application only ASSERTS that an SA is not NULL (only when debugging
is enabled) without properly dealing with the case of not having an SA
for the processed packet.
Behavior should be such as if no SA is found, drop the packet.
Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Rework implementation moving from function pointers approach, where each
function implements very specific functionality, to a generic function
approach.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Add support for building the application with DEBUG=1.
This option adds the compiler stack protection flag and enables extra
output in the application.
Also remove unnecessary VPATH setup.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Building the application with -O3 and -fstack-protection (default in
Ubuntu) results in the following error:
*** stack smashing detected ***: ./build/ipsec-secgw terminated
The error is caused by storing an 8B value in a 4B variable.
Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample application")
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>