When the specified cores and memory lie on different NUMA socket with
physical NIC, vhost fails to set up Rx queue, and exits without any
hints. This could leads to confusion of users.
This patch fixes it by adding some error messages when calling ether
APIs returns errors.
Suggested-by: Yulong Pei <yulong.pei@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Add a new paramter (flags) to rte_vhost_driver_register(). DPDK
vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag
is set. The flags would also allow future extensions without
breaking the API (again).
The rest is straingfoward then: allocate a unix socket, and
bind/listen for server, connect for client.
This extension is for vhost-user only, therefore we simply quit
and report error when any flags are given for vhost-cuse.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
The new API rte_vhost_avail_entries() is actually a rename of
rte_vring_available_entries(), with the "vring" to "vhost" name
change to keep the consistency of other vhost exported APIs.
This change could let us avoid the dependency of "virtio_net"
struct, to prepare for the ABI refactoring.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
I failed to figure out what does "fh" mean here for a long while.
The only guess I could have had is "file handle". So, you get the
point that it's not well named.
I then figured it out that "fh" is derived from the fuse lib, and
my above guess is right. However, device_fh represents a virtio
net device ID. Therefore, here I rename it to vid (Virtio-net device
ID, or Vhost device ID; choose one you prefer) to make it easier for
understanding.
This name (vid) then will be considered to the only interface to
applications. That's another reason to do the rename: it's our
interface, make it more understandable.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
Make a copy of virtio device id (device_fh) from the virtio_net struct,
so that we could have less dependency on the virtio_net struct.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
device_fh repsents the device id for a specific virtio net device.
Firstly, "int" would be big enough: we don't need 64 bit. Secondly,
this could let us avoid the ugly "%" PRIu64 ".." stuff.
And since ctx.fh is derived from device_fh, declare it as int, too.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
It does not make sense to ask the application to set/unset the flag
VIRTIO_DEV_RUNNING (that used internal only) at new_device()/
destroy_device() callback.
Instead, it should be set after new_device() succeeds and reset before
destroy_device() is invoked inside vhost lib. This patch fixes it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Rich Lane <rich.lane@bigswitch.com>
Acked-by: Rich Lane <rich.lane@bigswitch.com>
There are two tailq lists, one for logging all vhost devices, another
one for logging vhost devices distributed on a specific core. However,
there is just one tailq entry, named "next", to chain the two list,
which is wrong and could result to a corrupted tailq list, that the
tailq list might always be non-empty: the entry is still there even
after you have invoked TAILQ_REMOVE several times.
Fix it by introducing two tailq entries, one for each list.
Fixes: 45657a5c6861 ("examples/vhost: use tailq to link vhost devices")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
rte_thread_setname was a macro defined only for Linux.
The function rte_thread_setname() can now be used on FreeBSD
as well on Linux.
It is required to build librte_pdump.
The macro was 0 for old glibc. The function is now returning -1.
The related logs are decreased from error to debug level because
it is not an important failure, just a debug inconvenience.
Fixes: 278f945402c5 ("pdump: add new library for packet capture")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
The rte_eth_dev_count() function will never return a value greater
than RTE_MAX_ETHPORTS, so that checking is useless.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
switch_worker() is the last piece of code that is messy yet it touches
virtio/vhost device.
Here do a cleanup, so that we will be less painful for later vhost ABI
refactoring.
The cleanup is straight forward: break long lines, move some code into
functions. The last, comment a bit on switch_worker().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
It has always been a mystery (at least to me before) that how many
mbuf is enough while creating an mbuf pool. While current macro
NUM_MBUFS_PER_PORT gives your some insights, it's not that accurate:
it doesn't consider the case we may receive a big packet, say 64K
when TSO is enabled.
We actually have tried to fix it once before, with commit 5499c1fc9baa
("examples/vhost: fix mbuf allocation"), but it just workarounded it
by enlarging it a bit so that the case described in the commit log
by passes. So, while trying to fix it ultimately, I'm thinking how
big is big enough, and what are the factors need consider to figure
out a proper value.
Therefore, here you are. I introduced a helper function to create
the mbuf pool, and do the "how many mbufs are needed" calculation
there. Also, I put detailed comments how that comes, to serve as
the guidelines.
Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
Fixes: 5499c1fc9baa ("examples/vhost: fix mbuf allocation")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Every time I do a VM2VM iperf test with vhost example, I have to set
the arp table manually, as vhost-switch just ignores the broadcast
packet, leaving the ARP request not served.
Here we do a transmit a broadcast packet (such as ARP request) to
every vhost device, as well as the physical port, to fix above
arp table issue.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
rte_ether.h already provides a helper function to do mac address
compare. No need to define our own, use it directly.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Interestingly, DESC_PER_CACHELINE has never been used since the
introduction of vhost example. Remove it.
vlan_ethhdr struct and VLAN_ETH_HLEN macro reference had been removed
by commit 4d50b6acbd95 ("examples/vhost: adapt Tx routing to lib"), but
had forgot to remove the definition.
Fixes: 4d50b6acbd95 ("examples/vhost: adapt Tx routing to lib")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
It's reported that it's has not been working for a long while. And due
to it's complex, it's better to redesign it than to fix it to make it
work again.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
The debug logs must be enabled at compile-time and run-time.
There are also some internal flags in some examples to enable the debug
logs of the applications. They are now enabled in debug configs and
can be disabled thanks to the more generic logtype mechanism:
rte_set_log_type(RTE_LOGTYPE_USER1, 0);
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
While the last arg of virtio_tx_route() asks a vlan tag, we currently
feed it with device_fh, which is wrong. Fix it.
Fixes: 4796ad63ba1f ("examples/vhost: import userspace vhost application")
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Comments for PKT_TX_TCP_SEG at rte_mbuf says that we should only set
PKT_TX_IP_CKSUM and reset ip hdr checksum for IPv4:
- if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
to 0 in the packet
Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
For checksum offloading only case, the TCP/IP stack would
have calculated the pseudo checksum. Therefore, we don't
need to re-calculate it again here; remove it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Issue description: when packets go through vhost example to virtio
device and come back to another virtio device or physical NIC, the
sequence of packets will be changed.
Reported-by: Thomas Long <thomas.long@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
How to reproduce:
1. Start vhost-switch
./examples/vhost/build/vhost-switch -c 0x3 -n 4 -- -p 1 --stat 0
2. Start VM with a virtio port
$ $QEMU -smp cores=2,sockets=1 -m 4G -cpu host -enable-kvm \
-chardev socket,id=char1,path=<path to vhost-user socket> \
-device virtio-net-pci,netdev=vhostuser1 \
-netdev vhost-user,id=vhostuser1,chardev=char1
-object memory-backend-file,id=mem,size=4G,mem-path=<hugetlbfs path>,share=on \
-numa node,memdev=mem -mem-prealloc \
-hda <path to VM img>
3. Start l2fwd in VM
$ ./examples/l2fwd/build/l2fwd -c 0x1 -n 4 -m 1024 -- -p 0x1
4. Use ixia to inject packets in a small data bit rate.
Error:
vhost-switch keeps printing error message:
failed to allocate memory for mbuf.
Root cause:
How many mbufs allocated for a port is calculated by below formula.
NUM_MBUFS_PER_PORT = ((MAX_QUEUES*RTE_TEST_RX_DESC_DEFAULT) + \
(num_switching_cores*MAX_PKT_BURST) + \
(num_switching_cores*RTE_TEST_TX_DESC_DEFAULT) +\
(num_switching_cores*MBUF_CACHE_SIZE))
We suppose num_switching_cores is 1 and MBUF_CACHE_SIZE is 128.
And when initializing port, master core fills mbuf mempool cache,
so there would be some left in that cache, for example 121.
So total mbufs which can be used is:
(MAX_PKT_BURST + MBUF_CACHE_SIZE - 121) = (32 + 128 - 121) = 39.
What makes it worse is that there is a buffer to store mbufs
(which will be tx_burst to physical port), if it occupies some mbufs,
there will be possible < 32 mbufs left, so vhost dequeue prints out
this msg.
In all, it fails to include master core's mbuf mempool cache.
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Change the codes in vhost sample to test virtio offload feature.
These changes include,
1. add two test options: tx-csum and tso.
2. add virtio_tx_offload() function to test vhost TX offload feature
for VM to NIC case;
however, for VM to VM case, it doesn't need to call this function,
the reason is explained in patch 2.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Remove the ipv4_hdr structure defination in vhost sample.
The same structure has already defined in the rte_ip.h file, so we
remove the defination from the sample, and include that header file.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8e0 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Add #ifndef MAX_QUEUES to change MAX_QUEUES at compile time if needed.
Change MAX_QUEUES from 512 to 128 to reduce the number of hugepages
required by the vhost-switch program.
To change MAX_QUEUES add '-D MAX_QUEUES=512' to the EXTRA_CFLAGS variable,
before building the application.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Fixes following error on gcc 4.4.7:
examples/vhost/main.c: In function ‘new_device’:
rte_ring.h:740: error:
dereferencing pointer ‘mbuf.486’ does break strict-aliasing rules
examples/vhost/main.c:1503: note: initialized from here
...
rte_ring.h:740: error:
dereferencing pointer ‘({anonymous})’ does break strict-aliasing rules
examples/vhost/main.c:1804: note: initialized from here
Fixes: d19533e8 ("examples/vhost: copy old vhost example")
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
This issue was discovered under the case of software vm2vm
fowarding. When pkts are received from virtio device 0 and
tx_route to virtio device 1, tx of device 0 is not updated.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
pthread_setname_np() function added in glibc 2.12, using this function
in older glibc versions cause compile error:
error: implicit declaration of function "pthread_setname_np"
This patch adds "rte_thread_setname" macro and set it according
glibc >= 2.12 check, thread naming disabled for older glibc versions,
glibc versions that support "pthread_setname_np" will keep using this
function.
Fixes: 67b6d3039e9e ("eal: set name to threads")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds support for pthread_setname_np on Linux and
pthread_set_name_np on FreeBSD.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: add name in tep_termination example]
According to eventfd man page:
typedef uint64_t eventfd_t;
int eventfd_read(int fd, eventfd_t *value);
int eventfd_write(int fd, eventfd_t value);
eventfd_t is defined for the second arg(value), but not for fd.
Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The following commit broke vhost sample when it runs in second time:
292959c71961acde0cda6e77e737bb0a4df1559c
It should call api to unregister vhost driver when sample exit/quit, then
the socket file will be removed(by calling unlink), and thus make vhost sample
work correctly in the second time startup.
Test report: http://dpdk.org/ml/archives/dev/2015-July/020896.html
Fixes: 292959c71961 ("vhost: cleanup unix socket")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
It fixes the wrong log info when failing to unregister vhost driver.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Minor fix for the referring of a pointer when debug and dump is enabled.
Fixes: 72ec8d77ac68 ("examples/vhost: rework duplicated code")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Latest mbuf changes (priv_size addition and related fixes)
exposed small problem with testpmd and few other sample apps:
when mbuf size is exaclty 2KB or less, that causes
ixgbe PMD to select scattered RX even for configs with 'normal'
max packet length (max_rx_pkt_len == ETHER_MAX_LEN).
To overcome that problem and unify the code, new macro was created
to represent recommended minimal buffer length for mbuf.
When appropriate, samples are updated to use that macro.
Fixes: dfb03bbe2b ("app/testpmd: use standard functions to initialize
mbufs and mbuf pool")
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add a new priv_size field in mbuf structure that should
be initialized at mbuf pool creation. This field contains the
size of the application private data in mbufs.
Introduce new static inline functions rte_mbuf_from_indirect()
and rte_mbuf_to_baddr() to replace the existing macros, which
take the private size in account when attaching and detaching
mbufs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When it's possible, use the new helper to create the mbuf pools.
Most of the patch is trivial, except for the following files that
have some specifics (indirect mbufs):
- ip_fragmentation
- ip_pipeline
- ipv4_multicast
- vhost
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Deduct the mbuf data room size from mempool->elt_size and priv_size,
instead of using an hardcoded value that is not related to the real
buffer size.
To use rte_pktmbuf_pool_init(), the user can either:
- give a NULL parameter to rte_pktmbuf_pool_init(): in this case, the
private size is assumed to be 0, and the room size is
mp->elt_size - sizeof(struct rte_mbuf).
- give the rte_pktmbuf_pool_private filled with appropriate
data_room_size and priv_size values.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
rte_free handles getting passed a NULL pointer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
*alloc() routines return void * and therefore cast is not needed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
[Thomas: reverse num and size parameters in vhost calloc]
This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.
We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.
So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.
This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.
The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VLAN strip functionality.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>