Issue description: when packets go through vhost example to virtio
device and come back to another virtio device or physical NIC, the
sequence of packets will be changed.
Reported-by: Thomas Long <thomas.long@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
How to reproduce:
1. Start vhost-switch
./examples/vhost/build/vhost-switch -c 0x3 -n 4 -- -p 1 --stat 0
2. Start VM with a virtio port
$ $QEMU -smp cores=2,sockets=1 -m 4G -cpu host -enable-kvm \
-chardev socket,id=char1,path=<path to vhost-user socket> \
-device virtio-net-pci,netdev=vhostuser1 \
-netdev vhost-user,id=vhostuser1,chardev=char1
-object memory-backend-file,id=mem,size=4G,mem-path=<hugetlbfs path>,share=on \
-numa node,memdev=mem -mem-prealloc \
-hda <path to VM img>
3. Start l2fwd in VM
$ ./examples/l2fwd/build/l2fwd -c 0x1 -n 4 -m 1024 -- -p 0x1
4. Use ixia to inject packets in a small data bit rate.
Error:
vhost-switch keeps printing error message:
failed to allocate memory for mbuf.
Root cause:
How many mbufs allocated for a port is calculated by below formula.
NUM_MBUFS_PER_PORT = ((MAX_QUEUES*RTE_TEST_RX_DESC_DEFAULT) + \
(num_switching_cores*MAX_PKT_BURST) + \
(num_switching_cores*RTE_TEST_TX_DESC_DEFAULT) +\
(num_switching_cores*MBUF_CACHE_SIZE))
We suppose num_switching_cores is 1 and MBUF_CACHE_SIZE is 128.
And when initializing port, master core fills mbuf mempool cache,
so there would be some left in that cache, for example 121.
So total mbufs which can be used is:
(MAX_PKT_BURST + MBUF_CACHE_SIZE - 121) = (32 + 128 - 121) = 39.
What makes it worse is that there is a buffer to store mbufs
(which will be tx_burst to physical port), if it occupies some mbufs,
there will be possible < 32 mbufs left, so vhost dequeue prints out
this msg.
In all, it fails to include master core's mbuf mempool cache.
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Change the codes in vhost sample to test virtio offload feature.
These changes include,
1. add two test options: tx-csum and tso.
2. add virtio_tx_offload() function to test vhost TX offload feature
for VM to NIC case;
however, for VM to VM case, it doesn't need to call this function,
the reason is explained in patch 2.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Remove the ipv4_hdr structure defination in vhost sample.
The same structure has already defined in the rte_ip.h file, so we
remove the defination from the sample, and include that header file.
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8e0 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Add #ifndef MAX_QUEUES to change MAX_QUEUES at compile time if needed.
Change MAX_QUEUES from 512 to 128 to reduce the number of hugepages
required by the vhost-switch program.
To change MAX_QUEUES add '-D MAX_QUEUES=512' to the EXTRA_CFLAGS variable,
before building the application.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Fixes following error on gcc 4.4.7:
examples/vhost/main.c: In function ‘new_device’:
rte_ring.h:740: error:
dereferencing pointer ‘mbuf.486’ does break strict-aliasing rules
examples/vhost/main.c:1503: note: initialized from here
...
rte_ring.h:740: error:
dereferencing pointer ‘({anonymous})’ does break strict-aliasing rules
examples/vhost/main.c:1804: note: initialized from here
Fixes: d19533e8 ("examples/vhost: copy old vhost example")
Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
This issue was discovered under the case of software vm2vm
fowarding. When pkts are received from virtio device 0 and
tx_route to virtio device 1, tx of device 0 is not updated.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Qian Xu <qian.q.xu@intel.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
pthread_setname_np() function added in glibc 2.12, using this function
in older glibc versions cause compile error:
error: implicit declaration of function "pthread_setname_np"
This patch adds "rte_thread_setname" macro and set it according
glibc >= 2.12 check, thread naming disabled for older glibc versions,
glibc versions that support "pthread_setname_np" will keep using this
function.
Fixes: 67b6d3039e9e ("eal: set name to threads")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds support for pthread_setname_np on Linux and
pthread_set_name_np on FreeBSD.
Signed-off-by: Ravi Kerur <rkerur@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: add name in tep_termination example]
According to eventfd man page:
typedef uint64_t eventfd_t;
int eventfd_read(int fd, eventfd_t *value);
int eventfd_write(int fd, eventfd_t value);
eventfd_t is defined for the second arg(value), but not for fd.
Here I redefine those fd fields to `int' type, which also removes
the redundant (int) cast. And as the man page stated, we should
cast 1 to eventfd_t type for eventfd_write().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The following commit broke vhost sample when it runs in second time:
292959c71961acde0cda6e77e737bb0a4df1559c
It should call api to unregister vhost driver when sample exit/quit, then
the socket file will be removed(by calling unlink), and thus make vhost sample
work correctly in the second time startup.
Test report: http://dpdk.org/ml/archives/dev/2015-July/020896.html
Fixes: 292959c71961 ("vhost: cleanup unix socket")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
It fixes the wrong log info when failing to unregister vhost driver.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Minor fix for the referring of a pointer when debug and dump is enabled.
Fixes: 72ec8d77ac68 ("examples/vhost: rework duplicated code")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Latest mbuf changes (priv_size addition and related fixes)
exposed small problem with testpmd and few other sample apps:
when mbuf size is exaclty 2KB or less, that causes
ixgbe PMD to select scattered RX even for configs with 'normal'
max packet length (max_rx_pkt_len == ETHER_MAX_LEN).
To overcome that problem and unify the code, new macro was created
to represent recommended minimal buffer length for mbuf.
When appropriate, samples are updated to use that macro.
Fixes: dfb03bbe2b ("app/testpmd: use standard functions to initialize
mbufs and mbuf pool")
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add a new priv_size field in mbuf structure that should
be initialized at mbuf pool creation. This field contains the
size of the application private data in mbufs.
Introduce new static inline functions rte_mbuf_from_indirect()
and rte_mbuf_to_baddr() to replace the existing macros, which
take the private size in account when attaching and detaching
mbufs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When it's possible, use the new helper to create the mbuf pools.
Most of the patch is trivial, except for the following files that
have some specifics (indirect mbufs):
- ip_fragmentation
- ip_pipeline
- ipv4_multicast
- vhost
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Deduct the mbuf data room size from mempool->elt_size and priv_size,
instead of using an hardcoded value that is not related to the real
buffer size.
To use rte_pktmbuf_pool_init(), the user can either:
- give a NULL parameter to rte_pktmbuf_pool_init(): in this case, the
private size is assumed to be 0, and the room size is
mp->elt_size - sizeof(struct rte_mbuf).
- give the rte_pktmbuf_pool_private filled with appropriate
data_room_size and priv_size values.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
rte_free handles getting passed a NULL pointer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
*alloc() routines return void * and therefore cast is not needed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
[Thomas: reverse num and size parameters in vhost calloc]
This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.
We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.
So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.
This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.
The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VLAN strip functionality.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Check if it has already been vlan-tagged packet, if true, avoid inserting a
duplicated vlan tag into it.
This is a possible case when guest has the capability of inserting vlan tag.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The following commit break vm2vm hard mode test cases:
commit db4014f2b65cb31bf ("use factorized default Rx/Tx configuration")
Investigation show that it needs enabling vlan offload since it is turn off
by default in some drivers, and Tx need it, especially when vm2vm is in hard mode.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Search the right segment to increase its data length, rather than
wrongly early return and exit the tx function, which leads to drop all jumbo frame packets
when vm2vm is in hard forward mode.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Increase MAX_QUEUES from 256 to 512.
In vhost example, MAX_QUEUES macro should be the maximum possible queue number of the port.
Theoretically we should only set up the queues that are used, i.e., first rx queue of each pool, or
at most queues from 0 to MAX_QUEUES. Before we revise the implementation and are certain all NICs support
this well, add a remind message to user.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Refer to Pablo's commit (81f7ecd934372fc):
"use factorized default Rx/Tx configuration
For apps that were using default rte_eth_rxconf and rte_eth_txconf
structures, these have been removed and now they are obtained by
calling rte_eth_dev_info_get, just before setting up RX/TX queues."
Move zero copy's deferred start set up ahead.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
In Niantic, if VMDQ mode is set, all queues are allocated to VMDQ in DPDK.
In I40E, only configured part of continous queues are allocated to VMDQ.
The rte_eth_dev_info structure is extended to provide VMDQ queue base,
queue number, and VMDQ pool base information.
This patch support the new VMDQ API in vhost example.
FIXME in PMD:
* added mac address will be flushed at rte_eth_dev_start.
* we don't support selectively setting up queues well.
Test report: http://dpdk.org/ml/archives/dev/2014-December/009427.html
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and
conflicts with DPDK macro version.
Adding RTE_ prefix to avoid conflicts.
CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[Thomas: updated on HEAD, including PPC]
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
This is to enable user space vhost receiving and forwarding broadcast
and multicast packets:
Use new option in command line to enable promisc mode;
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
This patch checks the packet length offset value, and checks if the
extra bytes inside buffer cross page boundary.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Extract a function to replace duplicated codes in one copy and zero copy TX function.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As HW vlan strip will reduce the packet length by minus length of vlan tag,
so it need restore the packet length by plus it.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Since the commit 33e79bed3edc2bcf59 has fixed the issue in vector PMD,
and then it can receive jumbo frame by scatter-gather mode, so remove this check.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The packet passed to virtio_tx_route has been allocated
mbuf, so there is no need to allocate mbuf for it.
Use vlan offload to transmit vlan tagged packet.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove useless mbuf pool]
In switch_worker and virtio_tx_local, rte_vhost_enqueue_burst is called to
push host packets to guest VM.
Before enqueue packets to guest VM, vhost example uses configure-able retry logic
to wait for enough vring entries.
In switch_worker, rte_vhost_dequeue_burst is called to get packets from guest VM,
then virtio device will be bound to a queue in VMDQ for the first transmitted
packet.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
check_hpa_regions, fill_hpa_memory_regions and hpa memory region
data structure are added back from old virtio-net.c.
Add hpa (host physical address) region generation/destroy logic.
gpa<->hpa memory translation regions are generated at new_device,
when a virtio device is ready for packet processing.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Define vhost_dev data structure.
Change reference to virtio_dev to vhost_dev.
The vhost example use vdev data structure for switching related logic
and container for virtio_dev.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Those functions are integrated into the user space vhost library:
virtio_dev_rx, virtio_dev_merge_rx, virtio_dev_tx, virtio_dev_merge_tx,
copy_from_mbuf_to_ring, gpa_to_vva.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>