Add a command-line parameter to l3fwd, to allow the user to specify the
destination mac address for each ethernet port used.
Signed-off-by: Andrey Chilikin <andrey.chilikin@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Do not compile these examples if the related dpdk option is not
enabled, as it's done for other examples. It allows to build
the examples directory with a reduced dpdk configuration.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Fix a cast issue:
examples/netmap_compat/lib/compat_netmap.c:827:10: error: cast to
pointer from integer of different size [-Werror=int-to-pointer-cast]
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Fix the following compilation error:
examples/bond/main.c:717:1: error: control reaches end of
non-void function [-Werror,-Wreturn-type]
The prompt() function does not return anything, so fix its prototype
to be void.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This error is detected:
examples/ip_pipeline/cmdline.c:272:15: error: address of array
'params->file_path' will always evaluate to 'true'
if (!params->file_path) {
~~~~~~~~~^~~~~~~~~
file_path is an array in a structure so it's unneeded to check it.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Minor fix for the referring of a pointer when debug and dump is enabled.
Fixes: 72ec8d77ac ("examples/vhost: rework duplicated code")
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Latest mbuf changes (priv_size addition and related fixes)
exposed small problem with testpmd and few other sample apps:
when mbuf size is exaclty 2KB or less, that causes
ixgbe PMD to select scattered RX even for configs with 'normal'
max packet length (max_rx_pkt_len == ETHER_MAX_LEN).
To overcome that problem and unify the code, new macro was created
to represent recommended minimal buffer length for mbuf.
When appropriate, samples are updated to use that macro.
Fixes: dfb03bbe2b ("app/testpmd: use standard functions to initialize
mbufs and mbuf pool")
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add a new priv_size field in mbuf structure that should
be initialized at mbuf pool creation. This field contains the
size of the application private data in mbufs.
Introduce new static inline functions rte_mbuf_from_indirect()
and rte_mbuf_to_baddr() to replace the existing macros, which
take the private size in account when attaching and detaching
mbufs.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When it's possible, use the new helper to create the mbuf pools.
Most of the patch is trivial, except for the following files that
have some specifics (indirect mbufs):
- ip_fragmentation
- ip_pipeline
- ipv4_multicast
- vhost
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The mbuf pool private area must always be populated in a mbuf pool.
The applications or drivers may expect that for a mbuf pool, the mbuf
pool private area (mbuf_data_room_size and mbuf_priv_size) are
properly filled.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Deduct the mbuf data room size from mempool->elt_size and priv_size,
instead of using an hardcoded value that is not related to the real
buffer size.
To use rte_pktmbuf_pool_init(), the user can either:
- give a NULL parameter to rte_pktmbuf_pool_init(): in this case, the
private size is assumed to be 0, and the room size is
mp->elt_size - sizeof(struct rte_mbuf).
- give the rte_pktmbuf_pool_private filled with appropriate
data_room_size and priv_size values.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
rte_free handles getting passed a NULL pointer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Added a parameter to the RX callback to pass in the number of
available RX packets in addition to the number of dequeued packets.
This provides the RX callback functions with additional information
that can be used to decide how packets from a burst are handled.
The TX callback doesn't require this additional parameter so the RX
and TX callbacks no longer have the same function parameters. As such
the single RX/TX callback has been refactored into two separate callbacks.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Minor refactoring and comments to make the sample app and
code examples clearer for the sample app guide.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Siobhan Butler <siobhan.a.butler@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Previous vhost implementation wrongly name kickfd as callfd and callfd as kickfd.
It is functional correct, but causes confusion.
Exchange kickfd and callfd to avoid confusion.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>
*alloc() routines return void * and therefore cast is not needed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
[Thomas: reverse num and size parameters in vhost calloc]
The function print_client_stats was used in the example without being
clearly exported in the map file. So it breaks linking with shared library
when debug is enabled.
It's better to remove this function as it probably could be implemented
with statistics API.
Fixes: cc7e8ae84f ("add example application for link bonding mode 6")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.
Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.
New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.
Fixes: 2caeb8c014 ("examples/l2fwd-jobstats: new example")
[Thomas: files were missing in the previous commit]
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This app demonstrate usage of new rte_jobstats library.
It is basically the orginal l2fwd with following modifications to met
library requirements:
- main_loop() was split into two jobs: forward job and flush job. Logic
for those jobs is almost the same as in original application.
- stats is moved to rte_alarm callback to not introduce overhead of
printing.
- stats are expanded to show rte_jobstats statistics.
- added new parameter '-l' to automatic thousands separator.
Comparing original l2fwd and l2fwd-jobstats apps will show approach what
is needed to properly write own application with rte_jobstats
measurements.
New available statistics:
- Total and % of fwd and flush execution time
- management time - overhead of rte_timer + overhead of rte_jobstats
library
- Idle time and % of time spent waiting for fwd or flush to be ready to
execute.
- per job execution time and period.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
There was no error checking after calling rte_reorder_create.
Move the creation of the reorder buffer before launching threads
in case of memory error.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Example showing how callbacks can be used to insert a timestamp
into each packet on RX. On TX the timestamp is used to calculate
the packet latency through the app, in cycles.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
This patch removes all references to RTE_MBUF_REFCNT, setting the refcnt
field in the mbuf struct permanently.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently for mbufs with refcnt, we cannot free mbufs with external memory
buffers (ie. vhost zero copy), as they are recognized as indirect
attached mbufs and therefore we free the direct mbuf it points to,
resulting in an error in the case of external memory buffers.
We solve the issue by introducing the IND_ATTACHED_MBUF flag, which indicates
that the mbuf is an indirect attached mbuf pointing to another mbuf.
When we free an mbuf, we only free the direct mbuf if the flag is set.
Freeing an mbuf with external buffer is the same as freeing a non attached mbuf.
The flag is set during attach and clear on detach.
So in the case of vhost zero copy where we have mbufs with external
buffers, by default we just free the mbuf and it is up to the user to deal with
the external buffer.
This patch would allow the removal of the RTE_MBUF_REFCNT config option,
setting refcnt for all mbufs permanently.
The patch also modifies the vhost example as it was using the
RTE_MBUF_INDIRECT macro to detect if it was an mbuf with external buffer.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
RSS offload types were defined separately for 1/10G and 40G NICs,
and have no relationship with flow types. The modifications are to
unify all RSS offload types for all PMDs. Unified RSS offload types
have new and common names which can be used for any PMD or
applications, and decouple from specific hardwares.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
[Thomas: merge with fm10k]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch contains an example for link bonding mode 6.
It interact with user by a command prompt. Available commands are:
Start - starts ARP_thread which respond to ARP_requests and sends
ARP_updates (this
Is enabled by default after startup),
Stop -stops ARP_thread,
Send count ip - send count ARP requests for IP,
Show - prints basic bond information, like IPv4 statistics from clients
Help,
Quit.
The best way to test mode 6 is to use this example together with
previous patch:
[PATCH 3/4] bond: add debug info for mode 6 link bonding.
Connect clients thru switch to bonding machine and send:
arping -c 1 bond_ip or
generate IPv4 traffic to bond_ip (IPv4 traffic from different clients
should be then balanced on slaves in round robin manner).
Signed-off-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
Signed-off-by: Maciej Gajdzica <maciejx.t.gajdzica@intel.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VLAN strip functionality.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Check if it has already been vlan-tagged packet, if true, avoid inserting a
duplicated vlan tag into it.
This is a possible case when guest has the capability of inserting vlan tag.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
This new app makes use of the librte_reorder library.
It requires at least 3 lcores for RX, Workers (1 or more) and TX threads.
Communication between RX-Workers and Workers-TX is done by using rings.
The flow of mbufs is the following:
* RX thread gets mbufs from driver, set sequence number and enqueue
them in ring.
* Workers dequeue mbufs from ring, do some 'work' and enqueue mbufs in
ring.
* TX dequeue mbufs from ring, inserts them in reorder buffer, drains
mbufs from reorder and sends them to the driver.
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Declan Doherty <declan.doherty@intel.com>
[Thomas: add in examples/Makefile]
If at build phase we don't make any trie splitting,
then temporary build structures and resulting RT structure might be
much bigger than current.
>From other side - having just one trie instead of multiple can speedup
search quite significantly.
>From my measurements on rule-sets with ~10K rules:
RT table up to 8 times bigger, classify() up to 80% faster
than current implementation.
To make it possible for the user to decide about performance/space trade-off -
new parameter for build config structure (max_size) is introduced.
Setting it to the value greater than zero, instructs rte_acl_build() to:
- make sure that size of RT table wouldn't exceed given value.
- attempt to minimise number of tries in the table.
Setting it to zero maintains current behaviour.
That introduces a minor change in the public API, but I think the possible
performance gain is too big to ignore it.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
New data type to manipulate 256 bit AVX values.
Rename field in the rte_xmm to keep common naming across SSE/AVX fields.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The following commit break vm2vm hard mode test cases:
commit db4014f2b6 ("use factorized default Rx/Tx configuration")
Investigation show that it needs enabling vlan offload since it is turn off
by default in some drivers, and Tx need it, especially when vm2vm is in hard mode.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
Fix a typo: cmdline_parse_token_string_t was used in place of
cmdline_parse_num_string_t.
Seen with clang-3.5.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Compiling the netmap example with clang-3.5 triggered the following
warning:
compat_netmap.c:783:11: error: overflow converting case value to
switch condition type (3225184658 to 18446744072639768978)
[-Werror,-Wswitch]
case NIOCREGIF:
^
Indeed, an ioctl value should be an unsigned 32 bits, not an int.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Fix the following error:
error: unused function 'l3fwd_simple_forward'
The l3fwd_simple_forward() is maybe unused, due to compilation options
(APP_LOOKUP_METHOD, ENABLE_MULTI_BUFFER_OPTIMIZE). As the combinatorial
is quite big, it looks simpler to add the __attribute__((unused)) on
this function, so that the compiler does not complain.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The check for NULL is in the wrong position in the "if" error leg. The
pointer should be checked for NULL before checking what the value of
what the pointer points to is.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The length of the path to a unix socket is not PATH_MAX but instead is
UNIX_PATH_MAX which is generally just over 100 bytes in size. It's not
actually defined in sys/un.h on linux - despite the man page referencing
it, so calculate the size in the case where it's not defined.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Static analysis shows that once instance of rte_zmalloc is missing
a return value check in the code. This is fixed by adding a return
value check. The malloc call itself is moved to earlier in the function
so that no work is done unless all memory allocation requests have
succeeded - thereby removing the need for rollback on error.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
At error app_acl_init() can return without freeing dynamically allocated memory.
Not really a big problem, as if app_acl_init() fails,
then application would terminate immediately anyway.
Though it is a good coding practise to make a function to cleanup after itself.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When the routing is through the same queue, the app crashed.
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Search the right segment to increase its data length, rather than
wrongly early return and exit the tx function, which leads to drop all jumbo frame packets
when vm2vm is in hard forward mode.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Increase MAX_QUEUES from 256 to 512.
In vhost example, MAX_QUEUES macro should be the maximum possible queue number of the port.
Theoretically we should only set up the queues that are used, i.e., first rx queue of each pool, or
at most queues from 0 to MAX_QUEUES. Before we revise the implementation and are certain all NICs support
this well, add a remind message to user.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Refer to Pablo's commit (81f7ecd934):
"use factorized default Rx/Tx configuration
For apps that were using default rte_eth_rxconf and rte_eth_txconf
structures, these have been removed and now they are obtained by
calling rte_eth_dev_info_get, just before setting up RX/TX queues."
Move zero copy's deferred start set up ahead.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
In Niantic, if VMDQ mode is set, all queues are allocated to VMDQ in DPDK.
In I40E, only configured part of continous queues are allocated to VMDQ.
The rte_eth_dev_info structure is extended to provide VMDQ queue base,
queue number, and VMDQ pool base information.
This patch support the new VMDQ API in vhost example.
FIXME in PMD:
* added mac address will be flushed at rte_eth_dev_start.
* we don't support selectively setting up queues well.
Test report: http://dpdk.org/ml/archives/dev/2014-December/009427.html
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
Tested-by: Jingguo Fu <jingguox.fu@intel.com>
The symmetric_mp example app is set up to allow two processes to
share a NIC port, with each pulling packets from one queue. In order
to have the app continue working when one of the process dies, the
drop_en bit should be set in the NIC configuration. Without this bit
set, the NIC will stall once any queue fills. With the bit set, once
a queue fills, all subsequent packets for that queue are discarded
allowing other queues to continue operating as normal.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When using test-pmd with flow director in FreeBSD, the application will
segfault/Bus error while parsing the command-line. This is due to how
each commands result structure is represented during parsing, where the offsets
for each tokens value is stored in a character array(char result_buf[BUFSIZ])
in cmdline_parse()(./lib/librte_cmdline/cmdline_parse.c).
The overflow occurs where BUFSIZ is less than the size of a commands result
structure, in this case "struct cmd_pkt_filter_result"
(app/test-pmd/cmdline.c) is 1088 bytes and BUFSIZ on FreeBSD is 1024 bytes as
opposed to 8192 bytes on Linux.
The problem can be reproduced by running test-pmd on FreeBSD:
./testpmd -c 0x3 -n 4 -- -i --portmask=0x3 --pkt-filter-mode=perfect
And adding a filter:
add_perfect_filter 0 udp src 192.168.0.0 1024 dst 192.168.0.0 1024 flexbytes
0x800 vlan 0 queue 0 soft 0x17
This patch removes the OS dependency on BUFSIZ and defines and uses a
library #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
Added boundary checking to ensure this buffer size cannot overflow, with
an error message being produced.
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
http://git.droids-corp.org/?p=libcmdline.git;a=commitdiff;h=b1d5b169352e57df3fc14c51ffad4b83f3e5613f
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and
conflicts with DPDK macro version.
Adding RTE_ prefix to avoid conflicts.
CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[Thomas: updated on HEAD, including PPC]
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Some of the NICs supported by DPDK have a possibility to accelerate TCP
traffic by using segmentation offload. The application prepares a packet
with valid TCP header with size up to 64K and deleguates the
segmentation to the NIC.
Implement the generic part of TCP segmentation offload in rte_mbuf. It
introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
and tso_segsz (MSS of packets).
To delegate the TCP segmentation to the hardware, the user has to:
- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
PKT_TX_TCP_CKSUM)
- set the flag PKT_TX_IPV4 or PKT_TX_IPV6
- set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
- calculate the pseudo header checksum without taking ip_len in account,
and set it in the TCP header, for instance by using
rte_ipv4_phdr_cksum(ip_hdr, ol_flags)
The API is inspired from ixgbe hardware (the next commit adds the
support for ixgbe), but it seems generic enough to be used for other
hw/drivers in the future.
This commit also reworks the way l2_len and l3_len are used in igb
and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.
Signed-off-by: Mirek Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Provides a small sample application(guest_vm_power_mgr) to run on a VM.
The application is run by providing a core mask(-c) and number of memory
channels(-n). The core mask corresponds to the number of lcore channels to
attempt to open. A maximum of 64 channels per VM is allowed. The channels must
be monitored by the host.
After successful initialisation a CPU frequency command can be sent to the host
using:
set_cpu_freq <lcore_num> <up|down|min|max>.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
For launching CLI thread and Monitor thread and initialising
resources.
Requires a minimum of two lcores to run, additional cores specified by eal core
mask are not used.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
A wrapper around librte_power(using ACPI cpufreq), providing locking around the
non-threadsafe library, allowing for frequency changes based on core masks and
core numbers from both the CLI thread and epoll monitor thread.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The CLI is used for administrating the channel monitor and manager and
manually setting the CPU frequency on the host.
Supports the following commands:
add_vm [Mul-choice STRING]: add_vm|rm_vm <name>, add a VM for subsequent
operations with the CLI or remove a previously added VM from the VM Power
Manager
rm_vm [Mul-choice STRING]: add_vm|rm_vm <name>, add a VM for subsequent
operations with the CLI or remove a previously added VM from the VM Power
Manager
add_channels [Fixed STRING]: add_channels <vm_name> <list>|all, add
communication channels for the specified VM, the virtio channels must be
enabled in the VM configuration(qemu/libvirt) and the associated VM must be
active. <list> is a comma-separated list of channel numbers to add, using the
keyword 'all' will attempt to add all channels for the VM
set_channel_status [Fixed STRING]:
set_channel_status <vm_name> <list>|all enabled|disabled, enable or disable
the communication channels in list(comma-separated) for the specified VM,
alternatively list can be replaced with keyword 'all'. Disabled channels will
still receive packets on the host, however the commands they specify will be
ignored. Set status to 'enabled' to begin processing requests again.
show_vm [Fixed STRING]: show_vm <vm_name>, prints the information on the
specified VM(s), the information lists the number of vCPUS, the pinning to
pCPU(s) as a bit mask, along with any communication channels associated with
each VM
show_cpu_freq_mask [Fixed STRING]: show_cpu_freq_mask <mask>, Get the current
frequency for each core specified in the mask
set_cpu_freq_mask [Fixed STRING]: set_cpu_freq <core_mask> <up|down|min|max>,
Set the current frequency for the cores specified in <core_mask> by scaling
each up/down/min/max.
show_cpu_freq [Fixed STRING]: Get the current frequency for the specified core
set_cpu_freq [Fixed STRING]: set_cpu_freq <core_num> <up|down|min|max>,
Set the current frequency for the specified core by scaling up/down/min/max
quit [Fixed STRING]: close the application
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The manager is responsible for adding communications channels to the Monitor
thread, tracking and reporting VM state and employs the libvirt API for
synchronization with the KVM Hypervisor. The manager interacts with the
Hypervisor to discover the mapping of virtual CPUS(vCPUs) to the host
physical CPUS(pCPUs) and to inspect the VM running state.
The manager provides the following functionality to the CLI:
1) Connect to a libvirtd instance, default: qemu:///system
2) Add a VM to an internal list, each VM is identified by a "name" which must
correspond a valid libvirt Domain Name.
3) Add communication channels associated with a VM to the epoll based Monitor
thread.
The channels must exist and be in the form of:
/tmp/powermonitor/<vm_name>.<channel_number>. Each channel is a
Virtio-Serial endpoint configured as an AF_UNIX file socket and opened in
non-blocking mode.
Each VM can have a maximum of 64 channels associated with it.
4) Disable or re-enable VM communication channels, channels once added to the
Monitor thread remain in that threads control, however acting on channel
requests can be disabled and renabled via CLI.
The monitor is an epoll based infinite loop running in a separate thread that
waits on channel events from VMs and calls the corresponding functions. Channel
definitions from the manager are registered via the epoll event opaque pointer
when calling epoll_ctl(EPOLL_CTL_ADD), this allows for obtaining the channels
file descriptor for reading EPOLLIN events and mapping the vCPU to pCPU(s)
associated with a request from a particular VM.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This is a very simple example app for doing packet forwarding with the
Intel DPDK. It's designed to serve as a start point for people new to
the Intel DPDK and who want to develop a new app.
Therefore it's meant to:
* have as good a performance out-of-the-box as possible, using the
best-known settings for configuring the PMDs, so that any new apps can
be based off it.
* be kept as short as possible to make it easy to understand it and get
started with it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Since commit 08b563ffb1 ("mbuf: replace data pointer by an offset"),
data is not an mbuf field anymore.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
A new sample app that shows the usage of the distributor library. This
app works as follows:
* An RX thread runs which pulls packets from each ethernet port in turn
and passes those packets to worker using a distributor component.
* The workers take the packets in turn, and determine the output port
for those packets using basic l2forwarding doing an xor on the source
port id.
* The RX thread takes the returned packets from the workers and enqueue
those packets into an rte_ring structure.
* A TX thread pulls the packets off the rte_ring structure and then
sends each packet out the output port specified previously by the worker
* Command-line option support provided only for portmask.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This is to enable user space vhost receiving and forwarding broadcast
and multicast packets:
Use new option in command line to enable promisc mode;
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
This patch supports new VMDQ API in vmdq example.
Besides, it allows users to specify num_pools different with
max_nb_pools, thus the polling thread needn't to poll queues
of all pools.
Due to i40e implementation issue, there is no default mac for
VMDQ pool, so app needs to specify mac address for each pool
explicitly.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Jing Chen <jing.d.chen@intel.com>
These references to drivers break the layering isolation between
application and drivers.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
This patch checks the packet length offset value, and checks if the
extra bytes inside buffer cross page boundary.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Extract a function to replace duplicated codes in one copy and zero copy TX function.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As HW vlan strip will reduce the packet length by minus length of vlan tag,
so it need restore the packet length by plus it.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Since the commit 33e79bed3e has fixed the issue in vector PMD,
and then it can receive jumbo frame by scatter-gather mode, so remove this check.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
The packet passed to virtio_tx_route has been allocated
mbuf, so there is no need to allocate mbuf for it.
Use vlan offload to transmit vlan tagged packet.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove useless mbuf pool]
In switch_worker and virtio_tx_local, rte_vhost_enqueue_burst is called to
push host packets to guest VM.
Before enqueue packets to guest VM, vhost example uses configure-able retry logic
to wait for enough vring entries.
In switch_worker, rte_vhost_dequeue_burst is called to get packets from guest VM,
then virtio device will be bound to a queue in VMDQ for the first transmitted
packet.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
check_hpa_regions, fill_hpa_memory_regions and hpa memory region
data structure are added back from old virtio-net.c.
Add hpa (host physical address) region generation/destroy logic.
gpa<->hpa memory translation regions are generated at new_device,
when a virtio device is ready for packet processing.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Define vhost_dev data structure.
Change reference to virtio_dev to vhost_dev.
The vhost example use vdev data structure for switching related logic
and container for virtio_dev.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
Those functions are integrated into the user space vhost library:
virtio_dev_rx, virtio_dev_merge_rx, virtio_dev_tx, virtio_dev_merge_tx,
copy_from_mbuf_to_ring, gpa_to_vva.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
This patch copies two files main.c/main.h from most recent vhost example
(before transforming into a library) as the base for new vhost example.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
The previous implementation of rte_kni_alloc() was allocating memzones with a
name composed of a fixed string and the interface name. When an application was
allocating and deallocating multiple interfaces with different names, memzones
were quickly exhausted, even though memzones from deallocated interfaces were
never used anymore (unless an interface with the same name was re-allocated).
As a result, the application was unable to allocate more KNI interfaces with
different names.
This patch implements the KNI memzone pool in order to prevent memzone
exhaustion when allocating/deallocating KNI interfaces. It adds a new API call,
rte_kni_init(max_kni_ifaces) that shall be called before any call to
rte_kni_alloc() if KNI is used. The memzones are pre-allocated with interface-
independent names so that they can be reused.
Signed-off-by: Marc Sune <marc.sune@bisdn.de>
Acked-by: Helin Zhang <helin.zhang@intel.com>
Those files will be refactored in subsequent patches to form user space
vhost library.
Makefile and main.h are removed.
main.c is renamed to vhost_rxtx.c and will provide vring enqueue/dequeue API.
virtio-net.h is renamed to rte_virtio_net.h which is the API header file.
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Changchun Ouyang <changchun.ouyang@intel.com>
[Thomas: remove from examples Makefile and merge file renaming]
For apps that were using default rte_eth_rxconf and rte_eth_txconf
structures, these have been removed and now they are obtained by
calling rte_eth_dev_info_get, just before setting up RX/TX queues.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Rename start_?x_per_q to ?x_deferred_start
and add comments.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Since commit a155d43011 ("support link bonding device initialization"),
rte_eal_pci_probe() is called in rte_eal_init().
So it doesn't have to be called by application anymore.
It has been fixed for testpmd in commit 2950a76931,
and this patch remove it from other applications.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The mbuf structure already contains a pointer to the beginning of the
buffer (m->buf_addr). It is not needed to use 8 bytes again to store
another pointer to the beginning of the data.
Using a 16 bits unsigned integer is enough as we know that a mbuf is
never longer than 64KB. We gain 6 bytes in the structure thanks to
this modification.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
* Updated to apply to latest on mainline.
* Disabled vector PMD in config as it relies heavily on the mbuf layout
This will be re-enabled in a subsequent commit once vPMD has been
reworked to take account of mbuf changes.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The vlan_macip structure combined a vlan tag id with l2 and l3 headers
lengths for tracking offloads. However, this structure was only used as
a unit by the e1000 and ixgbe drivers, not generally.
This patch removes the structure from the mbuf header and places the
fields into the mbuf structure directly at the required point, without
any net effect on the structure layout. This allows us to treat the vlan
tags and header length fields as separate for future mbuf changes. The
drivers which were written to use the combined structure still do so,
using a driver-local definition of it.
Reduce perf regression caused by splitting vlan_macip field. This is
done by providing a single uint16_t value to allow writing/clearing
the l2 and l3 lengths together. There is still a small perf hit to the
slow path TX due to the reads from vlan_tci and l2/l3 lengths being
separated. (<5% in my tests with testpmd with no extra params).
Unfortunately, this cannot be eliminated, without restoring the vlan
tags and l2/l3 lengths as a combined 32-bit field. This would prevent
us from ever looking to move those fields about and is an artificial tie
that applies only for performance in igb and ixgbe drivers. Therefore,
this patch keeps the vlan_tci field separate from the lengths as the
best solution going forward.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
In some cases we may want to tag a packet for a particular destination
or output port, so rename the "in_port" field in the mbuf to just "port"
so that it can be re-used for this purpose if an application needs it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The rte_pktmbuf structure was initially included in the rte_mbuf
structure. This was needed when there was 2 types of mbuf (ctrl and
packet). As the control mbuf has been removed, we can merge the
rte_pktmbuf into the rte_mbuf structure.
Advantages of doing this:
- the access to mbuf fields is easier (ex: m->data instead of m->pkt.data)
- make the structure more consistent: for instance, there was no reason
to have the ol_flags field in rte_mbuf
- it will allow a deeper reorganization of the rte_mbuf structure in the
next commits, allowing to gain several bytes in it
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
[Bruce: updated for latest code and new example apps]
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The initial role of rte_ctrlmbuf is to carry generic messages (data
pointer + data length) but it's not used by the DPDK or it applications.
Keeping it implies:
- loosing 1 byte in the rte_mbuf structure
- having some dead code rte_mbuf.[ch]
This patch removes this feature. Thanks to it, it is now possible to
simplify the rte_mbuf structure by merging the rte_pktmbuf structure
in it. This is done in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
* Updated patch to HEAD.
* Modified patch to retain the old function names for ctrl mbufs as
macros. This helps with app compatibility, and allows the concept
of a control mbuf to be reintroduced via a single-bit flag in
a future change.
* Updated the packet framework ip_pipeline example application to
work following this change.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
It seems that RTE_MBUF_SCATTER_GATHER is not the proper name for the
feature it provides. "Scatter gather" means that data is stored using
several buffers. RTE_MBUF_REFCNT seems to be a better name for that
feature as it provides a reference counter for mbufs.
The macro RTE_MBUF_SCATTER_GATHER is poisoned to ensure this
modification is seen by drivers or applications using it.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Make ACL library to build/work on 'default' architecture:
- make rte_acl_classify_scalar really scalar
(make sure it wouldn't use sse4 instrincts through resolve_priority()).
- Provide two versions of rte_acl_classify code path:
rte_acl_classify_sse() - could be build and used only on systems with sse4.2
and upper, return -ENOTSUP on lower arch.
rte_acl_classify_scalar() - a slower version, but could be build and used
on all systems.
- Addition of a new function rte_acl_classify_alg. This function lets you
specify an enum value to override the acl contexts default algorithm when doing
a classification. This allows an application to specify a classification
algorithm without needing to publicize each method. I know there was concern
over keeping those methods public, but we don't have a static ABI at the moment,
so this seems to me a reasonable thing to do, as it gives us less of an ABI
surface to worry about.
- keep common code shared between these two codepaths.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
No need for that 'x bit' on source files.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
This patch support mergeable RX feature and thus support jumbo frame RX and TX
in user space vhost(as virtio backend).
On RX, it secures enough room from vring to accommodate one complete scattered
packet which is received by PMD from physical port, and then copy data from
mbuf to vring buffer, possibly across a few vring entries and descriptors.
On TX, it gets a jumbo frame, possibly described by a few vring descriptors which
are chained together with the flags of 'NEXT', and then copy them into one scattered
packet and TX it to physical port through PMD.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Acked-by: Huawei Xie <huawei.xie@intel.com>
Latest changes introduced a small degradation for the corner case
when each input packet is destined to the different port.
For the test-case when 1 core manages 4 ports and packet stream looks like:
IPV4_DSTPORT0, IPV4_DSTPORT1, IPV4_DSTPORT3, IPV4_DSTPORT4, IPV4_DSTPORT0, ...
non-optimised code path outperforms optimised one by 2-3%.
These changes supposed to close that gap.
From my testing: now for the case described above optimised code path
produces same numbers as non-optimised one.
For other test-cases numbers remain about the same.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When adding this packet framework sample (commit 77a3346),
it has been forgotten to add it into the global makefile for
"make examples".
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
L3fwd-acl and ip pipeline apps were using old
x86_64-default-linuxapp-gcc as their default target,
instead of x86_64-native-linuxapp-gcc
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
After enable vector pmd, qos_sched only send 32 packets every burst.
That will cause some packets not transmitted and therefore mempool
will be drain after a while.
App qos_sched now will re-send the packets which failed to send out in
previous tx function.
Signed-off-by: Yong Liu <yong.liu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Do not try to build Linux examples in a BSD environment.
Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The function rte_snprintf serves no useful purpose. It is the
same as snprintf() for all valid inputs. Deprecate it and
replace all uses in current code.
Leave the tests for the deprecated function in place.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Mark the rte_log, cmdline_printf and rte_snprintf functions as
being printf-style functions. This causes compilation errors
due to mis-matched parameter types, so the parameter types are
fixed where appropriate.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Make everything NUMA-related depend on lcore sockets, not device
sockets. This is because the init_mem() function allocates all data
structures based on NUMA nodes of the lcores in the coremask. Therefore,
when no cores are on socket 0, but there are devices on socket 0, it may
lead to segmentation faults.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
- i40e RSS flags have been added (and enlarged to 64-bit)
- A new configuration of 'uint8_t rss_key_len' has been added in
'struct rte_eth_rss_conf' to support different length of RSS keys.
- In each PMD, only the supported flags are masked.
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Jing Chen <jing.d.chen@intel.com>
Acked-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Jijiang Liu <jijiang.liu@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Heqing Zhu <heqing.zhu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
This patch fixes a core id issue in sample vmdq, in case core mask
doesn't start with lcore_id 0 but 20, for instance,
queue id should use core_id instead of lcore_id.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Split input error stats to have a better understanding of why packets
have been dropped.
Keep ierrors field untouched for backward compatibility.
Signed-off-by: Ivan Boule <ivan.boule@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This Packet Framework sample application illustrates the capabilities
of the Intel DPDK Packet Framework toolbox.
It creates different functional blocks used by a typical IPv4 framework like:
flow classification, firewall, routing, etc.
CPU cores are connected together through standard interfaces built on SW rings,
which each CPU core running a separate pipeline instance.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
Acked by: Ivan Boule <ivan.boule@6wind.com>
RTE_LOGTYPE_CONFIG, RTE_LOGTYPE_DATA and RTE_LOGTYPE_PORT are renamed
by adding VHOST prefix.
It prevents from conflict with new RTE_LOGTYPE_PORT of packet framework.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
New stuff:
* Support for regular traffic as well as IPv4 and IPv6
* Simplified config
* Routing table printed out on start
* Uses LPM/LPM6 for lookup
* Unmatched traffic is sent to the originating port
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Technically, fragmentation table can work for both IPv4 and IPv6
packets, so we're renaming everything to be generic enough to make sense
in IPv6 context.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Demonstrates the use of the ACL library in the DPDK application to
implement packet classification and L3 forwarding.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
[Thomas: some code-style changes]
With latest HW and optimised RX/TX path there is a huge gap between
tespmd iofwd and l3fwd performance results.
So there is an attempt to optimise l3fwd LPM code path and reduce the gap:
- Instead of processing each input packet up to completion -
divide packet processing into several stages and perform
stage by stage for the whole burst.
- Unroll things by the factor of 4 whenever possible.
- Use SSE instincts for some operations (bswap, replace MAC addresses, etc).
- Avoid TX packet buffering whenever possible.
- Move some checks from RX/TX into setup phase.
Note that new(optimized) code path can be switched on/off by setting
ENABLE_MULTI_BUFFER_OPTIMIZE macro to 1/0.
Some performance data:
SUT: dual-socket board IVB 2.8GHz, 2x1GB pages.
4 ports on 4 NICs (all at socket 0) connected to the traffic generator.
kernel: 3.11.3-201.fc19.x86_64, gcc: 4.8.2.
64B packets, using the packet flooding method.
All 4 ports are managed by one logical core:
Optimised scalar PMD RX/TX was used.
DIFF % (NEW-OLD)
IPV4-CONT-BURST: +23%
IPV6-CONT-BURST : +13%
IPV4/IPV6-CONT-BURST: +8%
IPV4-4STREAMSX8: +7%
IPV4-4STREAMSX1: -2%
Test cases description:
IPV4-CONT-BURST - IPV4 packets all packets from the one input port
are destined for the same output port.
IPV6-CONT-BURST - IPV6 packets all packets from the one input port
are destined for the same output port.
IPV4/IPV6-CONT-BURST - mix of the first 2 with interleave=1
(e.g: IPV4,IPV6,IPV4,IPV6, ...)
IPV4-4STREAMSX1 - 4 streams of IPV4 packets, where all packets
from same stream are destined for the same output port
(e.g: IPV4_DST_P0, IPV4_DST_P1, IPV4_DST_P2, IPV4_DST_P3, IPV4_DST_P0, ...)
IPV4-4STREAMSX8 - same as above but packets for each stream
are coming in groups of 8
(e.g: IPV4_DST_P0 X 8, IPV4_DST_P1 X 8, IPV4_DST_P2 X 8, IPV4_DST_P3 X 8,
IPV4_DST_P0 X 8, ...)
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com>
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This patch supports user space vhost zero copy. It removes packets copying between host and guest in RX/TX.
It introduces an extra ring to store the detached mbufs. At initialization stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address allocated by guest for RX and
translates them into host space addresses, then attaches them to mbufs and puts the attached mbufs into
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to set the DMA addresses.
For TX, it gets the buffer addresses of available packets to be transmitted from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them back into the extra ring.
Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Tested-by: Waterman Cao <waterman.cao@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The "default" part in configuration filenames is misleading.
Rename this as "native", as this is the RTE_MACHINE that is set in these files.
This should make it clearer for people who build DPDK on a system then run it on
another one.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Now that we've converted all the pmds in dpdk to use the driver registration
macro, rte_pmd_init_all has become empty. As theres no reason to keep it around
anymore, just remove it and fix up all the eample callers.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the ixgbe pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the ixgbe library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Convert the igb pmd driver to use the PMD_REGISTER_DRIVER macro.
This means that the test applications now have no reference to the igb library
when building DSO's and must specify its use on the command line with the -d
option. Static linking will still initalize the driver automatically.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The DPDK dump functions are useful for remote debugging of an
applications. But when application runs as a daemon, stdout
is typically routed to /dev/null.
Instead change all these functions to take a stdio FILE * handle
instead. An application can then use open_memstream() to capture
the output.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: fix quota_watermark example]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
It is not allowed to reference a an absolute file name in SRCS-y.
A VPATH has to be used, else the dependencies won't be checked
properly.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The example does not compile as the linker complains about duplicated
symbols.
Remove -lsched from LDLIBS, it is already present in rte.app.mk and
added by the DPDK framework automatically.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
It is now possible to build all examples by doing the following:
user@droids:~/dpdk.org$ cd examples
user@droids:~/dpdk.org/examples$ make RTE_SDK=${PWD}/.. \
RTE_TARGET=x86_64-default-linuxapp-gcc
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This provides a sample application and library showing how to use the
Intel(R) DPDK with basic netmap applications.
The Netmap compatibility library provides a minimal set of APIs to give the ability to
programs written against the Netmap APIs to be run with minimal changes to their
source code, using the Intel® DPDK to perform the actual packet I/O.
Since Netmap applications use regular system calls, like open(), ioctl() and
mmap() to communicate with the Netmap kernel module performing the packet I/O,
the compat_netmap library provides a set of similar APIs to use in place of those
system calls, effectively turning a Netmap application into a Intel(R) DPDK one.
The provided library is currently minimal and doesn’t support all the features that
Netmap supports, but is enough to run simple applications, such as the
bridge example included.
The application requires a single command line option:
-i INTERFACE is the number of a valid Intel(R) DPDK port to use.
If a single -i parameter is given, the interface will send back all the traffic it
receives. If two -i parameters are given, the two interfaces form a bridge, where
traffic received on one interface is replicated and sent by the other interface.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This provides a new sample application which demonstrates how
the ivshmem library and EAL capabilities can be used to create
a zero-copy fast-path for packet communication between host
machine and guest vm.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The vhost sample application demonstrates integration of the Intel(R) Data Plane
Development Kit (Intel(R) DPDK) with the Linux KVM hypervisor by implementing the
vhost-net offload API. The sample application performs simple packet switching
between virtual machines based on Media Access Control (MAC) address or Virtual
Local Area Network (VLAN) tag. The splitting of ethernet traffic from an external switch
is performed in hardware by the Virtual Machine Device Queues (VMDQ) and Data
Center Bridging (DCB) features of the Intel(R) 82599 10 Gigabit Ethernet Controller.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Updates including support for Intel® Communications Chipset
8925 to 8955 Series.
Add support for the wireless KASUMI algorithm.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
A series of minor changes to example applications included in the
Intel DPDK 1.6 release.
* changes to NIC configuration flags, e.g. specifying RSS
* replacing local "DIM" macro with common "RTE_DIM" macro
* minor whitespace changes for alignment.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This provides a para-virtualization packet switching solution, based on the
Xen hypervisor’s Grant Table, which provides simple and fast packet
switching capability between guest domains and host domain based on
MAC address or VLAN tag.
This solution is comprised of two components; a Poll Mode Driver (PMD)
as the front end in the guest domain and a switching back end in the
host domain. XenStore is used to exchange configure information
between the PMD front end and switching back end,
including grant reference IDs for shared Virtio RX/TX rings, MAC
address, device state, and so on.
The front end PMD can be found in the Intel DPDK directory lib/
librte_pmd_xenvirt and back end example in examples/vhost_xen.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Core support for using the Intel DPDK with Xen Dom0 - including EAL
changes and mempool changes. These changes encompass how memory mapping
is done, including support for initializing a memory pool inside an
already-allocated block of memory.
KNI sample app updated to use KNI close function when used with Xen.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Changes to allow compilation and use on FreeBSD. Includes:
* contigmem and nic_uio driver for FreeBSD
* new EAL instance
* new "bsdapp" compilation target
* various compilation fixes due to differences between linux and freebsd
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>