Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.
To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. To segment a packet requires two steps. The first
is to set proper flags to mbuf->ol_flags, where the flags are the same
as that of TSO. The second is to call the segmentation API,
rte_gso_segment(). This patch introduces the GSO API framework to DPDK.
rte_gso_segment() splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.
The GSO framework clears the PKT_TX_TCP_SEG flag for both the input
packet, and all produced GSO segments in the event of success, since
segmentation in hardware is no longer required at that point.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.
On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.
Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.
In one word, log is log, assertion is assertion, debug is hot pot :)
Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.
Related documents are removed or updated, and bump the eal library
version.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Membership library is an extension and generalization of a traditional
filter (for example Bloom Filter and cuckoo filter) structure.
In general, the Membership library is a data structure that provides a
"set-summary" and responds to set-membership queries of whether a
certain element belongs to a set(s). A membership test for an element
will return the set this element belongs to or not-found if the
element is never inserted into the set-summary.
The results of the membership test are not 100% accurate. Certain
false positive or false negative probability could exist. However,
comparing to a "full-blown" complete list of elements, a "set-summary"
is memory efficient and fast on lookup.
This patch adds the main API definition.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The kernel patch was merged to support pci resource mapping.
https://patchwork.kernel.org/patch/9677441/
So enable igu_uio in the default arm64 configuration.
Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch also adds configuration necessary for compilation of DPAA
Mempool driver into the DPAA specific config file.
CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS=dpaa is also configured to allow
applications to use DPAA mempool as default.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This option both sets the maximum number of segments for Rx/Tx packets and
whether scattered mode is supported at all. This commit removes the latter
as well as configuration file exposure since the most appropriate value
should be decided at run-time.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The patch simplifies DPDK applications analysis for developers which use
Intel® VTune Amplifier.
The empty cycles are such iterations that yielded no RX packets. As far as
DPDK is running in poll mode, wasting cycles is equal to wasting CPU time.
Tracing such iterations can identify that device is underutilized. Tracing
empty cycles becomes even more critical if a system uses a lot of Ethernet
ports.
The patch gives possibility to analyze empty cycles without changing
application code. All needs to be done is just to reconfigure and rebuild
the DPDK itself with CONFIG_RTE_ETHDEV_PROFILE_ITT_WASTED_RX_ITERATIONS
enbled. The important thing here is that this does not affect DPDK code.
The profiling code is not being compiled if user does not specify config
flag.
The patch provides common way to inject RX queues profiling and VTune
specific implementation.
Signed-off-by: Ilia Kurakin <ilia.kurakin@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Current mlx4 OFED version has bug which returns error to
ibv destroy functions when the device was plugged out, in
spite of the resources were destroyed correctly.
Hence, failsafe PMD was aborted, only in debug mode, when
it tries to remove the device in plug-out process.
The workaround added option to replace all claim_zero
assertions with debugging messages, by the way, this option
affects non ibv destroy assertions.
DPDK 18.02 release should work with Mellanox OFED-4.2 which will
include the verbs fix to this bug, then, this patch can
be removed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Earlier bonding pmd was disabled in default config for ppc64le.
Hence, removing it as it has been verified.
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.
This PMD allows for applications to benefit from true hot-plugging
support without having to implement it.
It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.
Applications only see a fail-safe instance, without caring for
underlying activity ensuring their continued operations.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
NXP Copyright has been wrongly worded with '(c)' at various places.
This patch removes these extra characters. It also removes
"All rights reserved".
Only NXP copyright syntax is changed. Freescale copyright is not
modified.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.
To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.
rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.
rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctx_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
Replace the incorrect reference to "Cavium Networks", "Cavium Ltd"
company name with correct the "Cavium, Inc" company name in
copyright headers.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The dpdk-test-eventdev tool is a Data Plane Development Kit (DPDK)
application that allows exercising various eventdev use cases. This
application has a generic framework to add new eventdev based test cases
to verify functionality and measure the performance parameters of DPDK
eventdev devices.
This patch adds the skeleton of the dpdk-test-eventdev application.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Moved all common defines from defconfig_arm64-armv8a-linuxapp-gcc
to common_armv8a_linuxapp.
Created new config arm64-armv8a-linuxapp-clang which adds the
clang support to armv8a.
Now defconfigs arm64-armv8a-linuxapp-gcc/clang contain only the
CONFIG_RTE_TOOLCHAIN* defines and all other common defines are
inherited from common_armv8a_linuxapp.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
* Removed setting CONFIG_RTE_SCHED_VECTOR=n from armv8a config
so that the setting from common_base is taken as the default
setting for armv8a
* Verified the changes with sched_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
It is safe to enable LIBRTE_VHOST_NUMA by default for all
configurations where libnuma is already a default dependency.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.
Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.
Example:
# 90 x 1GB hugepages availavle in a system
cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory
This happens beacause all allocated pages are
on socket 0.
Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:
1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
fashion.
3) Sort pages and choose the most suitable.
In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.
New config option RTE_EAL_NUMA_AWARE_HUGEPAGES introduced and
enabled by default for linuxapp except armv7 and dpaa2.
Enabling of this option adds libnuma as a dependency for EAL.
Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Move all bypass functions to ixgbe pmd and remove function
pointers from the eth_dev_ops struct.
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
TX coalescing waits for ETH_COALESCE_PKT_NUM packets to be coalesced
across bursts before transmitting them. For slow traffic, such as
100 PPS, this approach increases latency since packets are received
one at a time and tx coalescing has to wait for ETH_COALESCE_PKT
number of packets to arrive before transmitting.
To fix this:
- Update rx path to use status page instead and only receive packets
when either the ingress interrupt timer threshold (5 us) or
the ingress interrupt packet count threshold (32 packets) fires.
(i.e. whichever happens first).
- If number of packets coalesced is <= number of packets sent
by tx burst function, stop coalescing and transmit these packets
immediately.
Also added compile time option to favor throughput over latency by
default.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
DPAA2 devices now support cortex-a72. They no longer support a57.
Also fp and simd is no more required to be stated explicitly for
standard a72 core.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Stub callbacks for the generic flow API and a new FLOW debug define.
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Nelson Escobar <neescoba@cisco.com>
When building DPDK with musl, there is need not to disable
backtrace to remove some references to execinfo.h which is
not supported by musl now.
This also applies to some other libc implementation which
doesn't support backtrace() and backtrace_symbols().
musl is an implementation of the userspace portion
of the standard library functionality described in
the ISO C and POSIX standards, plus common extensions.
Got more details about musl from http://www.musl-libc.org .
Signed-off-by: Wei Dai <wei.dai@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Making AVX and AVX512 configurable is useful for performance and power
testing.
The similar kernel patch at https://patchwork.kernel.org/patch/9618883/.
AVX512 support like in rte_memcpy has been in DPDK since 16.04, but it's
still unproven in rich use cases in hardware. Therefore it's marked as
experimental for now, will enable it after enough field test and possible
optimization.
Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
armv8 implementations may have 64B or 128B cache line.
Setting to the maximum available cache line size in generic config to
address minimum DMA alignment across all arm64 implementations.
Increasing the cacheline size has no negative impact to cache invalidation
on systems with a smaller cache line.
The need for the minimum DMA alignment has impact on functional aspects
of the platform so default config should cater the functional aspects.
There is an impact on memory usage with this scheme, but that's not too
important for the single image arm64 distribution use case.
The arm64 linux kernel followed the similar approach for single
arm64 image use case.
http://lxr.free-electrons.com/source/arch/arm64/include/asm/cache.h
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
DPAA2 Hardware Mempool handlers allow enqueue/dequeue from NXP's
QBMAN hardware block.
CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS is set to 'dpaa2', if the pool
is enabled.
This memory pool currently supports packet mbuf type blocks only.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Having packets received without any offload flags given in the mbuf is not
very useful, and performance tests with testpmd indicates little
benefit is got with the current code by turning off the flags. This makes
the build-time option pointless, so we can remove it.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Having packets received without any offload flags given in the mbuf is not
very useful, and performance tests with testpmd indicates little to no
benefit is got with the current code by turning off the flags. This makes
the build-time option pointless, so we can remove it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
The AVP devices are only supported on Intel 64-bit architectures so
adjusting the defconfig attributes accordingly.
Fixes: 908072e9d0e6 ("net/avp: support driver registration")
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>