An errata exists where users may see reduced power savings when using
PMD Power Management. This issue occurs when compiling DPDK applications
with GCC-9 on platforms with TSX enabled. In rte_power_monitor_multi(),
the function may return without successfully starting the RTM
transaction (the _xbegin() fails).
Signed-off-by: David Hunt <david.hunt@intel.com>
Add support queue/RSS action for external Rx queue.
In indirection table creation, the queue index will be taken from
mapping array.
This feature supports neither LRO nor Hairpin.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds matching on the optional fields (checksum/key/sequence)
of GRE header. The matching on checksum and sequence fields requests
support from rdma-core with the capability of misc5 and tunnel_header 0-3.
For patterns without checksum and sequence specified, keep using misc for
matching as before, but for patterns with checksum or sequence, validate
capability first and then use misc5 for the matching.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add CNF95xx B0 variant to the list of supported models.
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
cnxk platform supports red/yellow packet marking based on TM
configuration. This patch set hooks to enable/disable packet
marking for VLAN DEI, IP DSCP and IP ECN. Marking enabled only
in scalar mode.
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Added capability and support for inline inbound IP reassembly
in cnxk driver. The IP reassembly offload is supported only
when the inline IPSec security offload is enabled.
In case of IP reassembly incomplete, the mbufs are attached
in the mbuf dynamic field and a dynamic flag is set accordingly.
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
The HW steering uses async queue-based flow rules management
mechanism. The matcher and part of the actions have been
prepared during flow table creation. Some remaining actions
will be constructed during flow creation if needed.
A flow postpone attribute bit describes if flow management
should be applied to the HW directly. An extra push function
is provided to force push all the cached flows to the HW.
Once the flow has been applied to the HW, the pull function
will be called to get the queued creation/destruction flows.
The DR rule flow memory is represented in PMD layer instead
of allocating from HW steering layer. While destroying the
flow, the flow rule memory can only be freed after the CQE
received.
The HW queue job descriptor is currently introduced to convey
the flow information and operation type between the flow
insertion/destruction in the pull function.
This commit adds the basic flow queue operation for:
rte_flow_async_create();
rte_flow_async_destroy();
rte_flow_push();
rte_flow_pull();
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The hardware since ConnectX-7 supports waiting on
specified moment of time with new introduced wait
descriptor. A timestamp can be directly placed
into descriptor and pushed to sending queue.
Once hardware encounter the wait descriptor the
queue operation is suspended till specified moment
of time. This patch update the Tx datapath to handle
this new hardware wait capability.
PMD documentation and release notes updated accordingly.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Queue-based flow rules management mechanism is suitable
not only for flow rules creation/destruction, but also
for speeding up other types of Flow API management.
Indirect action object operations may be executed
asynchronously as well. Provide async versions for all
indirect action operations, namely:
rte_flow_async_action_handle_create,
rte_flow_async_action_handle_destroy and
rte_flow_async_action_handle_update.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
A new, faster, queue-based flow rules management mechanism is needed for
applications offloading rules inside the datapath. This asynchronous
and lockless mechanism frees the CPU for further packet processing and
reduces the performance impact of the flow rules creation/destruction
on the datapath. Note that queues are not thread-safe and the queue
should be accessed from the same thread for all queue operations.
It is the responsibility of the app to sync the queue functions in case
of multi-threaded access to the same queue.
The rte_flow_async_create() function enqueues a flow creation to the
requested queue. It benefits from already configured resources and sets
unique values on top of item and action templates. A flow rule is enqueued
on the specified flow queue and offloaded asynchronously to the hardware.
The function returns immediately to spare CPU for further packet
processing. The application must invoke the rte_flow_pull() function
to complete the flow rule operation offloading, to clear the queue, and to
receive the operation status. The rte_flow_async_destroy() function
enqueues a flow destruction to the requested queue.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Treating every single flow rule as a completely independent and separate
entity negatively impacts the flow rules insertion rate. Oftentimes in an
application, many flow rules share a common structure (the same item mask
and/or action list) so they can be grouped and classified together.
This knowledge may be used as a source of optimization by a PMD/HW.
The pattern template defines common matching fields (the item mask) without
values. The actions template holds a list of action types that will be used
together in the same rule. The specific values for items and actions will
be given only during the rule creation.
A table combines pattern and actions templates along with shared flow rule
attributes (group ID, priority and traffic direction). This way a PMD/HW
can prepare all the resources needed for efficient flow rules creation in
the datapath. To avoid any hiccups due to memory reallocation, the maximum
number of flow rules is defined at the table creation time.
The flow rule creation is done by selecting a table, a pattern template
and an actions template (which are bound to the table), and setting unique
values for the items and actions.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
The flow rules creation/destruction at a large scale incurs a performance
penalty and may negatively impact the packet processing when used
as part of the datapath logic. This is mainly because software/hardware
resources are allocated and prepared during the flow rule creation.
In order to optimize the insertion rate, PMD may use some hints provided
by the application at the initialization phase. The rte_flow_configure()
function allows to pre-allocate all the needed resources beforehand.
These resources can be used at a later stage without costly allocations.
Every PMD may use only the subset of hints and ignore unused ones or
fail in case the requested configuration is not supported.
The rte_flow_info_get() is available to retrieve the information about
supported pre-configurable resources. Both these functions must be called
before any other usage of the flow API engine.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Adds support for priority flow control support for CNXK
platforms.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
The default missing Tx completion timeout was set to 5 seconds.
In order to provide users with the interface to control this timeout
to adjust it with the application's watchdog, the device argument for
controlling this value was added.
The parameter is called 'miss_txc_to' and can be modified using the
devargs interface:
./app -a <bdf>,miss_txc_to=UINT_NUMBER
This parameter accepts values from 0 to 60 and indicates number of
seconds after which the Tx packet will be considered as missing.
HW hints for the Tx completions timeout were removed to do not overwrite
parameter from the user. Also specifying default Tx completion timeout
value was moved from the configuration to init phase in order to
simplify default value assignment.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
ENA was only supporting retrieval of all the xstats name and wasn't
implementing the eth_xstats_get_names_by_id API.
As this API may be more efficient than retrieving all the names, it
tries to avoid excessive string copying.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
ENA driver did not allow applications to call tx_cleanup. Freeing Tx
mbufs was always done by the driver and it was not possible to manually
request the driver to free mbufs.
Modify ena_tx_cleanup function to accept maximum number of packets to
free and return number of packets that was freed.
Signed-off-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
Due to how the ena_com compatibility layer is written, all AQ commands
triggering functions use stack to save results of AQ and then copy them
to user given function.
Therefore to keep the compatibility layer common, introduce ENA_PROXY
macro. It either calls the wrapped function directly (in primary
process) or proxies it to the primary via DPDK IPC mechanism. Since all
proxied calls are taken under a lock share the result data through
shared memory (in struct ena_adapter) to work around 256B IPC parameter
size limit.
New proxy calls can be added by
1. Adding a new message type at the end of enum ena_mp_req
2. Adding new message arguments to the struct ena_mp_body if needed
3. Defining proxy request descriptor with ENA_PROXY_DESC. Its arguments
include handlers for request preparation and response processing.
Any of those may be empty (aside of marking arguments as used).
4. Adding request handling logic to ena_mp_primary_handle()
5. Replacing proxied function calls with ENA_PROXY(adapter, <func>, ...)
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
As the default behavior for arm64 is to alias rte_memcpy as memcpy, ENA
cannot redefine memcpy as rte_memcpy as it would cause nested
declaration.
To make it possible to use optimized memcpy in the ena_com layer on Arm,
the driver now redefines memcpy when it is beneficial:
* For arm64 only when the flag RTE_ARCH_ARM64_MEMCPY was defined
* For arm only when the flag RTE_ARCH_ARM_NEON_MEMCPY was defined
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
ENA uses AENQ for notification about various events, like LSC, keep
alive etc. By default it was enabling all AENQ that were supported by
both the driver and the device. As a result the LSC was always processed
even if the application turned it off explicitly.
As the DPDK provides application with the possibility to configure the
LSC, ENA should respect that. AENQ groups are now being updated upon
configure step, thus LSC can be activated or disabled between ENA PMD
reconfigurations. Moreover, the LSC capability for the device is being
determined dynamically.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
* Split 'bad_csum' Rx statistic into 'l3_csum_bad' and 'l4_csum_bad' to
be able to check which checksum was not calculated properly.
* Add l4_csum_good statistic, which shows how many times L4 Rx checksum
was properly offloaded.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Dawid Gorecki <dgr@semihalf.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
Add support for L2TPv2(include PPP over L2TPv2) protocols FDIR
based on outer MAC src/dst address and L2TPv2 session ID.
Add support for PPPoL2TPv2oUDP protocols FDIR based on inner IP
src/dst address and UDP/TCP src/dst port.
Patterns are listed below:
eth/ipv4(6)/udp/l2tpv2
eth/ipv4(6)/udp/l2tpv2/ppp
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/udp
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/tcp
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Add support for L2TPv2(include PPP over L2TPv2) protocols RSS based
on outer MAC src/dst address and L2TPv2 session ID.
Patterns are listed below:
eth/ipv4/udp/l2tpv2
eth/ipv4/udp/l2tpv2/ppp
eth/ipv6/udp/l2tpv2
eth/ipv6/udp/l2tpv2/ppp
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
In crypto producer mode, producer core enqueues cryptodev with software
generated crypto ops and worker core dequeues crypto completion events
from the eventdev. Event crypto metadata used for above processing is
pre-populated in each crypto session.
Parameter --prod_type_cryptodev can be used to enable crypto producer
mode. Parameter --crypto_adptr_mode can be set to select the crypto
adapter mode, 0 for OP_NEW and 1 for OP_FORWARD.
This mode can be used to measure the performance of crypto adapter.
Example:
./dpdk-test-eventdev -l 0-2 -w <EVENTDEV> -w <CRYPTODEV> -- \
--prod_type_cryptodev --crypto_adptr_mode 1 --test=perf_atq \
--stlist=a --wlcores 1 --plcores 2
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
The Kunpeng930 DMA devices have the same PCI device id with Kunpeng920,
but with different PCI revision and register layout. This patch
introduces the basic initialization for Kunpeng930 DMA devices.
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Add initial support for PMD that allows to control particular pins form
userspace. Moreover PMD allows to attach custom interrupt handlers to
controllable GPIOs.
Main users of this PMD are dataplain applications requiring fast and low
latency access to pin state.
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Rather than the asym session create function returning a session on
success, and a NULL value on error, it is modified to now return int
values - 0 on success or -EINVAL/-ENOTSUP/-ENOMEM on failure.
The session to be used is passed as input.
This adds clarity on the failure of the create function, which enables
treating the -ENOTSUP return as TEST_SKIPPED in test apps.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
A user data field is added to the asymmetric session structure.
Relevant API added to get/set the field.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The rte_cryptodev_asym_session structure is now moved to an internal
header. This will no longer be used directly by apps,
private session data can be accessed via get API.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Rather than using a session buffer that contains pointers to private
session data elsewhere, have a single session buffer.
This session is created for a driver ID, and the mempool element
contains space for the max session private data needed for any driver.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add flow pattern items and header format for matching optional fields
(checksum/key/sequence) in GRE header. And the flags in gre item should
be correspondingly set with the new added items.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Secondary process support had been disabled for the AF_XDP PMD because
there was no logic in place to share the AF_XDP socket file descriptors
between the processes. This commit introduces this logic using the IPC
APIs.
Rx and Tx are disabled in the secondary process due to memory mapping of
the AF_XDP rings being assigned by the kernel in the primary process only.
However other operations including retrieval of stats are permitted.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Support to configure LED in firmware. Driver commands firmware to turn
the LED on and off. And OEM customize their LED solutions in firmware.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Support to get OEM customized LED configuration information from firmware.
And driver needs to adjust the process of PHY setup link, based on this
LED configuration.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Added the ethdev dump API which provides querying private info from device.
There exists many private properties in different PMD drivers, such as
adapter state, Rx/Tx func algorithm in hns3 PMD. The information of these
properties is important for debug. As the information is private, the new
API is introduced.
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
IP Reassembly is a costly operation if it is done in software.
The operation becomes even more costlier if IP fragments are encrypted.
However, if it is offloaded to HW, it can considerably save application
cycles.
Hence, a new offload feature is exposed in eth_dev ops for devices which
can attempt IP reassembly of packets in hardware.
- rte_eth_ip_reassembly_capability_get() - to get the maximum values
of reassembly configuration which can be set.
- rte_eth_ip_reassembly_conf_set() - to set IP reassembly configuration
and to enable the feature in the PMD (to be called before
rte_eth_dev_start()).
- rte_eth_ip_reassembly_conf_get() - to get the current configuration
set in PMD.
Now when the offload is enabled using rte_eth_ip_reassembly_conf_set(),
the resulting reassembled IP packet would be a typical segmented mbuf in
case of success.
And if reassembly of IP fragments is failed or is incomplete (if
fragments do not come before the reass_timeout, overlap, etc), the mbuf
dynamic flags can be updated by the PMD. This is updated in a subsequent
patch.
Signed-off-by: Akhil Goyal <gakhil@marvell.com>
This patch defines new RSS offload type for L2TPv2, which
is required when users want to distribute packets based on
the L2TPv2 session ID field.
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Based on device support and use-case need, there are two different ways
to enable PFC. The first case is the port level PFC configuration, in
this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
configure the PFC, and PFC frames will be generated using based on VLAN
TC value.
The second case is the queue level PFC configuration, in this
case, Any packet field content can be used to steer the packet to the
specific queue using rte_flow or RSS and then use
rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
TC mapping on each queue.
Based on congestion selected on the specific queue, configured TC
shall be used to generate PFC frames.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Csum forwarding mode only supports software UDP/TCP csum calculation
for single segment packets when hardware offload is not enabled.
This patch enables software UDP/TCP csum calculation over multiple
segments.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add functions to call rte_raw_cksum_mbuf() to calculate IPv4/6
UDP/TCP checksum in mbuf which can be over multi-segments.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Aman Singh <aman.deep.singh@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
AF_XDP support is deprecated in libbpf since v0.7.0 [1]. The libxdp library
now provides the functionality which once was in libbpf and which the
AF_XDP PMD relies on. This commit updates the AF_XDP meson build to use the
libxdp library if a version >= v1.2.2 is available. If it is not available,
only versions of libbpf prior to v0.7.0 are allowed, as they still contain
the required AF_XDP functionality.
libbpf still remains a dependency even if libxdp is present, as we use
libbpf APIs for program loading.
The minimum required kernel version for libxdp for use with AF_XDP is v5.3.
For the library to be fully-featured, a kernel v5.10 or newer is
recommended. The full compatibility information can be found in the libxdp
README.
v1.2.2 of libxdp includes an important fix required for linking with DPDK
which is why this version or greater is required. Meson uses pkg-config to
verify the version of libxdp on the system, so it is necessary that the
library is discoverable using pkg-config in order for the PMD to use it. To
verify this, you can run: pkg-config --modversion libxdp
[1] https://github.com/libbpf/libbpf/commit/277846bc6c15
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.
Only up though the first packet header in the PDU can be matched.
RSS on the eCPRI payload is not supported.
Signed-off-by: John Daley <johndale@cisco.com>
Reviewed-by: Hyong Youb Kim <hyonkim@cisco.com>
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>