The zero value in flow MARK action is reported in Rx datapath
as tagged with zero FDIR ID. Once packet is marked in flow engine
it will be always reported as tagged. For metadata only the zero
value means there is "no metadata" in the packet and the metadata
flag is not set for the case.
Fixes: 3ceeed9f78 ("doc: update flow mark action in mlx5 guide")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This sets the correct minimal requirements for these features:
- Buffer Split offload is supported/verified on ConnectX-5
- Tx scheduling requires ConnectX-6DX and depends on firmware version
Fixes: cb7b0c24c8 ("doc: update hardware offloads support in mlx5 guide")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Verbs cannot be used to configure newly introduced miniCQE formats for
Flow Tag and L3/L4 Header compression. Support for these formats has
been added to the DevX configuration only. And the RX queue descriptor
has been updated with the CQE compression format information only as
well. But the datapath relies on this info no matter which method is
used for Rx queues configuration. Set proper CQE compression format
information in the Verbs configuration to fix the miniCQE parsing logic.
Fixes: 54c2d46b16 ("net/mlx5: support flow tag and packet header miniCQEs")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add support for new MODIFY_FIELD action to the Mellanox PMD.
This is the generic API that allows to manipulate any packet
header field by copying data from another packet field or
mark, metadata, tag, or immediate value (or pointer to it).
Since the API is generic and covers a lot of action under its
umbrella it makes sense to implement all the mechanics gradually
in order to move to this API for any packet field manipulations
in the future. This is the first step of RTE flows consolidation.
The modify field RTE flow action supports three operations: set,
add and sub. This patch brings to live only the "set" operation.
Support is provided for any packet header field as well as
meta/tag/mark and immediate value can be used as a source.
There are few limitations for this first version of API support:
- encapsulation levels are not supported, just outermost header
can be manipulated for now.
- offsets can only be 4-bytes aligned: 32, 64 and 96 for IPv6.
- the special ITEM_START ID is not supported as we do not allow
to cross packet header field boundaries yet.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds support of the mbuf fast free offload to the
transmit datapath. This offload allows freeing the mbufs on
transmit completion in the most efficient way. It requires
the all mbufs were allocated from the same pool, have
the reference counter value as 1, and have no any externally
attached buffers.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
There some limitations added for the MARK action value range.
Fixes: 2d241515eb ("net/mlx5: add devarg for extensive metadata support")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently, the maximal flow priority in non-root table to user
is 4, it's not enough for user to do some flow match by priority,
such as LPM, for one IPV4 address, we need 32 priorities for each
bit of 32 mask length.
PMD will manage 3 sub-priorities per user priority according to L2,
L3 and L4. The internal priority is 16 bits, user can use priorities
from 0 - 21843.
Those enlarged flow priorities are only used for ingress or egress
flow groups greater than 0 and for any transfer flow group.
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
While there's the modify action and sample action with ratio=1
in the E-Switch flow, and modify action is after the sample
action, means that the modify should only impact on after sample.
MLX5 PMD will monitor the above case and split the E-Switch flow
into two sub flows, similar as sample flow did before:
- the prefix sub flow with all actions preceding the sample and the
sample action itself, also append the new jump action after sample
in the prefix sub flow;
- the suffix sub flow with the modify action and other actions
following the sample action.
The flow split as below:
Original flow: items / actions pre / sample / modify / actions sfx
prefix sub flow -
items / actions pre / set_tag action / sample / jump
suffix sub flow -
tag_item / modify / actions sfx
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
mlx5 E-Switch mirroring is implemented as multiple destination array in
one steering table. The array currently supports only port ID as
destination actions.
This patch adds the jump action support to the array as one of
destination.
The packets can be mirrored to the port and jump to the next table in
the same destination array allowing to continue handling in the new
table.
For example:
set sample_actions 0 port_id id 1 / end
flow create 0 ingress transfer pattern eth / end actions
sample ratio 1 index 0 / jump group 1 / end
flow create 1 ingress transfer group 1 pattern eth / end actions
set_mac_dst mac_addr 00:aa:bb:cc:dd:ee / port_id id 2 / end
The flow results all the matched ingress packets are mirrored
to port id 1 and go to group 1. In the group 1, packets are modified
with the destination mac and sent to port id 2.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
PMD validates the rss action in the sample sub-actions list,
then translates into rdma-core action and it will be used for sample
path destination.
If the RSS action is in both sample sub-actions list and original flow,
the rss level and rss type in the sample sub-actions list should be
consistent with the original flow list, since the expanding items
for RSS should be the same for both actions.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The GENEVE TLV option matching flows must be created
using a translation function.
This function checks whether we already created a Devx
object for the matching and either creates the objects
or updates the reference counter.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds the translation function which
sets the qfi, PDU type.
The next extension header which indicates the following
extension header type is set to 0x85 - a PDU session
container.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The data-path code doesn't take care on 'rxq_cqe_pad_en' and use padded
CQE for any case when the system cache-line size is 128B.
This makes the argument redundant.
Remove it.
Fixes: bc91e8db12 ("net/mlx5: add 128B padding of Rx completion entry")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In DPDK 20.11 the following offload features are added:
* Buffer Split
* Sampling
* Tunnel offload
* 2-port hairpin
* RSS shared action
* Age shared action
Update the relevant tables with OFED/rdma-core/NIC versions.
Signed-off-by: Asaf Penso <asafp@nvidia.com>
The mlx5 PMD supports various Rx burst functions.
Each function is enabled differently and supports different features.
Signed-off-by: Asaf Penso <asafp@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Replace -w / --pci-whitelist with -a / --allow options
and --pci-blacklist with --block.
The -b short option remains unchanged.
Allow the old options for now, but print a nag
warning since old options are deprecated.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniCQE size is only 8 bytes and this limits the ability
to successfully keep the compression session in case of various
traffic patterns.
The current miniCQE format only keeps the compression session alive
in case of uniform traffic with the Hash RSS as the only difference.
There are requests to keep the compression session in case of tagged
traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic.
Add 2 new miniCQE formats in order to achieve the best performance
for these traffic patterns: Flow Tag and Packet Header miniCQEs.
The existing rxq_cqe_comp_en devarg is modified to specify the
desired miniCQE format. Specifying 2 selects Flow Tag format
for better compression rate in case of RTE Flow Mark traffic.
Specifying 3 selects Checksum format (existing format for MPRQ).
Specifying 4 selects L3/L4 Header format for better compression
rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
To support multi-thread flow insertion, this patch removes shared data
lock since all resources should support concurrent protection.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The fields ``has_vlan`` and ``has_more_vlan`` were added in rte_flow by
patch [1].
Using these fields, the application can match all the VLAN options by
single flow: any, VLAN only and non-VLAN only.
Add the support for the fields.
By the way, add the support for QinQ packets matching.
VLAN\QinQ limitations are listed in the driver document.
[1] https://patches.dpdk.org/patch/80965/
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Hairpin between two ports will be supported by mlx5 PMD.
The supported scenarios and limitations are listed in "mlx5.rst".
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The buffer split feature is mentioned in the mlx5 PMD
documentation, the limitation is description is added
as well.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
- apply matches to both outer and inner packet headers
during entire offload procedure;
- restore outer header of partially offloaded packet;
- model is implemented as a set of helper functions.
Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions with tunnel.
* offload JUMP action is restricted to steering tunnel rule only.
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Due to PRM requirement, the IPv6 header item 'proto' field, indicating
the next header protocol, should not be set as extension header.
This patch adds the relevant validation, and documents the limitation.
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Since the built driver filenames have changed in DPDK 20.11, we need to
update the driver doc to match.
Most drivers start their section with the driver filename highlighted in
bold, while a number were missing the highlight. When updating the names,
add the markers for bold text to any missing it, so as to have things more
consistent.
Fixes: a20b2c01a7 ("build: standardize component names and defines")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Previously, the Tx timestamp field and flag were registered in testpmd,
as described in mlx5 guide.
For consistency between Rx and Tx timestamps,
managing mbuf registrations inside the driver, as properly documented,
is a simpler expectation.
The only driver to support this feature (mlx5) is updated
as well as the testpmd application.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Make is no longer supported for compiling DPDK, references are now
removed in the documentation.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Add description about the sample flow limitation.
Sample Flow supports in NIC-Rx and E-Switch domains.
Due to Metadata register c0 is deleted while doing the loopback,
so that only support forward the sampling packet into
E-Switch manager port, no additional action support in sample flow.
Add the offloads minimum versions for new sampling feature.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
PRM expose fields "Icmp_header_data" in IPv4 ICMP.
Update ICMP mask parameter with ICMP identifier and sequence number
fields.
ICMP sequence number spec with mask, Icmp_header_data low 16 bits are
set.
ICMP identifier spec with mask, Icmp_header_data high 16 bits are set.
Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
The page "Development Kit Build System" was about make,
so it has been removed. A better help is in the Linux guide
(note: mlx4/mlx5 are supported on Linux only for now).
Fixes: 3cc6ecfdfe ("build: remove makefiles")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Add description about Tx scheduling timestamp upper limit.
If timestamp exceeds the value, it is marked by PMD as being
into "too-distant-future" and not scheduled at all
(is being sent without any wait).
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are some limitations on some NICs (at least on ConnectX-6 Dx
and BlueField 2) with supporting FCS (frame checksum) scattering for
the tunnel decapsulated packets.
For the case only one of the features can be supported in the same time,
and the new devarg "decap_en" is introduced to provide the choice to the
users.
If FCS scattering feature is not supposed to be engaged by application,
this new devarg should be specified as "decap_en=0", forcing the FCS
feature enable and rejecting tunnel decap actions in the rte_flow engine.
If FCS scatter is not needed and application supposes to use tunnel
decapsulation in rte_flow, the devarg can be omitted or set to non-zero
value (this is default settings).
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
As scatter FCS might be not supported for decapsulated tunnel
packets in some NIC HW, a new capability bit which indicates
if scatter FCS works with decap is added.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Currently, for MLX5 PMD, once millions of flows created, the memory
consumption of the flows are also very huge. For the system with limited
memory, it means the system need to reserve most of the memory as huge
page memory to serve the flows in advance. And other normal applications
will have no chance to use this reserved memory any more. While most of
the time, the system will not have lots of flows, the reserved huge
page memory becomes a bit waste of memory at most of the time.
By the new sys_mem_en devarg, once set it to be true, it allows the PMD
allocate the memory from system by default with the new add mlx5 memory
management functions. Only once the MLX5_MEM_RTE flag is set, the memory
will be allocate from rte, otherwise, it allocates memory from system.
So in this case, the system with limited memory no need to reserve most
of the memory for hugepage. Only some needed memory for datapath objects
will be enough to allocated with explicitly flag. Other memory will be
allocated from system. For system with enough memory, no need to care
about the devarg, the memory will always be from rte hugepage.
One restriction is that for DPDK application with multiple PCI devices,
if the sys_mem_en devargs are different between the devices, the
sys_mem_en only gets the value from the first device devargs, and print
out a message to warn that.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Update the release notes of mlx5 PMD part by adding the
support of eCPRI.
Update the firmware configuration in the mlx5 NIC guide to support
the usage of eCPRI.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch introduces the new devargs:
tx_pp - enables accurate packet send scheduling on mbuf timestamps
in the PMD. On the device start if "rte_dynflag_timestamp"
dynamic flag is registered and this devarg non-zero value is
specified, the driver initializes all necessary internal
infrastructure to provide packet scheduling. The parameter
value specifies scheduling granularity in nanoseconds.
tx_skew - the parameter adjusts the send packet scheduling on
timestamps and represents the average delay between beginning
of the transmitting descriptor processing by the hardware and
appearance of actual packet data on the wire. The value should
be provided in nanoseconds and is valid only if tx_pp parameter
is specified. The default value is zero.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The new devarg will control the steering of the lacp traffic.
When setting dv_lacp_by_user = 0 the lacp traffic will be
steered to kernel and managed there.
When setting dv_lacp_by_user = 1 the lacp traffic will
not be steered and the user will need to manage it.
Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Currently, when flow destroyed, some memory resources may still be kept
as cached to help next time create flow more efficiently.
Some system may need the resources to be more flexible with flow create
and destroy. After peak time, with millions of flows destroyed, the
system would prefer the resources to be reclaimed completely, no cache
is needed. Then the resources can be allocated and used by other
components. The system is not so sensitive about the flow insertion
rate, but more care about the resources.
Both DPDK mlx5 PMD driver and the low level component rdma-core have
provided the flow resources to be configured cached or not, but there is
no APIs or parameters exposed to user to configure the flow resources
cache mode. In this case, introduce a new PMD devarg to let user
configure the flow resources cache mode will be helpful.
This commit is to add a new "reclaim_mem_mode" to help user configure if
the destroyed flows' cache resources should be kept or not.
Their will be three mode can be chosen:
1. 0(none). It means the flow resources will be cached as usual. The
resources will be cached, helpful with flow insertion rate.
2. 1(light). It will only enable the DPDK PMD level resources reclaim.
3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be
configured as reclaimed mode.
With these three mode, user can configure the resources cache mode with
different levels.
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Removing the current limitation for TSO over VM
due to the fact that mlx5 currently support it.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Asaf Penso <asafp@mellanox.com>
If running DPDK as non-root, some extra capabilities may be required.
The Mellanox devices, using a bifurcated model with Linux drivers,
have some specific requirements summarized in mlx5 PMD guide.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Raslan Darawsheh <rasland@mellanox.com>
This patch adds to MLX5 PMD the support of matching on
GTP header item v_pt_rsv_flags.
This item is contained in 1 byte of the format:
-------------------------------------------
| bit | 0 - 2 | 3 | 4 | 5 | 6 | 7 |
|-----------------------------------------|
| value | Version | PT | Res | E | S | PN |
-------------------------------------------
Matching is supported only for GTP flags E, S, PN.
Therefore values 0 to 7 are supported.
Mask must be set accordingly:
... gtp v_pt_rsv_flags is 1 v_pt_rsv_flags mask 0x07 ...
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In existing implementation, using wild card VLAN item is not allowed.
A VLAN item in flow pattern must include VLAN ID (vid) value.
This obligation contradict the flow API specification [1].
This patch updates the VLAN item validation and translation, to allow
wild card VLAN item, without VLAN ID value.
User guide and release notes are updated accordingly.
[1]
commit 40513808b165 ("doc: refine ethernet and VLAN flow rule items")
Fixes: 00f75a4057 ("net/mlx5: fix VLAN match for DV mode")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch updates the MLX5 PMD and release notes documentations.
Adding the notes of the behavior change that rte flows organization
is switched into non-cached mode for applications.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
This patch updates the MLX5 PMD and release notes documentations.
Adding the guideline for hairpin data buffer size configuration.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Define a device parameter to configure log 2 of a stride size for MPRQ
- mprq_log_stride_size. User is able to specify a stride size in a range
allowed by an underlying hardware. The default stride size is defined as
2048 bytes to encompass most commonly used packet sizes in the Internet
(MTU 1518 and less) and will be used in case a maximum configured packet
size cannot fit into the largest possible stride size. Otherwise a
stride size is set to a large enough value to encompass a whole packet.
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>