Commit Graph

2531 Commits

Author SHA1 Message Date
Viacheslav Ovsiienko
d15bfd2930 net/mlx5: fix inline length exceeding descriptor limit
The hardware descriptor (WQE) length field is 6 bits wide
and we have the native limitation for the overall descriptor
length. To improve the PCIe bandwidth the packet data can be
inline into descriptor. If PMD was configured to inline large
amount of data it happened there was no enough space remaining
in the descriptor to specify all the packet data segments and
PMD rejected problematic packets.

The patch tries to adjust the inline data length conservatively
and allows to avoid error occurring.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Fixes: e2259f93ef ("net/mlx5: fix Tx when inlining is impossible")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2022-10-02 09:13:52 +02:00
Viacheslav Ovsiienko
166f185fef net/mlx5: fix single not inline packet storing
The mlx5 PMD can inline packet data into transmitting descriptor (WQE)
and free mbuf immediately as data no longer needed, for non-inline
packets the mbuf pointer should be stored in elts array for coming
freeing on send completion. There was an optimization on storing
pointers in batch and there was missed storing mbuf for single
packet if non-inline was explicitly requested by flag.

Fixes: cacb44a099 ("net/mlx5: add no-inline Tx flag")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:52 +02:00
Viacheslav Ovsiienko
37d6fc30c1 net/mlx5: fix check for orphan wait descriptor
The mlx5 PMD supports send scheduling feature, it allows
to send packets at specified moment of time, to do that
PMD pushes special wait descriptor (WQE) to the hardware
queue and then pushes descriptor for packet data as usual.
If queue is close to be full or there is no enough elts
buffers to store mbufs being sent the data descriptors might
be not pushed and the orphan wait WQE (not followed by the
data) might reside in queue on tx_burst routine exit.

To avoid orphan wait WQEs there was the check for enough
free space in the queue WQE buffer and enough amount of the
free elts in queue mbuf storage. This check was incomplete
and did not cover all the cases for Enhanced Multi-Packet
Write descriptors.

Fixes: 2f827f5ea6 ("net/mlx5: support scheduling on send routine template")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:52 +02:00
Bassam Zaid AlKilani
6fa815722e net/mlx5: fix flow matching priority for ESP item
ESP is one of IPSec protocols over both IPv4 and IPv6 and is considered
a tunnel layer that cannot be followed by any other layer. Taking that
into consideration, ESP is considered as a 4 layer.

Not defining ESP's priority will make it match with the same priority as
its prior IP layer, which has a layer 3 priority. This will lead to
issues in matching and will match the packet with the first matching
rule even if it doesn't have an esp layer in its pattern, disregarding
any following rules that could have an esp item and can be actually
a more accurate match since it will have a longer matching criterion.

This is fixed by defining the priority for the ESP item to have a
layer 4 priority, making the match be for the rule with the more
accurate and longer matching criteria.

Fixes: 18ca4a4ec7 ("net/mlx5: support ESP SPI match and RSS hash")
Cc: stable@dpdk.org

Signed-off-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
2022-10-02 09:13:51 +02:00
Michael Baum
593f913a8e net/mlx5: fix LRO requirements check
One of the conditions to allow LRO offload is the DV configuration.

The function incorrectly checks the DV configuration before initializing
it by the user devarg; hence, LRO cannot be allowed.

This patch moves this check to mlx5_shared_dev_ctx_args_config, where DV
configuration is initialized.

Fixes: c4b8620135 ("net/mlx5: refactor to detect operation by DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reported-by: Gal Shalom <galshalom@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:51 +02:00
Long Li
bc5d8fdb70 net/mlx5: fix Verbs FD leak in secondary process
FDs passed from rte_mp_msg are duplicated to the secondary process and
need to be closed.

Fixes: 9a8ab29b84 ("net/mlx5: replace IPC socket with EAL API")
Cc: stable@dpdk.org

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:50 +02:00
Ivan Malov
5e3779b7ab ethdev: remove deprecated flow item physical port
Such deprecation was commenced in DPDK 21.11.
Since then, no parties have objected. Remove.

The patch breaks ABI.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
2022-09-27 10:26:51 +02:00
David Marchand
a04322f616 bus: hide bus object
Make rte_bus opaque for non internal users.
This will make extending this object possible without breaking the ABI.

Introduce a new driver header and move rte_bus definition and helpers.
Update drivers and library to use the internal header.

Some applications may have been dereferencing rte_bus objects, mark
this object's accessors as stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-09-23 16:14:34 +02:00
David Marchand
1f37cb2bb4 bus/pci: make driver-only headers private
The pci bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.

While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
  inclusion,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2022-09-23 16:14:34 +02:00
David Marchand
b3f89090d6 bus/auxiliary: make driver-only headers private
The auxiliary bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.

While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
  inclusion,

Signed-off-by: David Marchand <david.marchand@redhat.com>
2022-09-23 16:14:34 +02:00
Matan Azrad
60b254e392 net/mlx5: fix Rx queue recovery mechanism
The local variables are getting inconsistent in data receiving routines
after queue error recovery.
Receive queue consumer index is getting wrong, need to reset one to the
size of the queue (as RQ was fully replenished in recovery procedure).

In MPRQ case, also the local consumed strd variable should be reset.

CVE-2022-28199
Fixes: 88c0733535 ("net/mlx5: extend Rx completion with error handling")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
2022-08-29 12:53:49 +02:00
David Marchand
72206323a5 version: 22.11-rc0
Start a new release cycle with empty release notes.

The ABI version becomes 23.0.
The map files are updated to the new ABI major number (23).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.
Special handling of removed drivers is also dropped in check-abi.sh and
a note has been added in libabigail.abignore as a reminder.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2022-07-21 12:13:48 +02:00
Raja Zidane
5ddb903824 net/mlx5: reject negative integrity item configuration
Negative integrity item refers to condition when the item value mask
is set, but value spec is cleared:
    ... integrity value mask l4_ok value spec 0 ...

ethdev library defines integrity bits `l3_ok` and `l4_ok` as accumulators
for all hardware L3 and L4 integrity verifications respectfully.
Hardware `l3_ok` and `l4_ok` integrity bits refer to L3 and L4
network headers only.
Integrity bits `l3_ok` and `l4_ok` are not compatible between
ethdev library and hardware.

PMD translations for ethdev `l3_ok` are:
 IPv4: `l3_ok` and `l3_csum_ok`
 IPv6: `l3_ok`
ethdev `l4_ok` is translated into PMD `l4_ok` and `l4_csum_ok` bits.

Positive IPv4 `l3_ok` flow item configuration is translated into
a single matcher that AND corresponding hardware bits.
Negative IPv4 `l3_ok` is translated into 2 hardware conditions where
each condition probes a single integrity bit:
  ethdev::l3_ok is 0 => MLX5::l3_ok is 0 OR MLX5:l3_csum_ok is 0
MLX5 hardware does not do OR condition in flow rule item.
Negative IPv4 `l3_ok` must be translated into 2 flow rules.
Similarly negative ethdev `l4_ok` condition is also translated into 2
hardware rules.

Current PMD roadmap does not allow implicit flow rule split.

Bugzilla ID: 948
Cc: stable@dpdk.org

Suggested-by: Raja Zidane <rzidane@nvidia.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-07-05 20:04:02 +02:00
Michael Baum
740a28366c net/mlx5: add test for external Rx queue
Add mlx5 internal test for map and unmap external RxQs.
This patch adds to testpmd app a runtime function to test the mapping
API.

  testpmd> mlx5 port (port_id) ext_rxq map (sw_queue_id) (hw_queue_id)
  testpmd> mlx5 port (port_id) ext_rxq unmap (sw_queue_id)

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-07-05 20:02:57 +02:00
Michael Baum
85d9252e55 net/mlx5: add test for remote PD and CTX
Add mlx5 internal option in testpmd similar to run-time function
"port attach" which adds another parameter named "socket" for attaching
port and add 2 devargs before.

The arguments are "cmd_fd" and "pd_handle" using to import device
created out of PMD. Testpmd application import it using IPC, and updates
the devargs list before attaching.

These arguments were added in
the commit 9d936f4f1a ("common/mlx5: support remote PD and CTX")

The syntax is:

  testpmd> mlx5 port attach (identifier) socket=(path)

Where "path" is the IPC socket path agreed on the remote process.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-07-05 20:01:33 +02:00
Yunjian Wang
a73b78554a net/mlx5: fix stack buffer overflow in drop action
The mlx5_drop_action_create function use mlx5_malloc for allocating
'hrxq', but don't allocate for 'rss_key'. This is wrong and it can
cause buffer overflow.

Detected with address sanitizer:
0 (/usr/lib64/libasan.so.4+0x7b8e2)
1 in mlx5_devx_tir_attr_set ../drivers/net/mlx5/mlx5_devx.c:765
2 in mlx5_devx_hrxq_new ../drivers/net/mlx5/mlx5_devx.c:800
3 in mlx5_devx_drop_action_create ../drivers/net/mlx5/mlx5_devx.c:1051
4 in mlx5_drop_action_create ../drivers/net/mlx5/mlx5_rxq.c:2846
5 in mlx5_dev_spawn ../drivers/net/mlx5/linux/mlx5_os.c:1743
6 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2501
7 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2647
8 in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2722
9 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:657
10 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:711
11 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:150
12 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:269
13 in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:353
14 in pci_probe ../drivers/bus/pci/pci_common.c:380
15 in rte_bus_probe ../lib/eal/common/eal_common_bus.c:72
16 in rte_eal_init ../lib/eal/linux/eal.c:1286
17 in main ../app/test-pmd/testpmd.c:4112

Fixes: 0c762e81da ("net/mlx5: share Rx queue drop action code")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:25:07 +02:00
Shun Hao
92b3c68e9f net/mlx5: fix metering on E-Switch Manager
When meter is used by E-Switch Manager port, there's an error that
cannot get correct port ID.

This patch fixes this by using specific parsing process to get port
ID for E-Switch Manager.

Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:06 +02:00
Shun Hao
68e9925c30 net/mlx5: add limitation for E-Switch Manager match
For BF with old FW which doesn't expose the E-Switch Manager vport ID,
E-Switch Manager port matching works correctly only when BF is in
embedded CPU mode.

This patch adds the limitation description.

Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:06 +02:00
Gregory Etelson
57d6a458a8 net/mlx5: fix RSS expansion for patterns with ICMP item
MLX5 PMD RSS expansion implementation added L4 UDP or TCP
headers after ICMP.
For example:
ETH / IPv4 / ICMP expanded into  ETH / IPv4 / ICMP / {UDP | TCP}
ETH / IPv6 / ICMPv6 expanded into  ETH / IPv6 / ICMPv6 / {UDP | TCP}

The patch updates PMD expansion scheme to handle ICMP and ICMPv6 types
as non-expandable for RSS.

Fixes: c7870bfe09 ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:05 +02:00
Spike Du
f41a5092e6 app/testpmd: add host shaper command
Add command line options to support host shaper configure.
- Command syntax:
  mlx5 set port <port_id> host_shaper avail_thresh_triggered <0|1> rate
<rate_num>

- Example commands:
To enable avail_thresh_triggered on port 1 and disable current host
shaper:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 1 rate 0

To disable avail_thresh_triggered and current host shaper on port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 0

The rate unit is 100Mbps.
To disable avail_thresh_triggered and configure a shaper of 5Gbps on
port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 50

Add sample code to handle rxq available descriptor threshold event, it
delays a while so that rxq empties, then disables host shaper and
rearms available descriptor threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:04 +02:00
Spike Du
2235fcda12 net/mlx5: add API to configure host port shaper
Host port shaper can be configured with QSHR (QoS Shaper Host Register).
Add check in build files to enable this function or not.

The host shaper configuration affects all the ethdev ports belonging to the
same host port.

Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives available descriptor
threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:04 +02:00
Spike Du
5c9f3294e6 net/mlx5: support Rx descriptor threshold event
Add mlx5 specific available descriptor threshold configuration
and query handler.
In mlx5 PMD, available descriptor threshold is also called
LWM (limit watermark).
While the Rx queue fullness reaches the LWM limit, the driver catches
an HW event and invokes the user callback.
The query handler finds the next Rx queue with pending LWM event
if any, starting from the given Rx queue index.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:02 +02:00
Spike Du
25025da3c5 net/mlx5: handle Rx descriptor LWM event
When LWM meets RQ WQE, the kernel driver raises an event to SW.
Use devx event_channel to catch this and to notify the user.
Allocate this channel per shared device.
The channel has a cookie that informs the specific event port and queue.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:00 +02:00
Spike Du
72d7efe464 common/mlx5: share interrupt management
There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:24:59 +02:00
Spike Du
7158e46cb9 net/mlx5: support descriptor LWM for Rx queue
Add LWM (Limit WaterMark) field to Rxq object which indicates the percentage
of Rx queue size used by HW to raise descriptor event to the user.
Allow LWM setting in modify_rq command.
Allow the LWM configuration dynamically by adding RDY2RDY state change.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:23:29 +02:00
Ali Alnubani
bae645a23a net/mlx5: fix build with clang 14
Use fgets instead of fscanf to resolve the following warning
reported by clang 14.0.0 in Fedora 37 (Rawhide):

drivers/net/mlx5/linux/mlx5_ethdev_os.c:1137:52: error:
  'fscanf' may overflow; destination buffer in argument 3 has size 16,
  but the corresponding specifier may require size 17
  [-Werror,-Wfortify-source]
  ret = fscanf(file, "%" RTE_STR(IF_NAMESIZE) "s", port_name);

Fixes: 63d1db710f ("net/mlx5: fix unlimited parsing of switch info")
Cc: stable@dpdk.org

Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:28 +02:00
Sean Zhang
6431068d0f net/mlx5: support field modification in meter rules
This patch introduces MODIFY_FIELD action support in meter. User can
create meter policy with MODIFY_FIELD action in green/yellow action.

For example:

testpmd> add port meter policy 0 21 g_actions modify_field op set
	dst_type ipv4_ecn src_type value src_value 3 width 2 / ...

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:26 +02:00
Sean Zhang
76d5756122 net/mlx5: support modifying ECN field
This patch is to support modify ECN field in IPv4/IPv6 header.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:25 +02:00
Sean Zhang
e8146c63c3 net/mlx5: support represented port item in flow rules
Add support for represented_port item in pattern. And if the spec and mask
both are NULL, translate function will not add source vport to matcher.

For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
will redirect packets from VF0 and VF1 to wire:
testpmd> flow create 0 ingress transfer group 0 pattern eth /
represented_port / end actions represented_port ethdev_id is 0 / end

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:23 +02:00
Raja Zidane
fb96caa56a net/mlx5: support ESP item on Windows
ESP item is not supported on Windows, yet it is expanded from the
expansion graph when trying to create default flow to RSS all packets.

Support ESP item match (without ability to match on SPI field on Windows).
Split ESP validation per OS.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-05 17:04:48 +02:00
Michael Baum
2192599c75 net/mlx5: fix entry size in construct data ipool
The mlx5_action_construct_data structure memory is managed by ipool
named acts_ipool.

The size of one entry in this ipool is mistakenly defined as size of
rte_flow_hw structure.
This size is used to reset in the allocated part. When the size is
incorrect it resets memory that does not belong to it.

This patch defines the correct size.

Fixes: f13fab2392 ("net/mlx5: add flow jump action")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-05 17:04:46 +02:00
Geoffrey Le Gourriérec
eadc35df59 net/mlx5: fix statistics read on Linux
This patch encompasses a few fixes carried by a previous patch
that aimed to support bonding device stats counting.

- If mlx5_os_read_dev_stat fails, it returns 1 instead of a
  negative value, causing mlx5_xstats_get to return an invalid
  number of counters. Since this error is not blocking, do not
  mess ret value with mlx5_os_read_dev_stat returned value.

  This allows avoiding the very annoying log:
  "n_xstats != n_xstats_names => skipping"

- Invert the check for mlx5_os_read_dev_stat(), currently leading
  us to store the result if the function failed, and use a
  backup value if it succeeded, which is the opposite of what we
  actually want. Revert to the original (correct) test.

- Add missing test on _mlx5_os_read_dev_counters() to prevent
  using trash stats values.

Fixes: 7ed15acdcd ("net/mlx5: improve xstats of bonding port")
Cc: stable@dpdk.org

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Geoffrey Le Gourriérec <geoffrey.le_gourrierec@6wind.com>
Tested-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-02 17:01:11 +02:00
Rongwei Liu
2bd03a4361 net/mlx5: add Rx drop counters to xstats
Add two kinds of Rx drop counters to DPDK xstats which are
physical port scope.

1. rx_prio[0-7]_buf_discard
   The number of unicast packets dropped due to lack of shared
   buffer resources.
2. rx_prio[0-7]_cong_discard
   The number of packets that is dropped by the Weighted Random
   Early Detection (WRED) function.

Prio[0-7] is determined by VLAN PCP value which is 0 by default.
Both counters are retrieved from kernel ethtool API which calls
PRM command finally.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-01 09:49:44 +02:00
Raja Zidane
1485d961e2 net/mlx5: fix Tx recovery
When an error occurs in Tx, and it is moved to ERROR state, it
is not recoverable, during recovery it's state cannot be modified
to INIT. to modify state from RESET to INIT, the port must be
passed in modify attributes, and in case of ERROR to READY
modification path, it was not provided.

Provide port number when changing state from RESET to INIT.

Fixes: 3a87b964ed ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2022-06-01 09:49:42 +02:00
Shun Hao
96ca87da4f net/mlx5: validate yellow meter action
Yellow meter action support is added in meter hierarchy validation.
If one color uses meter action, the other can only use NULL action
or the same meter action. And only shared meter is supported.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:41 +02:00
Shun Hao
3dc7afa2fa net/mlx5: support yellow meter action for hierarchy tag rule
When a hierarchy meter is shared by other ports, it's needed to iterate
all meter policies in hierarchy to create tag rules, to set packet with
next meter ID, which will be used by related meter drop count.
This patch adds the tag rule for yellow support in hierarchy, so both
green/yellow policy flows can set the correct meter ID.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:38 +02:00
Shun Hao
bf62fb7693 net/mlx5: support yellow meter action in hierarchy
This patch adds the support of meter action for yellow meter policy
flow, so can use meter action for both green and yellow policy flows
in meter hierarchy.
Currently must use the same meter within one meter policy. Packets
passing green/yellow policy flow will have previous meter color of
green/yellow in subsequent meter.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:36 +02:00
Shun Hao
6b838de3d5 net/mlx5: support previous meter color aware
This patch adds the support for previous color aware for meter.
Start_color setting is set to UNDEFINED when creating meter object that
is color aware.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:30 +02:00
Thomas Monjalon
64fcadeac0 avoid AltiVec keyword vector
The AltiVec header file is defining "vector", except in C++ build.
The keyword "vector" may conflict easily.
As a rule, it is better to use the alternative keyword "__vector",
so we will be able to #undef vector after including AltiVec header.

Later it may become possible to #undef vector in rte_altivec.h
with a compatibility breakage.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
2022-05-25 11:49:39 +02:00
Raja Zidane
18ca4a4ec7 net/mlx5: support ESP SPI match and RSS hash
In packets with ESP header, the inner IP will be encrypted, and
its fields cannot be used for RSS hashing. So, ESP packets
can be hashed only by the outer IP layer.
So, when using RSS on ESP packets, hashing may not be efficient,
because the fields used by the hash functions are only the outer IPs,
causing all traffic belonging to all tunnels between a given
pair of GWs to land on one core.
Adding the SPI hash field can extend the spreading of IPsec packets.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-05-15 09:38:59 +02:00
Shun Hao
fc7211097d net/mlx5: fix no-green metering with RSS
When a meter with RSS action being used, there might be several
sub-flows using different sub-policies in the flow splitting stage.
If there's no green action, there's an error that will always use the
same sub-policy for all sub-flows, some resources will be
overwritten and cannot be released, leading assert during port close.

This patch fixes this issue by checking both green and yellow queue
index during getting a blank sub-policy, to avoid the incorrect
resource overwrite.

Fixes: b38a12272b ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-05-11 09:45:23 +02:00
Michael Baum
7a99336865 net/mlx5: fix LRO configuration in drop Rx queue
The driver wrongly set the LRO configurations to the TIR of the DevX
drop queue even when LRO is not supported.
Actually, the LRO configuration is not relevant to the drop queue at
all.

This causes failure in the initialization of the device, which doesn't
support LRO where the drop queue is created.

Probably, the drop queue creation by DevX missed the fact that LRO is
set by default in the TIR creation function and didn't unset it in the
drop queue case like other cases that unset LRO.

Move the default LRO configuration to unset it and set it only in the
case of all the TIR queues configured with LRO.

Fixes: bc5bee028e ("net/mlx5: create drop queue using DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-26 11:52:18 +02:00
Michael Baum
a213b86821 net/mlx5: fix LRO validation in Rx setup
The mlx5_rx_queue_setup() get LRO offload from user.

When LRO is configured, the LRO flag in rxq_data is set to 1.

This patch adds validation to make sure the LRO is supported.

Fixes: 17ed314 ("net/mlx5: allow LRO per Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-26 11:52:18 +02:00
Dariusz Sosnowski
d2fa2632a4 net/mlx5: fix RSS hash types adjustment
When an indirect action was created with an RSS action configured to
hash on both source and destination L3 addresses (or L4 ports), it caused
shared hrxq to be configured to hash only on destination address
(or port).

This patch fixes this behavior by refining RSS types specified in
configuration before calculating hash types used for hrxq. Refining RSS
types removes *_SRC_ONLY and *_DST_ONLY flags if they are both set.

Fixes: 212d17b6a6 ("net/mlx5: fix missing shared RSS hash types")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-26 11:51:56 +02:00
Raja Zidane
773a7de21a net/mlx5: fix Rx/Tx stats concurrency
Queue statistics are being continuously updated in Rx/Tx burst
routines while handling traffic. In addition to that, statistics
can be reset (written with zeroes) on statistics reset in other
threads, causing a race condition, which in turn could result in
wrong stats.

The patch provides an approach with reference values, allowing
the actual counters to be writable within Rx/Tx burst threads
only, and updating reference values on stats reset.

Fixes: 87011737b7 ("mlx5: add software counters")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:50:26 +02:00
Dariusz Sosnowski
26f22fa64e net/mlx5: fix GTP handling in header modify action
GTP items were ignored during conversion of modify header actions. This
caused modify TTL action to generate a wrong modify header command when
tunnel and inner headers used different IP versions.

This patch adds GTP item handling to modify header action conversion.

Fixes: 04233f36c7 ("net/mlx5: fix layer type in header modify action")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:47:44 +02:00
Adham Masarwah
cb91f12f4a net/mlx5: support MTU settings on Windows
Mlx5Devx library has new API's for setting and getting MTU.
Added new glue functions that wrap the new mlx5devx lib API's.
Implemented the os_ethdev callbacks to use the new glue
functions in Windows.

Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:43 +02:00
Adham Masarwah
3014718fd2 net/mlx5: support promiscuous modes on Windows
Support of the set promiscuous modes by calling the new API
In Mlx5DevX Lib.
Added new glue API for Windows which will be used to communicate
with Windows driver to enable/disable PROMISC or ALLMC.

Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:42 +02:00
Michael Baum
d37b0b4d77 net/mlx5: remove redundant check for hairpin queue
The mlx5_rxq_is_hairpin() function checks whether RxQ type is Hairpin.
It is done by reading a flag in Rx control structure coming from
mlx5_rxq_ctrl_get() function.

The function verifies that the queue index is valid even though it has
been checked within the mlx5_rxq_ctrl_get() function.

This patch removes the redundant check.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:41 +02:00
Michael Baum
1573b07284 net/mlx5: restrict Rx queue array access to boundary
The mlx5_rxq_get() function gets RxQ index and return RxQ priv
accordingly.

When it gets an invalid index, it accesses out of array bounds which
might cause undefined behavior.

This patch adds a check for invalid indexes before accessing to array.

Fixes: 0cedf34da7 ("net/mlx5: move Rx queue reference count")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:41 +02:00
Michael Baum
1b0037b1cb net/mlx5: fix setting flags to external Rx queue
The flow_drv_rxq_flags_set sets the Rx queue flags (Mark/Flag and Tunnel
Ptypes) according to the device flow.

It tries to get the RxQ control structure to update its ptype. However,
external RxQs don't have control structure to update and it may cause a
crash.

This patch add check whether this Queue is external.

Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:40 +02:00
Shun Hao
4fa1452bca net/mlx5: fix counter in non-termination meter
In rte_flow, if a counter action is before a meter which has
non-termination policy, the counter value only includes packets not
being dropped.

This patch fixes this issue by differentiating the order of counter and
non-termination meter:
1. counter + meter, counts all packets hitting this flow.
2. meter + counter, only counts packets not being dropped.

Fixes: 51ec04dc7b ("net/mlx5: connect meter policy to created flows")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:39 +02:00
Rongwei Liu
f956d3d4c3 net/mlx5: fix probing with secondary bonding member
Users can probe primary or secondary PCIe id when bonding is
configured.
1. -a 0a:00.0,representor=pf[0-1]vf[0-1], PMD probes 5 ports
totally: bonding device plus 4 representor ports.
2. -a 0a:00.1,representor=pf[0-1]vf[0-1], PMD only probes 2
representor ports.

Under the 2nd condition, bonding IB device doesn't have the same
PCIe id and PMD needs to check bonding relationship otherwise
probe failure.

Fixes: 6856efa54e ("net/mlx5: fix PF leak on PCI probing failure")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:47:39 +02:00
Dmitry Kozlyuk
e2259f93ef net/mlx5: fix Tx when inlining is impossible
When txq_inline_max is too large and an mbuf is multi-segment
it may be impossible to inline data and build a valid WQE,
because WQE length would be larger then HW can represent.
It is impossible to detect misconfiguration at startup,
because the condition depends on the mbuf composition.
The check on the data path to prevent the error
treated the length limit as expressed in 64B units,
while the calculated length and limit are in 16B units.
Fix the condition to avoid subsequent TxQ failure and recovery.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:47:38 +02:00
Alexander Kozyrev
3a29cb3a73 net/mlx5: handle MPRQ incompatibility with external buffers
Multi-Packet Rx queue uses PMD-managed buffers to store packets.
These buffers are externally attached to user mbufs.
This conflicts with the feature that allows using user-managed
externally attached buffers in an application.
Fall back to SPRQ in case external buffers mempool is configured.
The limitation is already documented in mlx5 guide.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:47:18 +02:00
Thinh Tran
9011af71bb net/mlx5: fix CPU socket ID for Rx queue creation
The default CPU socket ID was used while creating the Rx queue and this caused
creation failure in case if hardware was not resided on the default socket.

The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID.

Fixes: bc5bee028e ("net/mlx5: create drop queue using DevX")
Cc: stable@dpdk.org

Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-14 15:30:01 +01:00
Jiawei Wang
04c0d3f20f net/mlx5: fix port matching in sample flow rule
If there are an explicit port match and sample action in the same flow,
mlx5 PMD pushes the explicit port match in the prefix subflow, and
uses the tag item match in the suffix subflow.

The explicit port match was translated into source vport match so
the sample suffix subflow lost this match after flow split.

This patch copies the explicit port match to the sample suffix subflow,
and the latter gets the correct source vport value in the flow matcher.

Fixes: b4c0ddbfcc ("net/mlx5: split sample flow into two sub-flows")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-14 15:29:57 +01:00
Jiawei Wang
0f845cc726 net/mlx5: fix implicit tag insertion with sample action
A flow rule with sample action was split into two sub-flows,
and the implicit tag action with unique id was added in the prefix
sub-flow, the suffix sub-flow used the tag item to match with that
unique id, and the implicit set tag action was inserted next to
the sample action.

While there's either PUSH VLAN action or ENCAP action preceding the
sample action, implicit set tag action was added after PUSH VLAN or
ENCAP actions, causing flow creation failure due to rdma-core
does not support this action order.

This patch ensures the implicit set tag action is inserted before
either PUSH VLAN or encap action (if any) in the prefix sub-flow.

Fixes: 6a951567c1 ("net/mlx5: support E-Switch mirroring and jump in one flow")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-10 09:29:05 +01:00
Rongwei Liu
4a9e5c999c net/mlx5: forbid multiple ASO actions in a single rule
For now, only one ASO action is supported in a single flow rule.
Flow rule with more than one ASO action should be rejected in the
validation stage.

Flow rule with action non-shared AGE and COUNT together should be
treated as non-ASO because AGE will fall back to use HW counter,
not ASO hit object.

Group 0 will use HW counter for AGE action even if no COUNT action.

This commit will reject patterns (no matter which group if transfer)
like:
1. group 1 pattern... / end actions age / meter / end
2. group 1 pattern... / end actions conntrack / meter / end
3. group 1 pattern... / end actions age / conntrack... / end

If AGE comes together with COUNT in the above patterns, it's allowed.

Fixes: daed4b6e ("net/mlx5: use aging by counter when counter exists")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-09 13:32:04 +01:00
Jiawei Wang
9a726360dd net/mlx5: fix sample flow action on trusted device
A flow rule with sample action will be split into two sub flows,
and a tag action was added implicitly in the sample prefix sub flow,
the reserved metadata regC index was used for this tag action.

The reserved metadata regC was shared with metering action,
for ConnectX-5 trusted device (VF/SF), the reserved metadata regC was
invalid since PF only supported the legacy metering.

This patch adds the checking for the tag index and back to use the
application tag if a failure happened.

Fixes: a9b6ea45be ("net/mlx5: fix tag ID conflict with sample action")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-09 13:31:06 +01:00
Dariusz Sosnowski
7c0c63c9a5 net/mlx5: fix VLAN push action validation
Flow domain and direction was validated when OF_PUSH_VLAN action
appears in flow actions. Flow was rejected whenever this action:

- was used in NIC domain, in ingress direction;
- was used in FDB domain, in ingress direction, on ConnectX-5.

This validation logic rejected a valid case when the OF_PUSH_VLAN
action was used when directing traffic to the hairpin queue,
configured in TX implicit mode.

This patch moves code responsible for OF_PUSH_VLAN validation of
domain and direction from flow_dv_validate_push_vlan() to
flow_dv_validate(). Domain and direction are now validated when either
non-hairpin queue is used or hairpin queue is configured in Tx explicit
mode.

Fixes: 96f85ec489 ("net/mlx5: check VLAN push/pop support")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-09 13:30:51 +01:00
Rongwei Liu
c3130c78e8 net/mlx5: fix meter creation default state
Disable means there is no packet drop in the meter. Meter is
active always but programmed with another CIR/CBS value.

If the user wants to disable the meter in creation, PMD calls
the disable() API manually after meter initialized.

Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:49:51 +01:00
Bing Zhao
3ef18940ef net/mlx5: fix configuration without Rx queue
None Rx queue configured in a DPDK application should be supported.
In this mode, the NIC can be used to generate packets without
receiving any ingress traffic.

In the current implementation, once there is no Rx queue specified,
the array to store the queues' pointers is NULL after allocation.
Then the checking of the array allocation prevents the application
from starting up.

By adding another condition checking of the Rx queue number, the
application with none Rx queue can start up successfully.

Fixes: 4cda06c3c3 ("net/mlx5: split Rx queue into shareable and private")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-07 11:49:43 +01:00
Michael Baum
72d836b300 net/mlx5: fix E-Switch DV flow disabling
E-Switch DV flow is supported only when DV flow is supported and
enabled.

The mlx5_shared_dev_ctx_args_config() function ensures that when the
environment does not support DV, the "dv_esw_en" flag is turned off.
However, when the environment is supportive but the user has requested
to disable it, the "dv_esw_en" flag remains on and causes the PMD to try
to create an E-Switch through the Verbs engine.

This patch adds check to ensure that "dv_esw_en" flag will be turned off
when DV flow is disabled.

Fixes: a13ec19c19 ("net/mlx5: add shared device context config structure")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:49:29 +01:00
Dariusz Sosnowski
98008ce6ec net/mlx5: fix MPLS/GRE Verbs spec ordering
When using Verbs flow engine to create flows, GRE Verbs spec was put at
the end of specs list. This created problems for flows matching MPLSoGRE
packets. In generated specs list MPLS spec was put before GRE spec, but
Verbs API requires that MPLS spec must be put in its exact location in
protocol stack.

This patch fixes this behavior. Space for GRE Verbs spec is reserved at
its exact location. MPLS Verbs is inserted at its exact location as
well. GRE spec is filled after all flow items are parsed.

Fixes: 985b479267 ("net/mlx5: fix GRE protocol type translation for Verbs")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-07 11:49:10 +01:00
Gregory Etelson
71adf25dbf net/mlx5: fix flex item availability
Flex item availability is restricted to BlueField-2 and BlueField-3
PF ports.

The patch validates port type compliance before proceeding to
flex item creation.

Fixes: db25cadc08 ("net/mlx5: add flex item operations")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-07 11:43:12 +01:00
Shun Hao
9267617bb0 net/mlx5: fix meter policy creation assert
The meter policy creation doesn't belong to flow rule creation
process, so thread workspace was not initialized and there will be
assert error when using it.

This patch removes the incorrect using of thread workspace in meter
policy creation, and adds a flag in policy instead. When creating
flow rule, can use the flag to set the mark flag in thread workspace.

Fixes: 082becbf1f ("net/mlx5: fix mark enabling for Rx")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:11 +01:00
Bing Zhao
cff6aad7af net/mlx5: remove unused reference counter
In the previous implementation, a count was used to record the number
of the references to a table resource, including the creation of the
table, the jumping to the table and the matchers created on the
table. Before releasing the table resource via the driver, it needed
to ensure that there is no reference to this table.

After the optimization of the resources management, the reference
count now is in the hash list entry as a unified solution for all the
resources management.

There is no need to keep the "refcnt" in the table resource
structure. It is removed in case that there is some unnecessary
memory overhead.

Fixes: afd7a62514 ("net/mlx5: make flow table cache thread safe")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:10 +01:00
Dmitry Kozlyuk
ea7cc15ab9 net/mlx5: fix modify port action validation
Certain flow rules containing a modify header action for an L4 port
could be erroneously rejected as invalid, because this action
was counted as consuming two HW actions, while it only requires one.

Fixes: 72a944dba1 ("net/mlx5: fix header modify action validation")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:10 +01:00
Michael Baum
4dcf29a8bc net/mlx5: fix external Rx queue referencing
When an indirection table object is modified, it updates the reference
counter for each RX queue related to it.

The reference counter for regular queues are indeed updated. However,
the reference counter for external RxQs are not.

This patch adds updating for external RxQs too.

Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:09 +01:00
Michael Baum
f6f1195cd6 net/mlx5: fix external Rx queue dereferencing
When an indirection table is destroyed, each Rx queue related to it,
should be dereferenced.

The regular queues are indeed dereferenced.
However, the external RxQs are not.

This patch adds dereferencing for external RxQs too.

Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:08 +01:00
Jiawei Wang
6d4f1066be net/mlx5: fix NIC egress flow mismatch in switchdev mode
When E-Switch mode was enabled, the NIC egress flows was implicitly
appended with source vport to match on. If the metadata register C0
was used to maintain the source vport, it was initialized to zero
on packet steering engine entry, the flow could be hit only
if source vport was zero, the register C0 of the packet was not correct
to match in the TX side, this caused egress flow misses.

This patch:
 - removes the implicit source vport match for NIC egress flow.
 - rejects the NIC egress flows on the representor ports at validation.
 - allows the internal NIC egress flows containing the TX_QUEUE items in
   order to not impact hairpins.

Fixes: ce777b147b ("net/mlx5: fix E-Switch flow without port item")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2022-03-07 11:43:08 +01:00
Rongwei Liu
e1786fd53d net/mlx5: fix shared RSS destroy
When both shared and non-shared RSS actions are present in single
flow rule shared RSS index is unset by mistake.

For example:
1. flow indirect_action 0 create action_id 3 ingress action RSS ...
2. set sample_actions 0 mark id 43690 / queue index 0 / end
3. flow create 0 ingress group 107 pattern eth / sample ratio 2
   index 0  / indirect 3 / end

PMD translates the indirect action to a shared RSS description at first.
In the split prefix flow, RSS->shared_RSS is unset when translating
sample queue action, the subfix flow will treat the RSS as non-shared.

Fixes: 8e61555657 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-07 11:43:07 +01:00
Gregory Etelson
6ae5c23837 net/mlx5: fix next protocol RSS expansion
RSS expansion scheme has 2 operational modes: default and specific.
The default mode expands into all valid options for a given network
layer. For example, Ethernet expands by default into VLAN, IPv4 and
IPv6, L3 expands into TCP and UDP, etc.
The specific mode expands according to flow item next protocol
configuration provided by the item spec and mask parameters.
There are 3 outcomes for the specific expansion:
1. Back to default – that is the case when result of (spec & mask)
   allows all possibilities.
   For example: eth type mask 0 type spec 0
2. No results – in that case item configuration has no valid expansion.
   For example: eth type mask 0xffff type spec 101
3. Direct - In that case flow item mask and spec configuration return
   valid expansion  option.
   Example: eth type mask 0x0fff type spec 0x0800.

Current PMD expands flow items with explicit spec and mask
configuration into the Direct(3) or No results (2). Default expansions
were handled as No results.

Fixes: f3f1f576f4 ("net/mlx5: fix RSS expansion with explicit next protocol")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:06 +01:00
Gregory Etelson
d16ec6841b net/mlx5: fix inet IPIP protocol type
Fix typo in INET IPIP protocol macro.

Fixes: f3f1f576f4 ("net/mlx5: fix RSS expansion with explicit next protocol")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-07 11:43:03 +01:00
Gregory Etelson
7bda5beead net/mlx5: fix flex item header length translation
Flex item API provides support for network header with a fixed and
variable lengths.
When PMD compiles a new flex item object configuration it converts
RTE parameters into matching PMD PARSE_GRAPH parameters and checks
the parameter values against port capabilities.

Current implementation mismatched PARSE_GRAPH configuration fields
for the fixed size header.

Fixes: b293e8e49d ("net/mlx5: translate flex item configuration")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-02 17:36:47 +01:00
Bing Zhao
dfb8c448da net/mlx5: fix matcher priority with ICMP or ICMPv6
On TCP/IP-based layered network, ICMP is considered and implemented
as part of layer 3 IP protocol. Actually, it is a user of the IP
protocol and must be encapsulated within IP packets. There is no
layer 4 protocol over ICMP.

The rule with layer 4 should be matched prior to the rule only with
layer 3 pattern when:
  1. Both rules are created in the same table
  2. Both rules could be hit
  3. The rules has the same priority

The steering result of the packet is indeterministic if there are
rules with patterns IP and IP+ICMP in the same table with the same
priority. Like TCP / UDP, a packet should hit the rule with a longer
matching criterion.

By treating the priority of ICMP/ICMPv6 as a layer 4 priority in the
PMD internally, the IP+ICMP will be hit in prior to IP only.

Fixes: d53aa89aea ("net/mlx5: support matching on ICMP/ICMP6")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-02 17:36:38 +01:00
Gregory Etelson
cfe337e715 net/mlx5: reduce flex item flow handle size
Reduce flex item flow handle size from 32 bits to 8 bits for each
flow.
The patch will save memory in setups with millions of flows.

Fixes: a23e9b6e3e ("net/mlx5: handle flex item in flows")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-02 11:24:36 +01:00
Gregory Etelson
df4986559f net/mlx5: fix GRE item translation in Verbs
GRE item translation must set inner protocol value.
For that reason the item is not translated inplace when PMD
translation iterates over flow items, but moved after the loop, when
all inner types are discovered.

If PMD does not translate GRE flow item inside the translation loop
it must save the GRE item for access outside the loop.

Fixes: 985b479267 ("net/mlx5: fix GRE protocol type translation for Verbs")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-02 11:24:20 +01:00
Michael Baum
57461cab16 net/mlx5: fix check in count action validation
The AGE action can be implemented by either counters or ASO mechanism.
ASO is more efficient than generating counters just for the purpose of
aging, so when ASO is supported its use is preferable. On the other
hand, when there is count in the list of actions, the counter is already
generated, and it is best to use it for aging even if ASO is supported.
On the other hand, when the count action is "indirect", it cannot be
used for aging since it may be updated from other flow rules in which it
participates.

Checking whether ASO is supported depends on both the capability of the
device and the flow rule group number, ASO is not supported for group 0.
However, the flow_dv_validate() function only checks the capability and
ignores the group, allowing inadmissible flow rules.
For example, when the device supports ASO and a flow rule is set that
combines an indirect counter with aging for group 0, the rule should be
rejected, but it is created and does not function properly.

This patch updates the counter validation which will also consider the
group number when deciding if there is ASO support.

Fixes: daed4b6e3d ("net/mlx5: use aging by counter when counter exists")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-01 22:24:53 +01:00
Michael Baum
8f53c7cf26 net/mlx5: fix shared counter flag in flow validation
The AGE action can be implemented by either counters or ASO mechanism.
When user ask count action in the flow rule, AGE action is implemented
by the same counter. However, if user ask indirect count action, it
cannot be used for AGE.

The flow_dv_validate() function has a flag named "shared_count" which
indicates whether AGE action validate depends on ASO support or not.
This flag is initialized to false and is updated if there is indirect
count action in the action list.
This flag is mistakenly set within the loop that reads the action list
and in each iteration it is reinitialized to false, regardless of the
existence of an indirect count action in the list.

This patch moves the flag initialization out of the loop.

Fixes: f3191849f2 ("net/mlx5: support flow count action handle")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-03-01 22:24:52 +01:00
Adham Masarwah
dc065d6efd net/mlx5: fix destroying empty matchers list
The table remove callback function is trying to destroy the
matchers list associated with table entries without checking
if the list is valid, which causes null pointer dereference.
Fixed by validating the matchers list before destroying it.

Issue can be reproduced with testpmd on Windows, when you run:
port close all

Fixes: 1872635570 ("net/mlx5: make matcher list thread safe")
Cc: stable@dpdk.org

Signed-off-by: Adham Masarwah <adham@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2022-03-01 22:24:36 +01:00
Suanming Mou
7bc528bac9 net/mlx5: fix indexed pool fetch overlap
For indexed pool with local cache, when a new trunk is allocated,
half of the trunk's index was fetched to the local cache. In case
of local cache size was less then half of the trunk size, memory
overlap happened.

This commit adds the check of the fetch size, if local cache size
is less than fetch size, adjust the fetch size to be local cache
size.

Fixes: d15c0946be ("net/mlx5: add indexed pool local cache")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-01 22:24:22 +01:00
Dmitry Kozlyuk
655c3c26c1 net/mlx5: fix initial link status detection
Link status change takes time that depends on the HW and the kernel.
It was checked immediately after the change was issued at probing.
If the port had been down before probing, a "down" state may be read,
while the port would be "up" imminently.
After that, DPDK reported the port as "down" mistakenly
and "ifconfig $DEV up" did not trigger an LSC event,
because from the system's perspective the port was "up" already.

Install Netlink event handler at port probe before requesting the port
to come up in order to receive LSC event even if it comes up
between probe and start.

Fixes: a85a606ca5 ("net/mlx5: fix link status initialization")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-01 16:54:07 +01:00
Dmitry Kozlyuk
17f95513ad net/mlx5: fix link status change detection
Sometimes net/mlx5 devices did not detect link status change to "up".

Each shared device was monitoring IBV_EVENT_PORT_{ACTIVE,ERR}
and queried the link status upon receiving the event.
IBV_EVENT_PORT_ACTIVE is delivered when the logical link status
(UP flag) is set, but the physical link status (RUNNING flag)
may be down at that time, in which case the new link status
would be erroneously considered down.

IBV interface is insufficient for the task.
Monitor interface events using Netlink.

Fixes: 198a3c339a ("mlx5: handle link status interrupts")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-01 16:54:07 +01:00
Dmitry Kozlyuk
be66461cba common/mlx5: add Netlink event helpers
Introduce mlx5_nl_read_events() to read Netlink events
(technically, messages) from a socket that was configured
to listen for them via a new mlx5_nl_init() parameter.
Add mlx5_nl_parse_link_status_update() helper
to extract information from link-related events.
This patch is a shared base for later fixes.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-03-01 16:54:03 +01:00
Stephen Hemminger
68eb9a1945 remove extra blank line at EOF
These source files all had unnecessary blank line at end of file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-27 21:26:06 +01:00
Michael Baum
311b17e669 net/mlx5: support queue/RSS actions for external Rx queue
Add support queue/RSS action for external Rx queue.
In indirection table creation, the queue index will be taken from
mapping array.

This feature supports neither LRO nor Hairpin.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-25 17:33:31 +01:00
Michael Baum
80f872ee02 net/mlx5: add external Rx queue mapping API
External queue is a queue that has been created and managed outside the
PMD. The queues owner might use PMD to generate flow rules using these
external queues.

When the queue is created in hardware it is given an ID represented by
32 bits. In contrast, the index of the queues in PMD is represented by
16 bits. To enable the use of PMD to generate flow rules, the queue
owner must provide a mapping between the HW index and a 16-bit index
corresponding to the ethdev API.

This patch adds an API enabling to insert/cancel a mapping between HW
queue id and ethdev queue id.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-25 17:33:31 +01:00
Michael Baum
c06f77aece net/mlx5: optimize queue type checks
The RxQ/TxQ control structure has a field named type. This type is enum
with values for standard and hairpin.
The use of this field is to check whether the queue is of the hairpin
type or standard.

This patch replaces it with a boolean variable that saves whether it is
a hairpin.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-25 17:33:31 +01:00
Sean Zhang
5c4d491791 net/mlx5: support matching GRE optional fields
This patch adds matching on the optional fields (checksum/key/sequence)
of GRE header. The matching on checksum and sequence fields requests
support from rdma-core with the capability of misc5 and tunnel_header 0-3.

For patterns without checksum and sequence specified, keep using misc for
matching as before, but for patterns with checksum or sequence, validate
capability first and then use misc5 for the matching.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-25 16:34:08 +01:00
Suanming Mou
fe3620aaba net/mlx5: add header reformat HW steering action
HW steering header reformat action can work under bulk mode. In
this case, when create the table, bulk size of header reformat
actions will be allocated in low level. Afterwards, when create
flow, just simply specify the action index in the bulk and the
encapsulation data to the action will be enough.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:23 +01:00
Suanming Mou
7ab3962d2d net/mlx5: add indirect HW steering action
HW steering can support indirect action as well. With indirect action,
the flow can be created with more flexible shared RSS action selection.
This will can save the action template with different RSS actions.

This commit adds the flow queue operation callback for:
rte_flow_async_action_handle_create();
rte_flow_async_action_handle_destroy();
rte_flow_async_action_handle_update();

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:23 +01:00
Suanming Mou
1deadfd709 net/mlx5: add HW mark action
The mark action is covered by tag action internally. While it is added
the HW will add a tag to the packet. The mark value can be set as fixed
or dynamic as the action mask indicates.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:22 +01:00
Suanming Mou
3a2f674b6a net/mlx5: add queue and RSS HW steering action
This commit adds the queue and RSS action. Similar to the jump action,
dynamic ones will be added to the action construct list.

Due to the queue and RSS action in template should not be destroyed
during port restart, the actions are created with standalone indirect
table as indirect action does. When port stops, detaches the indirect
table from action, when port starts, attaches the indirect table back
to the action.

One more change is made to accelerate the action creation. Currently
the mlx5_hrxq_get() function returns the object index instead of object
pointer. This introduced an extra converting the index to the object by
calling mlx5_ipool_get() in most of the case. And that extra converting
hurts multi-thread performance since mlx5_ipool_get() uses the global
lock inside. As the hash Rx queue object itself also contains the index,
returns the object directly will achieve better performance without the
global lock.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:22 +01:00
Suanming Mou
f13fab2392 net/mlx5: add flow jump action
Jump action connects different level of flow tables and allows packet
handling in the chain of flows.

A new action construct data struct is also added in this commit to help
to handle not only the dynamic jump action but also for the other
generic dynamic actions. The actions with empty mask configuration means
dynamic action, and the dedicated action will be created with the flow
action configuration during flow creation. In that dynamic action case,
the action will be appended to the table template's action list during
table creation.
When creating the flows, traverse the action list and pick the dynamic
action configuration details from flow actions as the action construct
data struct describes, then create the dedicated dynamic actions.

This commit adds the jump action and the generic dynamic action
construct mechanism.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:21 +01:00
Suanming Mou
08dfff78f2 net/mlx5: add flow flush
In case port is being stopped, all created flows should be flushed.
This commit adds the flow flush helper function.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:20 +01:00
Suanming Mou
c40c061a02 net/mlx5: add basic flow queue operation
The HW steering uses async queue-based flow rules management
mechanism. The matcher and part of the actions have been
prepared during flow table creation. Some remaining actions
will be constructed during flow creation if needed.

A flow postpone attribute bit describes if flow management
should be applied to the HW directly. An extra push function
is provided to force push all the cached flows to the HW.

Once the flow has been applied to the HW, the pull function
will be called to get the queued creation/destruction flows.

The DR rule flow memory is represented in PMD layer instead
of allocating from HW steering layer. While destroying the
flow, the flow rule memory can only be freed after the CQE
received.

The HW queue job descriptor is currently introduced to convey
the flow information and operation type between the flow
insertion/destruction in the pull function.

This commit adds the basic flow queue operation for:
rte_flow_async_create();
rte_flow_async_destroy();
rte_flow_push();
rte_flow_pull();

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:20 +01:00
Suanming Mou
d1559d66ed net/mlx5: add table management
Flow table is a group of flows with the same matching criteria
and the same actions defined for them. The table defines rules
that have the same matching fields but with different matching
values. For example, matching on 5 tuple, the table will be
(IPv4 source + IPv4 dest + s_port + d_port + next_proto)
while the values for each rule will be different.

The templates' relevant matching criteria and action instances
will be created in the table creation and saved in the table.
As table attributes indicate the supported flow number, the flow
memory will also be allocated at the same time.

This commit adds the table management functions.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:19 +01:00
Suanming Mou
836b5c9b5e net/mlx5: add action template management
The action template holds a list of action types that will be
used together on the same rule. The template's actions instances
will be created only when the template bind to the dedicated
group. And the created actions will be saved to each individual
group in order for best performance. The actions in a group will
not be shared with each other unless shared actions are specified.

This commit adds the action template management which stores the
flow action template.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:18 +01:00
Suanming Mou
42431df924 net/mlx5: add pattern template management
The pattern template defines flows that have the same matching
fields but with different matching values.
For example, matching on 5 tuple TCP flow, the template will be
(eth(null) + IPv4(source + dest) + TCP(s_port + d_port) while
the values for each rule will be different.

Due to the pattern template can be used in different domains, the
items will only be cached in pattern template create stage, while
the template is bound to a dedicated table, the HW criteria will
be created and saved to the table. The pattern templates can be
used by multiple tables. But different tables create the same
criteria and will not share the matcher between each other in order
to have better performance.

This commit adds pattern template management.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:18 +01:00
Suanming Mou
b401400db2 net/mlx5: add port flow configuration
The hardware steering is backend to support rte_flow_async API in
mlx5 PMD. The port configuration function creates the queues and
needed flow management resources.

The PMD layer configuration function allocates the queues' context
and per-queue job descriptor pool. The job descriptor pool size
is equal to the queue size, and the job descriptors will be popped
from pool with LIFO strategy to convey the flow information during
flow insertion/destruction. Then, while polling the queued operation
result, the flow information will be extracted from the job descriptor
and the descriptor will be pushed back to the LIFO pool.

The commit creates the flow port queues and the job descriptor pools.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:17 +01:00
Suanming Mou
d84c3cf766 net/mlx5: introduce hardware steering enable routine
The new hardware steering engine relies on using dedicated steering WQEs
instead of writing to the low-level steering table entries directly.
In the first implementation the hardware steering engine supports the
new queue based Flow API, the existing synchronous non-queue based Flow
API is not supported.

A new dv_flow_en value 2 is added to manage mlx5 PMD steering engine:

dv_flow_en	rte_flow API	rte_flow_async API
------------------------------------------------
 0		support		not support
 1		support		not support
 2		not support	support

This commit introduces the extra dv_flow_en = 2 to specify the new
flow initialize and manage operation routine.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:17 +01:00
Suanming Mou
5cadd74fbc net/mlx5: add HW steering low-level abstract stub
The HW steering low-level implementation will be added later in another
patch series. To avoid the linkage issues the abstract stub replacement
is provided currently.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:16 +01:00
Suanming Mou
2b6791506e net/mlx5: introduce hardware steering operation
The Connect-X steering is a lookup hardware mechanism that accesses flow
tables, matches packets to the rules, and performs specified actions.
Historically, mlx5 PMD implements several software engines to manage
steering hardware facility:

   - FW Steering - Verbs/Direct Verbs, uses FW calls to manage flows
   - SW Steering - DevX/mlx5dv, uses WQEs to access table memory directly

However, there are still some disadvantages:

   - performance is limited, we should invoke firmware either to
     manage the entire flow, or to handle some internal steering objects

   - organizing and preparing flow infrastructure (actions, matchers,
     groups, etc.) on the flow inserting is sure to cause slow flow
     insertion

   - security, exposing the low-level steering entries directly to the
     userspace may cause security risks

A new hardware WQE based steering operation with codename "HW Steering"
is going to be introduced to get rid of the security risks. And it will
take advantage of the recently new introduced async queue-based rte_flow
APIs to prepare everything in advance to achieve high insertion rate.

In this new HW steering engine, the original SW steering rte_flow API
will not be supported in the first implementation, only the new async
queue-based flow operations is going to be supported. A new steering
mode parameter for dv_flow_en will be introduced and user will be
able to engage the new steering engine.

This commit adds the basic driver operation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:15 +01:00
Viacheslav Ovsiienko
49e8797619 net/mlx5: support wait on time in Tx
The hardware since ConnectX-7 supports waiting on
specified moment of time with new introduced wait
descriptor. A timestamp can be directly placed
into descriptor and pushed to sending queue.
Once hardware encounter the wait descriptor the
queue operation is suspended till specified moment
of time. This patch update the Tx datapath to handle
this new hardware wait capability.

PMD documentation and release notes updated accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 13:46:57 +01:00
Viacheslav Ovsiienko
2f5122dfc4 net/mlx5: configure Tx queue with send on time offload
The wait on time configuration flag is copied to the Tx queue
structure due to performance considerations. Timestamp
mask is prepared and stored in queue structure as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 13:46:56 +01:00
Michael Baum
a6b9d5a538 common/mlx5: update doorbell mapping parameter name
The "tx_db_nc" devarg forces doorbell register mapping to non-cached
region eliminating the extra write memory barrier. This argument was
used in creating the UAR for Tx and thus affected its performance.

Recently [1] its use has been extended to all UAR creation in all mlx5
drivers, and now its name is no longer so accurate.

This patch changes its name to "sq_db_nc" to suit any send queue that
uses it. The old name will still work for backward compatibility.

[1] commit 5dfa003db5 ("common/mlx5: fix post doorbell barrier")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:43 +01:00
Shun Hao
38eb5c9f35 net/mlx5: fix E-Switch manager vport ID
One of the E-Switch vports plays the special role - it is assigned as
"E-Switch manager" and has some special exclusive rights and duties - it
maintains all the representors, manages FDB domain flows, etc. By
default, the E-Switch vport index was supposed to be zero on standalone
NICs (regular ConnectX) and 0xFFFE SmartNIC (BlueField), but that was
not always correct - this index can be assigned with any value by
kernel/hypervisor.

Currently the E-Switch manager vport id is supposed to be default - 0
for standalone NICs, and 0xFFFE for the SmartNICs, and is deduced from
the device PCI id.

To handle this and do not suggest any default values, can use DevX API
to query E-Switch manager vport ID directly from the firmware during
initialization, and use that value by default. If the new method is not
provided (legacy firmware), fallback to use the PCI id approach.

Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:39 +01:00
Michael Baum
f685878a56 net/mlx5: optimize Rx queue creation
Recently shared RxQ has been introduced. All shared Rx queues with same
group and queue ID share the same rxq_ctrl, but each one has
mlx5_rxq_priv structure.
The mlx5_rx_queue_setup generates a new rxq_priv structure, and looks
for a rxq_ctrl structure to refer to. If there is already a compatible
rxq_ctrl structure it refers it, otherwise it calls the mlx5_rxq_new
function that generates a new one.

This patch makes mlx5_rxq_new function "standalone", it generates a
rxq_ctrl structure regardless to specific rxq_priv structure. All
operations on the rxq_ctrl structure that depend on the new rxq_priv
structure are performed in the mlx5_rx_queue_setup function, at the same
place for either a new rxq_ctrl structure or an existing rxq_ctrl
structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:36 +01:00
Michael Baum
0ad12a8090 net/mlx5: fix entry in shared Rx queues list
The mlx5_rxq_new function creates control structure and if it from
shared group, it is inserted into the shared RXQs list.

After that, there are some validations which in case they fail, RxQ
control object is released.
In these cases, invalid pointer to the object still in the list, and
access it may cause a crash.

Move the list insertion to the end of the function where the RxQ control
object is surely valid.

Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:35 +01:00
Haifei Luo
9b57df5575 net/mlx5: refactor getting counter action pointer
Previously API flow_dv_query_count_ptr is defined to get counter's
action pointer. This DV function is directly called and the better
way is by the callback.

Add one arg in API mlx5_counter_query and the related callback
counter_query. The added arg is for counter's action pointer.

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:34 +01:00
Shun Hao
94112badf6 net/mlx5: fix meter sub-policy creation
If meter policy action was RSS, the correct items were not provided
for sub-policy creation.

This fixes the issue by providing original items in meter split, so
the sub-policy creation gets the correct items.

Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:34 +01:00
Suanming Mou
ad98ff6c50 net/mlx5: remove unused function
The mlx5_l3t_prepare_entry() function is not used anymore.
This commit removes the unused mlx5_l3t_prepare_entry() function.

Fixes: 92ef4b8f16 ("ethdev: remove deprecated shared counter attribute")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:33 +01:00
Suanming Mou
0c5d3e4cdf net/mlx5: set flow error for hash list create
While mlx5_hlist_create() failed, the rte_flow_error was not filled
with the corresponding error information.

This commit adds the missing rte_flow_error_set() for the failure case.

Fixes: f3020a331d ("net/mlx5: optimize hash list table allocate on demand")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:32 +01:00
Michael Baum
a729d2f093 common/mlx5: refactor devargs management
Improve the devargs handling in two aspects:
 - Parse the devargs string only once.
 - Return error and report for unknown keys.

The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:56 +01:00
Michael Baum
45a6df804a net/mlx5: separate per port configuration
Add configuration structure for port (ethdev). This structure contains
all configurations coming from devargs which oriented to port. It is a
field of mlx5_priv structure, and is updated in spawn function for each
port.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:54 +01:00
Michael Baum
c4b8620135 net/mlx5: refactor to detect operation by DevX
Add inline function indicating whether HW objects operations can be
created by DevX. It makes the code more readable.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:53 +01:00
Michael Baum
a13ec19c19 net/mlx5: add shared device context config structure
Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:52 +01:00
Michael Baum
87af0d1e1b net/mlx5: concentrate all device configurations
Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:51 +01:00
Michael Baum
91d1cfafc9 net/mlx5: rearrange device attribute structure
Rearrange the mlx5_os_get_dev_attr() function in such a way that it
first executes the queries and only then updates the fields.
In addition, it changed its name in preparation for expanding its
operations to configure the capabilities inside it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:50 +01:00
Michael Baum
cf004fd33a net/mlx5: add E-Switch mode flag
This patch adds in SH structure a flag which indicates whether is
E-Switch mode.
When configure "dv_esw_en" from devargs, it is enabled only when is
E-switch mode. So, since dv_esw_en has been configure, it is enough to
check if "dv_esw_en" is valid.
This patch also removes E-Switch mode check when "dv_esw_en" is checked
too.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:49 +01:00
Michael Baum
cf8971db65 net/mlx5: share counter config function
The mlx5_flow_counter_mode_config function exists for both Linux and
Windows with the same name and content.
This patch moves its implementation to the folder shared between the
operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:48 +01:00
Michael Baum
e3032e9c73 net/mlx5: share realtime timestamp configure
The realtime timestamp configure work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:47 +01:00
Michael Baum
c4c3e8afef common/mlx5: share VF check function
The check if device is VF work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:46 +01:00
Michael Baum
8f46481001 net/mlx5: remove Verbs query device duplication
The sharing device context structure has a field named "device_attr"
which is filled by mlx5_os_get_dev_attr() function.
The spawn function calls mlx5_os_get_dev_attr() again and save it to
local variable identical to "device_attr" field.

There is no need for this duplication, because there is a reference to
the sharing device context structure from spawn function.

This patch removes the local "device_attr" from spawn function, and uses
the context's field instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:45 +01:00
Michael Baum
6dc0cbc6c6 net/mlx5: remove DevX flag duplication
The sharing device context structure has a field named "devx" which
indicates if DevX is supported.
The common configure structure has also field named "devx" with the same
meaning.

There is no need for this duplication, because there is a reference to
the common structure from within the sharing device context structure.

This patch removes it from sharing device context structure and uses the
common config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:44 +01:00
Michael Baum
538205614f net/mlx5: remove HCA attribute structure duplication
The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.

There is no need for this duplication, because there is a reference to
the common structure from within the net structures.

This patch removes it from net configure structure and uses the common
config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:43 +01:00
Michael Baum
cfe0639b30 net/mlx5: remove redundant check of devargs
The device arguments are parsed and updated twice during spawning. First
time before creating the share device context, and again later after
updating a default value to one of the arguments.

This patch consolidates them into one parsing and updates the default
values before it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:42 +01:00
Michael Baum
dec50e58f7 net/mlx5: remove declaration duplications
In mlx5_ethdev.c file are implemented those 4 functions:
 - mlx5_dev_infos_get
 - mlx5_fw_version_get
 - mlx5_dev_set_mtu
 - mlx5_hairpin_cap_get

In mlx5.h file they are declared twice. First time under mlx5.c file and
second time under mlx5_ethdev.c file.

This patch removes the redundant declaration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:41 +01:00
Michael Baum
6be4c57add net/mlx5: fix errno update in shared context creation
The mlx5_alloc_shared_dev_ctx() function has a local variable named
"err" which contains the errno value in case of failure.

When functions called by this function are failed, this variable is
updated with their return value (that should be a positive errno value).
However, some functions doesn't update errno value by themselves or
return negative errno value. If one of them fails, the "err" variable
contains negative value what cause to assertion failure.

This patch updates all functions uses by mlx5_alloc_shared_dev_ctx()
function to update rte_errno and take this value instead of "err" value.

Fixes: 5dfa003db5 ("common/mlx5: fix post doorbell barrier")
Fixes: 5d55a494f4 ("net/mlx5: split multi-thread flow handling per OS")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:40 +01:00
Michael Baum
ce12974cce net/mlx5: fix ASO CT object release
The ASO connection tracking structure is initialized once for sharing
device context.

Its release takes place in the close function which is called for each
ethdev individually. i.e. when there is more than one ethdev under the
same sharing device context, it will be destroyed when one of them is
closed. If the other wants to use it later, it may cause it to crash.

In addition, the creation of this structure is performed in the spawn
function. If one of the creations of the objects following it fails, it
is supposed to be destroyed but this does not happen.

This patch moves its release to the sharing device context free function
and thus solves both problems.

Fixes: 0af8a2298a ("net/mlx5: release connection tracking management")
Fixes: ee9e5fad03 ("net/mlx5: initialize connection tracking management")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:39 +01:00
Michael Baum
ad9d0c6395 net/mlx5: fix ineffective metadata argument adjustment
In "dv_xmeta_en" devarg there is an option of dv_xmeta_en=3 which
engages tunnel offload mode. In E-Switch configuration, that mode
implicitly activates dv_xmeta_en=1.

The update according to E-switch support is done immediately after the
first parsing of the devargs, but there is another adjustment later.

This patch moves the adjustment after the second parsing.

Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:38 +01:00
Michael Baum
dcbaafdc8f net/mlx5: fix sibling device config check
The MLX5 net driver supports "probe again". In probing again, it
creates a new ethdev under an existing infiniband device context.

Sibling devices sharing infiniband device context should have compatible
configurations, so some of the devargs given in the probe again, the
ones that are mainly relevant to the sharing device context are sent to
the mlx5_dev_check_sibling_config function which makes sure that they
compatible its siblings.
However, the arguments are adjusted according to the capability of the
device, and the function compares the arguments of the probe again
before the adjustment with the arguments of the siblings after the
adjustment. A user who sends the same values to all siblings may fail in
this comparison if he requested something that the device does not
support and adjusted.

This patch moves the call to the mlx5_dev_check_sibling_config function
after the relevant adjustments.

Fixes: 92d5dd4834 ("net/mlx5: check sibling device configurations mismatch")
Fixes: 2d241515eb ("net/mlx5: add devarg for extensive metadata support")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:32 +01:00
Ferruh Yigit
a41f593f1b ethdev: introduce generic dummy packet burst function
Multiple PMDs have dummy/noop Rx/Tx packet burst functions.

These dummy functions are very simple, introduce a common function in
the ethdev and update drivers to use it instead of each driver having
its own functions.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2022-02-11 21:17:34 +01:00
Dariusz Sosnowski
864678e420 net/mlx5: fix inline length for multi-segment TSO
This patch removes a redundant assert in mlx5_tx_packet_multi_tso().
That assert assured that the amount of bytes requested to be inlined
is greater than or equal to the minimum amount of bytes required
to be inlined. This requirement is either derived from the NIC
inlining mode or configured through devargs. When using TSO this
requirement can be disregarded, because on all NICs it is satisfied by
TSO inlining requirements, since TSO requires L2, L3, and L4 headers to
be inlined.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:34 +01:00
Alexander Kozyrev
eb11edd9db net/mlx5: fix meter capabilities reporting
Meter capabilities reporting is not up to date.
Mellanox NICs support RFC2698 and RFC4115 as well as RFC2697.
Add these marker operations to the capabilities list.

Fixes: 6bc327b94f ("net/mlx5: fill meter capabilities using DevX")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:33 +01:00
Alexander Kozyrev
21fdeab422 net/mlx5: fix committed bucket size
Committed Bucket Size calculation tries to fit into 8-bit wide
mantissa field by setting 256 as a maximum value for it.
To compensate for this increase in the mantissa value the exponent
value has to be reduced by 8. But it gives a negative exponent
value for CBS less than 128. And negative exponent value is not
supported by the NIC. Adjust CSB calculation only for values
bigger than 128 to allow both small and big bucket sizes.

Fixes: 3bd26b23ce ("net/mlx5: support meter profile operations")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:32 +01:00
Viacheslav Ovsiienko
7cad3dc312 net/mlx5: remove unused metadata shift parameter
Due to updated modify field action immediate value buffer
pattern [1] the implicit shift for the metadata is not
needed anymore and should be removed.

[1] commit 40c8fb1fd3 ("net/mlx5: update modify field action")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:31 +01:00
Viacheslav Ovsiienko
ced4900cde net/mlx5: fix metadata endianness in modify field action
As modify field action immediate source parameter the metadata
should follow the CPU endianness (according to SET_META action
structure format), and mlx5 PMD wrongly handled the immediate
parameter metadata buffer as big-endian, resulting in wrong
metadata set action with incorrect endianness.

Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:28 +01:00
Stephen Hemminger
06c047b680 remove unnecessary null checks
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.

Remove redundant NULL pointer checks before free functions
found by nullfree.cocci

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-12 12:07:48 +01:00
Elena Agostini
8e83ba285a net/mlx5: add C++ include guard to public header
The support for linking rte_pmd_mlx5.h functions with
C++ applications was missing.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-03 09:19:26 +01:00
Xiaoyu Min
87b26522f7 net/mlx5: reject jump to root table
Currently root table as destination is not supported.
The jump action which finally be translated to underlying root table in
rdma-core should be rejected.

Fixes: f78f747f41 ("net/mlx5: allow jump to group lower than current")
Cc: stable@dpdk.org

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-26 17:41:12 +01:00
Raja Zidane
082becbf1f net/mlx5: fix mark enabling for Rx
To optimize datapath, the mlx5 pmd checked for mark action on flow
creation, and flagged possible destination rxqs (through queue/RSS
actions), then it enabled the mark action logic only for flagged rxqs.

Mark action didn't work if no queue/rss action was in the same flow,
even when the user use multi-group logic to manage the flows.
So, if mark action is performed in group X and the packet is moved to
group Y > X when the packet is forwarded to Rx queues, SW did not get
the mark ID to the mbuf.

Flag Rx datapath to report mark action for any queue when the driver
detects the first mark action after dev_start operation.

Fixes: 8e61555657 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-26 17:41:11 +01:00
Alexander Kozyrev
728b6447e7 net/mlx5: fix MPRQ WQE size assertion
Preparation of the stride size and the number of strides for
Multi-Packet RQ was updated recently to accommodate the hardware
limitation about minimum WQE size. The wrong assertion was
introduced to ensure this limitation is met. Assert that the
configured WQE size is not less than the minimum supported size.

Fixes: 34776af600 ("net/mlx5: fix MPRQ stride devargs adjustment")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-18 09:30:25 +01:00
Alexander Kozyrev
9701034faa net/mlx5: fix maximum packet headers size for TSO
The maximum packet headers size for TSO is calculated as a sum of
Ethernet, VLAN, IPv6 and TCP headers (plus inner headers).
The rationale  behind choosing IPv6 and TCP is their headers
are bigger than IPv4 and UDP respectively, giving us the maximum
possible headers size. But it is not true for L3 headers.
IPv4 header size (20 bytes) is smaller than IPv6 header size
(40 bytes) only in the default case. There are up to 10
optional header fields called Options in case IHL > 5.
This means that the maximum size of the IPv4 header is 60 bytes.

Choosing the wrong maximum packets headers size causes inability
to transmit multi-segment TSO packets with IPv4 Options present.
PMD check that it is possible to inline all the packet headers
and the packet headers size exceeds the expected maximum size.
The maximum packet headers size was set to 192 bytes before,
but its value has been reduced during Tx path refactor activity.
Restore the proper maximum packet headers size for TSO.

Fixes: 50724e1bba ("net/mlx5: update Tx definitions")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-16 13:06:39 +01:00
Dmitry Kozlyuk
637582afcc net/mlx5: relax headroom assertion
A debug assertion in Single-Packet Receive Queue (SPRQ) mode
required all Rx mbufs to have a 128 byte headroom,
based on the assumption that rte_pktmbuf_init() sets it.
However, rte_pktmbuf_init() may set a smaller headroom
if the dataroom is insufficient, e.g. this is a natural case
for split buffer segments. The headroom can also be larger.
Only check the headroom size when vectored Rx routines
are used because they rely on it. Relax the assertion
to require sufficient headroom size, not an exact one.

Fixes: a0a45e8af7 ("net/mlx5: configure Rx queue for buffer split")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-09 13:06:38 +01:00
Dmitry Kozlyuk
be8cda4932 net/mlx5: fix GCC uninitialized variable warning
When building with -Db_sanitize=thread, GCC gives a warning:

drivers/net/mlx5/mlx5_flow_meter.c: In function ‘mlx5_flow_meter_create’:
drivers/net/mlx5/mlx5_flow_meter.c:1170:33: warning: ‘legacy_fm’ may be
    used uninitialized in this function [-Wmaybe-uninitialized]

This is a false-positive: legacy_fm is initialized and used
if and only if priv->sh->meter_aso_en is false.
Work around this by initializing legacy_fm to NULL.
Add an assertion before legacy_fm use in case the logic changes.

Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:09:00 +01:00
Tal Shnaiderman
e50fe91ae3 net/mlx5: support imissed counter on Windows
Add support for the imissed counter using the DevX API on Windows.

imissed is queried by creating a queue counter for the port, attaching
it to all created RQs and querying the "out_of_buffer" field.

If the counter cannot be created, imissed will always report 0.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:07:59 +01:00
Gregory Etelson
985b479267 net/mlx5: fix GRE protocol type translation for Verbs
When application creates several flows to match on GRE tunnel without
explicitly specifying GRE protocol type value in flow rules, PMD will
translate that to zero mask.
RDMA-CORE cannot distinguish between different inner flow types and
produces identical matchers for each zero mask.

The patch extracts inner header type from flow rule and forces it in
GRE protocol type, if application did not specify any.

Fixes: 84c406e745 ("net/mlx5: add flow translate function")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:07:49 +01:00