Commit Graph

2531 Commits

Author SHA1 Message Date
Suanming Mou
42431df924 net/mlx5: add pattern template management
The pattern template defines flows that have the same matching
fields but with different matching values.
For example, matching on 5 tuple TCP flow, the template will be
(eth(null) + IPv4(source + dest) + TCP(s_port + d_port) while
the values for each rule will be different.

Due to the pattern template can be used in different domains, the
items will only be cached in pattern template create stage, while
the template is bound to a dedicated table, the HW criteria will
be created and saved to the table. The pattern templates can be
used by multiple tables. But different tables create the same
criteria and will not share the matcher between each other in order
to have better performance.

This commit adds pattern template management.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:18 +01:00
Suanming Mou
b401400db2 net/mlx5: add port flow configuration
The hardware steering is backend to support rte_flow_async API in
mlx5 PMD. The port configuration function creates the queues and
needed flow management resources.

The PMD layer configuration function allocates the queues' context
and per-queue job descriptor pool. The job descriptor pool size
is equal to the queue size, and the job descriptors will be popped
from pool with LIFO strategy to convey the flow information during
flow insertion/destruction. Then, while polling the queued operation
result, the flow information will be extracted from the job descriptor
and the descriptor will be pushed back to the LIFO pool.

The commit creates the flow port queues and the job descriptor pools.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:17 +01:00
Suanming Mou
d84c3cf766 net/mlx5: introduce hardware steering enable routine
The new hardware steering engine relies on using dedicated steering WQEs
instead of writing to the low-level steering table entries directly.
In the first implementation the hardware steering engine supports the
new queue based Flow API, the existing synchronous non-queue based Flow
API is not supported.

A new dv_flow_en value 2 is added to manage mlx5 PMD steering engine:

dv_flow_en	rte_flow API	rte_flow_async API
------------------------------------------------
 0		support		not support
 1		support		not support
 2		not support	support

This commit introduces the extra dv_flow_en = 2 to specify the new
flow initialize and manage operation routine.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:17 +01:00
Suanming Mou
5cadd74fbc net/mlx5: add HW steering low-level abstract stub
The HW steering low-level implementation will be added later in another
patch series. To avoid the linkage issues the abstract stub replacement
is provided currently.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:16 +01:00
Suanming Mou
2b6791506e net/mlx5: introduce hardware steering operation
The Connect-X steering is a lookup hardware mechanism that accesses flow
tables, matches packets to the rules, and performs specified actions.
Historically, mlx5 PMD implements several software engines to manage
steering hardware facility:

   - FW Steering - Verbs/Direct Verbs, uses FW calls to manage flows
   - SW Steering - DevX/mlx5dv, uses WQEs to access table memory directly

However, there are still some disadvantages:

   - performance is limited, we should invoke firmware either to
     manage the entire flow, or to handle some internal steering objects

   - organizing and preparing flow infrastructure (actions, matchers,
     groups, etc.) on the flow inserting is sure to cause slow flow
     insertion

   - security, exposing the low-level steering entries directly to the
     userspace may cause security risks

A new hardware WQE based steering operation with codename "HW Steering"
is going to be introduced to get rid of the security risks. And it will
take advantage of the recently new introduced async queue-based rte_flow
APIs to prepare everything in advance to achieve high insertion rate.

In this new HW steering engine, the original SW steering rte_flow API
will not be supported in the first implementation, only the new async
queue-based flow operations is going to be supported. A new steering
mode parameter for dv_flow_en will be introduced and user will be
able to engage the new steering engine.

This commit adds the basic driver operation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 22:10:15 +01:00
Viacheslav Ovsiienko
49e8797619 net/mlx5: support wait on time in Tx
The hardware since ConnectX-7 supports waiting on
specified moment of time with new introduced wait
descriptor. A timestamp can be directly placed
into descriptor and pushed to sending queue.
Once hardware encounter the wait descriptor the
queue operation is suspended till specified moment
of time. This patch update the Tx datapath to handle
this new hardware wait capability.

PMD documentation and release notes updated accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 13:46:57 +01:00
Viacheslav Ovsiienko
2f5122dfc4 net/mlx5: configure Tx queue with send on time offload
The wait on time configuration flag is copied to the Tx queue
structure due to performance considerations. Timestamp
mask is prepared and stored in queue structure as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-24 13:46:56 +01:00
Michael Baum
a6b9d5a538 common/mlx5: update doorbell mapping parameter name
The "tx_db_nc" devarg forces doorbell register mapping to non-cached
region eliminating the extra write memory barrier. This argument was
used in creating the UAR for Tx and thus affected its performance.

Recently [1] its use has been extended to all UAR creation in all mlx5
drivers, and now its name is no longer so accurate.

This patch changes its name to "sq_db_nc" to suit any send queue that
uses it. The old name will still work for backward compatibility.

[1] commit 5dfa003db5 ("common/mlx5: fix post doorbell barrier")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:43 +01:00
Shun Hao
38eb5c9f35 net/mlx5: fix E-Switch manager vport ID
One of the E-Switch vports plays the special role - it is assigned as
"E-Switch manager" and has some special exclusive rights and duties - it
maintains all the representors, manages FDB domain flows, etc. By
default, the E-Switch vport index was supposed to be zero on standalone
NICs (regular ConnectX) and 0xFFFE SmartNIC (BlueField), but that was
not always correct - this index can be assigned with any value by
kernel/hypervisor.

Currently the E-Switch manager vport id is supposed to be default - 0
for standalone NICs, and 0xFFFE for the SmartNICs, and is deduced from
the device PCI id.

To handle this and do not suggest any default values, can use DevX API
to query E-Switch manager vport ID directly from the firmware during
initialization, and use that value by default. If the new method is not
provided (legacy firmware), fallback to use the PCI id approach.

Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:39 +01:00
Michael Baum
f685878a56 net/mlx5: optimize Rx queue creation
Recently shared RxQ has been introduced. All shared Rx queues with same
group and queue ID share the same rxq_ctrl, but each one has
mlx5_rxq_priv structure.
The mlx5_rx_queue_setup generates a new rxq_priv structure, and looks
for a rxq_ctrl structure to refer to. If there is already a compatible
rxq_ctrl structure it refers it, otherwise it calls the mlx5_rxq_new
function that generates a new one.

This patch makes mlx5_rxq_new function "standalone", it generates a
rxq_ctrl structure regardless to specific rxq_priv structure. All
operations on the rxq_ctrl structure that depend on the new rxq_priv
structure are performed in the mlx5_rx_queue_setup function, at the same
place for either a new rxq_ctrl structure or an existing rxq_ctrl
structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:36 +01:00
Michael Baum
0ad12a8090 net/mlx5: fix entry in shared Rx queues list
The mlx5_rxq_new function creates control structure and if it from
shared group, it is inserted into the shared RXQs list.

After that, there are some validations which in case they fail, RxQ
control object is released.
In these cases, invalid pointer to the object still in the list, and
access it may cause a crash.

Move the list insertion to the end of the function where the RxQ control
object is surely valid.

Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:35 +01:00
Haifei Luo
9b57df5575 net/mlx5: refactor getting counter action pointer
Previously API flow_dv_query_count_ptr is defined to get counter's
action pointer. This DV function is directly called and the better
way is by the callback.

Add one arg in API mlx5_counter_query and the related callback
counter_query. The added arg is for counter's action pointer.

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:34 +01:00
Shun Hao
94112badf6 net/mlx5: fix meter sub-policy creation
If meter policy action was RSS, the correct items were not provided
for sub-policy creation.

This fixes the issue by providing original items in meter split, so
the sub-policy creation gets the correct items.

Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:34 +01:00
Suanming Mou
ad98ff6c50 net/mlx5: remove unused function
The mlx5_l3t_prepare_entry() function is not used anymore.
This commit removes the unused mlx5_l3t_prepare_entry() function.

Fixes: 92ef4b8f16 ("ethdev: remove deprecated shared counter attribute")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:33 +01:00
Suanming Mou
0c5d3e4cdf net/mlx5: set flow error for hash list create
While mlx5_hlist_create() failed, the rte_flow_error was not filled
with the corresponding error information.

This commit adds the missing rte_flow_error_set() for the failure case.

Fixes: f3020a331d ("net/mlx5: optimize hash list table allocate on demand")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-23 15:57:32 +01:00
Michael Baum
a729d2f093 common/mlx5: refactor devargs management
Improve the devargs handling in two aspects:
 - Parse the devargs string only once.
 - Return error and report for unknown keys.

The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:56 +01:00
Michael Baum
45a6df804a net/mlx5: separate per port configuration
Add configuration structure for port (ethdev). This structure contains
all configurations coming from devargs which oriented to port. It is a
field of mlx5_priv structure, and is updated in spawn function for each
port.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:54 +01:00
Michael Baum
c4b8620135 net/mlx5: refactor to detect operation by DevX
Add inline function indicating whether HW objects operations can be
created by DevX. It makes the code more readable.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:53 +01:00
Michael Baum
a13ec19c19 net/mlx5: add shared device context config structure
Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:52 +01:00
Michael Baum
87af0d1e1b net/mlx5: concentrate all device configurations
Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:51 +01:00
Michael Baum
91d1cfafc9 net/mlx5: rearrange device attribute structure
Rearrange the mlx5_os_get_dev_attr() function in such a way that it
first executes the queries and only then updates the fields.
In addition, it changed its name in preparation for expanding its
operations to configure the capabilities inside it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:50 +01:00
Michael Baum
cf004fd33a net/mlx5: add E-Switch mode flag
This patch adds in SH structure a flag which indicates whether is
E-Switch mode.
When configure "dv_esw_en" from devargs, it is enabled only when is
E-switch mode. So, since dv_esw_en has been configure, it is enough to
check if "dv_esw_en" is valid.
This patch also removes E-Switch mode check when "dv_esw_en" is checked
too.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:49 +01:00
Michael Baum
cf8971db65 net/mlx5: share counter config function
The mlx5_flow_counter_mode_config function exists for both Linux and
Windows with the same name and content.
This patch moves its implementation to the folder shared between the
operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:48 +01:00
Michael Baum
e3032e9c73 net/mlx5: share realtime timestamp configure
The realtime timestamp configure work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:47 +01:00
Michael Baum
c4c3e8afef common/mlx5: share VF check function
The check if device is VF work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:46 +01:00
Michael Baum
8f46481001 net/mlx5: remove Verbs query device duplication
The sharing device context structure has a field named "device_attr"
which is filled by mlx5_os_get_dev_attr() function.
The spawn function calls mlx5_os_get_dev_attr() again and save it to
local variable identical to "device_attr" field.

There is no need for this duplication, because there is a reference to
the sharing device context structure from spawn function.

This patch removes the local "device_attr" from spawn function, and uses
the context's field instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:45 +01:00
Michael Baum
6dc0cbc6c6 net/mlx5: remove DevX flag duplication
The sharing device context structure has a field named "devx" which
indicates if DevX is supported.
The common configure structure has also field named "devx" with the same
meaning.

There is no need for this duplication, because there is a reference to
the common structure from within the sharing device context structure.

This patch removes it from sharing device context structure and uses the
common config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:44 +01:00
Michael Baum
538205614f net/mlx5: remove HCA attribute structure duplication
The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.

There is no need for this duplication, because there is a reference to
the common structure from within the net structures.

This patch removes it from net configure structure and uses the common
config structure instead.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:43 +01:00
Michael Baum
cfe0639b30 net/mlx5: remove redundant check of devargs
The device arguments are parsed and updated twice during spawning. First
time before creating the share device context, and again later after
updating a default value to one of the arguments.

This patch consolidates them into one parsing and updates the default
values before it.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:42 +01:00
Michael Baum
dec50e58f7 net/mlx5: remove declaration duplications
In mlx5_ethdev.c file are implemented those 4 functions:
 - mlx5_dev_infos_get
 - mlx5_fw_version_get
 - mlx5_dev_set_mtu
 - mlx5_hairpin_cap_get

In mlx5.h file they are declared twice. First time under mlx5.c file and
second time under mlx5_ethdev.c file.

This patch removes the redundant declaration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:41 +01:00
Michael Baum
6be4c57add net/mlx5: fix errno update in shared context creation
The mlx5_alloc_shared_dev_ctx() function has a local variable named
"err" which contains the errno value in case of failure.

When functions called by this function are failed, this variable is
updated with their return value (that should be a positive errno value).
However, some functions doesn't update errno value by themselves or
return negative errno value. If one of them fails, the "err" variable
contains negative value what cause to assertion failure.

This patch updates all functions uses by mlx5_alloc_shared_dev_ctx()
function to update rte_errno and take this value instead of "err" value.

Fixes: 5dfa003db5 ("common/mlx5: fix post doorbell barrier")
Fixes: 5d55a494f4 ("net/mlx5: split multi-thread flow handling per OS")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:40 +01:00
Michael Baum
ce12974cce net/mlx5: fix ASO CT object release
The ASO connection tracking structure is initialized once for sharing
device context.

Its release takes place in the close function which is called for each
ethdev individually. i.e. when there is more than one ethdev under the
same sharing device context, it will be destroyed when one of them is
closed. If the other wants to use it later, it may cause it to crash.

In addition, the creation of this structure is performed in the spawn
function. If one of the creations of the objects following it fails, it
is supposed to be destroyed but this does not happen.

This patch moves its release to the sharing device context free function
and thus solves both problems.

Fixes: 0af8a2298a ("net/mlx5: release connection tracking management")
Fixes: ee9e5fad03 ("net/mlx5: initialize connection tracking management")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:39 +01:00
Michael Baum
ad9d0c6395 net/mlx5: fix ineffective metadata argument adjustment
In "dv_xmeta_en" devarg there is an option of dv_xmeta_en=3 which
engages tunnel offload mode. In E-Switch configuration, that mode
implicitly activates dv_xmeta_en=1.

The update according to E-switch support is done immediately after the
first parsing of the devargs, but there is another adjustment later.

This patch moves the adjustment after the second parsing.

Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:38 +01:00
Michael Baum
dcbaafdc8f net/mlx5: fix sibling device config check
The MLX5 net driver supports "probe again". In probing again, it
creates a new ethdev under an existing infiniband device context.

Sibling devices sharing infiniband device context should have compatible
configurations, so some of the devargs given in the probe again, the
ones that are mainly relevant to the sharing device context are sent to
the mlx5_dev_check_sibling_config function which makes sure that they
compatible its siblings.
However, the arguments are adjusted according to the capability of the
device, and the function compares the arguments of the probe again
before the adjustment with the arguments of the siblings after the
adjustment. A user who sends the same values to all siblings may fail in
this comparison if he requested something that the device does not
support and adjusted.

This patch moves the call to the mlx5_dev_check_sibling_config function
after the relevant adjustments.

Fixes: 92d5dd4834 ("net/mlx5: check sibling device configurations mismatch")
Fixes: 2d241515eb ("net/mlx5: add devarg for extensive metadata support")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-02-21 11:36:32 +01:00
Ferruh Yigit
a41f593f1b ethdev: introduce generic dummy packet burst function
Multiple PMDs have dummy/noop Rx/Tx packet burst functions.

These dummy functions are very simple, introduce a common function in
the ethdev and update drivers to use it instead of each driver having
its own functions.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2022-02-11 21:17:34 +01:00
Dariusz Sosnowski
864678e420 net/mlx5: fix inline length for multi-segment TSO
This patch removes a redundant assert in mlx5_tx_packet_multi_tso().
That assert assured that the amount of bytes requested to be inlined
is greater than or equal to the minimum amount of bytes required
to be inlined. This requirement is either derived from the NIC
inlining mode or configured through devargs. When using TSO this
requirement can be disregarded, because on all NICs it is satisfied by
TSO inlining requirements, since TSO requires L2, L3, and L4 headers to
be inlined.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:34 +01:00
Alexander Kozyrev
eb11edd9db net/mlx5: fix meter capabilities reporting
Meter capabilities reporting is not up to date.
Mellanox NICs support RFC2698 and RFC4115 as well as RFC2697.
Add these marker operations to the capabilities list.

Fixes: 6bc327b94f ("net/mlx5: fill meter capabilities using DevX")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:33 +01:00
Alexander Kozyrev
21fdeab422 net/mlx5: fix committed bucket size
Committed Bucket Size calculation tries to fit into 8-bit wide
mantissa field by setting 256 as a maximum value for it.
To compensate for this increase in the mantissa value the exponent
value has to be reduced by 8. But it gives a negative exponent
value for CBS less than 128. And negative exponent value is not
supported by the NIC. Adjust CSB calculation only for values
bigger than 128 to allow both small and big bucket sizes.

Fixes: 3bd26b23ce ("net/mlx5: support meter profile operations")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:32 +01:00
Viacheslav Ovsiienko
7cad3dc312 net/mlx5: remove unused metadata shift parameter
Due to updated modify field action immediate value buffer
pattern [1] the implicit shift for the metadata is not
needed anymore and should be removed.

[1] commit 40c8fb1fd3 ("net/mlx5: update modify field action")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:31 +01:00
Viacheslav Ovsiienko
ced4900cde net/mlx5: fix metadata endianness in modify field action
As modify field action immediate source parameter the metadata
should follow the CPU endianness (according to SET_META action
structure format), and mlx5 PMD wrongly handled the immediate
parameter metadata buffer as big-endian, resulting in wrong
metadata set action with incorrect endianness.

Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-10 09:44:28 +01:00
Stephen Hemminger
06c047b680 remove unnecessary null checks
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.

Remove redundant NULL pointer checks before free functions
found by nullfree.cocci

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-12 12:07:48 +01:00
Elena Agostini
8e83ba285a net/mlx5: add C++ include guard to public header
The support for linking rte_pmd_mlx5.h functions with
C++ applications was missing.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-02-03 09:19:26 +01:00
Xiaoyu Min
87b26522f7 net/mlx5: reject jump to root table
Currently root table as destination is not supported.
The jump action which finally be translated to underlying root table in
rdma-core should be rejected.

Fixes: f78f747f41 ("net/mlx5: allow jump to group lower than current")
Cc: stable@dpdk.org

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-26 17:41:12 +01:00
Raja Zidane
082becbf1f net/mlx5: fix mark enabling for Rx
To optimize datapath, the mlx5 pmd checked for mark action on flow
creation, and flagged possible destination rxqs (through queue/RSS
actions), then it enabled the mark action logic only for flagged rxqs.

Mark action didn't work if no queue/rss action was in the same flow,
even when the user use multi-group logic to manage the flows.
So, if mark action is performed in group X and the packet is moved to
group Y > X when the packet is forwarded to Rx queues, SW did not get
the mark ID to the mbuf.

Flag Rx datapath to report mark action for any queue when the driver
detects the first mark action after dev_start operation.

Fixes: 8e61555657 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-26 17:41:11 +01:00
Alexander Kozyrev
728b6447e7 net/mlx5: fix MPRQ WQE size assertion
Preparation of the stride size and the number of strides for
Multi-Packet RQ was updated recently to accommodate the hardware
limitation about minimum WQE size. The wrong assertion was
introduced to ensure this limitation is met. Assert that the
configured WQE size is not less than the minimum supported size.

Fixes: 34776af600 ("net/mlx5: fix MPRQ stride devargs adjustment")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-18 09:30:25 +01:00
Alexander Kozyrev
9701034faa net/mlx5: fix maximum packet headers size for TSO
The maximum packet headers size for TSO is calculated as a sum of
Ethernet, VLAN, IPv6 and TCP headers (plus inner headers).
The rationale  behind choosing IPv6 and TCP is their headers
are bigger than IPv4 and UDP respectively, giving us the maximum
possible headers size. But it is not true for L3 headers.
IPv4 header size (20 bytes) is smaller than IPv6 header size
(40 bytes) only in the default case. There are up to 10
optional header fields called Options in case IHL > 5.
This means that the maximum size of the IPv4 header is 60 bytes.

Choosing the wrong maximum packets headers size causes inability
to transmit multi-segment TSO packets with IPv4 Options present.
PMD check that it is possible to inline all the packet headers
and the packet headers size exceeds the expected maximum size.
The maximum packet headers size was set to 192 bytes before,
but its value has been reduced during Tx path refactor activity.
Restore the proper maximum packet headers size for TSO.

Fixes: 50724e1bba ("net/mlx5: update Tx definitions")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-16 13:06:39 +01:00
Dmitry Kozlyuk
637582afcc net/mlx5: relax headroom assertion
A debug assertion in Single-Packet Receive Queue (SPRQ) mode
required all Rx mbufs to have a 128 byte headroom,
based on the assumption that rte_pktmbuf_init() sets it.
However, rte_pktmbuf_init() may set a smaller headroom
if the dataroom is insufficient, e.g. this is a natural case
for split buffer segments. The headroom can also be larger.
Only check the headroom size when vectored Rx routines
are used because they rely on it. Relax the assertion
to require sufficient headroom size, not an exact one.

Fixes: a0a45e8af7 ("net/mlx5: configure Rx queue for buffer split")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-09 13:06:38 +01:00
Dmitry Kozlyuk
be8cda4932 net/mlx5: fix GCC uninitialized variable warning
When building with -Db_sanitize=thread, GCC gives a warning:

drivers/net/mlx5/mlx5_flow_meter.c: In function ‘mlx5_flow_meter_create’:
drivers/net/mlx5/mlx5_flow_meter.c:1170:33: warning: ‘legacy_fm’ may be
    used uninitialized in this function [-Wmaybe-uninitialized]

This is a false-positive: legacy_fm is initialized and used
if and only if priv->sh->meter_aso_en is false.
Work around this by initializing legacy_fm to NULL.
Add an assertion before legacy_fm use in case the logic changes.

Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:09:00 +01:00
Tal Shnaiderman
e50fe91ae3 net/mlx5: support imissed counter on Windows
Add support for the imissed counter using the DevX API on Windows.

imissed is queried by creating a queue counter for the port, attaching
it to all created RQs and querying the "out_of_buffer" field.

If the counter cannot be created, imissed will always report 0.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:07:59 +01:00
Gregory Etelson
985b479267 net/mlx5: fix GRE protocol type translation for Verbs
When application creates several flows to match on GRE tunnel without
explicitly specifying GRE protocol type value in flow rules, PMD will
translate that to zero mask.
RDMA-CORE cannot distinguish between different inner flow types and
produces identical matchers for each zero mask.

The patch extracts inner header type from flow rule and forces it in
GRE protocol type, if application did not specify any.

Fixes: 84c406e745 ("net/mlx5: add flow translate function")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:07:49 +01:00
Gregory Etelson
f3f1f576f4 net/mlx5: fix RSS expansion with explicit next protocol
The PMD RSS expansion scheme by default compiles flow rules for all
flow item types that may branch out from a stub supplied
by application.
For example,
ETH can lead to VLAN, IPv4 or IPv6.
IPv4 can lead to UDP, TCP, IPv4 or IPv6.

If application explicitly specified next protocol type, expansion must
use that option only and not create flows with other protocol types.

The PMD ignored explicit next protocol values in GRE and VXLAN-GPE.

The patch updates RSS expansion for GRE and VXLAN-GPE with explicit
next protocol settings.

Fixes: c7870bfe09 ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-01-06 10:07:41 +01:00
Lior Margalit
0c7606b75b net/mlx5: fix assertion on flags set in packet mbuf
Fixed the assertion on the flags set in pkt->ol_flags for vectorized
MPRQ.  With vectorized MPRQ the CQs are processed before copying the
MPRQ bufs so the valid assertion is that the expected flag is set and
not that the pkt->ol_flags equlas this flag alone.

Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-01-06 10:07:30 +01:00
Michael Baum
147f6fb42b net/mlx5: fix memory socket selection in ASO management
In ASO objects creation (WQE, CQE and MR), socket number is given as
a parameter.

The selection was wrongly socket 0 hardcoded even if the user didn't
configure memory for this socket.

This patch replaces the selection to default socket (SOCKET_ID_ANY).

Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-12-22 09:56:01 +01:00
Michael Baum
34776af600 net/mlx5: fix MPRQ stride devargs adjustment
In Multi-Packet RQ creation, the user can choose the number of strides
and their size in bytes. The user updates it using specific devargs for
both of these parameters.
The above two parameters determine the size of the WQE which is actually
their product of multiplication.

If the user selects values that are not in the supported range, the PMD
changes them to default values. However, apart from the range
limitations for each parameter individually there is also a minimum
value on their multiplication. When the user selects values that their
multiplication are lower than minimum value, no adjustment is made and
the creation of the WQE fails.

This patch adds an adjustment in these cases as well. When the user
selects values whose multiplication is lower than the minimum, they are
replaced with the default values.

Fixes: ecb160456a ("net/mlx5: add device parameter for MPRQ stride size")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-12-05 12:22:09 +01:00
Michael Baum
0947ed380f net/mlx5: improve stride parameter names
In the striding RQ management there are two important parameters, the
size of the single stride in bytes and the number of strides.

Both the data-path structure and config structure keep the log of the
above parameters. However, in their names there is no mention that the
value is a log which may be misleading as if the fields represent the
values themselves.

This patch updates their names describing the values more accurately.

Fixes: ecb160456a ("net/mlx5: add device parameter for MPRQ stride size")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-12-05 12:22:09 +01:00
Viacheslav Ovsiienko
252b5ae036 net/mlx5: fix modify field MAC address offset
The MAC addresses fields are 48 bit wide and are processed
by mlx5 PMD as two words. There the bug was introduced for
the offset, causing malfunction of MODIFY_FIELD action
with MAC address fields as source or destination and
with non zero field offset.

Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-12-05 12:24:13 +01:00
Josh Soref
7be78d0279 fix spelling in comments and strings
The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-01-11 12:16:53 +01:00
Michael Baum
8648fa2f46 net/mlx5: fix devargs validation for multi-class probing
The mlx5_args function reads the devargs and checks if they are valid
for this driver and if not it returns an error.

This was normal behavior as long as all the devargs come to this driver,
but since it is possible to run several drivers together, the function
may return an error for another driver's devarg even though it is
completely valid.
In addition the function does not allow the user to know which of the
devargs is incorrect, but returns an error without printing the
unknown devarg.

This patch eliminates the error return in the case of an unknown devarg,
and prints a warning for each such devarg specifically.

Fixes: 7b4f1e6bd3 ("common/mlx5: introduce common library")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-26 13:36:16 +01:00
Sean Morrissey
b53d106d34 remove repeated 'the' in the code
Remove the use of double "the" as it does not make sense.

Cc: stable@dpdk.org

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-26 11:28:34 +01:00
Viacheslav Ovsiienko
572c9d4bda net/mlx5: fix shared Rx queue segment configuration match
While joining the shared Rx queue to the existing queue group,
the queue configurations is checked to be the same as it was
specified in the first group queue creation - all shared
queues should be created with identical configurations.

During the Rx queue creation the buffer split segment
configuration can be altered - the zero segment sizes are
substituted with the actual ones, inherited from the pools,
number of segments can be extended to cover the maximal
packet length, etc. It means the actual queue segment
configuration can not be used directly to match the
configuration provided in the queue setup call.

To resolve an issue we should store original parameters
in the shared queue structure and perform the check against
one of these stored ones.

Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-24 17:25:37 +01:00
Alexander Kozyrev
94421842de net/mlx5: fix GENEVE and VXLAN-GPE flow item matching
GENEVE and VXLAN-GPE item matching is done similarly to GRE matching.
Users can skip the specification of the protocol type and expect that
this type is deducted from the inner header type automatically.
But the inner header type may not be specified in order to match all the
protocol types. In this case, PMD should not specify the protocol type.
Check if we have the inner header type before setting the protocol type.

Fixes: 690391dd0e ("net/mlx5: fix GENEVE protocol type translation")
Fixes: 861fa3796f ("net/mlx5: fix VXLAN-GPE next protocol translation")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-24 17:25:36 +01:00
Alexander Kozyrev
9e61533df2 net/mlx5: fix GRE flow item matching
GRE protocol type is implicitly set in the matching translation in case
an application doesn't specify any type explicitly in a flow rule.
It is extracted from the inner header type, but this type may be absent.
In this case, GRE item matching is broken. Check if we have the inner
header type before setting it to allow matching on all GRE packets.

Fixes: be26e81bfc ("net/mlx5: fix GRE protocol type translation")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-24 17:25:36 +01:00
Dmitry Kozlyuk
ec9b812b6c net/mlx5: fix Rx queue reference count for indirect RSS
mlx5_ind_table_obj_modify() was not changing the reference counters
of neither the new set of RxQs, nor the old set of RxQs.
On the other hand, creation of the RSS incremented the RxQ refcnt.
If an RxQ was present in both the initial and the modified set,
its reference counter was incremented one extra time
compared to the queues that were only present in the new set.
This prevented releasing said RxQ resources on port stop:

    flow indirect_action 0 create action_id 1 \
        action rss queues 0 1 end / end
    flow indirect_action 0 update 1 \
        action rss queues 2 3 end / end
    quit
    ...
    mlx5_net: mlx5.c:1622: mlx5_dev_close():
        port 0 some Rx queue objects still remain
    mlx5_net: mlx5.c:1626: mlx5_dev_close():
        port 0 some Rx queues still remain

Increment reference counters for the new set of RxQs
and decrement them for the old set of RxQs when needed.
Remove explicit referencing of RxQ from mlx5_ind_table_obj_attach()
because it reuses mlx5_ind_table_obj_modify() code doing this.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
2021-11-24 17:25:35 +01:00
Dmitry Kozlyuk
c65d684497 net/mlx5: fix indirect RSS creation when port is stopped
mlx5_ind_table_obj_setup() was incrementing RxQ reference counters
even when the port was stopped, which prevented RxQ release
and triggered an internal assertion.
Only increment reference counter when the port is started.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
2021-11-24 17:25:33 +01:00
Dmitry Kozlyuk
0ece5de3c4 net/mlx5: fix crash on close after failed start
If mlx5_rxq_start() failed and rxq_ctrl was not initialized,
mlx5_rxq_obj_verify() would segfault in an attempt to dereference it.
Add a check that rxq_ctrl is not NULL before accessing its members.

Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-23 21:39:05 +01:00
Dariusz Sosnowski
8fbce96fbe net/mlx5: fix reference count on detached indirect action
This patch fixes segfault which was triggered when port, with indirect
actions created, was closed. Segfault was occurring only when
RTE_LIBRTE_MLX5_DEBUG was defined. It was caused by redundant decrement
of RX queues refcount:

- refcount was decremented when port was stopped and indirect actions
were detached from RX queues (port stop),
- refcount was decremented when indirect actions objects were destroyed
(port close or destroying of indirect action).

This patch fixes behavior. Dereferencing Rx queues is done if and only
if indirect action is explicitly destroyed by the user or detached on
port stop. Dereferencing Rx queues on action destroy operation depends
on an argument to the wrapper of indirect action destroy operation,
introduced in this patch.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-23 17:57:19 +01:00
Dariusz Sosnowski
fa4883456d net/mlx5: fix multi-segment packet wraparound
This patch fixes the assertion failure triggered when the user
configured minimum inline length requirements and the application
transmitted multi segment packets. Failure was triggered when space left
in TX queue was not enough to cover this requirement.

This patch limits the length of data to be copied to the available space
in TX queue.

Fixes: cacb44a099 ("net/mlx5: add no-inline Tx flag")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-23 17:57:13 +01:00
Bing Zhao
db3ec06e7a net/mlx5: fix RSS validation for meter hierarchy
In a meter hierarchy, all the meters are marked with having RSS if
the final meter's termination action is RSS.

When validating a flow rule with meter hierarchy, the RSS action
should not be fetched from the current meter if it is not the final
one.

The fate action union is next meter ID instead of the pointer to the
RSS action. By using the final meter in the hierarchy, the flow rule
validation will succeed without any crash caused by the invalid RSS
action pointer access.

Fixes: 1ce19ab1f4 ("net/mlx5: fix RSS validation with meter policy")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Reviewed-by: Li Zhang <lizh@nvidia.com>
Reviewed-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-23 17:57:13 +01:00
Jiawei Wang
693c7d4b1e net/mlx5: fix flow mark with sampling and metering
If there are sample action and the meter action in the same flow,
mlx5 PMD performs several levels of splitting. For example, sampling
feature splits the original flow into prefix subflow with sample action,
and suffix subflow with the rest of actions. Then, metering feature
splits the sampling suffix subflow into its own meter subflows.
If mark action was added before the sample and meter action, the
flow mark flag was kept in the sample subflows but reset on
handling the metering split, causing the flow mark value missed.

This patch keeps the flow mark flag of previous subflow, and then
the following meter subflows handle the flow mark correctly.

Fixes: 9ade91dfe8 ("net/mlx5: fix group value of sample suffix flow")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-23 14:24:17 +01:00
Dmitry Kozlyuk
08ac03580e common/mlx5: fix mempool registration
Mempool registration was not correctly processing
mempools with RTE_PKTMBUF_F_PINEND_EXT_BUF flag set
("pinned mempools" for short), because it is not known
at registration time whether the mempool is a pktmbuf one,
and its elements may not yet be initialized to analyze them.
Attempts had been made to recognize such pools,
but there was no robust solution, only the owner of a mempool
(the application or a device) knows its type.
This patch extends common/mlx5 registration code
to accept a hint that the mempool is a pinned one
and uses this capability from net/mlx5 driver.

1. Remove all code assuming pktmbuf pool type
   or trying to recognize the type of a pool.
2. Register pinned mempools used for Rx
   and their external memory on port start.
   Populate the MR cache with all their MRs.
3. Change Tx slow path logic as follows:
   3.1. Search the mempool database for a memory region (MR)
        by the mbuf pool and its buffer address.
   3.2. If not MR for the address is found for the mempool,
	and the mempool contains only pinned external buffers,
	perform the mempool registration of the mempool
	and its external pinned memory.
   3.3. Fall back to using page-based MRs in other cases
	(for example, a buffer with externally attached memory,
	but not from a pinned mempool).

Fixes: 690b2a88c2 ("common/mlx5: add mempool registration facilities")
Fixes: fec28ca0e3 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-21 15:38:07 +01:00
Jiawei Wang
144d222305 net/mlx5: fix mismatch metadata flow with meter action
The mlx5 PMD introduced the table id attribute to allow multiple
flow tables on the same table level for flow metering, there can be
multiple flow table objects with the same table level but different
table ids.

If the extended metadata mode is enabled, all flows containing
destination Queue/RSS actions are split into two subflows - prefix one
jumps to the MLX5_FLOW_MREG_CP_TABLE_GROUP flow table to copy
MARK action data, and suffix one to perform the destination Queue/RSS
action. The table_id for the jump in the metadata split prefix flow
is always 0.

If flow itself was the metering split suffix subflow the table id was
set to 1 in the flow split structure and the metadata split suffix
subflow was created in the table with wrong table id, causing the
metadata suffix flow mismatch.

This patch resets the table id to 0 while creating the metadata
suffix flows.

Fixes: 51ec04dc7b ("net/mlx5: connect meter policy to created flows")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-21 15:38:06 +01:00
Jiawei Wang
16f4aa57ca net/mlx5: fix metadata and meter split shared tag
In the metadata flow split, PMD created the prefix subflow
with removed Queue or RSS action and appended the set tag and
copy table jump actions. If the flow being split for metadata
was the meter prefix subflow, the driver supposed to share the same
meter split tag action for the metadata split flow. There was the wrong
check for preceding meter split tag action, causing append with metadata
split set tag action and resulting the meter suffix subflow was missed
due to tag value mismatch.

This patch adds the checking before copying into extend action list,
to make sure the correct shared tag is used.

Fixes: 8d72fa6689 ("net/mlx5: share tag between meter and metadata")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-21 15:38:02 +01:00
Viacheslav Ovsiienko
a750169c4f net/mlx5: fix modify field destination bit offset
If the modify field action requests the field copy/set transaction
from other field, the destination field hardware bit offset was
assigned incorrectly with non-zero byte offset, causing wrong
translations for the fields with sizes larger than 32 bits.

Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-17 15:51:40 +01:00
Dariusz Sosnowski
421177ccd7 net/mlx5: fix MPLS tunnel outer layer overwrite
mlx5 PMD incorrectly overwrote outer layer fields in MPLS tunnel
rte_flow patterns using defaults for MPLS tunnels. This included
overwriting UDP destination port in MPLSoUDP and GRE protocol field in
MPLSoGRE.

This patch fixes this behavior. If application provides the values in
flow pattern items preceding the MPLS flow item the provided values will
be used, otherwise the defaults will be applied.

Fixes: d1abe664dd ("net/mlx5: add MPLS to Direct Verbs flow engine")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-17 11:48:18 +01:00
Dariusz Sosnowski
7775172c04 net/mlx5: fix partial inline of fine grain packets
Assuming a user tried to send multi-segment packets, with
RTE_PMD_MLX5_FINE_GRANULARITY_INLINE flag set, using a device with
minimum inlining requirements (such as ConnectX-4 Lx or when user
specified them explicitly), sending such packets caused segfault.
Segfault was caused by failed invariants in
mlx5_tx_packet_multi_inline function.

This patch introduces a logic for multi-segment packets, with
RTE_PMD_MLX5_FINE_GRANULARITY_INLINE flag set, to omit mbuf scanning for
filling inline buffer and inline only minimal amount of data required.

Fixes: ec837ad0fc ("net/mlx5: fix multi-segment inline for the first segments")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-17 11:48:18 +01:00
Dmitry Kozlyuk
e4c402afc1 common/mlx5: fix MPRQ mempool registration
Mempool registration code had a wrong assumption that it is always
dealing with packet mempools and always called rte_pktmbuf_priv_flags(),
which returned a random value for different types of mempools.
In particular, it could consider MPRQ mempools as having externally
pinned buffers, which is wrong.
Packet mempools cannot be reliably recognized, but it is sufficient to
check that the mempool is not a packet one, so it cannot have externally
pinned buffers.
Compare mempool private data size to that of packet mempools to check.

Fixes: 690b2a88c2 ("common/mlx5: add mempool registration facilities")
Fixes: fec28ca0e3 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 16:45:21 +01:00
Chengfeng Ye
1e580ed4b0 net/mlx5: fix mutex unlock in Tx packet pacing cleanup
The lock sh->txpp.mutex was not correctly released on one path
of cleanup function return, potentially causing the deadlock.

Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
Cc: stable@dpdk.org

Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 17:55:17 +01:00
Dmitry Kozlyuk
5c078fce58 net/mlx5: fix keeping indirect RSS non-isolated mode
When a port starts in non-isolated mode,
an internal indirect RSS is created that includes all configured queues
and a flow rule is created that references this indirect RSS.
If before switching to non-isolated mode an indirect RSS was created
that includes the same set of queues, it would be reused at this point.
However, because the port had been stopped (or not yet started),
the TIR for this indirect RSS had been destroyed (or not yet created).
The flow rule could not be created and the port start failed.

Creation of TIRs is moved before configuring non-isolated mode flows,
but it is not enough because of the following issue.

Commit 0cedf34da7 ("net/mlx5: move Rx queue reference count")
changed mlx5_rxq_get() not to increment RxQ control structure
reference count, mlx5_rxq_ref() was introduced for this purpose.
mlx5_ind_table_obj_attach() was not updated to use the new function,
so when the port was stopped, the control structure reference count
of an RxQ used in RSS reached zero and the structure was destroyed.

Use mlx5_rxq_ref() to keep RxQ control structure
needed for indirect RSS persistence across port restart.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Fixes: 0cedf34da7 ("net/mlx5: move Rx queue reference count")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 14:05:35 +01:00
Bing Zhao
1ce19ab1f4 net/mlx5: fix RSS validation with meter policy
The RSS can be one of the fate actions when creating a meter with
policy. In the previous implementation, the RSS validation was missed
when creating a flow rule with such meter due to the fact that a
policy meter was created firstly and then used in the rule. In the
stage of meter creation, no rte_flow_item* information was provided.

A unnecessary RSS expansion might be called since the validation was
missed and would cause an unexpected error of the rule creation. Even
though the rule should be rejected from the very beginning, it would
cause confusion. There might be some other errors when the validation
was missed.

Adding the RSS validation inside the meter action validation will
prevent the code from continuing when there is a conflict between the
items, other actions and the policy meter RSS action.

Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Reviewed-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-16 10:23:31 +01:00
Gregory Etelson
be26e81bfc net/mlx5: fix GRE protocol type translation
When application creates several flows to match on GRE tunnel
without explicitly specifying GRE protocol type value in
flow rules, PMD will translate that to zero mask.
RDMA-CORE cannot distinguish between different inner flow types and
produces identical matchers for each zero mask.

The patch extracts inner header type from flow rule and forces it
in GRE protocol type, if application did not specify
any without explicitly specifying GRE protocol type value in
flow rules, protocol type value.

Fixes: fc2c498ccb ("net/mlx5: add Direct Verbs translate items")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 10:22:56 +01:00
Gregory Etelson
690391dd0e net/mlx5: fix GENEVE protocol type translation
When application creates several flows to match on GENEVE tunnel
without explicitly specifying GENEVE protocol type value in
flow rules, PMD will translate that to zero mask.
RDMA-CORE cannot distinguish between different inner flow types and
produces identical matchers for each zero mask.

The patch extracts inner header type from flow rule and forces it
in GENEVE protocol type, if application did not specify
any without explicitly specifying GENEVE protocol type value in
flow rules, protocol type value.

Fixes: e59a5dbcfd ("net/mlx5: add flow match on GENEVE item")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 10:22:55 +01:00
Gregory Etelson
a21d616b99 net/mlx5: fix RSS expansion scheme for GRE header
RFC-2784 allows any valid Ethernet type in GRE protocol type field.

Add Ethernet to GRE RSS expansion.

Fixes: f4b901a46a ("net/mlx5: add flow GRE item")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 10:22:54 +01:00
Gregory Etelson
9f151fd8df net/mlx5: add Ethernet header to GENEVE RSS expansion
RFC-8926 allows inner Ethernet header after GENEVE tunnel.

Current GENEVE RSS expansion created IPv4 and IPv6 paths only.

The patch adds Ethernet to RSS expansion scheme.

Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 10:22:53 +01:00
Gregory Etelson
861fa3796f net/mlx5: fix VXLAN-GPE next protocol translation
VXLAN-GPE extends VXLAN protocol and provides the next protocol
field specifying the first inner header type.

The application can assign some explicit value to
VXLAN-GPE::next_protocol field or set it to the default one. In the
latter case, the rdma-core library cannot recognize the matcher
built by PMD correctly, and it results in hardware configuration
missing inner headers match.

The patch forces VXLAN-GPE::next_protocol assignment if the
application did not explicitly assign it to the non-default value

Fixes: 90456726eb ("net/mlx5: fix VXLAN-GPE item translation")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-16 10:22:51 +01:00
Michael Baum
71304b5c7b common/mlx5: fix redundant field in MR control structure
Inside the MR control structure there is a pointer to the common device.
This pointer enables access to the global cache as well as hardware
objects that may be required in case a new MR needs to be created.

The purpose of adding this pointer into the MR control structure was to
avoid its transfer as a parameter to all the functions of searching MR
in the caches.
However, adding it to this structure increased the Rx and Tx data-path
structures, all the fields that followed it were slightly moved away
which caused to a reduction in performance.

This patch removes the pointer from the structure. It can be accessed
through the "dev_gen_ptr" existing field using the "container_of"
operator.

Fixes: 334ed198ab ("common/mlx5: remove redundant parameter in MR search")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-17 10:42:20 +01:00
Bing Zhao
ce78c51833 net/mlx5: fix delay drop bit set overflow
The attribute to record the global control of hairpin queues' delay
drop was defined as a bit-field with one bit, and the intention was
to reduce the memory overhead. In the meanwhile, the macro was
defined as an enumerated value 0x2.

No matter what value inputted via devarg, the lowest bit was always
zero and the higher bits would be ignored. For hairpin queues, the
delay drop attribute couldn't be enabled.

With the commit, the double logical negation is used to fix this.

Fixes: febcac7b46 ("net/mlx5: support Rx queue delay drop")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-14 16:26:49 +01:00
Gregory Etelson
60bc280518 net/mlx5: fix flex item macro collision
Flex item macro definition values duplicated existing connection
tracking values.

The patch provides new values for flex item macros.

Fixes: a23e9b6e3e ("net/mlx5: handle flex item in flows")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-14 09:25:13 +01:00
Gregory Etelson
c06b773809 net/mlx5: fix integrity conversion scheme
RTE flow integrity API provides top-level packet validations.
RTE integrity bits are not always translated one-to-one to
hardware integrity bits.
For example RTE l3_ok and l4_ok integrity bits require 2 hardware
integrity bits each.

The patch fixes RTE l3_ok and l4_ok bits translation to match
ConnectX-6 hardware.

Fixes: 79f8952783 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-14 09:24:23 +01:00
Viacheslav Ovsiienko
11cfe349b3 net/mlx5: fix Tx scheduling check
There was a redundant check for the enabled E-Switch, this
resulted in device probing failure if the Tx scheduling was
requested and E-Switch was enabled.

Fixes: f17e4b4ffe ("net/mlx5: add Tx scheduling check on queue creation")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-14 09:18:28 +01:00
Viacheslav Ovsiienko
e53d873dc0 net/mlx5: fix modify field action conversion
The routine converting RTE flow modify field action into
field driver's presentation did not specify the field mask
correctly and this resulted into wrong conversion for
the actions with shifted fields.

Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-10 15:44:45 +01:00
Bing Zhao
6b5b3005cb net/mlx5: fix RETA update without stopping device
The global redirection table is used to create the default flow
rules for the ingress traffic with the lowest priority. It is also
used to create the default RSS rule in the destination table when
there is a tunnel offload.

To update the RETA in-flight, there is no restriction in the ethdev
API. In the previous implementation of mlx5, a port restart was
needed to make the new configuration take effect.

The restart is heavy, e.g., all the queues will be released and
reallocated, users' rules will be flushed. Since the restart is
internal, there is a risk to crash the application when some change
in the ethdev is introduced but no workaround is done in mlx5 PMD.

The users' rules, including the default miss rule for tunnel
offload, should not be impacted by the RETA update. It is improper
to flush all rules when updating RETA.

With this patch, only the default rules will be flushed and
re-created with the new table configuration.

Fixes: 3f2fe392bd ("net/mlx5: fix crash during RETA update")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-10 15:44:44 +01:00
Jiawei Wang
a9b6ea45be net/mlx5: fix tag ID conflict with sample action
For the flows containing sample action, the tag action was added
implicitly to store the unique flow index into metadata register in the
split prefix subflow, and then match on this index in the split suffix
subflow. The metadata register for flow index of sample split subflows
was also used to store application metadata TAG 0 item, this might cause
TAG 0 corruption in the flows with sample actions.

This patch uses the same metadata register C index as used for
ASO action since it's reserved and not used directly by the application,
and adds the checking in validation to make sure not to conflict
with ASO CT in the same flow.

Fixes: b4c0ddbfcc ("net/mlx5: split sample flow into two sub-flows")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-10 15:44:43 +01:00
Gregory Etelson
aaa6a7ec0f net/mlx5: fix tunnel offload validation
Tunnel offload API allows the application to restore packet to
its original form if the chain of flows is missed after DECAP action.

MLX5 PMD provides tunnel offload support only if DV API was enabled.

The patch verifies DV availability before processing with
tunnel offload tasks.

Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-10 15:44:43 +01:00
Dmitry Kozlyuk
7297d2cdec common/mlx5: fix external memory pool registration
Registration of packet mempools with RTE_PKTMBUF_POOL_PINNED_EXT_MEM
was performed incorrectly: after population of such mempool chunks
only contain memory for rte_mbuf structures, while pointers to actual
external memory are not yet filled. MR LKeys could not be obtained
for external memory addresses of such mempools. Rx datapath assumes
all used mempools are registered and does not fallback to dynamic
MR creation in such case, so no packets could be received.

Skip registration of extmem pools on population because it is useless.
If used for Rx, they are registered at port start.
During registration, recognize such pools, inspect their mbufs
and recover the pages they reside in.

While MRs for these pages may already be created by rte_dev_dma_map(),
they are not reused to avoid synchronization on Rx datapath
in case these MRs are changed in the database.

Fixes: 690b2a88c2 ("common/mlx5: add mempool registration facilities")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-10 15:44:42 +01:00
Rongwei Liu
0888c011d5 net/mlx5: fix meter policy validation
When a user specifies meter policy like "g_actions queue / end
y_actions queue / r_action drop / end", validation logic missed
to set meter policy mode and it took a random value from the stack.

Define ALL policy modes for the mentioned cases.

Fixes: 4b7bf3ffb4 ("net/mlx5: support yellow in meter policy validation")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Bing Zhao <bingz@nvidia.com>
2021-11-10 15:44:42 +01:00
Bing Zhao
0ad28e873c net/mlx5: fix RSS consistency check of meter policy
After yellow color actions in the metering policy were supported,
the RSS could be used for both green and yellow colors and only the
queues attribute could be different.

When specifying the attributes of a RSS, some fields can be ignored
and some default values will be used in PMD. For example, there is a
default RSS key in the PMD and it will be used to create the TIR if
nothing is provided by the application.

The default value cases were missed in the current implementation
and it would cause some false positives or crashes.

The comparison function should be adjusted to take all cases into
consideration when RSS is used for both green and yellow colors.

Fixes: 4b7bf3ffb4 ("net/mlx5: support yellow in meter policy validation")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-10 15:44:39 +01:00
Michael Baum
8451e165b8 net/mlx5: workaround MR creation for flow counter
Due to kernel driver / FW issues in direct MKEY creation using the DevX
API, this patch replaces the counter MR creation to use wrapped mkey
API.

Fixes: 5382d28c21 ("net/mlx5: accelerate DV flow counter transactions")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
2021-11-10 15:50:44 +01:00
Dmitry Kozlyuk
077be91dd7 net/mlx5: fix split buffer Rx
Routine to lookup LKey on Rx was assuming that the mbuf address
always belongs to a single mempool: the one associated with an RxQ
or the MPRQ mempool. This assumption is false for split buffers case.
A wrong LKey was looked up, resulting in completion errors.
Modify lookup routines to lookup LKey in the mbuf->pool
for non-MPRQ cases both on Rx datapath and on queue initialization.

Fixes: fec28ca0e3 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-08 13:56:29 +01:00
Michael Baum
5dfa003db5 common/mlx5: fix post doorbell barrier
The rdma-core library can map doorbell register in two ways, depending
on the environment variable "MLX5_SHUT_UP_BF":

  - as regular cached memory, the variable is either missing or set to
    zero. This type of mapping may cause the significant doorbell
    register writing latency and requires an explicit memory write
    barrier to mitigate this issue and prevent write combining.

  - as non-cached memory, the variable is present and set to not "0"
    value. This type of mapping may cause performance impact under
    heavy loading conditions but the explicit write memory barrier is
    not required and it may improve core performance.

The UAR creation function maps a doorbell in one of the above ways
according to the system. In run time, it always adds an explicit memory
barrier after writing to.
In cases where the doorbell was mapped as non-cached memory, the
explicit memory barrier is unnecessary and may impair performance.

The commit [1] solved this problem for a Tx queue. In run time, it
checks the mapping type and provides the memory barrier after writing to
a Tx doorbell register if it is needed. The mapping type is extracted
directly from the uar_mmap_offset field in the queue properties.

This patch shares this code between the drivers and extends the above
solution for each of them.

[1] commit 8409a28573
    ("net/mlx5: control transmit doorbell register mapping")

Fixes: f8c97babc9 ("compress/mlx5: add data-path functions")
Fixes: 8e196c08ab ("crypto/mlx5: support enqueue/dequeue operations")
Fixes: 4d4e245ad6 ("regex/mlx5: support enqueue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 16:21:03 +01:00
Michael Baum
b6e9c33c82 net/mlx5: remove duplicated reference of Tx doorbell
The Tx doorbell has different virtual addresses per process.
The secondary process takes the UAR physical page ID of the primary and
mmap it to its own virtual address.
The primary doorbell references were saved in two shared memory
locations: the TxQ structure and a dedicated doorbell array.

Remove the doorbell reference from the TxQ structure and move the
primary processes to take the UAR information from the primary doorbell
array.

Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 16:21:03 +01:00
Michael Baum
204891763c common/mlx5: make multi-process MR management port-agnostic
In the multi-process mechanism, there are things that the secondary
process does not perform itself but asks the primary process to perform
for it.
There is a special API for communication between the processes that
receives parameters necessary for the specific action required as well
as a special structure called mp_id that contains the port number of the
processes through which the initial process finds the relevant ETH
device for the processes.

One of the operations performed through this mechanism is the creation
of a memory region, where the secondary process sends the virtual
address as a parameter and the mp_id structure with the port number
inside it.
However, once the memory area management is shared between the drivers
and either port number or ETH device is no longer relevant to them, it
seems unnecessary to continue communicating between the processes
through the mp_id variable.

In this patch we will remove the use of the above structure for all MR
management, and add to the specific parameter of operations a pointer to
the common device that contains everything needed to create/register MR.

Fixes: 9f1d636f3e ("common/mlx5: share MR management")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 14:12:08 +01:00
Michael Baum
334ed198ab common/mlx5: remove redundant parameter in MR search
Memory region management has recently been shared between drivers,
including the search for caches in the data plane.
The initial search in the local linear cache of the queue, usually
yields a result and one should not continue searching in the next level
caches.

The function that searches in the local cache gets the pointer to a
device as a parameter, that is not necessary for its operation
but for subsequent searches (which, as mentioned, usually do not
happen).
Transferring the device to a function and maintaining it, takes some
time and causes some impact on performance.

Add the pointer to the device as a field of the mr_ctrl structure. The
field will be updated during control path and will be used only when
needed in the search.

Fixes: fc59a1ec55 ("common/mlx5: share MR mempool registration")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 14:11:16 +01:00
Bing Zhao
e848218741 net/mlx5: check delay drop settings in kernel driver
The delay drop is the common feature managed on per device basis
and the kernel driver is responsible one for the initialization and
rearming.

By default, the timeout value is set to activate the delay drop when
the driver is loaded.

A private flag "dropless_rq" is used to control the rearming. Only
when it is on, the rearming will be handled once received a timeout
event. Or else, the delay drop will be deactivated after the first
timeout occurs and all the Rx queues won't have this feature.

The PMD is trying to query this flag and warn the application when
some queues are created with delay drop but the flag is off.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-05 17:04:53 +01:00
Bing Zhao
febcac7b46 net/mlx5: support Rx queue delay drop
For the Ethernet RQs, if there all receiving descriptors are
exhausted, the packets being received will be dropped. This behavior
prevents slow or malicious software entities at the host from
affecting the network. While for hairpin cases, even if there is no
software involved during the packet forwarding from Rx to Tx side,
some hiccup in the hardware or back pressure from Tx side may still
cause the descriptors to be exhausted. In certain scenarios it may be
preferred to configure the device to avoid such packet drops,
assuming the posting of descriptors will resume shortly.

To support this, a new devarg "delay_drop" is introduced. By default,
the delay drop is enabled for hairpin Rx queues and disabled for
standard Rx queues. This value is used as a bit mask:
  - bit 0: enablement of standard Rx queue
  - bit 1: enablement of hairpin Rx queue
And this attribute will be applied to all Rx queues of a device.

The "rq_delay_drop" capability in the HCA_CAP is checked before
creating any queue. If the hardware capabilities do not support
this delay drop, all the Rx queues will still be created without
this attribute, and the devarg setting will be ignored even if it
is specified explicitly. A warning log is used to notify the
application when this occurs.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-05 17:04:53 +01:00
Viacheslav Ovsiienko
25ed2ebff1 net/mlx5: support shared Rx queue port data path
When receive packet, mlx5 PMD saves mbuf port number from
RxQ data.

To support shared RxQ, save port number into RQ context as user index.
Received packet resolve port number from CQE user index which derived
from RQ context.

Legacy Verbs API doesn't support RQ user index setting, still read from
RxQ port number.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:51 +01:00
Xueming Li
09c2555303 net/mlx5: support shared Rx queue
This patch introduces shared RxQ. All shared Rx queues with same group
and queue ID share the same rxq_ctrl. Rxq_ctrl and rxq_data are shared,
all queues from different member port share same WQ and CQ, essentially
one Rx WQ, mbufs are filled into this singleton WQ.

Shared rxq_data is set into device Rx queues of all member ports as
RxQ object, used for receiving packets. Polling queue of any member
ports returns packets of any member, mbuf->port is used to identify
source port.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:50 +01:00
Xueming Li
5cf0707fc7 net/mlx5: remove Rx queue data list from device
Rx queue data list(priv->rxqs) can be replaced by Rx queue
list(priv->rxq_privs), removes it and replaces with universal wrapper
API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:49 +01:00
Xueming Li
5ceb3a02b0 net/mlx5: move Rx queue DevX resource
To support shared RX queue, moves DevX RQ which is per queue resource to
Rx queue private data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:48 +01:00
Xueming Li
5db77fef78 net/mlx5: remove port info from shareable Rx queue
To prepare for shared Rx queue, removes port info from shareable Rx
queue control.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:47 +01:00
Xueming Li
44126bd9d0 net/mlx5: move Rx queue hairpin info to private data
Hairpin info of Rx queue can't be shared, moves to private queue data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:47 +01:00
Xueming Li
0cedf34da7 net/mlx5: move Rx queue reference count
Rx queue reference count is counter of RQ, used to count reference to RQ
object. To prepare for shared Rx queue, this patch moves it from
rxq_ctrl to Rx queue private data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:46 +01:00
Xueming Li
4cda06c3c3 net/mlx5: split Rx queue into shareable and private
To prepare shared Rx queue, splits RxQ data into shareable and private.
Struct mlx5_rxq_priv is per queue data.
Struct mlx5_rxq_ctrl is shared queue resources and data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:45 +01:00
Xueming Li
53232e3b05 net/mlx5: clean Rx queue code
This patch removes unused Rx queue code.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:45 +01:00
Xueming Li
fdb67b84a5 net/mlx5: fix Rx queue memory allocation return value
If error happened during Rx queue mbuf allocation, boolean value
returned. From description, return value should be error number.

This patch returns negative error number.

Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:44 +01:00
Xueming Li
056c87d07d common/mlx5: support receive memory pool
The hardware Receive Memory Pool (RMP) object holds the destination for
incoming packets/messages that are routed to the RMP through RQs. RMP
enables sharing of memory across multiple Receive Queues. Multiple
Receive Queues can be attached to the same RMP and consume memory
from that shared poll. When using RMPs, completions are reported to the
CQ pointed to by the RQ, user index that set in RQ creation time is
carried to completion entry.

This patch enables RMP based RQ, RMP is created when mlx5_devx_rq.rmp is
set.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:43 +01:00
Xueming Li
68fa62924d net/mlx5: fix Altivec Rx
This patch fixes stale field reference.

Fixes: a18ac61133 ("net/mlx5: add metadata support to Rx datapath")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:41 +01:00
Gregory Etelson
a23e9b6e3e net/mlx5: handle flex item in flows
Provide flex item recognition, validation and translation
in flow patterns. Track the flex item referencing.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:41 +01:00
Viacheslav Ovsiienko
6dac7d7ff2 net/mlx5: translate flex item pattern into matcher
The matcher is an steering engine entity that represents
the flow pattern to hardware to match. It order to
provide match on the flex item pattern the appropriate
matcher fields should be configured with values and masks
accordingly.

The flex item related matcher fields is an array of eight
32-bit fields to match with data captured by sample registers
of configured flex parser. One packet field, presented in
item pattern can be split between several sample registers,
and multiple fields can be combined together into single
sample register to optimize hardware resources usage
(number os sample registers is limited), depending on field
modes, widths and offsets. Actual mapping is complicated
and controlled by special translation data, built by PMD
on flex item creation.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:40 +01:00
Viacheslav Ovsiienko
b293e8e49d net/mlx5: translate flex item configuration
RTE Flow flex item configuration should be translated
into actual hardware settings:

  - translate header length and next protocol field samplings
  - translate data field sampling, the similar fields with the
    same mode and matching related parameters are relocated
    and grouped to be covered with minimal amount of hardware
    sampling registers (each register can cover arbitrary
    neighbour 32 bits (aligned to byte boundary) in the packet
    and we can combine the fields with smaller lengths or
    segments of bigger fields)
  - input and output links translation
  - preparing data for parsing flex item pattern on flow creation

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:39 +01:00
Gregory Etelson
9086ac093a net/mlx5: add flex parser DevX object management
The DevX flex parsers can be shared between representors
within the same IB context. We should put the flex parser
objects into the shared list and engage the standard
mlx5_list_xxx API to manage ones.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:38 +01:00
Viacheslav Ovsiienko
db25cadc08 net/mlx5: add flex item operations
This patch is a preparation step of implementing
flex item feature in driver and it provides:

  - external entry point routines for flex item
    creation/deletion

  - flex item objects management over the ports.

The flex item object keeps information about
the item created over the port - reference counter
to track whether item is in use by some active
flows and the pointer to underlying shared DevX
object, providing all the data needed to translate
the flow flex pattern into matcher fields according
hardware configuration.

There is not too many flex items supposed to be
created on the port, the design is optimized
rather for flow insertion rate than memory savings.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:38 +01:00
Viacheslav Ovsiienko
575740d10e net/mlx5: update eCPRI flex parser structures
To handle eCPRI protocol in the flows the mlx5 PMD engages
flex parser hardware feature. While we were implementing
eCPRI support we anticipated the flex parser usage extension,
and all related variables were named accordingly, containing
flex syllabus. Now we are preparing to introduce more common
approach of flex item, in order to avoid naming conflicts
and improve the code readability the eCPRI infrastructure
related variables are renamed as preparation step.

Later, once we have the new flex item implemented, we could
consider to refactor the eCPRI protocol support  to move on
common flex item basis.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:37 +01:00
Dmitry Kozlyuk
ec4e11d41d net/mlx5: preserve indirect actions on restart
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():

    Rx queue allocation failed: Cannot allocate memory

Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.

When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.

Fixes: 4b61b8774b ("ethdev: introduce indirect flow action")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-02 18:59:17 +01:00
Dmitry Kozlyuk
bc5bee028e net/mlx5: create drop queue using DevX
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-02 18:59:17 +01:00
Dmitry Kozlyuk
c5042f93a4 net/mlx5: discover max flow priority using DevX
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-02 18:59:17 +01:00
Lior Margalit
a451287102 net/mlx5: fix RSS expansion with EtherType
The RSS expansion algorithm is using a graph to find the possible
expansion paths. A graph node with the 'explicit' flag will be skipped,
if it is not found in the flow pattern.
The current implementation misses a check for the explicit flag when
expanding the pattern according to ETH item with EtherType.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / eth
type is 2048 / end actions rss level 2 types udp end / end
The "eth type is 2048" item in the pattern may be expanded to "ETH IPv4".
The ETH node in the expansion graph is followed by VLAN node marked as
explicit. The fix is to skip the VLAN node and continue the expansion
with its next nodes, IPv4 and IPv6.
The expansion paths for the above example will be:
ETH IPV6 UDP VXLAN ETH END
ETH IPV6 UDP VXLAN ETH IPV4 UDP END

Fixes: 69d268b4ff ("net/mlx5: fix RSS expansion for explicit graph node")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-01 14:53:37 +01:00
Jiawei Wang
7797b0fe64 net/mlx5: fix meter action pool protection
The ASO meter action with flows creation could be supported on
multiple threads. The meter pools were created to manage the meter
object resources, if there is no room in the current meter pool then
resize the meter pool to the new pool size and free the old one.

There's a race condition while one thread resizes the meter pool and
the old pool resource be freed, and another thread query the meter
object by index on the old pool, the return value is invalid.

This patch adds a read-write lock to protect the pool resource while
resizing and query.

Fixes: a5835d530f ("net/mlx5: optimize Rx queue match")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-01 14:53:36 +01:00
Jiawei Wang
7cf2d15a39 net/mlx5: fix age action pool protection
The age action with flows creation could be supported on the multiple
threads. The age pools were created to manage the age resources, if
there is no room in the current pool then resize the age pool to the new
pool size and free the old one.

There's a race condition while one thread resizes the age pool and the
old pool resource be freed, and another thread query the age action
value of the old pool so the queried value is invalid.

This patch uses the read-write lock to protect the pool resource while
resizing and query.

Fixes: a5835d530f ("net/mlx5: optimize Rx queue match")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-01 14:53:35 +01:00
David Marchand
7f49dafe05 net/mlx5: do not close stdin on error
If for any reason, a socket could not be opened, mlx5_pmd_socket_init()
could close the 0 fd (which is valid, and has a fair chance to be stdin),
since server_socket == 0 from the variable being in .bss.

Fixes: e6cdc54cc0 ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2021-11-01 08:51:48 +01:00
Alexander Kozyrev
a5a0a43bc6 net/mlx5: allow meta modifications in legacy mode
The MODIFY_FIELD RTE action rejects copy to/from metadata
in case of the legacy mode extensive flow metadata support.
It is not consistent with SET_META action that has no such
restriction imposed. Registers A or B are used for META in
legacy mode. Allow meta modifications in legacy mode as well.

On other hand, SET_META rejects actions in case register C
is not available even though it is not needed in legacy mode.
Skip this check for legacy mode and allow setting META.

Fixes: edf325d421 ("net/mlx5: check extended metadata for meta modification")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-31 13:31:13 +01:00
Alexander Kozyrev
f4f8f5aee3 net/mlx5: fix Tx meta width for modify field flow rule
Register C is used for the metadata within NIC Rx domain.
And its width can vary from 0 to 32 bits depending on
its kernel usage. But it is not the case within NIC Tx domain,
register A is always 32 bits there. Fix metadata width detection
for the modify_field flow API within NIC Tx domain.

Fixes: 6d5735c1cb ("net/mlx5: fix meta register conversion for extensive mode")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-31 13:31:12 +01:00
Maxime Coquelin
5aeb7fab59 net/mlx5: fix RSS RETA update
This patch fixes RETA updating for entries above 64.
Without that, these entries are never updated as
calculated mask value will always be 0.

Fixes: 634efbc2c8 ("mlx5: support RETA query and update")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-29 11:23:10 +02:00
Gregory Etelson
23b0a8b298 net/mlx5: fix integrity item validation and translation
Integrity item validation and translation must verify that integrity
item bits match L3 and L4 items in flow rule pattern.
For cases when integrity item was positioned before L3 header, such
verification must be split into two stages.
The first stage detects integrity flow item and makes initializations
for the second stage.
The second stage is activated after PMD completes processing of all
flow items in rule pattern. PMD accumulates information about flow
items in flow pattern. When all pattern flow items were processed,
PMD can apply that data to complete integrity item validation
and translation.

Fixes: 79f8952783 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-28 10:14:39 +02:00
Gregory Etelson
06741117ec net/mlx5: fix integrity match on inner and outer headers
MLX5 PMD can match on integrity bits for inner and outer headers in
a single flow.
That means a single flow rule can reference both inner and outer
integrity bits. That is implemented by adding 2 flow integrity items
to a rule - one item for outer integrity bits and other for
inner integrity bits.
Integrity item `level` parameter specifies what part is being
targeted.

Current PMD treated integrity items for outer and inner headers as
the same.
The patch separates PMD verifications for inner and outer integrity
items.

Fixes: 79f8952783 ("net/mlx5: support integrity flow item")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-28 10:14:38 +02:00
Haifei Luo
a7ac7fae49 net/mlx5: enhance flow dump
Multiple rules could use the same encap_decap/modify_hdr/counter action.
The flow dump data could be duplicated.

To avoid redundancy, flow dump value is based on the actions' pointer
instead of previous rules' pointer.

For counter, the data is stored in cmng of priv->sh.
For encap_decap/modify_hdr, the data stored in encaps_decaps/modify_cmds.
Traverse the fields and get action's pointer and information.

Formats are same for information in the dump except "id" stands for
actions' pointer:
    Counter:     rec_type,id,hits,bytes
    Modify_hdr:  rec_type,id,actions_number,actions
    Encap_decap: rec_type,id,buf

Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-28 10:14:21 +02:00
Jiawei Wang
3c4338a421 net/mlx5: optimize device spawn time with representors
During the device spawn process, mlx5 PMD queried the available flow
priorities by calling mlx5_flow_discover_priorities, queried
if the DR drop action was supported on the root table by calling
the mlx5_flow_discover_dr_action_support routine, and queried the
availability of metadata register C by calling mlx5_flow_discover_mreg_c

These functions created the test flows to get the supported fields, and
at the end destroyed the test flows. The test flows in the first two
functions was created on the root table.
If the device was spawned with multiple representors, these test flows
were created and destroyed on each representor as well. The above
operations took a significant amount of init time during the device
spawn.

This patch optimizes the device discover functions, if there is
the device with multiple representors (VF/SF) being spawned,
the priority and drop action and metadata register support check can be
done only ones and check results can be shared for all representors.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-27 14:04:39 +02:00
Rongwei Liu
7299ab6822 net/mlx5: support socket direct mode bonding
In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-26 13:24:20 +02:00
Harman Kalra
d61138d4f0 drivers: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Olivier Matz
daa02b5cdd mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-24 13:37:43 +02:00
Olivier Matz
5b63493241 mbuf: mark old VLAN offload flags as deprecated
The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are
marked as deprecated since commit 380a7aab1a ("mbuf: rename deprecated
VLAN flags") (2017). But they were not using the RTE_DEPRECATED
macro, because it did not exist at this time. Add it, and replace
usage of these flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-24 13:30:40 +02:00
Ferruh Yigit
295968d174 ethdev: add namespace
Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-22 18:15:38 +02:00
Rongwei Liu
a89f6433aa net/mlx5: set Tx queue affinity in round-robin
Previously, we set txq affinity to 0 and let firmware
to perform round-robin when bonding. Firmware uses a
global counter to assign txq affinity to different
physical ports accord to remainder after division.

There are three dis-advantages:
1. The global counter is shared between kernel and dpdk.
2. After restarting pmd or port, the previous counter value
is reused, so the new affinity is unpredictable.
3. There is no way to get what affinity is set by firmware.

In this update, we will create several TISs up to the
number of bonding ports and bind each TIS to one PF port.

For each port, it will start to pick up TIS using its port
index. Upper layer application can quickly calculate each txq's
affinity without querying.

At DPDK layer, when creating txq with 2 bonding ports, the
affinity is set like:
port 0: 1-->2-->1-->2
port 1: 2-->1-->2-->1
port 2: 1-->2-->1-->2

Note: Only applicable to DevX api.
This affinity subjects to HW hash.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 12:37:00 +02:00
Dmitry Kozlyuk
ea823b2c51 net/mlx5: close tools socket with last device
MLX5 PMD exposes a socket for external tools to dump port state.
Socket events are listened using an interrupt source of EXT type.
The socket was closed and the interrupt callback was unregistered
at program exit, which is incorrect because DPDK could be already
shut down at this point. Move actions performed at program exit
to the moment the last MLX5 port is closed. The socket will be opened
again if later a new MLX5 device is plugged in and probed.
Also fix comments that were decisively talking
about secondary processes instead of external tools.

Fixes: e6cdc54cc0 ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org

Reported-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-21 10:31:53 +02:00
Dmitry Kozlyuk
9ec1ceab76 net/mlx5: fix Rx queue resource cleanup
mlx5_rxq_start() allocates rxq_ctrl->obj and frees it on failure,
but did not set it to NULL. Later mlx5_rxq_release() could not recognize
this object is already freed and attempted to release its resources,
resulting in a crash:

    Configuring Port 0 (socket 0)
    mlx5_common: Failed to create RQ using DevX
    mlx5_common: Can't create DevX RQ object.
    mlx5_net: Port 0 Rx queue 0 RQ creation failure.
    Segmentation fault

Set rxq_ctrl->obj to NULL after it is freed to skip resource release.

Fixes: 1260a87b28 ("net/mlx5: share Rx control code")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 09:31:17 +02:00
Bing Zhao
273b09376c net/mlx5: fix meter yellow policy with RSS action
The RSS configuration in a policy action container was a pointer
inside a union, and the pointer area could be used as other fate
action. In the current implementation, the RSS of the green color
was prior to that of the yellow color. There was a high possibility
the pointer was considered as the RSS and result in a error flow
expansion when only the yellow color had the RSS action.

The check of the fate action type should also be done to get rid of
the misjudgment.

Fixes: b38a12272b ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 09:31:15 +02:00
Xueming Li
614966c2fa net/mlx5: check DevX to support more Verbs ports
Verbs API doesn't support device port number larger than 255 by design.

To support more VF or SubFunction port representors, forces DevX API
check when max Verbs device link ports larger than 255.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:14 +02:00
Xueming Li
686d05b60d net/mlx5: enable DevX Tx queue creation
Verbs API does not support Infiniband device port number larger 255 by
design. To support more representors on a single Infiniband device DevX
API should be engaged.

While creating Send Queue (SQ) object with Verbs API, the PMD assigned
IB device port attribute and kernel created the default miss flows in
FDB domain, to redirect egress traffic from the queue being created to
representor appropriate peer (wire, HPF, VF or SF).

With DevX API there is no IB-device port attribute (it is merely kernel
one, DevX operates in PRM terms) and PMD must create default miss flows
in FDB explicitly. PMD did not provide this and using DevX API for
E-Switch configurations was disabled.

The default miss FDB flow matches E-Switch manager vport (to make sure
the source is some representor) and SQn (Send Queue number - device
internal queue index). The root flow table managed by kernel/firmware
and it does not support vport redirect action, we have to split the
default miss flow into two ones:

- flow with lowest priority in the root table that matches E-Switch
manager vport ID and jump to group 1.
- flow in group 1 that matches E-Switch manager vport ID and SQn and
forwards packet to peer vport

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:13 +02:00
Xueming Li
ebe9afedc7 net/mlx5: fix internal root table flow priority
When creating internal transfer flow on root table with lowest
priority, the flow was created with max UINT32_MAX priority. It is wrong
since the flow is created in kernel and  max priority supported is 16.

This patch fixes this by adding internal flow check.

Fixes: 5f8ae44dd4 ("net/mlx5: enlarge maximal flow priority")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:12 +02:00
Xueming Li
d9020f2577 net/mlx5: support flow item of normal Tx queue
Extends txq flow pattern to support both hairpin and regular txq.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:11 +02:00
Xueming Li
a564038699 net/mlx5: support E-Switch manager egress traffic match
For egress packet on representor, the vport ID in transport domain
is E-Switch manager vport ID since representor shares resources of
E-Switch manager. E-Switch manager vport ID and Tx queue internal device
index are used to match representor egress packet.

This patch adds flow item port ID match on E-Switch manager.

E-Switch manager vport ID is 0xfffe on BlueField, 0 otherwise.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:10 +02:00
Xueming Li
1d47e9335e net/mlx5: improve Verbs flow priority discovery
To detect number flow Verbs flow priorities, PMD try to create Verbs
flows in different priority. While Verbs is not designed to support
ports larger than 255.

When DevX supported by kernel driver, 16 Verbs priorities must be
supported, no need to create Verbs flows.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:09 +02:00
Xueming Li
3fd2961efa net/mlx5: use Netlink when IB port greater than 255
IB spec doesn't allow 255 ports on a single HCA, port number of 256 was
cast to u8 value 0 which invalid to ibv_query_port()

This patch invokes Netlink API to query port state when port number
greater than 255.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:08 +02:00
Michael Baum
fc59a1ec55 common/mlx5: share MR mempool registration
Expand the use of mempool registration to MR management for other
drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:58:00 +02:00
Michael Baum
a5d06c9006 common/mlx5: support device DMA map and unmap
Since MR management has moved to the common area, there is no longer a
need for the DMA map and unmap function for each driver.
This patch share those functions. For most drivers it supports these
operations for the first time.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:58:00 +02:00
Michael Baum
9f1d636f3e common/mlx5: share MR management
Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:57:58 +02:00
Michael Baum
5fbc75ace1 common/mlx5: add global MR cache create function
Add function for global shared MR cache structure initialization.
This function include:
 - btree initialization.
 - set callbacks for reg and dereg MR.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:57:24 +02:00
Michael Baum
85c7005e84 common/mlx5: add MR control initialization
Add function for MR control structure initialization.
This function include:
 - btree initialization.
 - dev_gen_ptr initialization.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:54:32 +02:00
Michael Baum
05fa53d6a0 net/mlx5: remove redundancy in MR file
This patch remove two redundant things from MR file:

1. mr_find_contig_memsegs_data structure which is moved to common file
   before.
2. External memory mechanism - mlx5_tx_update_ext_mp function.
   Since commit [1] which added support for DMA map and unmap, external
   mem must be configured by the user using rte_mem_map function and no
   need to handle this in pmd.

[1]
commit 989e999d93
("net/mlx5: support PCI device DMA map and unmap")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:46 +02:00
Michael Baum
fe46b20c96 common/mlx5: share HCA capabilities handle
Add HCA attributes structure as a field of device config structure.
It query in common probing, and updates the timestamp format fields.

Each driver use HCA attributes from common device config structure,
instead of query it for itself.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:46 +02:00
Michael Baum
e35ccf243b common/mlx5: share protection domain object
Create shared Protection Domain in common area and add it and its PDN as
fields of common device structure.

Use this Protection Domain in all drivers and remove the PD and PDN
fields from their private structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:46 +02:00
Michael Baum
ca1418ce39 common/mlx5: share device context object
Create shared context device in common area and add it as a field of
common device.
Use this context device in all drivers and remove the ctx field from
their private structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:44 +02:00
Michael Baum
5bc38358b5 net/mlx5: remove redundant flag in device config
Device configure structure has flag named devx as same as SH structure
with the same meaning.

Remove the flag from the configuration structure and move all the
usages to the SH flag.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:36 +02:00
Michael Baum
887183effa common/mlx5: move basic probing functions to common
Move open IBV/DevX device function to common.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:32 +02:00
Michael Baum
5021ce2085 net/mlx5: rearrange probing functions for Windows
Rearrange device detection code.
Rearrange configuration structures filling.
Remove unneeded variables.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:39:24 +02:00
Michael Baum
8520992403 common/mlx5: share memory related devargs
Add device configure structure and function to parse user device
arguments into it.
Move parsing and management of relevant device arguments to common.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:39:04 +02:00
Michael Baum
a77bedf255 common/mlx5: share common definitions
Create MACRO definitions file in the common driver as preparation for MR
and basic probe sharing.
Move relevant definitions from the net driver to the above file.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:39:01 +02:00
Michael Baum
7af08c8f1a common/mlx5: share basic probing with internal drivers
Create common probing structure that includes, for now, basic probing
information detected by the common driver and share it with all the
internal drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:38:46 +02:00
Michael Baum
620be7f27b net/mlx5: register memory event callback in Windows
In device initialization, the driver registers to free hugepages events.
When hugepage is released, this callback frees all its related MRs.

In Windows initialization, this callback is not registered what may
cause to use invalid memory.

This patch adds memory event callback registration in Windows
initialization.

Fixes: 980826dc6f ("net/mlx5: probe on Windows")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:37:23 +02:00
Lior Margalit
0c3fa68396 net/mlx5: fix RSS expansion for L2/L3 VXLAN
The RSS expansion algorithm is using a graph to find the possible
expansion paths. The current implementation does not differentiate
between standard (L2) VXLAN and L3 VXLAN. As result the flow is expanded
with all possible paths.
For example:
testpmd> flow create... / vxlan / end actions rss level 2 / end
It is currently expanded to the following paths:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN ETH IPV4 END
ETH IPV4 UDP VXLAN ETH IPV6 END
ETH IPV4 UDP VXLAN IPV4 END
ETH IPV4 UDP VXLAN IPV6 END

The fix is to adjust the expansion according to the outer UDP destination
port. In case flow pattern defines a match on the standard udp port, 4789,
or does not define a match on the destination port, which also implies
setting the standard one, the expansion for the above example will be:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN ETH IPV4 END
ETH IPV4 UDP VXLAN ETH IPV6 END
Otherwise, the expansion will be:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN IPV4 END
ETH IPV4 UDP VXLAN IPV6 END

Fixes: f4f06e3615 ("net/mlx5: add flow VXLAN item")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-18 09:12:42 +02:00
Eli Britstein
292be511d2 net/mlx5: support more tunnel types
Accept RTE_FLOW_ITEM_TYPE_GRE, RTE_FLOW_ITEM_TYPE_NVGRE and
RTE_FLOW_ITEM_TYPE_GENEVE as valid tunnel types.

Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-19 23:51:10 +02:00
Ferruh Yigit
b563c14212 ethdev: remove jumbo offload flag
Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag.

Instead of drivers announce this capability, application can deduct the
capability by checking reported 'dev_info.max_mtu' or
'dev_info.max_rx_pktlen'.

And instead of application setting this flag explicitly to enable jumbo
frames, this can be deduced by driver by comparing requested 'mtu' to
'RTE_ETHER_MTU'.

Removing this additional configuration for simplification.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Acked-by: Michal Krawczyk <mk@semihalf.com>
2021-10-18 19:20:21 +02:00
Ferruh Yigit
1bb4a528c4 ethdev: fix max Rx packet length
There is a confusion on setting max Rx packet length, this patch aims to
clarify it.

'rte_eth_dev_configure()' API accepts max Rx packet size via
'uint32_t max_rx_pkt_len' field of the config struct 'struct
rte_eth_conf'.

Also 'rte_eth_dev_set_mtu()' API can be used to set the MTU, and result
stored into '(struct rte_eth_dev)->data->mtu'.

These two APIs are related but they work in a disconnected way, they
store the set values in different variables which makes hard to figure
out which one to use, also having two different method for a related
functionality is confusing for the users.

Other issues causing confusion is:
* maximum transmission unit (MTU) is payload of the Ethernet frame. And
  'max_rx_pkt_len' is the size of the Ethernet frame. Difference is
  Ethernet frame overhead, and this overhead may be different from
  device to device based on what device supports, like VLAN and QinQ.
* 'max_rx_pkt_len' is only valid when application requested jumbo frame,
  which adds additional confusion and some APIs and PMDs already
  discards this documented behavior.
* For the jumbo frame enabled case, 'max_rx_pkt_len' is an mandatory
  field, this adds configuration complexity for application.

As solution, both APIs gets MTU as parameter, and both saves the result
in same variable '(struct rte_eth_dev)->data->mtu'. For this
'max_rx_pkt_len' updated as 'mtu', and it is always valid independent
from jumbo frame.

For 'rte_eth_dev_configure()', 'dev->data->dev_conf.rxmode.mtu' is user
request and it should be used only within configure function and result
should be stored to '(struct rte_eth_dev)->data->mtu'. After that point
both application and PMD uses MTU from this variable.

When application doesn't provide an MTU during 'rte_eth_dev_configure()'
default 'RTE_ETHER_MTU' value is used.

Additional clarification done on scattered Rx configuration, in
relation to MTU and Rx buffer size.
MTU is used to configure the device for physical Rx/Tx size limitation,
Rx buffer is where to store Rx packets, many PMDs use mbuf data buffer
size as Rx buffer size.
PMDs compare MTU against Rx buffer size to decide enabling scattered Rx
or not. If scattered Rx is not supported by device, MTU bigger than Rx
buffer size should fail.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2021-10-18 19:20:20 +02:00
Li Zhang
771253ea8f net/mlx5: fix domains selection for meter policy
Fate actions are different per domain.
When all the domains, ingress, egress and FDB (transfer),
can support all the policy actions, i.e. [SET_TAG],
the policy prepares resources for all the domains and
failure happens if one of the domains misses its fate action
in the policy action list.

Remove the domains missing their fate action
from the meter policy preparation.

Now, the policy will prepare a domain only when the domain supports
all the actions and when one of the domain fate actions is on the list.

Fixes: afb4aa4f12 ("net/mlx5: support meter policy operations")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-14 10:48:33 +02:00
Viacheslav Ovsiienko
40c8fb1fd3 net/mlx5: update modify field action
Update immediate value/pointer source operand support
for modify field RTE Flow action:

  - source operand data can be presented by byte buffer
    (instead of former uint64_t) or by pointer
  - no host byte ordering is assumed anymore for immediate
    data buffer (not uint64_t anymore)
  - no immediate value offset is expected (the source
    subfield is located at the same offset as in destination)

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-14 14:34:31 +02:00
Andrew Rybchenko
d35dd287a2 net/mlx5: support represented port flow action
Semantics of the existing support for action PORT_ID suggests
that support for equal action REPRESENTED_PORT be implemented.

Helper functions keep port_id suffix since action
MLX5_FLOW_ACTION_PORT_ID is still used internally.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-13 22:59:26 +02:00
Konstantin Ananyev
8d7d4fcdca ethdev: change input parameters for Rx queue count
Currently majority of fast-path ethdev ops take pointers to internal
queue data structures as an input parameter.
While eth_rx_queue_count() takes a pointer to rte_eth_dev and queue
index.
For future work to hide rte_eth_devices[] and friends it would be
plausible to unify parameters list of all fast-path ethdev ops.
This patch changes eth_rx_queue_count() to accept pointer to internal
queue data as input parameter.
While this change is transparent to user, it still counts as an ABI change,
as eth_rx_queue_count_t is used by ethdev public inline function
rte_eth_rx_queue_count().

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-13 22:14:58 +02:00
Tal Shnaiderman
c8834a3663 net/mlx5: support keeping CRC on Windows
Support of the keep-CRC offloading by checking
the relevant FW capability (scatter_fcs) for NIC support.

Supported offload:

DEV_RX_OFFLOAD_KEEP_CRC

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:39 +02:00
Tal Shnaiderman
6061cc4148 net/mlx5: support VLAN stripping offload on Windows
Support of the VLAN stripping offloading by checking
the relevant FW capability (vlan_cap) for NIC support.

Supported offload:

DEV_RX_OFFLOAD_VLAN_STRIP

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:38 +02:00
Tal Shnaiderman
738da9a867 net/mlx5: support TSO offload on Windows
Support of the TSO offloading by checking
the relevant FW capability for NIC support.

Supported offloads:

DEV_TX_OFFLOAD_TCP_TSO
DEV_TX_OFFLOAD_VXLAN_TNL_TSO
DEV_TX_OFFLOAD_GRE_TNL_TSO
DEV_TX_OFFLOAD_GENEVE_TNL_TSO

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:37 +02:00
Tal Shnaiderman
6a86ee2e6d net/mlx5: query tunneling support on Windows
Query tunneling supported on the NIC.

Save the offloads values in a config parameter.
This is needed for the following TSO support:

DEV_TX_OFFLOAD_VXLAN_TNL_TSO
DEV_TX_OFFLOAD_GRE_TNL_TSO
DEV_TX_OFFLOAD_GENEVE_TNL_TSO

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:36 +02:00
Tal Shnaiderman
c1a320bf89 net/mlx5: fix tunneling support query
Currently, the PMD decides if the tunneling offload
can enable VXLAN/GRE/GENEVE tunneled TSO support by checking
config->tunnel_en (single bit) and config->tso.

This is incorrect, the right way is to check the following
flags returned by the mlx5dv_query_device function:

MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_VXLAN - if supported the offload
DEV_TX_OFFLOAD_VXLAN_TNL_TSO can be enabled.
MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_GRE - if supported the offload
DEV_TX_OFFLOAD_GRE_TNL_TSO can be enabled.
MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_GENEVE - if supported the offload
DEV_TX_OFFLOAD_GENEVE_TNL_TSO can be enabled.

The fix enables the offloads according to the correct
flags returned by the kernel.

Fixes: dbccb4cddc ("net/mlx5: convert to new Tx offloads API")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:34 +02:00
Tal Shnaiderman
d47fe9dabc net/mlx5: query software parsing support on Windows
Query software parsing supported on the NIC.

Save the offloads values in a config parameter.
This is needed for the outer IPv4 checksum and
IP and UDP tunneled packet TSO support.

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:34 +02:00
Tal Shnaiderman
accf3cfce4 net/mlx5: fix software parsing support query
Currently, the PMD decides if the software parsing
offload can enable outer IPv4 checksum and tunneled
TSO support by checking config->hw_csum and config->tso
respectively.

This is incorrect, the right way is to check the following
flags returned by the mlx5dv_query_device function:

MLX5DV_SW_PARSING - check general swp support.
MLX5DV_SW_PARSING_CSUM - check swp checksum support.
MLX5DV_SW_PARSING_LSO - check swp LSO/TSO support.

The fix enables the offloads according to the correct
flags returned by the kernel.

Fixes: e46821e9fc ("net/mlx5: separate generic tunnel TSO from the standard one")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:25 +02:00
Andrew Rybchenko
92ef4b8f16 ethdev: remove deprecated shared counter attribute
Indirect actions should be used to do shared counters.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-12 19:20:57 +02:00
Viacheslav Galaktionov
ff4e52efb3 ethdev: fix representor port ID search by name
The patch is required for all PMDs which do not provide representors
info on the representor itself.

The function, rte_eth_representor_id_get(), is used in
eth_representor_cmp() which is required in ethdev class iterator to
search ethdev port ID by name (representor case). Before the patch
the function is called on the representor itself and tries to get
representors info to match.

Search of port ID by name is used after hotplug to find out port ID
of the just plugged device.

Getting a list of representors from a representor does not make sense.
Instead, a backer device should be used.

To this end, extend the rte_eth_dev_data structure to include the port ID
of the backing device for representors.

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-12 16:54:20 +02:00
Andrew Rybchenko
c47d7b90a1 mempool: add namespace to flags
Fix the mempool flags namespace by adding an RTE_ prefix to the name.
The old flags remain usable, to be deprecated in the future.

Flag MEMPOOL_F_NON_IO added in the release is just renamed to have RTE_
prefix.

Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-20 10:00:16 +02:00
Dmitry Kozlyuk
fec28ca0e3 net/mlx5: support mempool registration
When the first port in a given protection domain (PD) starts,
install a mempool event callback for this PD and register all existing
memory regions (MR) for it. When the last port in a PD closes,
remove the callback and unregister all mempools for this PD.
This behavior can be switched off with a new devarg: mr_mempool_reg_en.

On TX slow path, i.e. when an MR key for the address of the buffer
to send is not in the local cache, first try to retrieve it from
the database of registered mempools. Supported are direct and indirect
mbufs, as well as externally-attached ones from MLX5 MPRQ feature.
Lookup in the database of non-mempool memory is used as the last resort.

RX mempools are registered regardless of the devarg value.
On RX data path only the local cache and the mempool database is used.
If implicit mempool registration is disabled, these mempools
are unregistered at port stop, releasing the MRs.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-19 16:35:16 +02:00
Xueming Li
7483341ae5 ethdev: change queue release callback
Currently, most ethdev callback API use queue ID as parameter, but Rx
and Tx queue release callback use queue object which is used by Rx and
Tx burst data plane callback.

To align with other eth device queue configuration callbacks:
- queue release callbacks are changed to use queue ID
- all drivers are adapted

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-06 19:16:03 +02:00
Dmitry Kozlyuk
04d43857ea net: rename Ethernet header fields
Definition of `rte_ether_addr` structure used a workaround allowing DPDK
and Windows SDK headers to be used in the same file, because Windows SDK
defines `s_addr` as a macro. Rename `s_addr` to `src_addr` and `d_addr`
to `dst_addr` to avoid the conflict and remove the workaround.
Deprecation notice:
https://mails.dpdk.org/archives/dev/2021-July/215270.html

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-10-08 14:58:11 +02:00
Bing Zhao
6728fe9305 net/mlx5: fix Tx metadata endianness in data path
The metadata can be set in the mbuf dynamic field and then used in
flow rules steering for egress direction. The hardware requires
network order for both the insertion of a rule and sending a packet.
Indeed, there is no strict restriction for the endianness. The order
for sending a packet and its steering rule should be consistent.

In the past, there was no endianness conversion due to the
performance reason. The flow rule converted the metadata into little
endian for hardware (if needed) and the packet hit the flow rule also
with little endian.

After the metadata was converted to big endian, the missing adaption
in the data path resulted in a flow miss of the egress packets.

Converting the metadata to big endian before posting a WQE to the
hardware solves this issue.

Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-09-29 23:29:38 +02:00
Bing Zhao
a6b57ff487 net/mlx5: fix flow tables double release
In the function mlx5_alloc_shared_dr(), there are various reasons
to result in a failure and error clean up process. While in the
caller of mlx5_dev_spawn(), once there is a error occurring after
the mlx5_alloc_shared_dr(), the mlx5_os_free_shared_dr() is called
to release all the resources.

To prevent a double release, the pointers of the resources should
be checked before the releasing and set to NULL after done.

In the mlx5_free_table_hash_list(), after the releasing, the pointer
was missed to set to NULL and a double release may cause a crash.

By setting the tables pointer to NULL as done for other resources,
the double release and crash could be solved.

Fixes: 54534725d2 ("net/mlx5: fix flow table hash list conversion")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-29 21:56:43 +02:00
Xueming Li
6428e0327c net/mlx5: support new global device syntax
This patch support new global device syntax like:
	bus=pci,addr=BB:DD.F/class=eth/driver=mlx5,devargs,..

In driver parameters check, ignores "driver" key which is part of new
global device syntax instead of reporting error.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-09-29 18:55:54 +02:00
William Tu
f1f6ebc0ea eal: remove sys/queue.h from public headers
Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the Windows build, DPDK uses a
bundled copy, so building a DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such
file.

One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
in Windows EAL. Note that these RTE_ macros are compatible with
<sys/queue.h>, both at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-01 13:09:43 +02:00
Raslan Darawsheh
16b8e92d49 ethdev: use extension header for GTP PSC item
This updates the gtp_psc flow item to use the net header
definition of the gtp_psc to be based on RFC 38415-g30

Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-09-28 12:34:58 +02:00
Dmitry Kozlyuk
f2f5879efb net/mlx5: fix shared RSS destruction
Shared RSS resources were released before checking that the shared RSS
has no more references. If it had, the destruction was aborted, leaving
the shared RSS in an invalid state where it could no longer be used.
Move reference counter check before resource release.

Fixes: d2046c09aa ("net/mlx5: support shared action for RSS")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-09-21 10:06:11 +02:00
Dmitry Kozlyuk
b09c65fa4f net/mlx5: fix flow indirect action reference counting
When an indirect action is used in a flow rule with a pattern that
causes RSS expansion, each device flow generated by the expansion
incremented the reference counter of the action. When such a flow was
destroyed, its action reference counter had been decremented only once.
The action remained marked as being used and could not be destroyed.
COUNT, AGE, and CONNTRACK indirect actions have been affected
(for AGE the error was not immediately observable).
Increment action counter only once for the original flow rule.

Fixes: 81073e1f8c ("net/mlx5: support shared age action")
Fixes: 2d084f69aa ("net/mlx5: add translation of connection tracking action")
Fixes: f3191849f2 ("net/mlx5: support flow count action handle")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-09-21 09:57:02 +02:00
Dmitry Kozlyuk
4ec1f971cd net/mlx5: report error on indirect CT action destroy
When an indirect CT action of mlx5 PMD could not be destroyed,
rte_action_handle_destroy() was returning (-1), but the error
structure was not filled. This lead to a segfault in testpmd
on an attempt to print it. Fill the details for each possible
cause of this error.

Fixes: c5a49265fc ("net/mlx5: add ASO connection tracking destroy")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-09-21 09:56:12 +02:00
Michael Baum
97c9b0aa25 net/mlx5: fix duplicate pattern option default
In order to allow/disallow configuring rules with identical patterns,
the new device argument 'allow_duplicate_pattern' was introduced.

The default is to allow, and it is initialized to 1 in PCI probe
function.
However, on auxiliary bus probing (for Sub-Function) it is not
initialized at all, so it's actually initialized to 0.

Move the initialization to default config function which is called from
both.

Fixes: 919488fbfa ("net/mlx5: support Sub-Function")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-20 23:13:40 +02:00
Michael Baum
6856efa54e net/mlx5: fix PF leak on PCI probing failure
During PCI probe, the internal probe function is called per PF.

If one of them fails, it was missing a proper destroy for the previously
probed PFs.

This fixes the behavior by destroying all previously probed PFs.

Fixes: 08c2772fc7 ("net/mlx5: support list of representor PF")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-20 23:12:10 +02:00
Michael Baum
c76db6a496 net/mlx5: fix memory leak on context allocation failure
In shared device context creation, there is a missing validation when
one of the btree memory allocation fails that will cause a memory leak.

This adds a proper check to clean resources in case of failure.

Fixes: 632f0f1905 ("net/mlx5: manage shared counters in three-level table")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-20 16:33:40 +02:00
Lior Margalit
aa52e5f0f9 net/mlx5: fix RSS expansion traversal over next nodes
The RSS expansion is based on DFS algorithm to traverse over the possible
expansion paths.

The current implementation breaks out, if it reaches the terminator of
the "next nodes" array, instead of going backwards to try the next path.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / end
actions rss level 2 types tcp end / end
The paths found are:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END
The traversal stopped after getting to the terminator of the next nodes
of the ETH node. It missed the rest of the nodes in the next nodes array
of the VXLAN node.

The fix is to go backwards when reaching the terminator of the current
level and find if there is a "next node" to start traversing a new path.
Using the above example, the flows will be:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END
ETH IPV6 UDP VXLAN IPV4 TCP END
ETH IPV6 UDP VXLAN IPV6 TCP END
The traversal will find additional paths, because it traverses through
all the next nodes array of the VXLAN node.

Fixes: 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-13 21:56:10 +02:00
Lior Margalit
69d268b4ff net/mlx5: fix RSS expansion for explicit graph node
The RSS expansion algorithm is using a graph to find the possible
expansion paths. A graph node with the 'explicit' flag will be skipped,
if it is not found in the flow pattern.

The current implementation misses the case where the node with the
explicit flag is in the middle of the expanded path.
For example:
testpmd> flow create 0 ingress pattern eth / ipv6 / udp / vxlan / end
actions rss level 2 types tcp end / end
The VLAN node has the explicit flag, so it is currently included in the
expanded flow:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH VLAN IPV4 TCP END
ETH IPV6 UDP VXLAN ETH VLAN IPV6 TCP END

The fix is to skip the nodes with the explicit flag while iterating over
the possible expansion paths. Using the above example, the flows will be:
ETH IPV6 UDP VXLAN END
ETH IPV6 UDP VXLAN ETH IPV4 TCP END
ETH IPV6 UDP VXLAN ETH IPV6 TCP END

Fixes: 3f02c7ff68 ("net/mlx5: fix RSS expansion for inner tunnel VLAN")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-13 21:56:09 +02:00
Aman Deep Singh
a7db3afce7 net: add macro to extract MAC address bytes
Added macros to simplify print of MAC address.
The six bytes of a MAC address are extracted in
a macro here, to improve code readablity.

Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-09-07 19:08:05 +02:00
Aman Deep Singh
c2c4f87b12 net: add macro for MAC address print
Added macro to print six bytes of MAC address.
The MAC addresses will be printed in upper case
hexadecimal format.
In case there is a specific check for lower case
MAC address, the user may need to make a change in
such test case after this patch.

Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-09-07 19:07:46 +02:00
Shiri Kuzin
3b48087a8a net/mlx5: update GENEVE TLV option matching
The GENEVE TLV option matching is done using a flex parser.

Recent update in firmware, requires that in order to match on the
GENEVE TLV option the "geneve_tlv_option_0_exist" bit should be set.

Add the new "geneve_tlv_option_0_exist" setting when translating the
GENEVE TLV option item.

Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-22 10:09:11 +02:00
Dmitry Kozlyuk
12d42b248c net/mlx5: fix eCPRI matching
When an ETH or VLAN flow item directly preceding ECPRI (i. e. a pattern
for eCPRI over Ethernet) did not specify the eCPRI protocol, matches
were not restricted to eCPRI traffic. For example, "eth / ecpri / end"
pattern behaved as "eth / end". Implicitly add Ethernet type condition,
so that "eth / ecpri / end" behaves as "eth type is 0xAEFE / end".

Fixes: daa38a8924 ("net/mlx5: add flow translation of eCPRI header")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-19 10:13:41 +02:00
Alexander Kozyrev
828274b70a net/mlx5: fix mbuf replenishment check for zipped CQE
A core dump is being generated with the following call stack:
0 _mm256_storeu_si256 (__A=..., __P=0x80)
1 rte_mov32 (src=0x2299c9140 "", dst=0x80)
2 rte_memcpy_aligned (n=60, src=0x2299c9140, dst=0x80)
3 rte_memcpy (n=60, src=0x2299c9140, dst=0x80)
4 mprq_buf_to_pkt (strd_cnt=1, strd_idx=0, buf=0x2299c8a00, len=60,
pkt=0x18345f0c0, rxq=0x18345ef40)
5 rxq_copy_mprq_mbuf_v (rxq=0x18345ef40, pkts=0x7f76e0ff6d18, pkts_n=5)
6 rxq_burst_mprq_v (rxq=0x18345ef40, pkts=0x7f76e0ff6d18, pkts_n=46,
err=0x7f76e0ff6a28, no_cq=0x7f76e0ff6a27)
7 mlx5_rx_burst_mprq_vec (dpdk_rxq=0x18345ef40, pkts=0x7f76e0ff6a88,
pkts_n=128)
8 rte_eth_rx_burst (nb_pkts=128, rx_pkts=0x7f76e0ff6a88,
queue_id=<optimized out>, port_id=<optimized out>)

This crash is caused by an attempt to copy previously uncompressed CQEs
into non-allocated mbufs. There is a check to make sure we only use
allocated mbufs in the rxq_burst_mprq_v() function, but it is done only
before the main processing loop. Leftovers of compressed CQEs session are
handled before that loop and may lead to the mbufs overflow as seen.

Move the check for replenished mbufs up to protect uncompressed CQEs
session leftovers from accessing non-allocated mbufs after the
mlx5_rx_mprq_replenish_bulk_mbuf() function is invoked.

Bugzilla ID: 746
Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-19 10:13:40 +02:00
Lior Margalit
3f02c7ff68 net/mlx5: fix RSS expansion for inner tunnel VLAN
The RSS expansion algorithm is using a graph to find the possible
expansion paths. The VLAN item in the flow pattern requires special
treatment, because it should not be added implicitly by the expansion
algorithm.  If the flow pattern ends with ETH item, the pattern will be
expanded with IPv4 and IPv6.
For example:
testpmd> flow create ... eth / end actions rss / end
ETH END
ETH IPV4 END
ETH IPV6 END
If a VLAN item follows the ETH item in the flow pattern, the pattern
will be expanded with IPv4 and IPv6 following the VLAN item.
For example:
testpmd> flow create ... eth / vlan / end actions rss level 1 / end
ETH VLAN END
ETH VLAN IPV4 END
ETH VLAN IPV6 END

The case of inner tunnel VLAN item was not taken care of so the flow
pattern did not expand with IPv6 and IPv4 as expected.
Example with inner VLAN:
testpmd> flow create ... / vxlan / eth / vlan / end actions rss level 2
/ end
The current result of the expansion alg:
ETH IPV6 UDP VXLAN ETH VLAN END
The expected result of the expansion alg:
ETH IPV6 UDP VXLAN ETH VLAN END
ETH IPV6 UDP VXLAN ETH VLAN IPV4 END
ETH IPV6 UDP VXLAN ETH VLAN IPV6 END

The fix is to introduce a new flag to set on a graph expansion node
to apply the 'explicit' behavior, meaning the node is not added to
the expanded pattern, if it is not found in the flow pattern, but the
expansion alg can go deeper to its next nodes.

Fixes: c7870bfe09 ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-19 10:13:35 +02:00
Thomas Monjalon
fdab8f2e17 version: 21.11-rc0
Start a new release cycle with empty release notes.

The ABI version becomes 22.0.
The map files are updated to the new ABI major number (22).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-08-17 08:37:52 +02:00
Gregory Etelson
8f6c921b6a net/mlx5: fix build on Windows
mlx5_dev_check_sibling_config() API was updated to allow newly
spawned port locate existing sibling devices.
PMD port initialization for Windows OS was not updated
for the new API prototype:

drivers/net/mlx5/windows/mlx5_os.c:457:50: error:
too few arguments to function call, expected 3, have 2
	err = mlx5_dev_check_sibling_config(priv, config);

The patch fixes mlx5_dev_check_sibling_config call for Windows OS.

Fixes: e9d420dfc2 ("net/mlx5: fix find sibling devices")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-05 12:48:16 +02:00
Gregory Etelson
e9d420dfc2 net/mlx5: fix find sibling devices
The routine mlx5_eth_find_next() and related iterating macro
MLX5_ETH_FOREACH_DEV is used to iterate through sibling devices (all
representors share the same configuration and switching domain) on top
of specified root device.

The root device parameter was specified as NULL, and it caused
missing siblings in iteration during representor device probing,
causing:

1. allocating new domain_id for the device being probed.
2. discrepancy in representor configurations and potential overall
   driver malfunctions.

Fixes: 56bb3c84e9 ("net/mlx5: reduce PCI dependency")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-04 11:27:49 +02:00
Shun Hao
1af874087c net/mlx5: fix domains detection in meter hierarchy
Meters in one hierarchy might support different domains. For
example, one meter may support ingress only, but the root meter
can support all the domains.

If the later meter in the meter hierarchy wrongly doesn't inherit
the first meter's domains, it will lead to invalid domain table
access.

Fix is when creating meter hierarchy, try to inherit the first meter
domains in the meter hierarchy.

Fixes: a3b7af90ba ("net/mlx5: validate meter action in policy")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-04 11:25:29 +02:00
Shun Hao
6bbced4ae3 net/mlx5: fix meter flow counter translation
When a flow rule uses a meter without any modify packet action,
there will be an internal drop flow with meter counter created,
matching the same 5-tuple as the original flow.

In this case, the meter flow count action is wrongly reused as the
original flow counter, leading to wrong flow statistics.

Add a check in the count action translation to detect the meter case
and use the meter drop dedicated counter in the meter 5-tuple flow
only.

Fixes: f3191849f2 ("net/mlx5: support flow count action handle")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-04 11:24:21 +02:00
Suanming Mou
45633c460c net/mlx5: workaround drop action with old kernel
Currently, there are two types of drop action implementation
in the PMD. One is the DR (Direct Rules) dummy placeholder drop
action and another is the dedicated dummy queue drop action.
When creates flow on the root table with DR drop action, the
action will be converted to MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP
Verbs attribute in rdma-core.

In some inbox systems, MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs
attribute may not be supported in the kernel driver. Create flow
with drop action on the root table will be failed as it is not
supported. In this case, the dummy queue drop action should be
used instead of DR dummy placeholder drop action.

This commit adds the DR drop action support detect on the root
table. If MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs is not
supported in the system, a dummy queue will be used as drop
action.

Fixes: da845ae9d7 ("net/mlx5: fix drop action for Direct Rules/Verbs")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-03 15:08:02 +02:00
Rongwei Liu
a1fd0c827b net/mlx5: fix VXLAN VNI matching on ConnectX-5
In the recent update, the misc5 matcher was introduced to
match VxLAN header extra fields. However, ConnectX-5
doesn't support misc5 for the UDP ports different from
VXLAN's standard one (4789).

Need to fall back to the previous approach and use legacy
misc matcher if non-standard UDP port is recognized
in VxLAN flow.

Fixes: 630a587bfb ("net/mlx5: support matching on VXLAN reserved field")
Cc: stable@dpdk.org

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-03 14:43:28 +02:00
Gregory Etelson
ce4062cb10 net/mlx5: fix port initialization of switch domain
All active ports that belong to the same E-switch share domain_id
value.
Port initialization procedure searches through a database for existing
port with matching properties. New domain_id allocated if match was
not located. Otherwise, new port inherits existing domain_id.

Port initialization did not pass enough info to search procedure to
find existing matches. Therefore, each port was created with a private
domain_id value. As the result, port_id flow action failed because it
could not match ports in a rule to E-switch.

The patch adds dpdk_dev with port properties to device search.

Fixes: 56bb3c84e9 ("net/mlx5: reduce PCI dependency")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-03 14:19:33 +02:00
Bing Zhao
f8c42c53ce net/mlx5: fix meter hierarchy validation with yellow
In mlx5 PMD, the meter hierarchy only supports the green color. It
means that a meter action can only be in the green action list. In
the meanwhile, the yellow action list should be empty now. Any
action for the yellow color policy will be considered invalid if
the green color policy is a hierarchy.

Also, the error message printing of meter hierarchy validation is
fixed by removing an incorrect checking.

Fixes: 4b7bf3ffb4 ("net/mlx5: support yellow in meter policy validation")
Fixes: a3b7af90ba ("net/mlx5: validate meter action in policy")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 22:06:43 +02:00
Bing Zhao
d5bb76d5b0 net/mlx5: fix green meter policy RSS queues
Both green policy and yellow policy could support RSS actions
simultaneous, the Rx queues configuration may be different between
them while the other fields should be the same.

When the only green color policy was supported in the past, the
queues copied and saved in the temporary workspace were used. Since
the yellow support was added, the queues stored in the thread
workspace would be overwritten by the yellow color policy. The flow
rule created using a meter with such a policy would have the same
RSS distribution for both green and yellow packets.

By using the meter action containers RSS information instead of the
workspace RSS, this overwritten can be prevented.

Fixes: b38a12272b ("net/mlx5: split meter color policy handling")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 22:06:42 +02:00
Bing Zhao
a9e3a4a9e2 net/mlx5: fix meter EIR calculation
Before the yellow color policy was supported, the only supported
profile of metering is RFC2697 and EIR is not part of the profile.
When creating a meter with this profile, the EIR part was always
zero.

After the yellow color policy supported and RFC2698 & 4115 support
was introduced, EIR is relevant and should be calculated. Usually
the EIR could not be zero and the formula for calculating CIR
mantissa & exponent could be reused.

The EIR could be 0 and then only green and red colors will be
supported from the specification. Both the mantissa and exponent
parts should be set to 0. Currently, the formula wrongly sets
non-zero values for the EIR=0 case.

Setting the mantissa and the exponent parts to zeros when EIR is 0
will solve the issue.

Fixes: 33a7493c8d ("net/mlx5: support meter for trTCM profiles")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 22:05:36 +02:00
Bing Zhao
8a0fca1101 net/mlx5: fix meter profile validation
After the support for yellow color and RFC2698 & RFC4115 were added,
the profile validation adjustment was missed. With this fix, the
validation is like below:
  1. Legacy metering only supports RFC2697 without EBS.
  2. ASO metering can support all three profiles.
  3. For backward compatibility, none EBS with RFC2697 profile is
     still supported and the checking is done in the meter
     creation stage.

In the meanwhile, some checking which was done in the parameters
calculation stage is moved in the validation in order to skip the
useless checking.

Fixes: 33a7493c8d ("net/mlx5: support meter for trTCM profiles")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 22:03:14 +02:00
Viacheslav Ovsiienko
f17e4b4ffe net/mlx5: add Tx scheduling check on queue creation
The send scheduling on timestamp offload requires the Send
Queue (SQ) shares its User Access Region (UAR) with the
pacing Clock Queue. The SQ can be created by mlx5 PMD either
with DevX or with Verbs. If the SQ is being created with
DevX, the dedicated UAR can be specified and all the SQs
share the single UAR. Once SQ is being created with Verbs
the SQ's UAR is allocated by the rdma-core library internally
on its own and there is no UAR sharing. This caused hardware
errors on WAIT WQEs and overall send scheduling malfunction.

If SQs are going to be created with Verbs and the send
scheduling offload is explicitly requested via tx_pp devarg
the device probing is rejected as device configuration
can't satisfy the requirements.

Fixes: 3ec73abeed ("net/mlx5/linux: fix Tx queue operations decision")
Fixes: 8f848f32fc ("net/mlx5: introduce send scheduling devargs")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:23 +02:00
Viacheslav Ovsiienko
dab07e489c net/mlx5: fix timestamp initialization on empty clock queue
The committing completions by clock queue might be delayed
after queue initialization is done and the only Clock Queue
completion entry (CQE) might keep the invalid status till
the CQE first update happens.

The mlx5_txpp_update_timestamp() wrongly recognized invalid
status as error and reported about lost synchronization.

The patch recognizes the invalid status as "not updated yet"
and accurate scheduling initialization routine waits till
CQE first update happens.

Some collateral typos in comment are fixed as well.

Fixes: 77522be0a5 ("net/mlx5: introduce clock queue service routine")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:23 +02:00
Asaf Penso
e3db225065 net/mlx5: fix flow engine type in function name
The concrete function names have a prefix for flow_dv.
This emphasizes the flow engine is Direct Verbs.

The function flow_get_aged_flows doesn’t have this prefix.
It creates an inconsistency with the other functions.

Update the function name to include dv.

Fixes: fa2d01c87d ("net/mlx5: support flow aging")
Cc: stable@dpdk.org

Signed-off-by: Asaf Penso <asafp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 18:01:22 +02:00
Suanming Mou
6821a57a9b net/mlx5: limit implicit MPLS RSS expansion over GRE
As [1] optimized the MPLS RSS expansion before, this commit limits
the implicitly MPLS RSS expansion for MPLSoGRE as well. For the
RSS flow matcher to GRE level only, it will not expand the MPLS
match item for the sub flows due to performance consideration.

The original RSS flow match item:
ETH VLAN IPV6 GRE GRE_KEY END

The previous RSS expansion:
ETH VLAN IPV6 GRE GRE_KEY END
ETH VLAN IPV6 GRE GRE_KEY IPV4 END
ETH VLAN IPV6 GRE GRE_KEY MPLS IPV4 END
ETH VLAN IPV6 GRE GRE_KEY MPLS ETH IPV4 END

New RSS expansion:
ETH VLAN IPV6 GRE GRE_KEY END
ETH VLAN IPV6 GRE GRE_KEY IPV4 END

[1]
commit a26cc30fa0 ("net/mlx5: limit inner RSS expansion for MPLS")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 18:01:22 +02:00
Lior Margalit
4a5a1e6b62 net/mlx5: fix default queue number in RSS flow rule
The selection flags for the RX hash define how the received packets will
be distributed between multiple queues.
When creating a new TIR, the queue_num is set to 1 if none of the selection
flags is set.

Applied the same to the RSS desc before checking if it matches a cached
TIR object to save creating a new object every time.

Fixes: fabf8a3724 ("net/mlx5: fix shared RSS action release")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 18:01:21 +02:00
Lior Margalit
5e1db76dd8 net/mlx5: fix RSS flow rule with L4 mismatch
The RSS hash types defined in the API do not support setting the L4 proto
type (TCP or UDP) without setting the L3 proto. For example, ETH_RSS_TCP
is defined as
(ETH_RSS_NONFRAG_IPV4_TCP | \
 ETH_RSS_NONFRAG_IPV6_TCP | \
 ETH_RSS_IPV6_TCP_EX).

The L3 proto of the RSS hash type may be different than the one defined
in the pattern, for example:
testpmd> flow create .../ ipv4 / tcp / end actions rss types ipv6-tcp-ex
end / end

If the RSS hash type also includes L4 proto type as in the above example,
the selection flags for the RX hash are currently set with SPORT/DPORT
without setting SRC/DST IP. As this combination is not supported, it does
not match any of the pre-created TIRs of the indirect RSS action
and the flow creation fails.

The fix is to prevent setting the selection flags for the RX hash with
SPORT/DPORT without setting SRC/DST IP. It applies non-RSS processing of
the received packets. In case of indirect RSS action, it will match the
MLX5_RSS_HASH_NONE pre-created TIR.

Fixes: b1d63d8293 ("net/mlx5: support RSS on src or dst fields only")
Fixes: 4a78c88e3b ("net/mlx5: fix Verbs flow tunnel")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-29 18:01:20 +02:00
Jiawei Wang
7356aec64c net/mlx5: fix mirror flow split with L3 encapsulation
Due to hardware limitations, the decap action (such as
VXLAN/NVGRE/RAW decap) can't follow the sample action in the
same flow, to keep the original action order of sample and decap
actions the flow was internally split into two subflows by PMD,
the sample action was moved into prefix subflow in the original table,
and decap action was moved into suffix subflow in the new table.

There is a specific combination of raw decap and raw encap actions
to specify "L3 encapsulation" packet transformation - raw decap action
to remove L2 header and raw encap to add the tunnel header.
This specific L3 encapsulation is encoded as a single packet reformat
hardware transaction and is supported by hardware after sample
action (no hardware limitations for packet reformat).

The "L3 encapsulation" with mirror actions in the same flow was not handled
correctly in the previous commit.
The patch checks whether the decap action is part of "L3 encapsulation"
and does not move the decap action into suffix subflow for the case.

Fixes: cafd87f62a ("net/mlx5: fix VLAN push/pop and decap actions with mirror")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:20 +02:00
Bing Zhao
75f166c20f net/mlx5: fix queue leaking in hairpin auto bind check
During the start up stage, the hairpin auto bind was executed for
each port. All the Tx and Rx queues configured for this port should
be checked to confirm if the auto bind of hairpin is needed.
1. The queue is hairpin queue.
2. The peer port is the same one and the peer queue should also be
   with hairpin type.
3. The manual bind attribute is not set for this queue.

If the queue is not a hairpin queue or it doesn't need to be bound
automatically, the reference count should be decreased by 1 since
the count was increased when calling the mlx5_*xq_get().
When the peer port is not the same, it means that no auto bind is
supported and the mlx5_*xq_release() was missed in the current
implementation.

By calling the release function before continue, the count is
correct when calling the device close.

Fixes: aa8bea0e34 ("net/mlx5: add conditional hairpin auto bind")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:19 +02:00
Gregory Etelson
494d6863c2 net/mlx5: fix representor interrupt handler
In mlx5 PMD the PCI device interrupt vector was used by Uplink
representor exclusively and other VF representors did not support
interrupt mode.
All the VFs and Uplink representors are separate ethernet devices
and must have dedicated interrupt vectors.
The fix provides each representor with a dedicated interrupt
vector.

Fixes: 5882bde88d ("net/mlx5: fix representor interrupts handler")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:15 +02:00
Liang Ma
da7a5c1406 net/mlx5: export PMD-specific API file
The file rte_pmd_mlx5.h should be exported by Meson.

Fixes: efa79e68c8 ("net/mlx5: support fine grain dynamic flag")
Fixes: 23f627e0ed ("net/mlx5: add flow sync API")
Cc: stable@dpdk.org

Signed-off-by: Liang Ma <liangma@bytedance.com>
2021-07-22 17:23:26 +02:00
Lior Margalit
4e5ba38d56 net/mlx5: reject inner ethernet matching in GTP
The user is able to create a flow rule pattern with ETH after GTP
although it is not supported by the flex-parser configuration.

Failed the rule validation in such case with proper error message.

Fixes: 23c1d42c71 ("net/mlx5: split flow validation to dedicated function")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 17:13:09 +02:00
Lior Margalit
3e455a97dc net/mlx5: fix RSS expansion for GTP
The flow did not expand correctly when it included a GTP item.

Added GTP node to the expansion graph as possible next node
after IPv4/IPv6 UDP node.

Fixes: 592f05b29a ("net/mlx5: add RSS flow action")
Cc: stable@dpdk.org

Signed-off-by: Lior Margalit <lmargalit@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 16:55:19 +02:00
Xueming Li
92d16c83a7 net/mlx5: fix SF representor probing in isolate mode
Representor failed to probe in isolated mode due to callback of
retrieving representor info missing. This patch adds it back.

Fixes: cb95feefdd ("net/mlx5: support sub-function representor")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 16:53:26 +02:00
Viacheslav Ovsiienko
9f430dd751 net/mlx5: fix RoCE LAG bond device probing
The RoCE LAG bond device requires neither E-Switch nor SR-IOV
configurations. It means the RoCE LAG bond device might be
presented as a single port Infiniband device.

The mlx5 PMD wrongly recognized standalone RoCE LAG bond device
as E-Switch configuration, this triggered the calls of E-Switch
ports related API and the latter failed (over the new OFED kernel
driver, starting since 5.4.1), causing the overall device probe
failure.

If there is a single port Infiniband bond device found the
E-Switch related flags must be cleared indicating standalone
configuration.

Also, it is not true anymore the bond device can exist
over E-Switch configurations only (as it was claimed for VF LAG
bond devices). The related checks are not relevant anymore
and removed.

Fixes: 790164ce1d ("net/mlx5: check kernel support for VF LAG bonding")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 16:43:49 +02:00
Alexander Kozyrev
39f0df9b6d net/mlx5: reject copy to mark via modify action
The Mark action is a two-stage process in the Mellanox driver.
First, a hardware register is filled with the required value,
then this value is registered in the software resource table.

The MODIFY_FIELD action can instruct a Mellanox NIC to copy
some value from an arbitrary packet header field into the
hardware register, associated with the Mark item. But there
is no way NIC can modify the software resource table as well.

Due to these driver limitations the copying of arbitrary value
to the MARK can not be supported and should be rejected in the
MODIFY_FIELD action.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 16:26:41 +02:00
Alexander Kozyrev
6d5735c1cb net/mlx5: fix meta register conversion for extensive mode
Register C is used in the extensive metadata mode number 1 and its
width can vary from 0 to 32 bits depending on the kernel usage of it.

There are several issues associated with this mode (dv_xmeta_en=1):
1. The metadata setting assumes that the width is always 16 bits,
which is the most common case in this mode. Use the proper mask.
2. The same is true for the modify_field Flow API. 16-bits width
is hardcoded for dv_xmeta_en=1. Switch to the register C mask width.
3. Metadata is stored in the most significant bits in CQE in this
mode because the registers copy code was not updated during the
metadata conversion to the big-endian format. Update this code to
avoid shifting the metadata in the datapath.

Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 16:24:56 +02:00
Suanming Mou
89a4bcb1fc net/mlx5: fix indexed pools allocation on Windows
Currently, the flow indexed pools are allocated per port,
the allocation was missing in Windows code.

Allocate indexed pool for the Windows case too.

Fixes: b4edeaf3ef ("net/mlx5: replace flow list with indexed pool")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Odi Assli <odia@nvidia.com>
2021-07-22 16:16:29 +02:00
Dmitry Kozlyuk
b7c8ea62d0 net/mlx5: fix indirect action modify rollback
mlx5_ind_table_obj_modify() first references queues from the new list,
then applies the new list to HW. In case of apply failure the function
dereferenced queues from the old list, while it should be the new list.

Fixes: fa7ad49e96 ("net/mlx5: fix shared RSS action update")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 15:55:48 +02:00
Dmitry Kozlyuk
94e257ec8c net/mlx5: fix Rx/Tx queue checks
When device configuration was interrupted by a signal,
mlx5_rxq/txq_release() could access yet unitinialized array
and crash the application. Add checks whether queue array
is initialized.

Fixes: a1366b1a2b ("net/mlx5: add reference counter on DPDK Rx queues")
Fixes: 6e78005a9b ("net/mlx5: add reference counter on DPDK Tx queues")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 15:49:33 +02:00
Dong Zhou
96f85ec489 net/mlx5: check VLAN push/pop support
For ConnectX-6 in FDB domain, pop and push VLAN
on both ingress and egress directions are supported.

For ConnectX-6 in NIC domain, and ConnectX-5 in both FWD and NIC domain,
pop VLAN is only supported on ingress direction,
push VLAN is only supported on egress direction.

Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 15:40:01 +02:00
Michael Baum
2fec07edd4 net/mlx5: fix overflow in mempool argument
The mlx5_mprq_alloc_mp function makes shifting to the numeric constant
1, for sending it as a parameter to rte_mempool_create function.

The rte_mempool_create function expects to get void pointer (uintptr_t,
might be 64-bit) and instead gets a 32-bit variable, because the
numeric constant size is a 32-bit.
In case the shift is greater than 32 the variable might lose its value
even though the function might get 64-bit argument.

Change the size of the numeric constant 1 to uintptr_t.

Fixes: 3a22f3877c ("net/mlx5: replace external mbuf shared memory")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 14:48:47 +02:00
Bing Zhao
33a7493c8d net/mlx5: support meter for trTCM profiles
The support of RFC2698 and RFC4115 are added in mlx5 PMD. Only the
ASO metering supports these two profiles.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:29:01 +02:00
Bing Zhao
4d648fad90 net/mlx5: check consistency of meter policy and profile
In the previous implementation, only green color policy was
supported in mlx5 PMD. Since yellow color policy is supported now,
the consistency of meter policy and profile should be checked.
  1. If the profile supports yellow but the policy doesn't, an error
     should be returned when creating the meter. Or else, there is
     no explicit steering action for the packets marked with yellow.
  2. If the policy supports yellow but the profile doesn't, it will
     be considered as a valid case. Even if no packet will be
     handled with the yellow steering action, it is just like that
     only the green policy presents.

Usually the green color is supported by default, but when it is
disabled intentionally with setting the CBS to a small value like
zero in the profile, the similar checking on green policy and
profile should also be done.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:57 +02:00
Bing Zhao
4b7bf3ffb4 net/mlx5: support yellow in meter policy validation
In the previous implementation, the policy for yellow color was not
supported. The action validation for yellow was skipped.

Since the yellow color policy needs to be supported, the validation
should also be done for the yellow color. In the meanwhile, due to
the fact that color policies of one meter should be used for the
same flow(s), the domains supported of both colors should be the
same. If both of the colors have RSS as the termination actions,
except the queues, all other parameters of RSS should be the same.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:54 +02:00
Bing Zhao
b38a12272b net/mlx5: split meter color policy handling
If the fate action is either RSS or Queue of a meter policy, the
action will only be created in the flow splitting stage. With queue
as the fate action, only one sub-policy is needed. And RSS will
have more than one sub-policies if there is an expansion.

Since the RSS parameters are the same for both green and yellow
colors except the queues, the expansion result will be unique.
Even if only one color has the RSS action, the checking and possible
expansion will be done then. For each sub-policy, the action rules
need to be created separately on its own policy table.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:50 +02:00
Bing Zhao
fa31a5ed01 net/mlx5: support yellow meter policy rules
When creating a meter policy, both / either of the action rules for
green and yellow colors may be provided. After validation, usually
the actions are created before the meter is using by a flow rule.

If there is action specified for the yellow color, the action rules
should be created together with green color in the same time. The
action of green / yellow color can be empty, then the default
behavior is the jump action of the rule, just the same as that of
the default policy.

If the fate action of either one color is queue / RSS, all the
actions rules will be created on the flow splitting stage instead of
the policy adding stage.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:47 +02:00
Bing Zhao
9b5463df5d net/mlx5: enable meter bucket overflow for yellow color
To support the meter policy for yellow action, the prerequisite is
that the hardware needs to support the EBS, as defined in the
RFC2697.
  https://datatracker.ietf.org/doc/html/rfc2697
Then some of the packets can be marked as yellow if the tokens of C
bucket is not enough but enough in E bucket. The color could be used
for the further steering of the packets.

In the current implementation EBS and overflow were ignored when
creating a meter profile. With this commit, if EBS is set by the
application, the generation of yellow color will be enabled in the
hardware for flow rules steering of packets.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:43 +02:00
Bing Zhao
363db9b00f net/mlx5: handle yellow case in default meter policy
In order to support the yellow color for the default meter policy,
the default policy action for yellow should be created together
with the green policy.

The default policy action for yellow action is the same as that for
green. In the same table, the same matcher will be reused for yellow
and the destination group will be the same.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 13:28:40 +02:00
Xueming Li
cdfdb82d0b net/mlx5: check maximum Verbs port number
Verbs API doesn't support device port number larger than 255 by design.
Add check and fail probing with proper error log.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Xueming Li
919488fbfa net/mlx5: support Sub-Function
Introduce SF support.
Similar to VF, SF on auxiliary bus is a portion of hardware PF,
no representor or bonding parameters for SF.

Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00