Commit Graph

180 Commits

Author SHA1 Message Date
Michael Baum
34776af600 net/mlx5: fix MPRQ stride devargs adjustment
In Multi-Packet RQ creation, the user can choose the number of strides
and their size in bytes. The user updates it using specific devargs for
both of these parameters.
The above two parameters determine the size of the WQE which is actually
their product of multiplication.

If the user selects values that are not in the supported range, the PMD
changes them to default values. However, apart from the range
limitations for each parameter individually there is also a minimum
value on their multiplication. When the user selects values that their
multiplication are lower than minimum value, no adjustment is made and
the creation of the WQE fails.

This patch adds an adjustment in these cases as well. When the user
selects values whose multiplication is lower than the minimum, they are
replaced with the default values.

Fixes: ecb160456a ("net/mlx5: add device parameter for MPRQ stride size")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-12-05 12:22:09 +01:00
Michael Baum
0947ed380f net/mlx5: improve stride parameter names
In the striding RQ management there are two important parameters, the
size of the single stride in bytes and the number of strides.

Both the data-path structure and config structure keep the log of the
above parameters. However, in their names there is no mention that the
value is a log which may be misleading as if the fields represent the
values themselves.

This patch updates their names describing the values more accurately.

Fixes: ecb160456a ("net/mlx5: add device parameter for MPRQ stride size")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-12-05 12:22:09 +01:00
Josh Soref
7be78d0279 fix spelling in comments and strings
The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-01-11 12:16:53 +01:00
Viacheslav Ovsiienko
11cfe349b3 net/mlx5: fix Tx scheduling check
There was a redundant check for the enabled E-Switch, this
resulted in device probing failure if the Tx scheduling was
requested and E-Switch was enabled.

Fixes: f17e4b4ffe ("net/mlx5: add Tx scheduling check on queue creation")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-14 09:18:28 +01:00
Bing Zhao
febcac7b46 net/mlx5: support Rx queue delay drop
For the Ethernet RQs, if there all receiving descriptors are
exhausted, the packets being received will be dropped. This behavior
prevents slow or malicious software entities at the host from
affecting the network. While for hairpin cases, even if there is no
software involved during the packet forwarding from Rx to Tx side,
some hiccup in the hardware or back pressure from Tx side may still
cause the descriptors to be exhausted. In certain scenarios it may be
preferred to configure the device to avoid such packet drops,
assuming the posting of descriptors will resume shortly.

To support this, a new devarg "delay_drop" is introduced. By default,
the delay drop is enabled for hairpin Rx queues and disabled for
standard Rx queues. This value is used as a bit mask:
  - bit 0: enablement of standard Rx queue
  - bit 1: enablement of hairpin Rx queue
And this attribute will be applied to all Rx queues of a device.

The "rq_delay_drop" capability in the HCA_CAP is checked before
creating any queue. If the hardware capabilities do not support
this delay drop, all the Rx queues will still be created without
this attribute, and the devarg setting will be ignored even if it
is specified explicitly. A warning log is used to notify the
application when this occurs.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-05 17:04:53 +01:00
Xueming Li
09c2555303 net/mlx5: support shared Rx queue
This patch introduces shared RxQ. All shared Rx queues with same group
and queue ID share the same rxq_ctrl. Rxq_ctrl and rxq_data are shared,
all queues from different member port share same WQ and CQ, essentially
one Rx WQ, mbufs are filled into this singleton WQ.

Shared rxq_data is set into device Rx queues of all member ports as
RxQ object, used for receiving packets. Polling queue of any member
ports returns packets of any member, mbuf->port is used to identify
source port.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:50 +01:00
Gregory Etelson
9086ac093a net/mlx5: add flex parser DevX object management
The DevX flex parsers can be shared between representors
within the same IB context. We should put the flex parser
objects into the shared list and engage the standard
mlx5_list_xxx API to manage ones.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:38 +01:00
Viacheslav Ovsiienko
db25cadc08 net/mlx5: add flex item operations
This patch is a preparation step of implementing
flex item feature in driver and it provides:

  - external entry point routines for flex item
    creation/deletion

  - flex item objects management over the ports.

The flex item object keeps information about
the item created over the port - reference counter
to track whether item is in use by some active
flows and the pointer to underlying shared DevX
object, providing all the data needed to translate
the flow flex pattern into matcher fields according
hardware configuration.

There is not too many flex items supposed to be
created on the port, the design is optimized
rather for flow insertion rate than memory savings.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:38 +01:00
Dmitry Kozlyuk
bc5bee028e net/mlx5: create drop queue using DevX
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-02 18:59:17 +01:00
Jiawei Wang
3c4338a421 net/mlx5: optimize device spawn time with representors
During the device spawn process, mlx5 PMD queried the available flow
priorities by calling mlx5_flow_discover_priorities, queried
if the DR drop action was supported on the root table by calling
the mlx5_flow_discover_dr_action_support routine, and queried the
availability of metadata register C by calling mlx5_flow_discover_mreg_c

These functions created the test flows to get the supported fields, and
at the end destroyed the test flows. The test flows in the first two
functions was created on the root table.
If the device was spawned with multiple representors, these test flows
were created and destroyed on each representor as well. The above
operations took a significant amount of init time during the device
spawn.

This patch optimizes the device discover functions, if there is
the device with multiple representors (VF/SF) being spawned,
the priority and drop action and metadata register support check can be
done only ones and check results can be shared for all representors.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-27 14:04:39 +02:00
Rongwei Liu
7299ab6822 net/mlx5: support socket direct mode bonding
In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-26 13:24:20 +02:00
Harman Kalra
d61138d4f0 drivers: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Ferruh Yigit
295968d174 ethdev: add namespace
Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-22 18:15:38 +02:00
Rongwei Liu
a89f6433aa net/mlx5: set Tx queue affinity in round-robin
Previously, we set txq affinity to 0 and let firmware
to perform round-robin when bonding. Firmware uses a
global counter to assign txq affinity to different
physical ports accord to remainder after division.

There are three dis-advantages:
1. The global counter is shared between kernel and dpdk.
2. After restarting pmd or port, the previous counter value
is reused, so the new affinity is unpredictable.
3. There is no way to get what affinity is set by firmware.

In this update, we will create several TISs up to the
number of bonding ports and bind each TIS to one PF port.

For each port, it will start to pick up TIS using its port
index. Upper layer application can quickly calculate each txq's
affinity without querying.

At DPDK layer, when creating txq with 2 bonding ports, the
affinity is set like:
port 0: 1-->2-->1-->2
port 1: 2-->1-->2-->1
port 2: 1-->2-->1-->2

Note: Only applicable to DevX api.
This affinity subjects to HW hash.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 12:37:00 +02:00
Dmitry Kozlyuk
ea823b2c51 net/mlx5: close tools socket with last device
MLX5 PMD exposes a socket for external tools to dump port state.
Socket events are listened using an interrupt source of EXT type.
The socket was closed and the interrupt callback was unregistered
at program exit, which is incorrect because DPDK could be already
shut down at this point. Move actions performed at program exit
to the moment the last MLX5 port is closed. The socket will be opened
again if later a new MLX5 device is plugged in and probed.
Also fix comments that were decisively talking
about secondary processes instead of external tools.

Fixes: e6cdc54cc0 ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org

Reported-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-21 10:31:53 +02:00
Xueming Li
614966c2fa net/mlx5: check DevX to support more Verbs ports
Verbs API doesn't support device port number larger than 255 by design.

To support more VF or SubFunction port representors, forces DevX API
check when max Verbs device link ports larger than 255.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:14 +02:00
Xueming Li
686d05b60d net/mlx5: enable DevX Tx queue creation
Verbs API does not support Infiniband device port number larger 255 by
design. To support more representors on a single Infiniband device DevX
API should be engaged.

While creating Send Queue (SQ) object with Verbs API, the PMD assigned
IB device port attribute and kernel created the default miss flows in
FDB domain, to redirect egress traffic from the queue being created to
representor appropriate peer (wire, HPF, VF or SF).

With DevX API there is no IB-device port attribute (it is merely kernel
one, DevX operates in PRM terms) and PMD must create default miss flows
in FDB explicitly. PMD did not provide this and using DevX API for
E-Switch configurations was disabled.

The default miss FDB flow matches E-Switch manager vport (to make sure
the source is some representor) and SQn (Send Queue number - device
internal queue index). The root flow table managed by kernel/firmware
and it does not support vport redirect action, we have to split the
default miss flow into two ones:

- flow with lowest priority in the root table that matches E-Switch
manager vport ID and jump to group 1.
- flow in group 1 that matches E-Switch manager vport ID and SQn and
forwards packet to peer vport

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:13 +02:00
Xueming Li
3fd2961efa net/mlx5: use Netlink when IB port greater than 255
IB spec doesn't allow 255 ports on a single HCA, port number of 256 was
cast to u8 value 0 which invalid to ibv_query_port()

This patch invokes Netlink API to query port state when port number
greater than 255.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-21 09:31:08 +02:00
Michael Baum
9f1d636f3e common/mlx5: share MR management
Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:57:58 +02:00
Michael Baum
5fbc75ace1 common/mlx5: add global MR cache create function
Add function for global shared MR cache structure initialization.
This function include:
 - btree initialization.
 - set callbacks for reg and dereg MR.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:57:24 +02:00
Michael Baum
fe46b20c96 common/mlx5: share HCA capabilities handle
Add HCA attributes structure as a field of device config structure.
It query in common probing, and updates the timestamp format fields.

Each driver use HCA attributes from common device config structure,
instead of query it for itself.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:46 +02:00
Michael Baum
e35ccf243b common/mlx5: share protection domain object
Create shared Protection Domain in common area and add it and its PDN as
fields of common device structure.

Use this Protection Domain in all drivers and remove the PD and PDN
fields from their private structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:46 +02:00
Michael Baum
ca1418ce39 common/mlx5: share device context object
Create shared context device in common area and add it as a field of
common device.
Use this context device in all drivers and remove the ctx field from
their private structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:44 +02:00
Michael Baum
5bc38358b5 net/mlx5: remove redundant flag in device config
Device configure structure has flag named devx as same as SH structure
with the same meaning.

Remove the flag from the configuration structure and move all the
usages to the SH flag.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:36 +02:00
Michael Baum
887183effa common/mlx5: move basic probing functions to common
Move open IBV/DevX device function to common.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:53:32 +02:00
Michael Baum
8520992403 common/mlx5: share memory related devargs
Add device configure structure and function to parse user device
arguments into it.
Move parsing and management of relevant device arguments to common.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:39:04 +02:00
Michael Baum
7af08c8f1a common/mlx5: share basic probing with internal drivers
Create common probing structure that includes, for now, basic probing
information detected by the common driver and share it with all the
internal drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:38:46 +02:00
Tal Shnaiderman
c1a320bf89 net/mlx5: fix tunneling support query
Currently, the PMD decides if the tunneling offload
can enable VXLAN/GRE/GENEVE tunneled TSO support by checking
config->tunnel_en (single bit) and config->tso.

This is incorrect, the right way is to check the following
flags returned by the mlx5dv_query_device function:

MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_VXLAN - if supported the offload
DEV_TX_OFFLOAD_VXLAN_TNL_TSO can be enabled.
MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_GRE - if supported the offload
DEV_TX_OFFLOAD_GRE_TNL_TSO can be enabled.
MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_GENEVE - if supported the offload
DEV_TX_OFFLOAD_GENEVE_TNL_TSO can be enabled.

The fix enables the offloads according to the correct
flags returned by the kernel.

Fixes: dbccb4cddc ("net/mlx5: convert to new Tx offloads API")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:34 +02:00
Tal Shnaiderman
accf3cfce4 net/mlx5: fix software parsing support query
Currently, the PMD decides if the software parsing
offload can enable outer IPv4 checksum and tunneled
TSO support by checking config->hw_csum and config->tso
respectively.

This is incorrect, the right way is to check the following
flags returned by the mlx5dv_query_device function:

MLX5DV_SW_PARSING - check general swp support.
MLX5DV_SW_PARSING_CSUM - check swp checksum support.
MLX5DV_SW_PARSING_LSO - check swp LSO/TSO support.

The fix enables the offloads according to the correct
flags returned by the kernel.

Fixes: e46821e9fc ("net/mlx5: separate generic tunnel TSO from the standard one")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2021-10-12 15:29:25 +02:00
Viacheslav Galaktionov
ff4e52efb3 ethdev: fix representor port ID search by name
The patch is required for all PMDs which do not provide representors
info on the representor itself.

The function, rte_eth_representor_id_get(), is used in
eth_representor_cmp() which is required in ethdev class iterator to
search ethdev port ID by name (representor case). Before the patch
the function is called on the representor itself and tries to get
representors info to match.

Search of port ID by name is used after hotplug to find out port ID
of the just plugged device.

Getting a list of representors from a representor does not make sense.
Instead, a backer device should be used.

To this end, extend the rte_eth_dev_data structure to include the port ID
of the backing device for representors.

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Reviewed-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-12 16:54:20 +02:00
Dmitry Kozlyuk
fec28ca0e3 net/mlx5: support mempool registration
When the first port in a given protection domain (PD) starts,
install a mempool event callback for this PD and register all existing
memory regions (MR) for it. When the last port in a PD closes,
remove the callback and unregister all mempools for this PD.
This behavior can be switched off with a new devarg: mr_mempool_reg_en.

On TX slow path, i.e. when an MR key for the address of the buffer
to send is not in the local cache, first try to retrieve it from
the database of registered mempools. Supported are direct and indirect
mbufs, as well as externally-attached ones from MLX5 MPRQ feature.
Lookup in the database of non-mempool memory is used as the last resort.

RX mempools are registered regardless of the devarg value.
On RX data path only the local cache and the mempool database is used.
If implicit mempool registration is disabled, these mempools
are unregistered at port stop, releasing the MRs.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-19 16:35:16 +02:00
Michael Baum
97c9b0aa25 net/mlx5: fix duplicate pattern option default
In order to allow/disallow configuring rules with identical patterns,
the new device argument 'allow_duplicate_pattern' was introduced.

The default is to allow, and it is initialized to 1 in PCI probe
function.
However, on auxiliary bus probing (for Sub-Function) it is not
initialized at all, so it's actually initialized to 0.

Move the initialization to default config function which is called from
both.

Fixes: 919488fbfa ("net/mlx5: support Sub-Function")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-20 23:13:40 +02:00
Michael Baum
6856efa54e net/mlx5: fix PF leak on PCI probing failure
During PCI probe, the internal probe function is called per PF.

If one of them fails, it was missing a proper destroy for the previously
probed PFs.

This fixes the behavior by destroying all previously probed PFs.

Fixes: 08c2772fc7 ("net/mlx5: support list of representor PF")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-09-20 23:12:10 +02:00
Aman Deep Singh
a7db3afce7 net: add macro to extract MAC address bytes
Added macros to simplify print of MAC address.
The six bytes of a MAC address are extracted in
a macro here, to improve code readablity.

Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-09-07 19:08:05 +02:00
Aman Deep Singh
c2c4f87b12 net: add macro for MAC address print
Added macro to print six bytes of MAC address.
The MAC addresses will be printed in upper case
hexadecimal format.
In case there is a specific check for lower case
MAC address, the user may need to make a change in
such test case after this patch.

Signed-off-by: Aman Deep Singh <aman.deep.singh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-09-07 19:07:46 +02:00
Gregory Etelson
e9d420dfc2 net/mlx5: fix find sibling devices
The routine mlx5_eth_find_next() and related iterating macro
MLX5_ETH_FOREACH_DEV is used to iterate through sibling devices (all
representors share the same configuration and switching domain) on top
of specified root device.

The root device parameter was specified as NULL, and it caused
missing siblings in iteration during representor device probing,
causing:

1. allocating new domain_id for the device being probed.
2. discrepancy in representor configurations and potential overall
   driver malfunctions.

Fixes: 56bb3c84e9 ("net/mlx5: reduce PCI dependency")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-04 11:27:49 +02:00
Suanming Mou
45633c460c net/mlx5: workaround drop action with old kernel
Currently, there are two types of drop action implementation
in the PMD. One is the DR (Direct Rules) dummy placeholder drop
action and another is the dedicated dummy queue drop action.
When creates flow on the root table with DR drop action, the
action will be converted to MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP
Verbs attribute in rdma-core.

In some inbox systems, MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs
attribute may not be supported in the kernel driver. Create flow
with drop action on the root table will be failed as it is not
supported. In this case, the dummy queue drop action should be
used instead of DR dummy placeholder drop action.

This commit adds the DR drop action support detect on the root
table. If MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP Verbs is not
supported in the system, a dummy queue will be used as drop
action.

Fixes: da845ae9d7 ("net/mlx5: fix drop action for Direct Rules/Verbs")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-08-03 15:08:02 +02:00
Gregory Etelson
ce4062cb10 net/mlx5: fix port initialization of switch domain
All active ports that belong to the same E-switch share domain_id
value.
Port initialization procedure searches through a database for existing
port with matching properties. New domain_id allocated if match was
not located. Otherwise, new port inherits existing domain_id.

Port initialization did not pass enough info to search procedure to
find existing matches. Therefore, each port was created with a private
domain_id value. As the result, port_id flow action failed because it
could not match ports in a rule to E-switch.

The patch adds dpdk_dev with port properties to device search.

Fixes: 56bb3c84e9 ("net/mlx5: reduce PCI dependency")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-08-03 14:19:33 +02:00
Viacheslav Ovsiienko
f17e4b4ffe net/mlx5: add Tx scheduling check on queue creation
The send scheduling on timestamp offload requires the Send
Queue (SQ) shares its User Access Region (UAR) with the
pacing Clock Queue. The SQ can be created by mlx5 PMD either
with DevX or with Verbs. If the SQ is being created with
DevX, the dedicated UAR can be specified and all the SQs
share the single UAR. Once SQ is being created with Verbs
the SQ's UAR is allocated by the rdma-core library internally
on its own and there is no UAR sharing. This caused hardware
errors on WAIT WQEs and overall send scheduling malfunction.

If SQs are going to be created with Verbs and the send
scheduling offload is explicitly requested via tx_pp devarg
the device probing is rejected as device configuration
can't satisfy the requirements.

Fixes: 3ec73abeed ("net/mlx5/linux: fix Tx queue operations decision")
Fixes: 8f848f32fc ("net/mlx5: introduce send scheduling devargs")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:23 +02:00
Gregory Etelson
494d6863c2 net/mlx5: fix representor interrupt handler
In mlx5 PMD the PCI device interrupt vector was used by Uplink
representor exclusively and other VF representors did not support
interrupt mode.
All the VFs and Uplink representors are separate ethernet devices
and must have dedicated interrupt vectors.
The fix provides each representor with a dedicated interrupt
vector.

Fixes: 5882bde88d ("net/mlx5: fix representor interrupts handler")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-29 18:01:15 +02:00
Viacheslav Ovsiienko
9f430dd751 net/mlx5: fix RoCE LAG bond device probing
The RoCE LAG bond device requires neither E-Switch nor SR-IOV
configurations. It means the RoCE LAG bond device might be
presented as a single port Infiniband device.

The mlx5 PMD wrongly recognized standalone RoCE LAG bond device
as E-Switch configuration, this triggered the calls of E-Switch
ports related API and the latter failed (over the new OFED kernel
driver, starting since 5.4.1), causing the overall device probe
failure.

If there is a single port Infiniband bond device found the
E-Switch related flags must be cleared indicating standalone
configuration.

Also, it is not true anymore the bond device can exist
over E-Switch configurations only (as it was claimed for VF LAG
bond devices). The related checks are not relevant anymore
and removed.

Fixes: 790164ce1d ("net/mlx5: check kernel support for VF LAG bonding")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 16:43:49 +02:00
Dong Zhou
96f85ec489 net/mlx5: check VLAN push/pop support
For ConnectX-6 in FDB domain, pop and push VLAN
on both ingress and egress directions are supported.

For ConnectX-6 in NIC domain, and ConnectX-5 in both FWD and NIC domain,
pop VLAN is only supported on ingress direction,
push VLAN is only supported on egress direction.

Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-22 15:40:01 +02:00
Xueming Li
cdfdb82d0b net/mlx5: check maximum Verbs port number
Verbs API doesn't support device port number larger than 255 by design.
Add check and fail probing with proper error log.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Xueming Li
919488fbfa net/mlx5: support Sub-Function
Introduce SF support.
Similar to VF, SF on auxiliary bus is a portion of hardware PF,
no representor or bonding parameters for SF.

Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Xueming Li
a7f34989e9 net/mlx5: migrate to bus-agnostic common interface
To support SubFunction based on auxiliary bus, common driver supports
new bus-agnostic driver.

This patch migrates net driver to new common driver.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Xueming Li
56bb3c84e9 net/mlx5: reduce PCI dependency
To support more bus types, remove PCI dependency where possible.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Thomas Monjalon
4d567938be common/mlx5: get PCI device address from any bus
A function is exported to allow retrieving the PCI address
of the parent PCI device of a Sub-Function in auxiliary bus sysfs.
The function mlx5_dev_to_pci_str() is accepting both PCI and auxiliary
devices. In case of a PCI device, it is simply using the device name.

The function mlx5_dev_to_pci_addr(), which is based on sysfs path
and do not use any device object, is renamed to mlx5_get_pci_addr()
for clarity purpose.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-07-22 00:11:14 +02:00
Suanming Mou
f3020a331d net/mlx5: optimize hash list table allocate on demand
Currently, all the hash list tables are allocated during start up.
Since different applications may only use dedicated limited actions,
optimized the hash list table allocate on demand will save initial
memory.

This commit optimizes hash list table allocate on demand.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-15 16:09:22 +02:00
Suanming Mou
f7c3f3c290 net/mlx5: adjust hash bucket size
With the new per core optimization to the list, the hash bucket size
can be tuned to a more accurate number.

This commit adjusts the hash bucket size.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-15 16:09:21 +02:00
Matan Azrad
961b6774c4 common/mlx5: add per-lcore cache to hash list utility
Using the mlx5 list utility object in the hlist buckets.

This patch moves the list utility object to the common utility, creates
all the clone operations for all the hlist instances in the driver.

Also adjust all the utility callbacks to be generic for both list and
hlist.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
2021-07-15 16:09:18 +02:00