Commit Graph

2531 Commits

Author SHA1 Message Date
Jiawei Wang
bfa87e21bd net/mlx5: fix tunnel header with IPIP offload
For the flows with multiple tunnel layers and containing
tunnel decap and modify actions, for example:

... / vxlan / eth / ipv4 proto is 4 / end
actions raw_decap / modify_field / ...
(note: proto 4 means we have the IP-over-IP tunnel in VXLAN payload)

We have added the multiple tunnel layers validation rejecting
the flows like above mentioned one.

The hardware supports the above match combination till the inner
IP-over-IP header (not including the last one), both for IP-over-IPv4
and IP-over-IPv6, so we should not blindly reject. Also, for the modify
actions following the decap we should set the layer attributes correctly.

This patch reverts the below code changes to support the match, and
adjusts the layers update in case of decap with outer tunnel header.

Fixes: fa06906a48 ("net/mlx5: fix IPIP multi-tunnel validation")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:45 +02:00
Sean Zhang
707d5e7d79 net/mlx5: support flow matching on representor ID
Add support for port_representor item, it will match on traffic
originated from representor port specified in the pattern. This item
is supported in FDB steering domain only (in the flow with transfer
attribute).

For example, below flow will redirect the destination of traffic from
ethdev 1 to ethdev 2.

testpmd> ... pattern eth / port_representor port_id is 1 / end actions
represented_port ethdev_port_id 2 / ...

To handle abovementioned item, Tx queue matching is added in the driver,
and the flow will be expanded to number of the Tx queues. If the spec of
port_representor is NULL, the flow will not be expanded and match on
traffic from any representor port.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:44 +02:00
Gregory Etelson
31b29e0c7f net/mlx5: fix RSS expansion buffer size
Increase expansion buffer size to accumulate more RSS types.

Fixes: 3f02c7ff68 ("net/mlx5: fix RSS expansion for inner tunnel VLAN")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-26 13:33:44 +02:00
Dariusz Sosnowski
9fa7c1cddb net/mlx5: create control flow rules with HWS
This patch adds the creation of control flow rules required to receive
default traffic (based on port configuration) with HWS.

Control flow rules are created on port start and destroyed on port stop.
Handling of destroying these rules was already implemented before that
patch.

Control flow rules are created if and only if flow isolation mode is
disabled and the creation process goes as follows:

- Port configuration is collected into a set of flags. Each flag
  corresponds to a certain Ethernet pattern type, defined by
  mlx5_flow_ctrl_rx_eth_pattern_type enumeration. There is a separate
  flag for VLAN filtering.

- For each possible Ethernet pattern type and:
  - For each possible RSS action configuration:
    - If configuration flags do not match this combination, it is
      omitted.
    - A template table is created using this combination of pattern
      and actions template (templates are fetched from hw_ctrl_rx
      struct stored in the port's private data).
    - Flow rules are created in this table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:43 +02:00
Dariusz Sosnowski
483181f7b6 net/mlx5: support device control of representor matching
In some E-Switch use cases, applications want to receive all traffic
on a single port. Since currently, flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through a new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with the port on which the rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which the flow rule is
  created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to the pattern, which
   matches available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to the pattern, which
   matches vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to the current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:43 +02:00
Dariusz Sosnowski
26e1eaf2da net/mlx5: support device control for E-Switch default rule
This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- the creation of the default FDB jump flow rule.
- the ability of the user to create transfer flow rules in the root
table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:43 +02:00
Gregory Etelson
a3778a4784 net/mlx5: support flow integrity in HWS group 0
- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:42 +02:00
Suanming Mou
478ba4bbe6 net/mlx5: support async flow action push and pull
The queue based rte_flow_async_action_* functions work the same as
queue based async flow functions. The operations can be pushed
asynchronously, and so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:42 +02:00
Michael Baum
04a4de756e net/mlx5: support flow age action with HWS
Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage aging.
 2. Initialize all of them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement a queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:41 +02:00
Alexander Kozyrev
48fbb0e93d net/mlx5: support flow meter mark indirect action with HWS
Add the ability to create an indirect action handle for METER_MARK.
It allows sharing one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:41 +02:00
Gregory Etelson
773ca0e91b net/mlx5: support VLAN push/pop/modify with HWS
Add PMD implementation for HW steering VLAN push, pop, and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID, and optional OF_SET_VLAN_PCP
flow action commands.

The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].

In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.

In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:41 +02:00
Suanming Mou
463170a7c9 net/mlx5: support connection tracking with HWS
This commit adds the support of connection tracking to HW steering as
SW steering did before.

The difference from SW steering implementation is that it takes
advantage of HW steering bulk action allocation support, in HW
steering only one single CT pool is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar to
deallocating. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:40 +02:00
Dariusz Sosnowski
f1fecffa88 net/mlx5: support Direct Rules action template API
This patch adapts mlx5 PMD to changes in mlx5dr API regarding the action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine the action's location in the rule_acts array passed
      on the rule creation.

2. Template table creation:

    - Fixed actions are created and put in the rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by the action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:40 +02:00
Xiaoyu Min
4d368e1da3 net/mlx5: support flow counter action for HWS
This commit adds HW steering counter action support.
The pool mechanism is the basic data structure for the HW steering
counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on the rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2. A: if the dequeued one from the local cache is still in reset
        status (counter's query_gen_when_free is equal to pool's query
        gen):
        I. Flush all counters in the local cache back to global
           wait_reset_list.
        II. Fetch _fetch_sz_ counters into the cache from the global
            free list.
        III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      A: Write back all counters above _threshold_ into the global
         wait_reset_list.
      B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:39 +02:00
Alexander Kozyrev
24865366e4 net/mlx5: support flow meter action for HWS
This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advance in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:39 +02:00
Bing Zhao
ddb68e4733 net/mlx5: add extended metadata mode for HWS
The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported.

The mark is only supported in NIC and there is no copy supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:38 +02:00
Dariusz Sosnowski
1939eb6f66 net/mlx5: support flow port action with HWS
This patch implements creating and caching of port action for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as the master. Attaching and detaching ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rules that are created by the PMD
implicitly or not, and PMD sets this value to 1 by default.

If set to 0, the default E-Switch rule will not be created and the user
can create the specific E-Switch rules on the root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:38 +02:00
Suanming Mou
0f4aa72b99 net/mlx5: support flow modify field with HWS
This patch introduces support for modify_field rte_flow actions in HWS
mode that includes:
	- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	  fields.

This is implemented in two phases:
1. On flow table creation the hardware commands are generated, based
   on rte_flow action templates, and stored alongside action template.

2. On flow rule creation/queueing the hardware commands are updated with
   values provided by the user. Any masks over immediate values, provided
   in action templates, are applied to these values before enqueueing rules
   for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:38 +02:00
Suanming Mou
7f6daa490d net/mlx5: add shared header reformat
As the rte_flow_async API defines the action mask with a field value
not being 0 means the action will be used as shared in all the flows
in the table.

The header reformat action with the action mask field not being 0 will
be created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:37 +02:00
Suanming Mou
b206c558f7 net/mlx5: fix IPv6 and TCP RSS hash fields
In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have a chance to be checked. This caused the IPv6 and TCP hash
fields in the else case would never be set.

This commit adds the dedicated HW steering hash field set function
to generate the RSS hash fields.

Fixes: 3a2f674b6a ("net/mlx5: add queue and RSS HW steering action")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:37 +02:00
Suanming Mou
4fb0ef2976 net/mlx5: fix steering engine type check
In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case the user calls the HWS API in SWS mode, the attr should be
placed in HWS functions or it will cause a crash.

Fixes: c40c061a02 ("net/mlx5: add basic flow queue operation")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:36 +02:00
Alex Vesker
22681deead net/mlx5/hws: enable hardware steering
Replace stub implementation of HWS with mlx5dr code.

Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:36 +02:00
Hamdan Igbaria
78580cf4e7 net/mlx5/hws: add debug layer
The debug layer is used to generate a debug CSV file
containing details of the context, table, matcher, rules
and other useful debug information.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:36 +02:00
Erez Shitrit
f8c8a6d844 net/mlx5/hws: add action object
Action objects are used for executing different HW actions
over packets. Each action contains the HW resources and parameters
needed for action use over the HW when creating a rule.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:35 +02:00
Alex Vesker
405242c52d net/mlx5/hws: add rule object
HWS rule objects reside under the matcher, each rule holds
the configuration for the packet fields to match and the
set of actions to execute over the packet that has the requested
fields.

Rules can be created asynchronously in parallel over multiple
queues to different matchers with each rule configured to the HW.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:35 +02:00
Alex Vesker
c467608215 net/mlx5/hws: add matcher object
HWS matcher resides under the table object, each table can
have multiple chained matches with different attributes.

Each matcher represents a combination of match and action templates,
and can contain multiple configurations based on the templates.

Packets are steered from the table to the matcher and from there to
other objects.

The matcher allows efficient HW packet field matching and action
execution based on the configuration done to it.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
2022-10-26 13:33:34 +02:00
Alex Vesker
394cc7ba40 net/mlx5/hws: add table object
HWS table resides under the context object, each context can
have multiple tables with different steering types RX/TX/FDB.

The table is not only a logical object but it is also represented
in the HW, packets can be steered to the table, and from there
to other tables.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:34 +02:00
Alex Vesker
b0290e56dd net/mlx5/hws: add context object
Context is the first mlx5dr object created, all sub objects
table, matcher, rule, and action are created using the context.
The context holds the capabilities and the send queues used for
configuring the offloads to the HW.

Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:34 +02:00
Alex Vesker
c55c2bf353 net/mlx5/hws: add definer layer
Definers are HW objects that are used for matching, rte items
are translated to definers, each definer holds the fields and
bit-masks used for HW flow matching. The definer layer is used
for finding the most efficient definer for each set of items.
In addition to definer creation we also calculate the field
copy (fc) array used for efficient items to WQE conversion.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:33 +02:00
Alex Vesker
3eb748869d net/mlx5/hws: add send layer
HWS configures flows to the HW using a QP, each WQE has
the details of the flow we want to offload. The send layer
allocates the resources needed to send the request to the HW
as well as managing the queues, getting completions and
handling failures.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:33 +02:00
Erez Shitrit
b4dd7bcb0d net/mlx5/hws: add pool and buddy
HWS needs to manage different types of device memory in
an efficient and quick way. For this, memory pools are
being used.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:33 +02:00
Erez Shitrit
365cdf5f8c net/mlx5/hws: add command layer
This adds the command layer which is used to communicate with
the FW, to query capabilities and allocate FW resources needed
for HWS.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
2022-10-26 13:33:32 +02:00
Bing Zhao
8a89038f40 net/mlx5: provide available tag registers
This stores the available tags that can be used by the
application in a global array that will be used to
transfer the TAG item directly from the ID to the REG_C_x
since these can't be changed after startup.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
2022-10-26 13:33:31 +02:00
Dariusz Sosnowski
5bd0e3e671 net/mlx5: add port to metadata conversion
This adds conversion functions between both ethdev port_id and IB
context to internal corresponding tag/mask values.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
2022-10-26 13:33:31 +02:00
Suanming Mou
75a00812b1 net/mlx5: add hardware steering item translation
This provides shared item tranlsation code for hardware
steering root table flows as they still work under FW steering mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
2022-10-26 13:33:30 +02:00
Suanming Mou
cd4ab74206 net/mlx5: split flow item matcher and value translation
This split the item matcher and value translation to
make the code reusable for the new steering mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
2022-10-26 13:33:30 +02:00
Suanming Mou
e64fd460b7 net/mlx5: split flow item translation
This splits flow item translation code to a dedicated function
to share the item translation code with hardware steering mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
2022-10-26 13:33:29 +02:00
Michael Savisko
7f6e276b02 net/mlx5: support flow action send to kernel
Introduce mlx5_get_send_to_kernel_priority() function which returns
value of priority which must be used to jump back to table 0 in order
to send traffic to kernel. This function returns lowest priority.

Add flow_dv_translate_action_send_to_kernel() function which
will allocate rdma-core send_to_kernel action object.
Called from flow_dv_translate().

Fail translation of RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL action in
HW steering.

Signed-off-by: Michael Savisko <michaelsav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-26 13:33:29 +02:00
Michael Savisko
f31a141e64 net/mlx5: add send to kernel action resource holder
Add new structure mlx5_send_to_kernel_action which will hold
together allocated action resource and a reference to used table.
A new structure member of this type added to struct mlx5_dev_ctx_shared.
The member will be initialized upon first created send_to_kernel
action and will be reused for all future actions of this type.
Release of these resources will be done when all shared DR
resources are being released in mlx5_os_free_shared_dr().

Change function flow_dv_tbl_resource_release() from
static to external.

Signed-off-by: Michael Savisko <michaelsav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-26 13:33:29 +02:00
Michael Savisko
25c4d6dfae net/mlx5: add flow action stub for send to kernel
Add new mlx5 action flag MLX5_FLOW_ACTION_SEND_TO_KERNEL.

Add element MLX5_FLOW_FATE_SEND_TO_KERNEL in enum mlx5_flow_fate_type.
For that purpose field 'fate_action' in structure mlx5_flow_handle must be
expanded from 3 bits to 4 bits.

Signed-off-by: Michael Savisko <michaelsav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-26 13:33:28 +02:00
Michael Savisko
80f998da1d common/mlx5: add send to kernel flow action
Add new glue callback dr_create_flow_action_send_to_kernel.
Default callback invokes mlx5dv_dr_action_create_dest_root_table().

Add static inline mlx5_flow_os_create_flow_action_send_to_kernel(),
which calls dr_create_flow_action_send_to_kernel glue callback.

Define HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE macro if function
mlx5dv_dr_action_create_dest_root_table exists in infiniband/mlx5dv.h

Signed-off-by: Michael Savisko <michaelsav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-26 13:33:28 +02:00
Dong Zhou
4df7f801ff net/mlx5: fix thread workspace memory leak
The thread workspace push/pop should be paired. In the "flow_list_create"
routine, if error happened the workspace pop was missed. This patch shares
the workspace pop for all return paths.

Fixes: 0064bf4318 ("net/mlx5: fix nested flow creation")
Cc: stable@dpdk.org

Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-26 13:33:27 +02:00
Dariusz Sosnowski
f2d43ff54d net/mlx5: allow hairpin Rx queue in locked memory
This patch adds a capability to place hairpin Rx queue in locked device
memory. This capability is equivalent to storing hairpin RQ's data
buffers in locked internal device memory.

Hairpin Rx queue creation is extended with requesting that RQ is
allocated in locked internal device memory. If allocation fails and
force_memory hairpin configuration is set, then hairpin queue creation
(and, as a result, device start) fails. If force_memory is unset, then
PMD will fallback to allocating memory for hairpin RQ in unlocked
internal device memory.

To allow such allocation, the user must set HAIRPIN_DATA_BUFFER_LOCK
flag in FW using mlxconfig tool.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-08 18:30:50 +02:00
Dariusz Sosnowski
7274b41756 net/mlx5: allow hairpin Tx queue in host memory
This patch adds a capability to place hairpin Tx queue in host memory
managed by DPDK. This capability is equivalent to storing hairpin SQ's
WQ buffer in host memory.

Hairpin Tx queue creation is extended with allocating a memory buffer of
proper size (calculated from required number of packets and WQE BB size
advertised in HCA capabilities).

force_memory flag of hairpin queue configuration is also supported.
If it is set and:

- allocation of memory buffer fails,
- or hairpin SQ creation fails,

then device start will fail. If it is unset, PMD will fallback to
creating the hairpin SQ with WQ buffer located in unlocked device
memory.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-08 18:30:50 +02:00
Shun Hao
9b7fcf395c net/mlx5: fix meter profile delete after disable
If a meter's profile is changed after meter disabled, there's an issue
that will fail when deleting the old profile.

This patch fixes this by adding the correct process to decrease the old
profile's reference count when changing profile.

Fixes: 63ffeb2ff2 ("net/mlx5: support meter profile update")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:55 +02:00
Shun Hao
530dc58073 net/mlx5: fix meter ID tag for meter hierarchy
Currently, when flow usese meter hierarchy, a tag action is always applied
to set the first meter's meter id, so as to update the first meter's drop
count. But it's not considered if first meter doesn't have drop count.

This patch fixes it, that in hierarchy, if the first meter doesn't have
drop count, no need to add the meter id tag action. No change for
non-hierarchy meter.

Fixes: e8146c63 ("net/mlx5: support represented port item in flow rules")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:54 +02:00
Shun Hao
ca7e6051e7 net/mlx5: limit meter flow when matching all ports
If there's no param in represented_port item, it will be treated as
matching all ports by default. But there's some limitation when using it
with meter hierarchy.

This patch adds the limitation that when matching all ports, the meter
hierarchy should not contain any meter having drop count.

Fixes: e8146c63 ("net/mlx5: support represented port item in flow rules")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:54 +02:00
Shun Hao
33d506b9e5 net/mlx5: fix meter hierarchy with represented port item
There is a new item type represented_port, and currently it will fail
when using meter hierarchy in flow using represented_port match.

This patch fixes this fail by adding support for represented_port item
in meter hierarchy flow split.

Fixes: e8146c63 ("net/mlx5: support represented port item in flow rules")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:54 +02:00
Jiawei Wang
9f71a297da net/mlx5: fix modify action with tunnel decapsulation
The driver splits the flow with sample action into two sub-flows,
sub prefix flow and sub suffix flow.

In the case of tunnel flow including a decap action, the driver should
translate the inner as outer for actions coming after the decap action.
In the case of flow splitting, the packet layers, used to detect the
attributes, are inherited from the prefix flow to the suffix flow but
the driver wrongly didn't handle the decap adjustment and the inner
layers didn't shift to the outer.

This patch adjusts the inherited layers in case of decap.

Fixes: 6e77151286 ("net/mlx5: fix match information in meter")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:53 +02:00
Raja Zidane
130bb7da53 net/mlx5: fix Tx check for hardware descriptor length
If hardware descriptor (WQE) length exceeds one the HW can handle,
the Tx queue failure occurs. PMD does the length check but there was
a bug - the length limit was expressed in 16B units (WQEBB segments),
while the calculated WQE length and limit were in 64B units (WQEBBs).
Fix the condition to avoid subsequent Tx queue failure.

Fixes: 18a1c20 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:53 +02:00
Viacheslav Ovsiienko
d15bfd2930 net/mlx5: fix inline length exceeding descriptor limit
The hardware descriptor (WQE) length field is 6 bits wide
and we have the native limitation for the overall descriptor
length. To improve the PCIe bandwidth the packet data can be
inline into descriptor. If PMD was configured to inline large
amount of data it happened there was no enough space remaining
in the descriptor to specify all the packet data segments and
PMD rejected problematic packets.

The patch tries to adjust the inline data length conservatively
and allows to avoid error occurring.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Fixes: e2259f93ef ("net/mlx5: fix Tx when inlining is impossible")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2022-10-02 09:13:52 +02:00
Viacheslav Ovsiienko
166f185fef net/mlx5: fix single not inline packet storing
The mlx5 PMD can inline packet data into transmitting descriptor (WQE)
and free mbuf immediately as data no longer needed, for non-inline
packets the mbuf pointer should be stored in elts array for coming
freeing on send completion. There was an optimization on storing
pointers in batch and there was missed storing mbuf for single
packet if non-inline was explicitly requested by flag.

Fixes: cacb44a099 ("net/mlx5: add no-inline Tx flag")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:52 +02:00
Viacheslav Ovsiienko
37d6fc30c1 net/mlx5: fix check for orphan wait descriptor
The mlx5 PMD supports send scheduling feature, it allows
to send packets at specified moment of time, to do that
PMD pushes special wait descriptor (WQE) to the hardware
queue and then pushes descriptor for packet data as usual.
If queue is close to be full or there is no enough elts
buffers to store mbufs being sent the data descriptors might
be not pushed and the orphan wait WQE (not followed by the
data) might reside in queue on tx_burst routine exit.

To avoid orphan wait WQEs there was the check for enough
free space in the queue WQE buffer and enough amount of the
free elts in queue mbuf storage. This check was incomplete
and did not cover all the cases for Enhanced Multi-Packet
Write descriptors.

Fixes: 2f827f5ea6 ("net/mlx5: support scheduling on send routine template")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:52 +02:00
Bassam Zaid AlKilani
6fa815722e net/mlx5: fix flow matching priority for ESP item
ESP is one of IPSec protocols over both IPv4 and IPv6 and is considered
a tunnel layer that cannot be followed by any other layer. Taking that
into consideration, ESP is considered as a 4 layer.

Not defining ESP's priority will make it match with the same priority as
its prior IP layer, which has a layer 3 priority. This will lead to
issues in matching and will match the packet with the first matching
rule even if it doesn't have an esp layer in its pattern, disregarding
any following rules that could have an esp item and can be actually
a more accurate match since it will have a longer matching criterion.

This is fixed by defining the priority for the ESP item to have a
layer 4 priority, making the match be for the rule with the more
accurate and longer matching criteria.

Fixes: 18ca4a4ec7 ("net/mlx5: support ESP SPI match and RSS hash")
Cc: stable@dpdk.org

Signed-off-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
2022-10-02 09:13:51 +02:00
Michael Baum
593f913a8e net/mlx5: fix LRO requirements check
One of the conditions to allow LRO offload is the DV configuration.

The function incorrectly checks the DV configuration before initializing
it by the user devarg; hence, LRO cannot be allowed.

This patch moves this check to mlx5_shared_dev_ctx_args_config, where DV
configuration is initialized.

Fixes: c4b8620135 ("net/mlx5: refactor to detect operation by DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reported-by: Gal Shalom <galshalom@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-10-02 09:13:51 +02:00
Long Li
bc5d8fdb70 net/mlx5: fix Verbs FD leak in secondary process
FDs passed from rte_mp_msg are duplicated to the secondary process and
need to be closed.

Fixes: 9a8ab29b84 ("net/mlx5: replace IPC socket with EAL API")
Cc: stable@dpdk.org

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-10-02 09:13:50 +02:00
Ivan Malov
5e3779b7ab ethdev: remove deprecated flow item physical port
Such deprecation was commenced in DPDK 21.11.
Since then, no parties have objected. Remove.

The patch breaks ABI.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
2022-09-27 10:26:51 +02:00
David Marchand
a04322f616 bus: hide bus object
Make rte_bus opaque for non internal users.
This will make extending this object possible without breaking the ABI.

Introduce a new driver header and move rte_bus definition and helpers.
Update drivers and library to use the internal header.

Some applications may have been dereferencing rte_bus objects, mark
this object's accessors as stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-09-23 16:14:34 +02:00
David Marchand
1f37cb2bb4 bus/pci: make driver-only headers private
The pci bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.

While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
  inclusion,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2022-09-23 16:14:34 +02:00
David Marchand
b3f89090d6 bus/auxiliary: make driver-only headers private
The auxiliary bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.

While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
  inclusion,

Signed-off-by: David Marchand <david.marchand@redhat.com>
2022-09-23 16:14:34 +02:00
Matan Azrad
60b254e392 net/mlx5: fix Rx queue recovery mechanism
The local variables are getting inconsistent in data receiving routines
after queue error recovery.
Receive queue consumer index is getting wrong, need to reset one to the
size of the queue (as RQ was fully replenished in recovery procedure).

In MPRQ case, also the local consumed strd variable should be reset.

CVE-2022-28199
Fixes: 88c0733535 ("net/mlx5: extend Rx completion with error handling")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
2022-08-29 12:53:49 +02:00
David Marchand
72206323a5 version: 22.11-rc0
Start a new release cycle with empty release notes.

The ABI version becomes 23.0.
The map files are updated to the new ABI major number (23).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.
Special handling of removed drivers is also dropped in check-abi.sh and
a note has been added in libabigail.abignore as a reminder.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2022-07-21 12:13:48 +02:00
Raja Zidane
5ddb903824 net/mlx5: reject negative integrity item configuration
Negative integrity item refers to condition when the item value mask
is set, but value spec is cleared:
    ... integrity value mask l4_ok value spec 0 ...

ethdev library defines integrity bits `l3_ok` and `l4_ok` as accumulators
for all hardware L3 and L4 integrity verifications respectfully.
Hardware `l3_ok` and `l4_ok` integrity bits refer to L3 and L4
network headers only.
Integrity bits `l3_ok` and `l4_ok` are not compatible between
ethdev library and hardware.

PMD translations for ethdev `l3_ok` are:
 IPv4: `l3_ok` and `l3_csum_ok`
 IPv6: `l3_ok`
ethdev `l4_ok` is translated into PMD `l4_ok` and `l4_csum_ok` bits.

Positive IPv4 `l3_ok` flow item configuration is translated into
a single matcher that AND corresponding hardware bits.
Negative IPv4 `l3_ok` is translated into 2 hardware conditions where
each condition probes a single integrity bit:
  ethdev::l3_ok is 0 => MLX5::l3_ok is 0 OR MLX5:l3_csum_ok is 0
MLX5 hardware does not do OR condition in flow rule item.
Negative IPv4 `l3_ok` must be translated into 2 flow rules.
Similarly negative ethdev `l4_ok` condition is also translated into 2
hardware rules.

Current PMD roadmap does not allow implicit flow rule split.

Bugzilla ID: 948
Cc: stable@dpdk.org

Suggested-by: Raja Zidane <rzidane@nvidia.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-07-05 20:04:02 +02:00
Michael Baum
740a28366c net/mlx5: add test for external Rx queue
Add mlx5 internal test for map and unmap external RxQs.
This patch adds to testpmd app a runtime function to test the mapping
API.

  testpmd> mlx5 port (port_id) ext_rxq map (sw_queue_id) (hw_queue_id)
  testpmd> mlx5 port (port_id) ext_rxq unmap (sw_queue_id)

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-07-05 20:02:57 +02:00
Michael Baum
85d9252e55 net/mlx5: add test for remote PD and CTX
Add mlx5 internal option in testpmd similar to run-time function
"port attach" which adds another parameter named "socket" for attaching
port and add 2 devargs before.

The arguments are "cmd_fd" and "pd_handle" using to import device
created out of PMD. Testpmd application import it using IPC, and updates
the devargs list before attaching.

These arguments were added in
the commit 9d936f4f1a ("common/mlx5: support remote PD and CTX")

The syntax is:

  testpmd> mlx5 port attach (identifier) socket=(path)

Where "path" is the IPC socket path agreed on the remote process.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-07-05 20:01:33 +02:00
Yunjian Wang
a73b78554a net/mlx5: fix stack buffer overflow in drop action
The mlx5_drop_action_create function use mlx5_malloc for allocating
'hrxq', but don't allocate for 'rss_key'. This is wrong and it can
cause buffer overflow.

Detected with address sanitizer:
0 (/usr/lib64/libasan.so.4+0x7b8e2)
1 in mlx5_devx_tir_attr_set ../drivers/net/mlx5/mlx5_devx.c:765
2 in mlx5_devx_hrxq_new ../drivers/net/mlx5/mlx5_devx.c:800
3 in mlx5_devx_drop_action_create ../drivers/net/mlx5/mlx5_devx.c:1051
4 in mlx5_drop_action_create ../drivers/net/mlx5/mlx5_rxq.c:2846
5 in mlx5_dev_spawn ../drivers/net/mlx5/linux/mlx5_os.c:1743
6 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2501
7 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2647
8 in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2722
9 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:657
10 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:711
11 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:150
12 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:269
13 in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:353
14 in pci_probe ../drivers/bus/pci/pci_common.c:380
15 in rte_bus_probe ../lib/eal/common/eal_common_bus.c:72
16 in rte_eal_init ../lib/eal/linux/eal.c:1286
17 in main ../app/test-pmd/testpmd.c:4112

Fixes: 0c762e81da ("net/mlx5: share Rx queue drop action code")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:25:07 +02:00
Shun Hao
92b3c68e9f net/mlx5: fix metering on E-Switch Manager
When meter is used by E-Switch Manager port, there's an error that
cannot get correct port ID.

This patch fixes this by using specific parsing process to get port
ID for E-Switch Manager.

Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:06 +02:00
Shun Hao
68e9925c30 net/mlx5: add limitation for E-Switch Manager match
For BF with old FW which doesn't expose the E-Switch Manager vport ID,
E-Switch Manager port matching works correctly only when BF is in
embedded CPU mode.

This patch adds the limitation description.

Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:06 +02:00
Gregory Etelson
57d6a458a8 net/mlx5: fix RSS expansion for patterns with ICMP item
MLX5 PMD RSS expansion implementation added L4 UDP or TCP
headers after ICMP.
For example:
ETH / IPv4 / ICMP expanded into  ETH / IPv4 / ICMP / {UDP | TCP}
ETH / IPv6 / ICMPv6 expanded into  ETH / IPv6 / ICMPv6 / {UDP | TCP}

The patch updates PMD expansion scheme to handle ICMP and ICMPv6 types
as non-expandable for RSS.

Fixes: c7870bfe09 ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:05 +02:00
Spike Du
f41a5092e6 app/testpmd: add host shaper command
Add command line options to support host shaper configure.
- Command syntax:
  mlx5 set port <port_id> host_shaper avail_thresh_triggered <0|1> rate
<rate_num>

- Example commands:
To enable avail_thresh_triggered on port 1 and disable current host
shaper:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 1 rate 0

To disable avail_thresh_triggered and current host shaper on port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 0

The rate unit is 100Mbps.
To disable avail_thresh_triggered and configure a shaper of 5Gbps on
port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 50

Add sample code to handle rxq available descriptor threshold event, it
delays a while so that rxq empties, then disables host shaper and
rearms available descriptor threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:04 +02:00
Spike Du
2235fcda12 net/mlx5: add API to configure host port shaper
Host port shaper can be configured with QSHR (QoS Shaper Host Register).
Add check in build files to enable this function or not.

The host shaper configuration affects all the ethdev ports belonging to the
same host port.

Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives available descriptor
threshold event.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:04 +02:00
Spike Du
5c9f3294e6 net/mlx5: support Rx descriptor threshold event
Add mlx5 specific available descriptor threshold configuration
and query handler.
In mlx5 PMD, available descriptor threshold is also called
LWM (limit watermark).
While the Rx queue fullness reaches the LWM limit, the driver catches
an HW event and invokes the user callback.
The query handler finds the next Rx queue with pending LWM event
if any, starting from the given Rx queue index.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:02 +02:00
Spike Du
25025da3c5 net/mlx5: handle Rx descriptor LWM event
When LWM meets RQ WQE, the kernel driver raises an event to SW.
Use devx event_channel to catch this and to notify the user.
Allocate this channel per shared device.
The channel has a cookie that informs the specific event port and queue.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:25:00 +02:00
Spike Du
72d7efe464 common/mlx5: share interrupt management
There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:24:59 +02:00
Spike Du
7158e46cb9 net/mlx5: support descriptor LWM for Rx queue
Add LWM (Limit WaterMark) field to Rxq object which indicates the percentage
of Rx queue size used by HW to raise descriptor event to the user.
Allow LWM setting in modify_rq command.
Allow the LWM configuration dynamically by adding RDY2RDY state change.

Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-23 17:23:29 +02:00
Ali Alnubani
bae645a23a net/mlx5: fix build with clang 14
Use fgets instead of fscanf to resolve the following warning
reported by clang 14.0.0 in Fedora 37 (Rawhide):

drivers/net/mlx5/linux/mlx5_ethdev_os.c:1137:52: error:
  'fscanf' may overflow; destination buffer in argument 3 has size 16,
  but the corresponding specifier may require size 17
  [-Werror,-Wfortify-source]
  ret = fscanf(file, "%" RTE_STR(IF_NAMESIZE) "s", port_name);

Fixes: 63d1db710f ("net/mlx5: fix unlimited parsing of switch info")
Cc: stable@dpdk.org

Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:28 +02:00
Sean Zhang
6431068d0f net/mlx5: support field modification in meter rules
This patch introduces MODIFY_FIELD action support in meter. User can
create meter policy with MODIFY_FIELD action in green/yellow action.

For example:

testpmd> add port meter policy 0 21 g_actions modify_field op set
	dst_type ipv4_ecn src_type value src_value 3 width 2 / ...

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:26 +02:00
Sean Zhang
76d5756122 net/mlx5: support modifying ECN field
This patch is to support modify ECN field in IPv4/IPv6 header.

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:25 +02:00
Sean Zhang
e8146c63c3 net/mlx5: support represented port item in flow rules
Add support for represented_port item in pattern. And if the spec and mask
both are NULL, translate function will not add source vport to matcher.

For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
will redirect packets from VF0 and VF1 to wire:
testpmd> flow create 0 ingress transfer group 0 pattern eth /
represented_port / end actions represented_port ethdev_id is 0 / end

Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-23 17:23:23 +02:00
Raja Zidane
fb96caa56a net/mlx5: support ESP item on Windows
ESP item is not supported on Windows, yet it is expanded from the
expansion graph when trying to create default flow to RSS all packets.

Support ESP item match (without ability to match on SPI field on Windows).
Split ESP validation per OS.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-05 17:04:48 +02:00
Michael Baum
2192599c75 net/mlx5: fix entry size in construct data ipool
The mlx5_action_construct_data structure memory is managed by ipool
named acts_ipool.

The size of one entry in this ipool is mistakenly defined as size of
rte_flow_hw structure.
This size is used to reset in the allocated part. When the size is
incorrect it resets memory that does not belong to it.

This patch defines the correct size.

Fixes: f13fab2392 ("net/mlx5: add flow jump action")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-05 17:04:46 +02:00
Geoffrey Le Gourriérec
eadc35df59 net/mlx5: fix statistics read on Linux
This patch encompasses a few fixes carried by a previous patch
that aimed to support bonding device stats counting.

- If mlx5_os_read_dev_stat fails, it returns 1 instead of a
  negative value, causing mlx5_xstats_get to return an invalid
  number of counters. Since this error is not blocking, do not
  mess ret value with mlx5_os_read_dev_stat returned value.

  This allows avoiding the very annoying log:
  "n_xstats != n_xstats_names => skipping"

- Invert the check for mlx5_os_read_dev_stat(), currently leading
  us to store the result if the function failed, and use a
  backup value if it succeeded, which is the opposite of what we
  actually want. Revert to the original (correct) test.

- Add missing test on _mlx5_os_read_dev_counters() to prevent
  using trash stats values.

Fixes: 7ed15acdcd ("net/mlx5: improve xstats of bonding port")
Cc: stable@dpdk.org

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Geoffrey Le Gourriérec <geoffrey.le_gourrierec@6wind.com>
Tested-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-02 17:01:11 +02:00
Rongwei Liu
2bd03a4361 net/mlx5: add Rx drop counters to xstats
Add two kinds of Rx drop counters to DPDK xstats which are
physical port scope.

1. rx_prio[0-7]_buf_discard
   The number of unicast packets dropped due to lack of shared
   buffer resources.
2. rx_prio[0-7]_cong_discard
   The number of packets that is dropped by the Weighted Random
   Early Detection (WRED) function.

Prio[0-7] is determined by VLAN PCP value which is 0 by default.
Both counters are retrieved from kernel ethtool API which calls
PRM command finally.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-06-01 09:49:44 +02:00
Raja Zidane
1485d961e2 net/mlx5: fix Tx recovery
When an error occurs in Tx, and it is moved to ERROR state, it
is not recoverable, during recovery it's state cannot be modified
to INIT. to modify state from RESET to INIT, the port must be
passed in modify attributes, and in case of ERROR to READY
modification path, it was not provided.

Provide port number when changing state from RESET to INIT.

Fixes: 3a87b964ed ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
2022-06-01 09:49:42 +02:00
Shun Hao
96ca87da4f net/mlx5: validate yellow meter action
Yellow meter action support is added in meter hierarchy validation.
If one color uses meter action, the other can only use NULL action
or the same meter action. And only shared meter is supported.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:41 +02:00
Shun Hao
3dc7afa2fa net/mlx5: support yellow meter action for hierarchy tag rule
When a hierarchy meter is shared by other ports, it's needed to iterate
all meter policies in hierarchy to create tag rules, to set packet with
next meter ID, which will be used by related meter drop count.
This patch adds the tag rule for yellow support in hierarchy, so both
green/yellow policy flows can set the correct meter ID.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:38 +02:00
Shun Hao
bf62fb7693 net/mlx5: support yellow meter action in hierarchy
This patch adds the support of meter action for yellow meter policy
flow, so can use meter action for both green and yellow policy flows
in meter hierarchy.
Currently must use the same meter within one meter policy. Packets
passing green/yellow policy flow will have previous meter color of
green/yellow in subsequent meter.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:36 +02:00
Shun Hao
6b838de3d5 net/mlx5: support previous meter color aware
This patch adds the support for previous color aware for meter.
Start_color setting is set to UNDEFINED when creating meter object that
is color aware.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-06-01 09:49:30 +02:00
Thomas Monjalon
64fcadeac0 avoid AltiVec keyword vector
The AltiVec header file is defining "vector", except in C++ build.
The keyword "vector" may conflict easily.
As a rule, it is better to use the alternative keyword "__vector",
so we will be able to #undef vector after including AltiVec header.

Later it may become possible to #undef vector in rte_altivec.h
with a compatibility breakage.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
2022-05-25 11:49:39 +02:00
Raja Zidane
18ca4a4ec7 net/mlx5: support ESP SPI match and RSS hash
In packets with ESP header, the inner IP will be encrypted, and
its fields cannot be used for RSS hashing. So, ESP packets
can be hashed only by the outer IP layer.
So, when using RSS on ESP packets, hashing may not be efficient,
because the fields used by the hash functions are only the outer IPs,
causing all traffic belonging to all tunnels between a given
pair of GWs to land on one core.
Adding the SPI hash field can extend the spreading of IPsec packets.

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-05-15 09:38:59 +02:00
Shun Hao
fc7211097d net/mlx5: fix no-green metering with RSS
When a meter with RSS action being used, there might be several
sub-flows using different sub-policies in the flow splitting stage.
If there's no green action, there's an error that will always use the
same sub-policy for all sub-flows, some resources will be
overwritten and cannot be released, leading assert during port close.

This patch fixes this issue by checking both green and yellow queue
index during getting a blank sub-policy, to avoid the incorrect
resource overwrite.

Fixes: b38a12272b ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org

Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-05-11 09:45:23 +02:00
Michael Baum
7a99336865 net/mlx5: fix LRO configuration in drop Rx queue
The driver wrongly set the LRO configurations to the TIR of the DevX
drop queue even when LRO is not supported.
Actually, the LRO configuration is not relevant to the drop queue at
all.

This causes failure in the initialization of the device, which doesn't
support LRO where the drop queue is created.

Probably, the drop queue creation by DevX missed the fact that LRO is
set by default in the TIR creation function and didn't unset it in the
drop queue case like other cases that unset LRO.

Move the default LRO configuration to unset it and set it only in the
case of all the TIR queues configured with LRO.

Fixes: bc5bee028e ("net/mlx5: create drop queue using DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-26 11:52:18 +02:00
Michael Baum
a213b86821 net/mlx5: fix LRO validation in Rx setup
The mlx5_rx_queue_setup() get LRO offload from user.

When LRO is configured, the LRO flag in rxq_data is set to 1.

This patch adds validation to make sure the LRO is supported.

Fixes: 17ed314 ("net/mlx5: allow LRO per Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-26 11:52:18 +02:00
Dariusz Sosnowski
d2fa2632a4 net/mlx5: fix RSS hash types adjustment
When an indirect action was created with an RSS action configured to
hash on both source and destination L3 addresses (or L4 ports), it caused
shared hrxq to be configured to hash only on destination address
(or port).

This patch fixes this behavior by refining RSS types specified in
configuration before calculating hash types used for hrxq. Refining RSS
types removes *_SRC_ONLY and *_DST_ONLY flags if they are both set.

Fixes: 212d17b6a6 ("net/mlx5: fix missing shared RSS hash types")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-26 11:51:56 +02:00
Raja Zidane
773a7de21a net/mlx5: fix Rx/Tx stats concurrency
Queue statistics are being continuously updated in Rx/Tx burst
routines while handling traffic. In addition to that, statistics
can be reset (written with zeroes) on statistics reset in other
threads, causing a race condition, which in turn could result in
wrong stats.

The patch provides an approach with reference values, allowing
the actual counters to be writable within Rx/Tx burst threads
only, and updating reference values on stats reset.

Fixes: 87011737b7 ("mlx5: add software counters")
Cc: stable@dpdk.org

Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:50:26 +02:00
Dariusz Sosnowski
26f22fa64e net/mlx5: fix GTP handling in header modify action
GTP items were ignored during conversion of modify header actions. This
caused modify TTL action to generate a wrong modify header command when
tunnel and inner headers used different IP versions.

This patch adds GTP item handling to modify header action conversion.

Fixes: 04233f36c7 ("net/mlx5: fix layer type in header modify action")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2022-04-21 12:47:44 +02:00
Adham Masarwah
cb91f12f4a net/mlx5: support MTU settings on Windows
Mlx5Devx library has new API's for setting and getting MTU.
Added new glue functions that wrap the new mlx5devx lib API's.
Implemented the os_ethdev callbacks to use the new glue
functions in Windows.

Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:43 +02:00
Adham Masarwah
3014718fd2 net/mlx5: support promiscuous modes on Windows
Support of the set promiscuous modes by calling the new API
In Mlx5DevX Lib.
Added new glue API for Windows which will be used to communicate
with Windows driver to enable/disable PROMISC or ALLMC.

Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:42 +02:00
Michael Baum
d37b0b4d77 net/mlx5: remove redundant check for hairpin queue
The mlx5_rxq_is_hairpin() function checks whether RxQ type is Hairpin.
It is done by reading a flag in Rx control structure coming from
mlx5_rxq_ctrl_get() function.

The function verifies that the queue index is valid even though it has
been checked within the mlx5_rxq_ctrl_get() function.

This patch removes the redundant check.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:41 +02:00
Michael Baum
1573b07284 net/mlx5: restrict Rx queue array access to boundary
The mlx5_rxq_get() function gets RxQ index and return RxQ priv
accordingly.

When it gets an invalid index, it accesses out of array bounds which
might cause undefined behavior.

This patch adds a check for invalid indexes before accessing to array.

Fixes: 0cedf34da7 ("net/mlx5: move Rx queue reference count")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2022-04-21 12:47:41 +02:00