Commit Graph

29 Commits

Author SHA1 Message Date
Viacheslav Ovsiienko
572c9d4bda net/mlx5: fix shared Rx queue segment configuration match
While joining the shared Rx queue to the existing queue group,
the queue configurations is checked to be the same as it was
specified in the first group queue creation - all shared
queues should be created with identical configurations.

During the Rx queue creation the buffer split segment
configuration can be altered - the zero segment sizes are
substituted with the actual ones, inherited from the pools,
number of segments can be extended to cover the maximal
packet length, etc. It means the actual queue segment
configuration can not be used directly to match the
configuration provided in the queue setup call.

To resolve an issue we should store original parameters
in the shared queue structure and perform the check against
one of these stored ones.

Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-24 17:25:37 +01:00
Dmitry Kozlyuk
ec9b812b6c net/mlx5: fix Rx queue reference count for indirect RSS
mlx5_ind_table_obj_modify() was not changing the reference counters
of neither the new set of RxQs, nor the old set of RxQs.
On the other hand, creation of the RSS incremented the RxQ refcnt.
If an RxQ was present in both the initial and the modified set,
its reference counter was incremented one extra time
compared to the queues that were only present in the new set.
This prevented releasing said RxQ resources on port stop:

    flow indirect_action 0 create action_id 1 \
        action rss queues 0 1 end / end
    flow indirect_action 0 update 1 \
        action rss queues 2 3 end / end
    quit
    ...
    mlx5_net: mlx5.c:1622: mlx5_dev_close():
        port 0 some Rx queue objects still remain
    mlx5_net: mlx5.c:1626: mlx5_dev_close():
        port 0 some Rx queues still remain

Increment reference counters for the new set of RxQs
and decrement them for the old set of RxQs when needed.
Remove explicit referencing of RxQ from mlx5_ind_table_obj_attach()
because it reuses mlx5_ind_table_obj_modify() code doing this.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
2021-11-24 17:25:35 +01:00
Dmitry Kozlyuk
c65d684497 net/mlx5: fix indirect RSS creation when port is stopped
mlx5_ind_table_obj_setup() was incrementing RxQ reference counters
even when the port was stopped, which prevented RxQ release
and triggered an internal assertion.
Only increment reference counter when the port is started.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
2021-11-24 17:25:33 +01:00
Dariusz Sosnowski
8fbce96fbe net/mlx5: fix reference count on detached indirect action
This patch fixes segfault which was triggered when port, with indirect
actions created, was closed. Segfault was occurring only when
RTE_LIBRTE_MLX5_DEBUG was defined. It was caused by redundant decrement
of RX queues refcount:

- refcount was decremented when port was stopped and indirect actions
were detached from RX queues (port stop),
- refcount was decremented when indirect actions objects were destroyed
(port close or destroying of indirect action).

This patch fixes behavior. Dereferencing Rx queues is done if and only
if indirect action is explicitly destroyed by the user or detached on
port stop. Dereferencing Rx queues on action destroy operation depends
on an argument to the wrapper of indirect action destroy operation,
introduced in this patch.

Fixes: ec4e11d41d ("net/mlx5: preserve indirect actions on restart")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-23 17:57:19 +01:00
Michael Baum
71304b5c7b common/mlx5: fix redundant field in MR control structure
Inside the MR control structure there is a pointer to the common device.
This pointer enables access to the global cache as well as hardware
objects that may be required in case a new MR needs to be created.

The purpose of adding this pointer into the MR control structure was to
avoid its transfer as a parameter to all the functions of searching MR
in the caches.
However, adding it to this structure increased the Rx and Tx data-path
structures, all the fields that followed it were slightly moved away
which caused to a reduction in performance.

This patch removes the pointer from the structure. It can be accessed
through the "dev_gen_ptr" existing field using the "container_of"
operator.

Fixes: 334ed198ab ("common/mlx5: remove redundant parameter in MR search")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-17 10:42:20 +01:00
Dmitry Kozlyuk
077be91dd7 net/mlx5: fix split buffer Rx
Routine to lookup LKey on Rx was assuming that the mbuf address
always belongs to a single mempool: the one associated with an RxQ
or the MPRQ mempool. This assumption is false for split buffers case.
A wrong LKey was looked up, resulting in completion errors.
Modify lookup routines to lookup LKey in the mbuf->pool
for non-MPRQ cases both on Rx datapath and on queue initialization.

Fixes: fec28ca0e3 ("net/mlx5: support mempool registration")

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-08 13:56:29 +01:00
Michael Baum
5dfa003db5 common/mlx5: fix post doorbell barrier
The rdma-core library can map doorbell register in two ways, depending
on the environment variable "MLX5_SHUT_UP_BF":

  - as regular cached memory, the variable is either missing or set to
    zero. This type of mapping may cause the significant doorbell
    register writing latency and requires an explicit memory write
    barrier to mitigate this issue and prevent write combining.

  - as non-cached memory, the variable is present and set to not "0"
    value. This type of mapping may cause performance impact under
    heavy loading conditions but the explicit write memory barrier is
    not required and it may improve core performance.

The UAR creation function maps a doorbell in one of the above ways
according to the system. In run time, it always adds an explicit memory
barrier after writing to.
In cases where the doorbell was mapped as non-cached memory, the
explicit memory barrier is unnecessary and may impair performance.

The commit [1] solved this problem for a Tx queue. In run time, it
checks the mapping type and provides the memory barrier after writing to
a Tx doorbell register if it is needed. The mapping type is extracted
directly from the uar_mmap_offset field in the queue properties.

This patch shares this code between the drivers and extends the above
solution for each of them.

[1] commit 8409a28573
    ("net/mlx5: control transmit doorbell register mapping")

Fixes: f8c97babc9 ("compress/mlx5: add data-path functions")
Fixes: 8e196c08ab ("crypto/mlx5: support enqueue/dequeue operations")
Fixes: 4d4e245ad6 ("regex/mlx5: support enqueue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 16:21:03 +01:00
Michael Baum
334ed198ab common/mlx5: remove redundant parameter in MR search
Memory region management has recently been shared between drivers,
including the search for caches in the data plane.
The initial search in the local linear cache of the queue, usually
yields a result and one should not continue searching in the next level
caches.

The function that searches in the local cache gets the pointer to a
device as a parameter, that is not necessary for its operation
but for subsequent searches (which, as mentioned, usually do not
happen).
Transferring the device to a function and maintaining it, takes some
time and causes some impact on performance.

Add the pointer to the device as a field of the mr_ctrl structure. The
field will be updated during control path and will be used only when
needed in the search.

Fixes: fc59a1ec55 ("common/mlx5: share MR mempool registration")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-07 14:11:16 +01:00
Bing Zhao
febcac7b46 net/mlx5: support Rx queue delay drop
For the Ethernet RQs, if there all receiving descriptors are
exhausted, the packets being received will be dropped. This behavior
prevents slow or malicious software entities at the host from
affecting the network. While for hairpin cases, even if there is no
software involved during the packet forwarding from Rx to Tx side,
some hiccup in the hardware or back pressure from Tx side may still
cause the descriptors to be exhausted. In certain scenarios it may be
preferred to configure the device to avoid such packet drops,
assuming the posting of descriptors will resume shortly.

To support this, a new devarg "delay_drop" is introduced. By default,
the delay drop is enabled for hairpin Rx queues and disabled for
standard Rx queues. This value is used as a bit mask:
  - bit 0: enablement of standard Rx queue
  - bit 1: enablement of hairpin Rx queue
And this attribute will be applied to all Rx queues of a device.

The "rq_delay_drop" capability in the HCA_CAP is checked before
creating any queue. If the hardware capabilities do not support
this delay drop, all the Rx queues will still be created without
this attribute, and the devarg setting will be ignored even if it
is specified explicitly. A warning log is used to notify the
application when this occurs.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-05 17:04:53 +01:00
Xueming Li
09c2555303 net/mlx5: support shared Rx queue
This patch introduces shared RxQ. All shared Rx queues with same group
and queue ID share the same rxq_ctrl. Rxq_ctrl and rxq_data are shared,
all queues from different member port share same WQ and CQ, essentially
one Rx WQ, mbufs are filled into this singleton WQ.

Shared rxq_data is set into device Rx queues of all member ports as
RxQ object, used for receiving packets. Polling queue of any member
ports returns packets of any member, mbuf->port is used to identify
source port.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:50 +01:00
Xueming Li
5cf0707fc7 net/mlx5: remove Rx queue data list from device
Rx queue data list(priv->rxqs) can be replaced by Rx queue
list(priv->rxq_privs), removes it and replaces with universal wrapper
API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:49 +01:00
Xueming Li
5ceb3a02b0 net/mlx5: move Rx queue DevX resource
To support shared RX queue, moves DevX RQ which is per queue resource to
Rx queue private data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:48 +01:00
Xueming Li
5db77fef78 net/mlx5: remove port info from shareable Rx queue
To prepare for shared Rx queue, removes port info from shareable Rx
queue control.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:47 +01:00
Xueming Li
44126bd9d0 net/mlx5: move Rx queue hairpin info to private data
Hairpin info of Rx queue can't be shared, moves to private queue data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:47 +01:00
Xueming Li
0cedf34da7 net/mlx5: move Rx queue reference count
Rx queue reference count is counter of RQ, used to count reference to RQ
object. To prepare for shared Rx queue, this patch moves it from
rxq_ctrl to Rx queue private data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:46 +01:00
Xueming Li
4cda06c3c3 net/mlx5: split Rx queue into shareable and private
To prepare shared Rx queue, splits RxQ data into shareable and private.
Struct mlx5_rxq_priv is per queue data.
Struct mlx5_rxq_ctrl is shared queue resources and data.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-11-04 22:55:45 +01:00
Dmitry Kozlyuk
ec4e11d41d net/mlx5: preserve indirect actions on restart
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():

    Rx queue allocation failed: Cannot allocate memory

Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.

When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.

Fixes: 4b61b8774b ("ethdev: introduce indirect flow action")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-11-02 18:59:17 +01:00
Olivier Matz
daa02b5cdd mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-24 13:37:43 +02:00
Michael Baum
fc59a1ec55 common/mlx5: share MR mempool registration
Expand the use of mempool registration to MR management for other
drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:58:00 +02:00
Michael Baum
9f1d636f3e common/mlx5: share MR management
Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-21 15:57:58 +02:00
Konstantin Ananyev
8d7d4fcdca ethdev: change input parameters for Rx queue count
Currently majority of fast-path ethdev ops take pointers to internal
queue data structures as an input parameter.
While eth_rx_queue_count() takes a pointer to rte_eth_dev and queue
index.
For future work to hide rte_eth_devices[] and friends it would be
plausible to unify parameters list of all fast-path ethdev ops.
This patch changes eth_rx_queue_count() to accept pointer to internal
queue data as input parameter.
While this change is transparent to user, it still counts as an ABI change,
as eth_rx_queue_count_t is used by ethdev public inline function
rte_eth_rx_queue_count().

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-13 22:14:58 +02:00
Dmitry Kozlyuk
fec28ca0e3 net/mlx5: support mempool registration
When the first port in a given protection domain (PD) starts,
install a mempool event callback for this PD and register all existing
memory regions (MR) for it. When the last port in a PD closes,
remove the callback and unregister all mempools for this PD.
This behavior can be switched off with a new devarg: mr_mempool_reg_en.

On TX slow path, i.e. when an MR key for the address of the buffer
to send is not in the local cache, first try to retrieve it from
the database of registered mempools. Supported are direct and indirect
mbufs, as well as externally-attached ones from MLX5 MPRQ feature.
Lookup in the database of non-mempool memory is used as the last resort.

RX mempools are registered regardless of the devarg value.
On RX data path only the local cache and the mempool database is used.
If implicit mempool registration is disabled, these mempools
are unregistered at port stop, releasing the MRs.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-10-19 16:35:16 +02:00
Xueming Li
7483341ae5 ethdev: change queue release callback
Currently, most ethdev callback API use queue ID as parameter, but Rx
and Tx queue release callback use queue object which is used by Rx and
Tx burst data plane callback.

To align with other eth device queue configuration callbacks:
- queue release callbacks are changed to use queue ID
- all drivers are adapted

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-06 19:16:03 +02:00
Suanming Mou
6507c9f51d common/mlx5: call list callbacks with context
This commit optimizes to call the list callback functions with global
context directly.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-07-15 16:09:17 +02:00
Matan Azrad
491b7137ff net/mlx5: add per-lcore cache to the list utility
When mlx5 list object is accessed by multiple cores, the list lock
counter is all the time written by all the cores what increases cache
misses in the memory caches.

In addition, when one thread accesses the list for add\remove\lookup
operation, all the other threads coming to do an operation in the list
are stuck in the lock.

Add per lcore cache to allow thread manipulations to be lockless when
the list objects are mostly reused.

Synchronization with atomic operations should be done in order to
allow threads to unregister an entry from other thread cache.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
2021-07-15 15:19:10 +02:00
Matan Azrad
e78e5408da net/mlx5: remove cache term from the list utility
The internal mlx5 list tool is used mainly when the list objects need to
be synchronized between multiple threads.

The "cache" term is used in the internal mlx5 list API.

Next enhancements on this tool will use the "cache" term for per thread
cache management.

To prevent confusing, remove the current "cache" term from the API's
names.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
2021-07-15 15:19:10 +02:00
Alexander Kozyrev
a8f0df6bf9 net/mlx5: support power monitoring
Support the PMD power management API in MLX5 driver.
The monitor policy of this API puts a CPU core to sleep until
a data in some monitored memory address is changed by the NIC.
Implement the get_monitor_addr function to return an address
of a CQE owner bit to monitor the arrival of a new packet.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-05-03 12:12:42 +02:00
Michael Baum
a96102c869 net/mlx5: separate Rx function implementations to new file
This patch separates Rx function implementations to different source
file as an optional preparation step for further consolidation of Rx
burst functions.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-04-15 08:24:51 +02:00
Michael Baum
151cbe3aab net/mlx5: separate Rx function declarations to another file
The mlx5_rxtx.c file contains a lot of Tx burst functions, each of those
is performance-optimized for the specific set of requested offloads.
These ones are generated on the basis of the template function and it
takes significant time to compile, just due to a large number of giant
functions generated in the same file and this compilation is not being
done in parallel with using multithreading.

Therefore we can split the mlx5_rxtx.c file into several separate files
to allow different functions to be compiled simultaneously.
In this patch, we separate Rx function declarations to different header
file in preparation for removing them from the source file and as an
optional preparation step for further consolidation of Rx burst
functions.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-04-15 08:24:49 +02:00