The Tx queue completion production index was not reset
on Tx queue stop and there were completions remaining
from the previous queue run. This caused the wrong
completion queue operating and overall Tx queue malfunction
on queue restart.
Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The Tx queue stop API doesn't call the PMD callback when the state of
the queue is stopped.
The drivers should update the state to be stopped when the queue stop
callback is done successfully or when the port is stopped.
The drivers should update the state to be started when the queue start
callback is done successfully or when the port is started.
The driver wrongly didn't update the state as started when the port
start callback was done which kept the state as stopped.
Following call to a queue stop API was not completed by ethdev layer
because the state is already stopped.
Move the state update from the Tx queue setup to the port start
callback.
Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The Txq refcnt 1 value means that there is no real reference to the
queue and only the control configurations are saved in the struct.
The patch below wrongly didn't consider it and caused a leak in the Txq
object resource.
Revert the specific update in the refcnt.
Fixes: b5c8b3e70c ("net/mlx5: use C11 atomics for RxQ/TxQ refcounts")
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RxQ/TxQ refcounts.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The Tx queue stop\start operations update the HW state of the Tx queue
object. The stop API should update the state from ready to reset in
order to stop any queue traffic and the start API should update the
state from reset to ready in order to open the traffic path.
The start API wrongly tried to change the state from ready to ready what
caused a failure in FW on the current state validation.
Replace ready to ready command by reset to ready command in the Tx start
API.
Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Asaf Penso <asafp@nvidia.com>
In the current implementation of single port mode hairpin, the peer
queue should belong to the same port of the current queue. When the
two ports hairpin mode is introduced, such checking should be removed
to make the hairpin queue setup execute successfully since it is not
an invalid condition, if the Tx port and Rx port are not the same.
In the meanwhile, different devices could have different queue
configurations. The queues number of peer port is unknown to the
current device. The checking should be removed also.
If the Tx and Rx port IDs of a hairpin peer are different, only the
manual binding and explicit Tx flows are supported. Or else, the four
combinations of modes could be supported. The mode attributes
consistency checking will be done when connecting the queue with its
peer queue.
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The HW objects of the Tx queue is created/destroyed in the device
start\stop stage while the ethdev configurations for the Tx queue
starts from the tx_queue_setup stage.
The PMD should save all the last configurations it got from the ethdev
and to apply them to the device in the dev_start operation.
Wrongly, last code added to mitigate the reference counters didn't take
into account the above rule and combined the configurations and HW
objects to be created\destroyed together.
This causes to memory leak and other memory issues.
Make sure the HW object is released in stop operation when there is no
any reference to it while the configurations stay saved.
Fixes: 17a57183c0 ("net/mlx5: mitigate Tx queue reference counters")
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The functions rte_mbuf_dynfield_lookup() and rte_mbuf_dynflag_lookup()
can return an offset starting with 0 or a negative error code.
In reality the first offsets are probably reserved forever,
but for the sake of strict API compliance,
the checks which considered 0 as an error are fixed.
Fixes: efa79e68c8 ("net/mlx5: support fine grain dynamic flag")
Fixes: 3172c471b8 ("net/mlx5: prepare Tx queue structures to support timestamp")
Fixes: 0febfcce36 ("net/mlx5: prepare Tx to support scheduling")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Use new modify_qp functions for Tx object creation in DevX and Verbs
modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Separate Tx object modification to the Verbs and DevX modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Move Tx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.
Define operation structure and DevX module in addition to the existing
Linux Verbs module.
Separate Tx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The eqn field has become a field of sh directly since it is also
relevant for Tx and Rx.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Move the creation of the completion queue from the mlx5_txq_obj_new
function into an auxiliary function.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Move the creation of the send queue and the completion queue resources
from the mlx5_txq_obj_devx_new function into auxiliary functions.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The Tx queue structures manage 2 different reference counter per queue:
txq_ctrl reference counter and txq_obj reference counter.
There is no real need to use two different counters, it just complicates
the release functions.
Remove the txq_obj counter and use only the txq_ctrl counter.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When a CQ is not created by DevX, it be allocated by either DV function
or by regular Verbs function.
The CQ DV attributes variable was wrongly defined and initialized in Tx
queue creation while the CQ is created by the regular Verbs function
what remained the attributes variable unused.
Remove the unused variable.
Fixes: faf2667fe8 ("net/mlx5: separate DPDK from verbs Tx queue objects")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
As part of SQ creation for Tx queue objects, a HW doorbell memory should
be allocated and mapped to the HW.
The SQ doorbell handler was wrongly saved on the CQ fields what caused
wrong doorbell release in the Tx queue object destroy flow.
Save the SQ doorbell handler in the SQ fields.
Fixes: 3a87b964ed ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remove them and use rte_io_*mb instead.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Hairpin helper functions were not used by drivers, but it was used only
local to ethdev. They are:
'rte_eth_dev_is_rx_hairpin_queue()'
'rte_eth_dev_is_tx_hairpin_queue()'
Exposing them as internal APIs and update mlx5 driver (only user of
hairpin) to use them.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced
with 'void *' to enable compilation under non-Linux operating systems.
New getter functions were added to retrieve the specific fields that
were previously accessed directly.
Replaced structs:
'struct mlx5dv_pp *'
'struct mlx5dv_devx_event_channel *'
'struct mlx5dv_devx_umem *'
'struct mlx5dv_devx_uar *'
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The mlx5 PMD did not support queue_start and queue_stop eth_dev API
routines, queue could not be suspended and resumed during device
operation.
There is the use case when this feature is crucial for applications:
- there is the secondary process handling the queue
- secondary process crashed/aborted
- some mbufs were allocated or used by secondary application
- some mbufs were allocated by Rx queues to receive packets
- some mbufs were placed to send queue
- queue goes to undefined state
In this case there was no reliable way to recovery queue handling
by restarted secondary process but reset queue to initial state
freeing all involved resources, including buffers involved in queue
operations, reset the mbuf pools, and then reinitialize queue
to working state:
- reset mbuf pool, allocate all mbuf to initialize pool into
safe state after the crush and allow safe mbuf free calls
- stop queue, free all potentially involved mbufs
- reset mbuf pool again
- start queue, reallocate mbufs needed
This patch introduces the queue start/stop feature with some
limitations:
- hairpin queues are not supported
- it is application responsibility to synchronize start/stop
with datapath routines, rx/tx_burst must be suspended during
the queue_start/queue_stop calls
- it is application responsibility to track queue usage and
provide coordinated queue_start/queue_stop calls from
secondary and primary processes.
- Rx queues with vectorized Rx routine and engaged CQE
compression are not supported by this patch currently
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Several source files include Verbs header files as in (1). These source
files will not compile under non-Linux operating systems. This commit
removes this inclusion in two cases:
Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source
file so the inclusion in (1) can be safely removed.
Case 2: Verbs symbols are used. Please note the inclusion in (1) already
appears in file linux/mlx5_glue.h (which represents the interface
to the rdma-core library). Therefore, replace (1) in the source file
with (2). Under non-Linux operating systems - file mlx5_glue.h will not
include (1).
(1)
#include <infiniband/verbs.h>
#include <infiniband/mlx5dv.h>
(2)
#include <mlx5_glue.h>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The following Linux calls are replaced by their matching rte APIs.
mmap ==> rte_mem_map()
munmap == >rte_mem_unmap()
sysconf(_SC_PAGESIZE) ==> rte_mem_page_size()
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.
Fixes: f048f3d479 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The number of descriptors to configure in a Rx/Tx queue is passed to
the mlx5_tx/rx_queue_pre_setup() function by value. That means any
adjustments of this variable are local and cannot affect the actual
value that is used to allocate mbufs in the mlx5_txq/rxq_new()
functions. Pass the number as a reference to actually update it.
Fixes: 6218063b39 ("net/mlx5: refactor Rx data path")
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
When secondary process starts, it will allocate its own process private
data, and also does remap to UAR register of the Tx queue. Once the
secondary process exits, these resources should be released accordingly.
And the shared resources owned by primary should not be touched.
Currently, once one port in the secondary process spawn failed, all the
other spawned ports will also be released during process exits. However,
the mlx5_dev_close() function does not add the cases for secondary
process, it means call the mlx5_dev_close() function directly in
secondary process releases the resources it should not touch.
Add the case for secondary process release to its own resources in
mlx5_dev_close() function to help it quits gracefully.
Fixes: 942d13e6e7 ("net/mlx5: fix sharing context destroy order")
Fixes: 3a8207423a ("net/mlx5: close all ports on remove")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The mlx5_txq_obj_new function defines a pointer named txq_data and
assign value into it. After assigning, the code writer is sure that the
variable does not point to NULL and even express it using assertion.
During the function, the function does dereferencing to the pointer
several times and at no point change its value. However, at the end of
the function at the error label when it wants to free one of the fields
of the structure that txq_data points to, it checks again whether
txq_data is invalid.
This check is unnecessary since it knows for sure that txq_data is
valid.
Remove the aforementioned needless check.
Fixes: 6449068818 ("net/mlx5: add free on completion queue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The mlx5_txq_obj_hairpin_new function defines a pointer named tmpl and
allocates memory for it using the rte_zmalloc_socket function.
Later, this function allocates memory to a variable inside tmpl using
the mlx5_devx_cmd_create_sq function.
In both cases, if the allocation fails, the code jumps to the error
label and frees allocated resources. However, in the first jump there
are still no resources to free and the jump only for the line return
NULL is unnecessary. Even worse, when it jumps to error label with
invalid tmpl it actually does dereference to a null pointer.
In contrast, the second jump needs to free the tmpl variable but the
function instead of freeing, tries to free the variable that it just
failed to allocate, and another variable that has never been allocated.
In addition, for another error, the function returns NULL without
freeing the tmpl variable before, causing a memory leak.
Delete the error label and replace each jump with local return NULL and
free tmpl variable if needed.
Fixes: ae18a1ae96 ("net/mlx5: support Tx hairpin queues")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Program received signal SIGSEGV, Segmentation fault.
0x00000000008ef7c4 in mlx5_tx_queue_release (dpdk_txq=0x17ce01680) at
drivers/net/mlx5/mlx5_txq.c:302
301 mlx5_txq_release(ETH_DEV(priv), i);
302 DRV_LOG(DEBUG, "port %u removing Tx queue %u from list",
303 PORT_ID(priv), txq->idx);
The problem is txq is freed inside the mlx5_txq_release() function
and no longer valid in the debug log right after this invocation.
Move the debug log before the mlx5_txq_release() function to fix this.
Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Refactor common memory btree and cache management to common driver.
Replace some input parameters of MR APIs to more common data structure
like PD, port_id, share_cache,... so that multiple PMD drivers can
use those MR APIs.
Modify mlx5 net pmd driver to use MR management APIs from common driver.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When creating a hairpin queue, the total data size and the maximal
number of packets are interrelated. The differ is the stride size.
Larger buffer size means big packet like jumbo could be supported,
but in the meanwhile, it will introduce more cache misses and have a
side effect on the performance.
Now a new device parameter "hp_buf_log_sz" is introduced for
applications to set the total data buffer size (the logarithm value).
Then the maximal number of packets will also be calculated
automatically by this value.
Applications could also change this value to a larger one in order
to support larger packets in hairpin case. A smaller value will be
beneficial for memory consumption.
If it is not set, the default value will be used.
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The devices of the family ConnectX may have two letters as suffix.
Such suffix is preceded with a space and the second x is lowercase:
- ConnectX-4 Lx
- ConnectX-5 Ex
- ConnectX-6 Dx
Uppercase of the device family name BlueField is also fixed.
The lists of supported devices are fixed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The hairpin TX/RX queue depth and packet size is fixed in the past.
When the firmware has some fix or improvement, the PMD will not
make full use of it. And also, 32 packets for a single queue will not
guarantee a good performance for hairpin flows. It will make the
stride size larger and for small packets, it is a waste of memory.
The recommended stride size is 64B now.
The parameter of hairpin queue setup needs to be adjusted.
1. A proper buffer size should support the standard jumbo frame with
9KB, and also more than 1 jumbo frame packet for performance.
2. Number of packets of a single queue should be the maximum
supported value (total buffer size / stride size).
There is no need to support the max capacity of total buffer size
because the memory consumption should also be taken into
consideration.
Fixes: e79c9be915 ("net/mlx5: support Rx hairpin queues")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Use the MLX5_ASSERT macros instead of the standard assert clause.
Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it.
If RTE_LIBRTE_MLX5_DEBUG is enabled MLX5_ASSERT is equal to RTE_VERIFY
to bypass the global CONFIG_RTE_ENABLE_ASSERT option.
If RTE_LIBRTE_MLX5_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT
can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT.
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Use the RTE_LIBRTE_MLX5_DEBUG configuration flag to get rid of dependency
on the NDEBUG definition. This is a preparation step to switch
from standard assert clauses to DPDK RTE_ASSERT ones in MLX5 driver.
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move the vendor information, vendor ID and device IDs from net/mlx5 PMD
to the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The free on completion queue keeps the indices of elts array,
all mbuf stored below this index should be freed on arrival
of normal send completion. In debug version it also contains
an index of completed transmitting descriptor (WQE) to check
queues synchronization.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The new software manged entity is introduced in Tx datapath
- free on completion queue. This queue keeps the information
how many buffers stored in elts array must freed on send
completion. Each element of the queue contains transmitting
descriptor index to be in synch with completion entries (in
debug build only) and the index in elts array to free buffers.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This is preparation step, we are going to store the index
of elts to free on completion in the dedicated free on
completion queue, this patch updates the elts freeing routine
and updates Tx error handling routine to be synced with
coming new queue.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The doorbell register is mapped using mmap() and offset
must have off_t instead of unsigned int. Bug is not critical
due to only least significant bits of offset are currently
tested to determine mapping mode.
Fixes: 8409a28573 ("net/mlx5: control transmit doorbell register mapping")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>