There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.
There is no need for this duplication, because there is a reference to
the common structure from within the net structures.
This patch removes it from net configure structure and uses the common
config structure instead.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The lock sh->txpp.mutex was not correctly released on one path
of cleanup function return, potentially causing the deadlock.
Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
Cc: stable@dpdk.org
Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The rdma-core library can map doorbell register in two ways, depending
on the environment variable "MLX5_SHUT_UP_BF":
- as regular cached memory, the variable is either missing or set to
zero. This type of mapping may cause the significant doorbell
register writing latency and requires an explicit memory write
barrier to mitigate this issue and prevent write combining.
- as non-cached memory, the variable is present and set to not "0"
value. This type of mapping may cause performance impact under
heavy loading conditions but the explicit write memory barrier is
not required and it may improve core performance.
The UAR creation function maps a doorbell in one of the above ways
according to the system. In run time, it always adds an explicit memory
barrier after writing to.
In cases where the doorbell was mapped as non-cached memory, the
explicit memory barrier is unnecessary and may impair performance.
The commit [1] solved this problem for a Tx queue. In run time, it
checks the mapping type and provides the memory barrier after writing to
a Tx doorbell register if it is needed. The mapping type is extracted
directly from the uar_mmap_offset field in the queue properties.
This patch shares this code between the drivers and extends the above
solution for each of them.
[1] commit 8409a28573
("net/mlx5: control transmit doorbell register mapping")
Fixes: f8c97babc9 ("compress/mlx5: add data-path functions")
Fixes: 8e196c08ab ("crypto/mlx5: support enqueue/dequeue operations")
Fixes: 4d4e245ad6 ("regex/mlx5: support enqueue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Previously, we set txq affinity to 0 and let firmware
to perform round-robin when bonding. Firmware uses a
global counter to assign txq affinity to different
physical ports accord to remainder after division.
There are three dis-advantages:
1. The global counter is shared between kernel and dpdk.
2. After restarting pmd or port, the previous counter value
is reused, so the new affinity is unpredictable.
3. There is no way to get what affinity is set by firmware.
In this update, we will create several TISs up to the
number of bonding ports and bind each TIS to one PF port.
For each port, it will start to pick up TIS using its port
index. Upper layer application can quickly calculate each txq's
affinity without querying.
At DPDK layer, when creating txq with 2 bonding ports, the
affinity is set like:
port 0: 1-->2-->1-->2
port 1: 2-->1-->2-->1
port 2: 1-->2-->1-->2
Note: Only applicable to DevX api.
This affinity subjects to HW hash.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add HCA attributes structure as a field of device config structure.
It query in common probing, and updates the timestamp format fields.
Each driver use HCA attributes from common device config structure,
instead of query it for itself.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Create shared Protection Domain in common area and add it and its PDN as
fields of common device structure.
Use this Protection Domain in all drivers and remove the PD and PDN
fields from their private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Create shared context device in common area and add it as a field of
common device.
Use this context device in all drivers and remove the ctx field from
their private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Definition of `rte_ether_addr` structure used a workaround allowing DPDK
and Windows SDK headers to be used in the same file, because Windows SDK
defines `s_addr` as a macro. Rename `s_addr` to `src_addr` and `d_addr`
to `dst_addr` to avoid the conflict and remove the workaround.
Deprecation notice:
https://mails.dpdk.org/archives/dev/2021-July/215270.html
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
The committing completions by clock queue might be delayed
after queue initialization is done and the only Clock Queue
completion entry (CQE) might keep the invalid status till
the CQE first update happens.
The mlx5_txpp_update_timestamp() wrongly recognized invalid
status as error and reported about lost synchronization.
The patch recognizes the invalid status as "not updated yet"
and accurate scheduling initialization routine waits till
CQE first update happens.
Some collateral typos in comment are fixed as well.
Fixes: 77522be0a5 ("net/mlx5: introduce clock queue service routine")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch separates Tx function declarations to different header file
in preparation for removing their implementation from the source file
and as an optional preparation for Tx cleanup.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The mlx5_rxtx.c file contains a lot of Tx burst functions, each of those
is performance-optimized for the specific set of requested offloads.
These ones are generated on the basis of the template function and it
takes significant time to compile, just due to a large number of giant
functions generated in the same file and this compilation is not being
done in parallel with using multithreading.
Therefore we can split the mlx5_rxtx.c file into several separate files
to allow different functions to be compiled simultaneously.
In this patch, we separate Rx function declarations to different header
file in preparation for removing them from the source file and as an
optional preparation step for further consolidation of Rx burst
functions.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds support for the timestamp format settings for
the receive and send queues. If the firmware version x.30.1000
or above is installed and the NIC timestamps are configured
with the real-time format, the default zero values for newly
added fields cause the queue creation to fail.
The patch queries the timestamp formats supported by the hardware
and sets the configuration values in queue context accordingly.
Fixes: 86fc67fc93 ("net/mlx5: create advanced RxQ object via DevX")
Fixes: ae18a1ae96 ("net/mlx5: support Tx hairpin queues")
Fixes: 15c3807e86 ("common/mlx5: support DevX QP operations")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
The rte_ethdev_driver.h, rte_ethdev_vdev.h and rte_ethdev_pci.h files are
for drivers only and should be a private to DPDK and not installed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Steven Webster <steven.webster@windriver.com>
Using common function for DevX SQ creation for rearm and clock queue.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Using common function for CQ creation at rearm queue and clock queue.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
According to the current data-path implementation in the PMD the CQE
size must follow the cache-line size.
So, the configuration of the CQE size should be depended in
RTE_CACHE_LINE_SIZE.
Wrongly, part of the CQE creations didn't follow it exactly what caused
an incompatibility between HW and SW in the data-path when working in
128B cache-line size systems.
Adjust the rule for any CQE creation.
Remove the cqe_size attribute from the DevX CQ creation command and set
it inside the command translation according to the cache-line size.
Fixes: 79a7e409a2 ("common/mlx5: prepare support of packet pacing")
Fixes: 5cd0a83f41 ("common/mlx5: support more fields in DevX CQ create")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap the API to create/destroy event channel and to subscribe an event
with OS calls. In Linux those calls are implemented by glue functions
while in Windows they are not supported.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap glue calls for UMEM registration and deregistration with generic OS
calls since each OS (Linux or Windows) has a different glue API
parameters.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Some Windows compilers consider static_assert() as calls to another
function rather than a compiler directive which allows checking type
information at compile time. This only occurs if the static_assert call
appears inside another function scope. To solve it move the
static_assert calls to global scope in the files where they are used.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Replace Linux API usleep() and nanosleep() with rte_delay_us_sleep().
The replacement occurs in shared files compiled under different
operating systems.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Packet pacing is allocated under condition #ifdef HAVE_MLX5DV_PP_ALLOC.
In a similar way - free packet pacing index under the same condition.
This update is required to successfully compile under operating systems
which do not support packet pacing.
Fixes: aef1e20ebe ("net/mlx5: allocate packet pacing context")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering and explicit
memory barrier for Clock Queue and timestamps synchronization.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The eqn field has become a field of sh directly since it is also
relevant for Tx and Rx.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced
with 'void *' to enable compilation under non-Linux operating systems.
New getter functions were added to retrieve the specific fields that
were previously accessed directly.
Replaced structs:
'struct mlx5dv_pp *'
'struct mlx5dv_devx_event_channel *'
'struct mlx5dv_devx_umem *'
'struct mlx5dv_devx_uar *'
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The following Linux calls are replaced by their matching rte APIs.
mmap ==> rte_mem_map()
munmap == >rte_mem_unmap()
sysconf(_SC_PAGESIZE) ==> rte_mem_page_size()
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:
- txpp_err_miss_int - rearm queue interrupt was not handled
was not handled in time and service routine might miss
the completions
- txpp_err_rearm_queue - reports errors in rearm queue
- txpp_err_clock_queue - reports errors in clock queue
- txpp_err_ts_past - timestamps in the packet being sent
were found in the past, timestamps were ignored
- txpp_err_ts_future - timestamps in the packet being sent
were found in the too distant future (beyond HW/clock queue
capabilities to schedule, typically it is about 16M of
tx_pp devarg periods)
- txpp_jitter - estimated jitter in device clocks between
8K completions of Clock Queue.
- txpp_wander - estimated wander in device clocks between
16M completions of Clock Queue.
- txpp_sync_lost - error flag, the Clock Queue completions
synchronization is lost, accurate packet scheduling can
not be handled, timestamps are being ignored, the restart
of all ports using scheduling must be performed.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:
- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
continuously running and pushing cross-channel requests
to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularity and assigns to Clock Queue.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>