This commit adds support for generic tunnel TSO and checksum offload.
PMD will compute the inner/outer headers offset according to the
mbuf fields. Hardware will do calculation based on offsets and types.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
TSO should be set if either of the TSO offload flags is requested.
Fixes: dbccb4cddc ("net/mlx5: convert to new Tx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
These functions return int although they are not supposed to fail,
resulting in unnecessary checks in their callers.
Some are returning error where is should be a boolean.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This change removes the need to distinguish unlocked priv_*() functions
which are therefore renamed using a mlx5_*() prefix for consistency.
At the same time, all functions from mlx5 uses a pointer to the ETH device
instead of the one to the PMD private data.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In priv struct only the memory region needs to be protected against
concurrent access between the control plane and the data plane.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Replaces all (void)foo; by __rte_unused macro except when variables are
under #if statements.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
priv_tx_uar_remap() is wrongly considering the queue is already configured
and thus present in the queue array of the device.
Fixes: f8b9a3bad4 ("net/mlx5: install a socket to exchange a file descriptor")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This lays the groundwork for externalizing rdma-core as an optional
run-time dependency instead of a mandatory one.
No functional change.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reserving the memory space for the UAR near huge pages helps to
**reduce** the cases where the secondary process cannot start. Those
pages being physical pages they must be mapped at the same virtual
address as in the primary process to have a
working secondary process.
As this remap is almost the latest being done by the processes
(libraries, heaps, stacks are already loaded), similar to huge pages,
there is **no guarantee** this mechanism will always work.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When no memory is available on the same numa node than the device, the
initialization of the device fails. However, the use case where the
cores and memory are on a different socket than the device is valid,
even if not optimal.
To fix this issue, this commit introduces an infrastructure to select
the socket on which to allocate the verbs objects based on the ethdev
configuration and the object type, rather than the PCI numa node.
Fixes: 1e3a39f72d ("net/mlx5: allocate verbs object into shared memory")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Move device configuration and features capabilities to its own structure.
This structure is filled by mlx5_pci_probe(), outside of this function
it should be treated as *read only*.
This configuration struct will be used for the Tx/Rx queue setup to
select the Tx/Rx queue parameters based on the user configuration and
device capabilities.
In addition it will be used by the burst selection function to decide
on the best pkt burst to be used.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
A non max_inline 0 means an inline is requested, there is no need to
duplicate this information.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Since the secondary process has its own devops, function which cannot be
called by the secondary don't need anymore to verify which process is
calling it.
Fixes: 87ec44ce16 ("net/mlx5: add operations for secondary process")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Use the same design for DPDK queue as for Verbs queue for symmetry, this
also helps in fixing some issues like the DPDK release queue API which
is not expected to fail. With such design, the queue is released when
the reference counters reaches 0.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Move verbs object to their own functions to allocate/release them
independently from the DPDK queue. At the same time a reference counter
is added to help in issues detections when the queue is being release
but still in use somewhere else (flows for instance).
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch introduce the Memory region as a shared object where users
should get a reference to it by calling the priv_mr_get() or
priv_mr_new() to create the memory region. This last one will
register the memory pool in the kernel driver and retrieve the
associated memory region.
This should help to reduce the memory consumption cause by registering
multiple times the same memory pool.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This flag is already present in the Ethernet device.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Use a unix socket to get back the communication channel with the Kernel
driver from the primary process, this is necessary to remap those pages
in the secondary process memory space and thus use the same Tx queues.
This is only supported from rdma-core (v15).
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
This removes the dependency on specific Mellanox OFED libraries by
using the upstream rdma-core and linux upstream community code.
Both rdma-core upstream and Mellanox OFED are Linux user-space packages:
1. Rdma-core is Linux upstream user-space package.(Generic)
2. Mellanox OFED is Mellanox's Linux user-space package.(Proprietary)
The difference between the two are the APIs towards the kernel.
Support for x86-32 is removed due to issues in rdma-core library.
ICC compilation will be supported as soon as the following patch is
integrated in rdma-core:
https://marc.info/?l=linux-rdma&m=150643474705690&w=2
Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Mellanox NICs has a limitation on the number of mbuf segments a multi
segment mbuf can have. The max number depends on the Tx offloads
requested.
The current code not enforce such limitation, which might cause
malformed work requests to be written to the device.
This commit adds verification for the number of mbuf segments posted
to the device. In case of overflow the packet will not be sent.
In addition update the nic documentation with the limitation.
Considering device limitation is 63 data segments in a work request, the
maximum number of segment in mbuf was calculated taking TSO as the worst
case:
max_nb_segs = 63 - (control_segment + ethernet segment +
TSO headers inline + inline segment +
extra inline to align to cacheline)
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Secondary process is a copy/paste of the mlx4 drivers, it was never
tested and it even segfault at the secondary process start in the
mlx5_pci_probe().
This makes more sense to wipe this non working feature to re-write a
working and functional version.
Fixes: a48deada65 ("mlx5: allow operation in secondary processes")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Those are useless since DPDK headers have been cleaned up.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
To make vectorized burst routines enabled, it is required to run on x86_64
architecture. If all the conditions are met, the vectorized burst functions
are enabled automatically. The decision is made individually on RX and TX.
There's no PMD option to make a selection.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The callbacks are global to a device but the selection is made every queue
configuration, which is redundant.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When searching LKEY, if search key is mempool pointer, the 2nd cacheline
has to be accessed and it even requires to check whether a buffer is
indirect per every search. Instead, using address for search key can reduce
cycles taken. And caching the last hit entry is beneficial as well.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
For Tx SW ring (txq->elts[]), indexes are kept and used in
txq->elts_head/tail. Because of this, one entry must always be left unused
and it also makes code complex. Changed to store counters instead of
indexes in order to make the code simpler and to reduce a few calculations.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
SW completion ring of Tx (txq->elts) stores individual mbufs even if a
multi-segmented packet is sent. rte_pktmbuf_free_seg() must be used when
cleaning up the completion ring. Otherwise, chained mbufs are redundantly
freed and finally it would cause a crash.
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
CC: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Completion buffer size was computed wrongly, causing
completion polling to wraparound too early and miss entries.
Fixing it by using Direct Verbs to query the CQ info.
Fixes: 6218063b39 ("net/mlx5: refactor Rx data path")
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When TSO is enabled, Verbs layer aggregates the TSO
inline size with the txq inline size for the Tx creation,
while the PMD takes the maximum among them.
Fixing it by adjusting the max inline parameter before
passing to to Verbs.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Remove unnecessary interface queries and the Resource Domain when
creating the Queue Pair. Since mlx5 PMD data path is on top of native
APIs, such Verbs library calls are no longer needed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
When configuring Rx/Tx queue, if queue already exists, it is reused. But if
the queue size is changed, it must be resized to not access/overwrite
invalid memory.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Prior to this commit Tx checksum offload was supported only for the
inner headers.
This commit adds support for the hardware to compute the checksum for the
outer headers as well.
The support is for tunneling protocols GRE and VXLAN.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>