Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Inlining a packet to WQE that cross the WQ wraparound, i.e. the WQE
starts on the end of the ring and ends on the beginning, is not
supported and blocked by the data path logic.
However, in case of TSO, an extra inline header is required before
inlining. This inline header is not taken into account when checking if
there is enough room left for the required inline size.
On some corner cases were
(ring_tailroom - inline header) < inline size < ring_tailroom ,
this can lead to WQE being written outsize of the ring buffer.
Fixing it by always assuming the worse case that inline of packet will
require the inline header.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch adds support for the rx_queue_count API in mlx5 driver
Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
As described in series starting at [1], it adds option to set
metadata value as match pattern when creating a new flow rule.
This patch adds metadata support in mlx5 driver, in two parts:
- Add the validation and setting of metadata value in matcher,
when creating a new flow rule.
- Add the passing of metadata value from mbuf to wqe when
indicated by ol_flag, in different burst functions.
[1] "ethdev: support metadata as flow rule criteria"
http://mails.dpdk.org/archives/dev/2018-September/113269.html
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
eal: add shorthand __rte_weak macro
qat: update code to use __rte_weak macro
avf: update code to use __rte_weak macro
fm10k: update code to use __rte_weak macro
i40e: update code to use __rte_weak macro
ixgbe: update code to use __rte_weak macro
mlx5: update code to use __rte_weak macro
virtio: update code to use __rte_weak macro
acl: update code to use __rte_weak macro
bpf: update code to use __rte_weak macro
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Rxq cq_ci was 16 bits while hardware is expecting to wrap
around 24 bits, this caused interrupt failure after burst of packets.
Fixes: 43e9d9794cde ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
There should be at least one Tx CQE remained if Tx WQ and txq->elts[] have
available slots to send a packet because the size of Tx CQ is exactly
calculated from the size of other resources. As it is guaranteed, it is
checked by an assertion.
max_elts is checked after the assertion for Tx CQ. If no slot is available
in txq->elts[], the assertion would be wrong.
Fixes: 2eefbec531c7 ("net/mlx5: add missing sanity checks for Tx completion queue")
Fixes: 6ce84bd88919 ("net/mlx5: add enhanced multi-packet send for ConnectX-5")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.
The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
of an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Multi-Packet Receive Queue is to receive multiple packets on a single large
buffer. The number of consumed strides in CQE is accumulated to keep track
of the current stride index. However, it is safer to directly use stride
index in CQE to avoid out-of-order situation which can possibly be caused
by introducing LRO in the future.
If Rx CQE compression is enabled, HW can be configured to store the stride
index in a mini-CQE but this will need newer version of library/driver.
Therefore, since this change, MPRQ is only supported with the newer
library/driver and Rx hash result is not supported if MPRQ is enabled along
with Rx CQE compression.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
mlx5_rx_poll_len() returns Rx hash result extracted from either mini CQE or
regular CQE. As mini CQE may not have the hash result if configured
otherwise, it shouldn't assume the first DWORD of mini CQE is always hash
result. mlx5_rx_poll_len() is changed to return pointer to the mini CQE if
compressed.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When a multi-segmented packet is inlined, data can be further inlined even
after the first segment. In case of TSO packet, extra inline data after TSO
header should be carried by an inline DSEG which has 4B inline header
recording the length of the inline data. If more than one segment is
inlined, the length doesn't count from the second segment. This will cause
a fault in HW and CQE will have an error, which is ignored by PMD.
Fixes: f895536be4fa ("net/mlx5: enable inlining data from multiple segments")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.
Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Filling in fields of mbuf becomes a separate inline function so that this
can be reused.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.
There are multiple layers for MR search.
L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().
If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.
If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.
If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.
In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.
To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.
'--legacy-mem' is also supported.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Once tunnel packet type(RTE_PTYPE_TUNNEL_xxx) identified,
PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result of
inner headers, outer L3 and L4 header checksum are always valid as soon
as tunnel identified. If no tunnel identified, PKT_RX_IP_CKSUM_XXX and
PKT_RX_L4_CKSUM_XXX represent checksum result of outer L3 and L4
headers.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This patch introduced tunnel type identification based on flow rules.
If flows of multiple tunnel types built on same queue, no tunnel type
will be returned. User application could use bits in flow mark as tunnel
type identifier.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit adds support for generic tunnel TSO and checksum offload.
PMD will compute the inner/outer headers offset according to the
mbuf fields. Hardware will do calculation based on offsets and types.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Separate TSO function to make logic of mlx5_tx_burst clear.
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
LLC if it isn't inlined. Even though this helps reducing jitter when HW
fetches data by DMA, this can thresh the LLC with evicting precious data.
And if the size of queue is large and there are many queues, this might not
be effective. Also, if application runs on a remote node from the PCIe
link, it may not be helpful and can even cause bad results.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
According to CQE format:
- l4_hdr_type:
0 - None
1 - TCP header was present in the packet
2 - UDP header was present in the packet
3 - TCP header was present in the packet with Empty
TCP ACK indication. (TCP packet <ACK> flag is set,
and packet carries no data)
4 - TCP header was present in the packet with TCP ACK indication.
(TCP packet <ACK> flag is set, and packet carries data).
A packet should be identified as TCP packet if l4_hdr_type is 1, 3 or 4.
Add corresponding idx of TCP ACK to ptype table.
previous discussion:
https://www.mail-archive.com/users@dpdk.org/msg02980.html
Signed-off-by: Bin Huang <bin.huang@hxt-semitech.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This change removes the need to distinguish unlocked priv_*() functions
which are therefore renamed using a mlx5_*() prefix for consistency.
At the same time, all functions from mlx5 uses a pointer to the ETH device
instead of the one to the PMD private data.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Replaces all (void)foo; by __rte_unused macro except when variables are
under #if statements.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Polling a new packet is basically sensing the generation bit in a
completion entry. For some processors not having strongly-ordered memory
model, there has to be a memory barrier between reading the generation bit
and other fields of the entry in order to guarantee data is not stale.
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Remove multi-segment packet handling from mlx5_tx_burst_empw() as there's
fallback to regular Tx for such packets.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
mlx5_tx_burst_empw() falls back to legacy Tx descriptor for multi-segmented
packets without taking advantage of inlining. In many cases, the 1st
segment can be inlined and this could make device fetch only one segment
instead of two. This helps saving PCIe bandwidth when transmitting out
multi-segmented packets with still using the Enhanced Multi-Packet Send for
other packets.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
mlx5_tx_burst() doesn't inline data from the 2nd segment. If there's still
enough room in the descriptor after inlining the 1st segment, further
inlining from the 2nd segment would be beneficial to save PCIe bandwidth.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Tx checksum offloads are correctly handled in a single Tx burst function
whereas the capability is always set.
This causes VXLAN packet with checksum offloads request to be ignored when
the (E)MPS Tx functions are selected.
Fixes: f5fde5205101 ("net/mlx5: add hardware checksum offload for tunnel packets")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
A non max_inline 0 means an inline is requested, there is no need to
duplicate this information.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Most of the variable in mlx5_tx_burst() are defined too soon.
This commit moves them their uses C block of code.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
naddr variable was introduced in
commit 9a7fa9f76d9e ("net/mlx5: use vector types to speed up processing")
to avoid compilation errors on 32bits compilation, as x86_32 is no more
supported by rdma-core nor by MLNX_OFED, this variable becomes useless and
can be safely removed.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
total_length is only visible when SOFT_COUNTERS are enabled
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
If tunneled bit is set in the HW descriptor, the l4_hdr_type bits
describe the inner packet.
Fixes: ea16068c0064 ("net/mlx5: fix L4 packet type support")
Cc: stable@dpdk.org
Reported-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated for a while.
As explained in [1], these flags were kept to let the applications and
PMDs move to the new flag. There is also a need to support Rx vlan
offload without vlan strip (at least for the ixgbe driver).
This patch renames the old flags for this feature, knowing that some
PMDs were using PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT to indicate that
the vlan tci has been saved in the mbuf structure.
It is likely that some PMDs do not set the proper flags when doing vlan
offload, and it would be worth making a pass on all of them.
Link: [1] http://dpdk.org/ml/archives/dev/2017-June/067712.html
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Considering the PMD supports only Ethernet transport, packet which
arrives without any packet type flags in the completion should be
marked with L2_ETHER flag.
Fixes: ea16068c0064 ("net/mlx5: fix L4 packet type support")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Updating a consumer index to HW doesn't require a memory barrier in case
that there's no updated data to be posted to HW, but a compiler barrier
is sufficient. rte_wmb() is replaced with rte_io_wmb() when it makes
changes visible to HW, not other core.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This removes the dependency on specific Mellanox OFED libraries by
using the upstream rdma-core and linux upstream community code.
Both rdma-core upstream and Mellanox OFED are Linux user-space packages:
1. Rdma-core is Linux upstream user-space package.(Generic)
2. Mellanox OFED is Mellanox's Linux user-space package.(Proprietary)
The difference between the two are the APIs towards the kernel.
Support for x86-32 is removed due to issues in rdma-core library.
ICC compilation will be supported as soon as the following patch is
integrated in rdma-core:
https://marc.info/?l=linux-rdma&m=150643474705690&w=2
Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>