Updating a consumer index to HW doesn't require a memory barrier in case
that there's no updated data to be posted to HW, but a compiler barrier
is sufficient. rte_wmb() is replaced with rte_io_wmb() when it makes
changes visible to HW, not other core.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This removes the dependency on specific Mellanox OFED libraries by
using the upstream rdma-core and linux upstream community code.
Both rdma-core upstream and Mellanox OFED are Linux user-space packages:
1. Rdma-core is Linux upstream user-space package.(Generic)
2. Mellanox OFED is Mellanox's Linux user-space package.(Proprietary)
The difference between the two are the APIs towards the kernel.
Support for x86-32 is removed due to issues in rdma-core library.
ICC compilation will be supported as soon as the following patch is
integrated in rdma-core:
https://marc.info/?l=linux-rdma&m=150643474705690&w=2
Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Mellanox NICs has a limitation on the number of mbuf segments a multi
segment mbuf can have. The max number depends on the Tx offloads
requested.
The current code not enforce such limitation, which might cause
malformed work requests to be written to the device.
This commit adds verification for the number of mbuf segments posted
to the device. In case of overflow the packet will not be sent.
In addition update the nic documentation with the limitation.
Considering device limitation is 63 data segments in a work request, the
maximum number of segment in mbuf was calculated taking TSO as the worst
case:
max_nb_segs = 63 - (control_segment + ethernet segment +
TSO headers inline + inline segment +
extra inline to align to cacheline)
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Tx descriptor for TSO embeds packet header to be replicated. If Tx
inline is enabled, there could be additional packet data inlined with
4B inline header ahead. And between the header and additional inlined
packet data, there may be padding to make the inline part aligned to
MLX5_WQE_DWORD_SIZE. In calculating the total size of inlined data,
the size of inline header and padding is missing.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Those are useless since DPDK headers have been cleaned up.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
To use the vector, it needs to add to the PMD Rx mbuf ring four extra mbuf
to avoid memory corruption. This additional mbuf are added on dev_start()
whereas all other mbuf are allocated on queue setup.
This patch brings this allocation back to the same place as other mbuf
allocation.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
If there's a Rx completion with error (e.g, MTU mismatch), it is handled
later out of main burst loop as a slow path for performance reason.
Statistics should be corrected by subtracting counters of errored packets.
Also, the last entry of mlx5_ptype_table[] must be RTE_PTYPE_ALL_MASK to
mark error in completion.
Fixes: ea16068c0064 ("net/mlx5: fix L4 packet type support")
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Calculation of packet type is currently enabled only when HW checksum is
enabled.
This isn't related to HW checksum offload. Enable it regardless.
Fixes: 081f7eae242e ("mlx5: process offload flags only when requested")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
On a host having 128B cacheline size, some devices insert 64B padding in
each completion entry to avoid partial cacheline write by HW. But, as the
padding is ahead of completion data, casting a completion entry to
compressed mini-completions must start from the middle of the completion.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
To make vectorized burst routines enabled, it is required to run on x86_64
architecture. If all the conditions are met, the vectorized burst functions
are enabled automatically. The decision is made individually on RX and TX.
There's no PMD option to make a selection.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When searching LKEY, if search key is mempool pointer, the 2nd cacheline
has to be accessed and it even requires to check whether a buffer is
indirect per every search. Instead, using address for search key can reduce
cycles taken. And caching the last hit entry is beneficial as well.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When processing Tx completion, it is more efficient to free buffers in bulk
using rte_mempool_put_bulk() if buffers are from a same mempool.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
For Tx SW ring (txq->elts[]), indexes are kept and used in
txq->elts_head/tail. Because of this, one entry must always be left unused
and it also makes code complex. Changed to store counters instead of
indexes in order to make the code simpler and to reduce a few calculations.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In case on multi segment packet, the TSO segment size
was taken from the last segment. This may lead to incorrect
values in case not all segments are initialized with the field.
Fixing it by taking the value from the first segment.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
These functions do not belong to the data path. Their prototypes are also
misplaced.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool"), some
fields are already initialised and do not need to be modified by the PMD
anymore.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.
Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
In the main loop of mlx5_tx_burst(), pointers/indexes are advanced at the
beginning. Therefore, those should be rolled back if checking resource
availability fails and breaks the loop. And some of them are even
redundant.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In case of resource deficiency on Tx, mlx5_tx_burst() breaks the loop
without rolling back consumed resources (txq->wqes[] and txq->elts[]). This
can make application crash because unposted mbufs can be freed while
processing completions. Other Tx functions don't have this issue.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Fixes: f04f1d51564b ("net/mlx5: fix Tx WQE corruption caused by starvation")
Cc: stable@dpdk.org
Reported-by: Hanoch Haim <hhaim@cisco.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Current implementation is error-prone if the max inline size
(txq->max_inilne) is decoupled from txq->inline_en and becomes zero. If it
becomes zero, HW can crash due to WQ overflow.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Removing this check improves performance as VLAN and CRC stripping are
enabled most of the time.
Convert MLX5_CQE_VLAN_STRIPPED to network order to speed up the check
instead of doing it on the completion queue entry field.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since patch "mbuf: structure reorganization" the compiler complains
sometimes (in some conditions):
.../drivers/net/mlx5/mlx5_rxtx.c: In function ‘mlx5_rx_burst’:
.../drivers/net/mlx5/mlx5_rxtx.c:2082:17: error: ‘len’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
len is not initialised as it will be at the first segment of a received
packet, but it remains hard for the compiler to determine it.
Fixes: 9964b965ad69 ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Mark ID in the completion queue entry is 24 bits, the remaining 8 bits are
reserved and may be nonzero. Do not take them into account when looking for
marked packets.
Fixes: ea3bc3b1df94 ("net/mlx5: support mark flow action")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
First segment size must be at least 18 bytes, packets not respecting this
are silently not sent by the NIC but counted as sent by the PMD. The only
way to figure out is compiling the PMD in debug mode.
Fixes: 6579c27c11a5 ("net/mlx5: remove gather loop on segments")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Since PKT_TX_TCP_SEG implies PKT_TX_TCP_CKSUM, the PMD must force this
flag.
The fix applied for both tunneled and non-tunneled packets.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Fixes: b247f346019b ("net/mlx5: support hardware TSO for VXLAN and GRE")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit adds support for hardware TSO for tunneled packets.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Prior to this commit Tx checksum offload was supported only for the
inner headers.
This commit adds support for the hardware to compute the checksum for the
outer headers as well.
The support is for tunneling protocols GRE and VXLAN.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Mark value is always reported even when not requested or invalid.
Fixes: ea3bc3b1df94 ("net/mlx5: support mark flow action")
Cc: stable@dpdk.org
Reported-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The indication on vlan stripping was taken from the wrong location in the
completion entry.
Fixes: 9964b965ad69 ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Since there is no "descriptor done" flag like on Intel drivers, the
approach is different on mlx5 driver.
- for Tx, we call txq_complete() to free descriptors processed by
the hw, then we check if the descriptor is between tail and head
- for Rx, we need to browse the cqes, managing compressed ones,
to get the number of used descriptors.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The total length field in descriptor of inlined multi-packet send must be
updated before closing a session. There's possibility of updating it
afterward. This bug might cause one packet out of MLX5_MPW_DSEG_MAX gets
silently dropped by HW and impact performance, especially lossless test.
Fixes: 230189d9ff22 ("net/mlx5: support multi-packet send")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
For some sizes of packets, the number of bytes copied in the work queue
element could be greater than the available size of the inline. In such
situation it could consume one more work queue element where it should
not.
Fixes: 0e8679fcddc4 ("net/mlx5: fix inline logic")
Cc: stable@dpdk.org
Reported-by: Elad Persiko <eladpe@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Fixes an issue which may occurs with the inline feature activated and a
packet greater than the max_inline requested.
In such situation, more work request elements can be consumed and in the
worst case override some still handled by the NIC, this can result in
sending garbage on the network or putting the work queue in error.
Fixes: 2a66cf378954 ("net/mlx5: support inline send")
Cc: stable@dpdk.org
Reported-by: Elad Persiko <eladpe@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
First two bytes of the Ethernet header was written twice at the same place.
Fixes: b8fe952ec5b6 ("net/mlx5: prepare Tx vectorization")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When the WQ is wrapped around, it wrongly checks the condition when
resetting the pointer. It should be compared against the end of the queue,
not the beginning of the queue. And this isn't even needed when the length
of the copying data crosses the boundary.
Fixes: fdcb0f53053b ("net/mlx5: use work queue buffer as a raw buffer")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
On receiving a compressed session of Rx completion, prefetch every entries
to be invalidated. Also, invalidate consumed completions per every 8
mini-completions, not to wait until the last entry is consumed. This helps
to reduce jitter in rx_burst.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>