In case on multi segment packet, the TSO segment size
was taken from the last segment. This may lead to incorrect
values in case not all segments are initialized with the field.
Fixing it by taking the value from the first segment.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit addresses various issues that may lead to undefined behavior
when configuring Rx interrupts.
While failure to create a Rx queue completion channel in rxq_ctrl_setup()
prevents that queue from being created, existing queues still have theirs.
Since the error handler disables dev_conf.intr_conf.rxq as well, subsequent
calls to rxq_ctrl_setup() create Rx queues without interrupts. This leads
to a scenario where not all Rx queues support interrupts; missing checks on
the presence of completion channels may crash the application.
Considering that the PMD is not supposed to disable user-provided
configuration parameters (dev_conf.intr_conf.rxq), and that these can
change for subsequent rxq_ctrl_setup() calls anyway, properly supporting a
mixed mode where not all Rx queues have interrupts enabled is a better
approach.
To do so with a minimum set of changes, priv_intr_efd_enable() and
priv_create_intr_vec() are first refactored as a single
priv_rx_intr_vec_enable() function (same for their "disable" counterparts).
Since they had to be used together, there was no point in keeping them
separate.
Remaining changes:
- Always clean up before reconfiguring interrupts to avoid memory leaks.
- Always clean up when closing the device.
- Use malloc()/free() instead of their rte_*() counterparts since there is
no need to store the vector in huge pages-backed memory.
- Allow more Rx queues than the size of the event file descriptor array as
long as Rx interrupts are not requested on all of them.
- Properly clean up interrupt handle when disabling Rx interrupts (nb_efd
and intr_vec reset to 0).
- Check completion channel presence while toggling Rx interrupts on a given
queue.
Fixes: 3c7d44af25 ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A negative return value is documented for that function in case of error.
Fixes: 3c7d44af25 ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Not exposing Rx interrupts callbacks when this feature is unsupported is
less intrusive than having two different versions for these functions.
Fixes: 3c7d44af25 ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
These functions do not belong to the data path. Their prototypes are also
misplaced.
Fixes: 3c7d44af25 ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Drop flows being created when the port is stop should not access to the
drop table hash queues as it is invalid.
Fixes: 028761059a ("net/mlx5: use an RSS drop queue")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The current drop action is implemented as a queue tail drop,
requiring to instantiate multiple WQs to maintain high drop rate.
This commit, implements the drop action in hardware classifier.
This enables to reduce the amount of contexts needed for the drop,
without affecting the drop rate.
Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In some environments, the PCI domain can be larger than 16 bits.
For example, a PCI device passed through in Azure gets a synthetic domain
id which is internally generated based on GUID. The PCI standard does
not restrict domain to be 16 bits.
This change breaks ABI for API's that expose PCI address structure.
The printf format for PCI remains unchanged, so that on most
systems (with only 16 bit domain) the output format is unchanged
and is 4 characters wide. For example: 0000:00:01.0
Only on sysetms with higher bits will the domain take up more
space; example: 12000:00:01.0
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Change the rte_eth_dev_callback_process function to return int,
and add a void *ret_param parameter.
The new parameter is used by ixgbe and i40e instead of abusing
the user data of the callback.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
build error:
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:
In function ‘fdir_filter_to_flow_desc’:
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:146:18:
error: this statement may fall through [-Werror=implicit-fallthrough=]
desc->dst_port = fdir_filter->input.flow.udp4_flow.dst_port;
~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:147:2: note: here
case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
^~~~
Fixed by adding fallthrough comment to the code.
Fixes: 76f5c99e68 ("mlx5: support flow director")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
SW completion ring of Tx (txq->elts) stores individual mbufs even if a
multi-segmented packet is sent. rte_pktmbuf_free_seg() must be used when
cleaning up the completion ring. Otherwise, chained mbufs are redundantly
freed and finally it would cause a crash.
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
CC: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
A sanity check is required in priv_fdir_disable(). If resizing Rx queue
fails, this can cause a crash by referencing a NULL pointer.
Fixes: 76f5c99e68 ("mlx5: support flow director")
Fixes: 0cdddf4d06 ("net/mlx5: split Rx queue structure")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since commit 8f094a9ac5 ("mbuf: set mbuf fields while in pool"), some
fields are already initialised and do not need to be modified by the PMD
anymore.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Flow rules must be applied in the same order as they have been created and
thus destroyed in the reverse order.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Completion buffer size was computed wrongly, causing
completion polling to wraparound too early and miss entries.
Fixing it by using Direct Verbs to query the CQ info.
Fixes: 6218063b39 ("net/mlx5: refactor Rx data path")
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Instead of many PMD define their own macro, define a generic one in
ethdev and use that in PMDs.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Different drivers use internal macros like force_inline for compiler
always inline feature.
Standardizing it through __rte_always_inline macro.
Verified the change by comparing the output binary file.
No difference found in the output binary file with this change.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
In mlx5_dev_start(), the spinlock isn't released when returning error.
Fixes: c8d4ee50cc ("net/mlx5: fix startup when flow cannot be applied")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In the main loop of mlx5_tx_burst(), pointers/indexes are advanced at the
beginning. Therefore, those should be rolled back if checking resource
availability fails and breaks the loop. And some of them are even
redundant.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In case of resource deficiency on Tx, mlx5_tx_burst() breaks the loop
without rolling back consumed resources (txq->wqes[] and txq->elts[]). This
can make application crash because unposted mbufs can be freed while
processing completions. Other Tx functions don't have this issue.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Fixes: f04f1d5156 ("net/mlx5: fix Tx WQE corruption caused by starvation")
Cc: stable@dpdk.org
Reported-by: Hanoch Haim <hhaim@cisco.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
If mlx5_dev_start() fails, it tries to rollback data structures related to
rte_flow including drop queue. The destruction code doesn't assume the
structures are created but priv_flow_delete_drop_queue() never does sanity
check. This can cause a crash.
Fixes: 028761059a ("net/mlx5: use an RSS drop queue")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When TSO is enabled, Verbs layer aggregates the TSO
inline size with the txq inline size for the Tx creation,
while the PMD takes the maximum among them.
Fixing it by adjusting the max inline parameter before
passing to to Verbs.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Some customers find adding MAC addr to VF sometimes can fail,
but it is still stored in dev->data->mac_addrs[ ]. So this
can lead to some errors that assumes the non-zero entry in
dev->data->mac_addrs[ ] is valid.
Following acknowledgements are from specific NIC PMD
maintainer for their managing part.
This patch changes the ethdev internal API, it should not be
backported to a stable/LTS release so far.
Fixes: af75078fec ("first public release")
Signed-off-by: Wei Dai <wei.dai@intel.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
The PCI code will move to the bus drivers directory.
Rename functions from rte_eal_pci_ to rte_pci_
to prepare the move of the driver out of EAL.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Only masked bits must be set in Verbs specification for a rule to be valid.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
With the Enhanced multi packet send addition, the defaults were made
in order to get the maximum out of the box performance.
Features like tso, don't use the enhanced send, however the defaults
are still valid. This cause Tx queue creation to fail.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Fixes: 6ce84bd889 ("net/mlx5: add enhanced multi-packet send for ConnectX-5")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Currently the argument process is done without indication which
parameter was forced by the application and which one is on it
default value.
This becomes problematic when different features requires different
defaults. For example, Enhanced multi packet send and TSO.
This commit modifies the argument process, enabling to differ
which parameter was forced by the application.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Current implementation is error-prone if the max inline size
(txq->max_inilne) is decoupled from txq->inline_en and becomes zero. If it
becomes zero, HW can crash due to WQ overflow.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Since the queue release API does not allow failures (return value is void)
and the flow API does not allow a queue to be released as long as a flow
rule depends on it, the only rational decision to avoid undefined behavior
is to panic in this situation.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Flow queues array offset is not properly computed causing all sorts of
issues. Address it by making the rte_flow structure less complex.
Fixes: 360663e1df ("net/mlx5: prepare support for RSS action rule")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Removing this check improves performance as VLAN and CRC stripping are
enabled most of the time.
Convert MLX5_CQE_VLAN_STRIPPED to network order to speed up the check
instead of doing it on the completion queue entry field.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The patch change the prototype of callback function
(rte_intr_callback_fn) by removing the unnecessary parameter.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Since patch "mbuf: structure reorganization" the compiler complains
sometimes (in some conditions):
.../drivers/net/mlx5/mlx5_rxtx.c: In function ‘mlx5_rx_burst’:
.../drivers/net/mlx5/mlx5_rxtx.c:2082:17: error: ‘len’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
len is not initialised as it will be at the first segment of a received
packet, but it remains hard for the compiler to determine it.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Flow validation already processes the final actions to verify if a rule can
be applied or not and the same is done during creation. As the creation
function relies on validation to generate and apply a rule, this job can be
fully handled by the validation function.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Remove unnecessary interface queries and the Resource Domain when
creating the Queue Pair. Since mlx5 PMD data path is on top of native
APIs, such Verbs library calls are no longer needed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Currently mlx5_dev_rss_reta_update() just updates tables in the host,
therefore it isn't immediately effective until restarting the device by
calling mlx5_dev_stop()/mlx5_dev_start() to update the changes in the
device side. This patch adds rebuilding the device-specific data
structure and applying it to the device right away.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
When querying and updating RSS RETA table, it always uses the max size of
the device instead of configured value. This patch fixes it and removed the
related comments in the code.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Mark ID in the completion queue entry is 24 bits, the remaining 8 bits are
reserved and may be nonzero. Do not take them into account when looking for
marked packets.
Fixes: ea3bc3b1df ("net/mlx5: support mark flow action")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
First segment size must be at least 18 bytes, packets not respecting this
are silently not sent by the NIC but counted as sent by the PMD. The only
way to figure out is compiling the PMD in debug mode.
Fixes: 6579c27c11 ("net/mlx5: remove gather loop on segments")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Toggle Rx scatter mode based on the scatter_enable flag and the maximum
packet size only instead of deriving this information from the jumbo_frame
setting and the MTU configuration.
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When configuring Rx/Tx queue, if queue already exists, it is reused. But if
the queue size is changed, it must be resized to not access/overwrite
invalid memory.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>