The corrupted code doesn't return error when probe function
fails due to error in device mac address getting.
By this way, the probe function may return success even if the
ETH dev is not allocated.
Hence, the probe caller, for example failsafe PMD, fails when it
tries to get ETH dev after the device was plugged out while mlx5
was probing it.
The fix adds error report to the probe caller when priv_get_mac fails
and in all other failure options which are missing it.
By this way, it prevents the unexpected behavior to miss ETH device
after the device was probed successfully.
This bug was already present in the original code taken from mlx4.
Fixes: 771fa900b73a ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Fixes: 1371f4df16bc ("mlx5: check port is configured as ethernet device")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Allocate no more memory than necessary for the second call to
ETHTOOL_GLINKSETTINGS.
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
On redhat 7.2 clang reports the following error:
CC mlx5_rxmode.o
/drivers/net/mlx5/mlx5_ethdev.c:820:32: error: field 'edata' with
variable sized type 'struct ethtool_link_settings' not at the end
of a struct or class is a GNU extension
[-Werror,-Wgnu-variable-sized-type-not-at-end]
struct ethtool_link_settings edata;
Use alternative approach to reserve buffer space on the stack.
Fixes: ef09a7fc7620 ("net/mlx5: fix inconsistent link status query")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
By default, Verbs maps the doorbell register to write combining.
Working with write combining is useful for drivers which use blue flame
for the doorbell write.
Since mlx5 PMD uses only doorbells and write combining mapping requires
an extra memory barrier to flush the doorbell after its write, setting
the mapping to un-cached by default.
Such change is expected to reduce the max and average round trip latency.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Alexander Solganik <solganik@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The reason for the requirement of a barrier between the txq writes
and the doorbell record writes is to avoid a case where the device
reads the doorbell record's new value before the txq writes are flushed
to memory.
The current use of rte_wmb is not necessary, and can be replaced by
rte_io_wmb which is more relaxed.
Replacing the rte_wmb is also expected to improve the throughput.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Alexander Solganik <solganik@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Extend debug logs verbosity by printing the full completion with error
along with the entire txq in case of error. For the Rx case no logs were
added since such errors are counted and recovered by the Rx data path.
Such prints are essential to understand the root cause for the error.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The corrupted code didn't unlock the spinlock in xstats
get and reset functions error flow.
Hence, if these errors happened, the device spinlock was
left locked and many mlx5 device functionalities were blocked.
The fix unlocks the spinlock in the missed places.
Fixes: e62bc9e70608 ("net/mlx5: fix extended statistics")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This version of MLNX_OFED is no more supported.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Since MLNX_OFED 4.1 this code is no more useful.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Secondary process is a copy/paste of the mlx4 drivers, it was never
tested and it even segfault at the secondary process start in the
mlx5_pci_probe().
This makes more sense to wipe this non working feature to re-write a
working and functional version.
Fixes: a48deada651b ("mlx5: allow operation in secondary processes")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Those are useless since DPDK headers have been cleaned up.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Those two if statements are useless as there is a verification on the drop
field of the flow to jump to the end of the function just above.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Vector PMD returns buffers to the application without setting the pointers
in the Rx queue to null nor allocating them. When the PMD cleanup the ring
it needs to take a special care to those pointers to not free the mbufs
before the application have used them nor if the application have already
freed them.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
To use the vector, it needs to add to the PMD Rx mbuf ring four extra mbuf
to avoid memory corruption. This additional mbuf are added on dev_start()
whereas all other mbuf are allocated on queue setup.
This patch brings this allocation back to the same place as other mbuf
allocation.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch prepare the merge of fake mbuf allocation needed by the vector
code with rxq_alloc_elts() where all mbuf of the queues should be
allocated.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Vector code is very young and can present some issues for users, to avoid
them to modify the selections function by commenting the code and recompile
the PMD, new devices parameters are added to deactivate the Tx and/or Rx
vector code.
By using such device parameters, the user will be able to fall back to
regular burst functions.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
If there's a Rx completion with error (e.g, MTU mismatch), it is handled
later out of main burst loop as a slow path for performance reason.
Statistics should be corrected by subtracting counters of errored packets.
Also, the last entry of mlx5_ptype_table[] must be RTE_PTYPE_ALL_MASK to
mark error in completion.
Fixes: ea16068c0064 ("net/mlx5: fix L4 packet type support")
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The pinfo variable has wrong data. This has to have merged data of two
fields from Rx completion - pkt_info and hdr_type_etc.
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The data_off field of newly allocated mbufs is stale data. This shouldn't
be used in calculating Rx address for device when posting free buffers.
RTE_PKTMBUF_HEADROOM should be used instead and data_off of a mbuf will be
reset on packet reception anyway.
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Unlike mlx5_rx_burst(), mlx5_rx_burst_vec() doesn't replace completed
buffers one by one right after completion is processed but replenishes
multiple buffers later with rte_mempool_get_bulk(). Therefore, there could
be some buffer addresses left in the SW ring (rxq->elts[]) which have
already been delivered to application. As PMD doesn't own such buffers, it
must not be freed by PMD. "Trimming" is needed before cleanup.
A problem can be seen when quitting testpmd when
CONFIG_RTE_LIBRTE_MBUF_DEBUG=y and CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=y
Trimming should be as simple as possible, it shouldn't touch any indexes
and buffer allocation isn't necessary.
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Changing the MTU is not related to changing the number of segments,
activating or not the multi-segment support should be handled by the
application.
Fixes: 9964b965ad69 ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Calculation of packet type is currently enabled only when HW checksum is
enabled.
This isn't related to HW checksum offload. Enable it regardless.
Fixes: 081f7eae242e ("mlx5: process offload flags only when requested")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When advancing Tx ring index (txq->wqe_ci) in txq_scatter_v(), the title
descriptor of multi-packet send isn't taken into account if it doesn't
cross 64B boundary.
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
mlx5_tx_complete() polls completion queue multiple times until it
encounters an invalid entry. As Tx completions are suppressed by
MLX5_TX_COMP_THRESH, it is waste of cycles to expect multiple
completions in a poll. And freeing too many buffers in a call can
cause high jitter.
This patch improves throughput a little.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
ETHTOOL_GLINKSETTINGS ioctl call in mlx5 pmd returns inconsistent
link status due to which any application relying on it would not
function correctly.
Fixes: 188408719888 ("net/mlx5: fix support for newer link speeds")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
On a host having 128B cacheline size, some devices insert 64B padding in
each completion entry to avoid partial cacheline write by HW. But, as the
padding is ahead of completion data, casting a completion entry to
compressed mini-completions must start from the middle of the completion.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
To make vectorized burst routines enabled, it is required to run on x86_64
architecture. If all the conditions are met, the vectorized burst functions
are enabled automatically. The decision is made individually on RX and TX.
There's no PMD option to make a selection.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The callbacks are global to a device but the selection is made every queue
configuration, which is redundant.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When searching LKEY, if search key is mempool pointer, the 2nd cacheline
has to be accessed and it even requires to check whether a buffer is
indirect per every search. Instead, using address for search key can reduce
cycles taken. And caching the last hit entry is beneficial as well.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When processing Tx completion, it is more efficient to free buffers in bulk
using rte_mempool_put_bulk() if buffers are from a same mempool.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
For Tx SW ring (txq->elts[]), indexes are kept and used in
txq->elts_head/tail. Because of this, one entry must always be left unused
and it also makes code complex. Changed to store counters instead of
indexes in order to make the code simpler and to reduce a few calculations.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit addresses a compilation issue against Glibc >= 2.25, which
implements assert() through a nonstandard ({ }) construct. Such constructs
can normally not be used without __extension__ keyword when -pedantic is
enabled, as is the case when compiling mlx4 and mlx5 PMDs in debug mode.
While assert.h checks for the compiler ability to support GNU extensions,
Clang, unlike GCC, does not allow the above syntax when combining
-std=gnu99 with -pedantic.
Work around missing keyword by moving these PMDs to a stricter compliance
standard without GNU extensions but properly checked by Glibc. Doing so is
supported on the DPDK side since includes have been cleaned up.
Even in C11, using types other than _Bool or signed/unsigned int for
bit-fields is an extension. Some GCC versions complain about that when
-pedantic checks are enabled.
The RTE_STD_C11 macro correctly prevented this issue with C99 but not with
C11 as it becomes a no-op. Forcing the extension keyword addresses it.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Tested-by: Yongseok Koh <yskoh@mellanox.com>
In case on multi segment packet, the TSO segment size
was taken from the last segment. This may lead to incorrect
values in case not all segments are initialized with the field.
Fixing it by taking the value from the first segment.
Fixes: 3f13f8c23a7c ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit addresses various issues that may lead to undefined behavior
when configuring Rx interrupts.
While failure to create a Rx queue completion channel in rxq_ctrl_setup()
prevents that queue from being created, existing queues still have theirs.
Since the error handler disables dev_conf.intr_conf.rxq as well, subsequent
calls to rxq_ctrl_setup() create Rx queues without interrupts. This leads
to a scenario where not all Rx queues support interrupts; missing checks on
the presence of completion channels may crash the application.
Considering that the PMD is not supposed to disable user-provided
configuration parameters (dev_conf.intr_conf.rxq), and that these can
change for subsequent rxq_ctrl_setup() calls anyway, properly supporting a
mixed mode where not all Rx queues have interrupts enabled is a better
approach.
To do so with a minimum set of changes, priv_intr_efd_enable() and
priv_create_intr_vec() are first refactored as a single
priv_rx_intr_vec_enable() function (same for their "disable" counterparts).
Since they had to be used together, there was no point in keeping them
separate.
Remaining changes:
- Always clean up before reconfiguring interrupts to avoid memory leaks.
- Always clean up when closing the device.
- Use malloc()/free() instead of their rte_*() counterparts since there is
no need to store the vector in huge pages-backed memory.
- Allow more Rx queues than the size of the event file descriptor array as
long as Rx interrupts are not requested on all of them.
- Properly clean up interrupt handle when disabling Rx interrupts (nb_efd
and intr_vec reset to 0).
- Check completion channel presence while toggling Rx interrupts on a given
queue.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A negative return value is documented for that function in case of error.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Not exposing Rx interrupts callbacks when this feature is unsupported is
less intrusive than having two different versions for these functions.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
These functions do not belong to the data path. Their prototypes are also
misplaced.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Drop flows being created when the port is stop should not access to the
drop table hash queues as it is invalid.
Fixes: 028761059aeb ("net/mlx5: use an RSS drop queue")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The current drop action is implemented as a queue tail drop,
requiring to instantiate multiple WQs to maintain high drop rate.
This commit, implements the drop action in hardware classifier.
This enables to reduce the amount of contexts needed for the drop,
without affecting the drop rate.
Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In some environments, the PCI domain can be larger than 16 bits.
For example, a PCI device passed through in Azure gets a synthetic domain
id which is internally generated based on GUID. The PCI standard does
not restrict domain to be 16 bits.
This change breaks ABI for API's that expose PCI address structure.
The printf format for PCI remains unchanged, so that on most
systems (with only 16 bit domain) the output format is unchanged
and is 4 characters wide. For example: 0000:00:01.0
Only on sysetms with higher bits will the domain take up more
space; example: 12000:00:01.0
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Change the rte_eth_dev_callback_process function to return int,
and add a void *ret_param parameter.
The new parameter is used by ixgbe and i40e instead of abusing
the user data of the callback.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
build error:
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:
In function ‘fdir_filter_to_flow_desc’:
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:146:18:
error: this statement may fall through [-Werror=implicit-fallthrough=]
desc->dst_port = fdir_filter->input.flow.udp4_flow.dst_port;
~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../dpdk/drivers/net/mlx5/mlx5_fdir.c:147:2: note: here
case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
^~~~
Fixed by adding fallthrough comment to the code.
Fixes: 76f5c99e6840 ("mlx5: support flow director")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
SW completion ring of Tx (txq->elts) stores individual mbufs even if a
multi-segmented packet is sent. rte_pktmbuf_free_seg() must be used when
cleaning up the completion ring. Otherwise, chained mbufs are redundantly
freed and finally it would cause a crash.
Fixes: 1d88ba171942 ("net/mlx5: refactor Tx data path")
CC: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
A sanity check is required in priv_fdir_disable(). If resizing Rx queue
fails, this can cause a crash by referencing a NULL pointer.
Fixes: 76f5c99e6840 ("mlx5: support flow director")
Fixes: 0cdddf4d0626 ("net/mlx5: split Rx queue structure")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool"), some
fields are already initialised and do not need to be modified by the PMD
anymore.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>