The hardware descriptor (WQE) length field is 6 bits wide
and we have the native limitation for the overall descriptor
length. To improve the PCIe bandwidth the packet data can be
inline into descriptor. If PMD was configured to inline large
amount of data it happened there was no enough space remaining
in the descriptor to specify all the packet data segments and
PMD rejected problematic packets.
The patch tries to adjust the inline data length conservatively
and allows to avoid error occurring.
Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Fixes: e2259f93ef ("net/mlx5: fix Tx when inlining is impossible")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
The mlx5 PMD can inline packet data into transmitting descriptor (WQE)
and free mbuf immediately as data no longer needed, for non-inline
packets the mbuf pointer should be stored in elts array for coming
freeing on send completion. There was an optimization on storing
pointers in batch and there was missed storing mbuf for single
packet if non-inline was explicitly requested by flag.
Fixes: cacb44a099 ("net/mlx5: add no-inline Tx flag")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The mlx5 PMD supports send scheduling feature, it allows
to send packets at specified moment of time, to do that
PMD pushes special wait descriptor (WQE) to the hardware
queue and then pushes descriptor for packet data as usual.
If queue is close to be full or there is no enough elts
buffers to store mbufs being sent the data descriptors might
be not pushed and the orphan wait WQE (not followed by the
data) might reside in queue on tx_burst routine exit.
To avoid orphan wait WQEs there was the check for enough
free space in the queue WQE buffer and enough amount of the
free elts in queue mbuf storage. This check was incomplete
and did not cover all the cases for Enhanced Multi-Packet
Write descriptors.
Fixes: 2f827f5ea6 ("net/mlx5: support scheduling on send routine template")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
ESP is one of IPSec protocols over both IPv4 and IPv6 and is considered
a tunnel layer that cannot be followed by any other layer. Taking that
into consideration, ESP is considered as a 4 layer.
Not defining ESP's priority will make it match with the same priority as
its prior IP layer, which has a layer 3 priority. This will lead to
issues in matching and will match the packet with the first matching
rule even if it doesn't have an esp layer in its pattern, disregarding
any following rules that could have an esp item and can be actually
a more accurate match since it will have a longer matching criterion.
This is fixed by defining the priority for the ESP item to have a
layer 4 priority, making the match be for the rule with the more
accurate and longer matching criteria.
Fixes: 18ca4a4ec7 ("net/mlx5: support ESP SPI match and RSS hash")
Cc: stable@dpdk.org
Signed-off-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
One of the conditions to allow LRO offload is the DV configuration.
The function incorrectly checks the DV configuration before initializing
it by the user devarg; hence, LRO cannot be allowed.
This patch moves this check to mlx5_shared_dev_ctx_args_config, where DV
configuration is initialized.
Fixes: c4b8620135 ("net/mlx5: refactor to detect operation by DevX")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reported-by: Gal Shalom <galshalom@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
FDs passed from rte_mp_msg are duplicated to the secondary process and
need to be closed.
Fixes: 9a8ab29b84 ("net/mlx5: replace IPC socket with EAL API")
Cc: stable@dpdk.org
Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Such deprecation was commenced in DPDK 21.11.
Since then, no parties have objected. Remove.
The patch breaks ABI.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ori Kam <orika@nvidia.com>
Make rte_bus opaque for non internal users.
This will make extending this object possible without breaking the ABI.
Introduce a new driver header and move rte_bus definition and helpers.
Update drivers and library to use the internal header.
Some applications may have been dereferencing rte_bus objects, mark
this object's accessors as stable.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The pci bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.
While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
inclusion,
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
The auxiliary bus interface is for drivers only.
Mark as internal and move the header in the driver headers list.
While at it, cleanup the code:
- fix indentation,
- remove unneeded reference to bus specific singleton object,
- remove unneeded list head structure type,
- reorder the definitions and macro manipulating the bus singleton object,
- remove inclusion of rte_bus.h and fix the code that relied on implicit
inclusion,
Signed-off-by: David Marchand <david.marchand@redhat.com>
The local variables are getting inconsistent in data receiving routines
after queue error recovery.
Receive queue consumer index is getting wrong, need to reset one to the
size of the queue (as RQ was fully replenished in recovery procedure).
In MPRQ case, also the local consumed strd variable should be reset.
CVE-2022-28199
Fixes: 88c0733535 ("net/mlx5: extend Rx completion with error handling")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
Start a new release cycle with empty release notes.
The ABI version becomes 23.0.
The map files are updated to the new ABI major number (23).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.
Special handling of removed drivers is also dropped in check-abi.sh and
a note has been added in libabigail.abignore as a reminder.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Negative integrity item refers to condition when the item value mask
is set, but value spec is cleared:
... integrity value mask l4_ok value spec 0 ...
ethdev library defines integrity bits `l3_ok` and `l4_ok` as accumulators
for all hardware L3 and L4 integrity verifications respectfully.
Hardware `l3_ok` and `l4_ok` integrity bits refer to L3 and L4
network headers only.
Integrity bits `l3_ok` and `l4_ok` are not compatible between
ethdev library and hardware.
PMD translations for ethdev `l3_ok` are:
IPv4: `l3_ok` and `l3_csum_ok`
IPv6: `l3_ok`
ethdev `l4_ok` is translated into PMD `l4_ok` and `l4_csum_ok` bits.
Positive IPv4 `l3_ok` flow item configuration is translated into
a single matcher that AND corresponding hardware bits.
Negative IPv4 `l3_ok` is translated into 2 hardware conditions where
each condition probes a single integrity bit:
ethdev::l3_ok is 0 => MLX5::l3_ok is 0 OR MLX5:l3_csum_ok is 0
MLX5 hardware does not do OR condition in flow rule item.
Negative IPv4 `l3_ok` must be translated into 2 flow rules.
Similarly negative ethdev `l4_ok` condition is also translated into 2
hardware rules.
Current PMD roadmap does not allow implicit flow rule split.
Bugzilla ID: 948
Cc: stable@dpdk.org
Suggested-by: Raja Zidane <rzidane@nvidia.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add mlx5 internal test for map and unmap external RxQs.
This patch adds to testpmd app a runtime function to test the mapping
API.
testpmd> mlx5 port (port_id) ext_rxq map (sw_queue_id) (hw_queue_id)
testpmd> mlx5 port (port_id) ext_rxq unmap (sw_queue_id)
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
Add mlx5 internal option in testpmd similar to run-time function
"port attach" which adds another parameter named "socket" for attaching
port and add 2 devargs before.
The arguments are "cmd_fd" and "pd_handle" using to import device
created out of PMD. Testpmd application import it using IPC, and updates
the devargs list before attaching.
These arguments were added in
the commit 9d936f4f1a ("common/mlx5: support remote PD and CTX")
The syntax is:
testpmd> mlx5 port attach (identifier) socket=(path)
Where "path" is the IPC socket path agreed on the remote process.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_drop_action_create function use mlx5_malloc for allocating
'hrxq', but don't allocate for 'rss_key'. This is wrong and it can
cause buffer overflow.
Detected with address sanitizer:
0 (/usr/lib64/libasan.so.4+0x7b8e2)
1 in mlx5_devx_tir_attr_set ../drivers/net/mlx5/mlx5_devx.c:765
2 in mlx5_devx_hrxq_new ../drivers/net/mlx5/mlx5_devx.c:800
3 in mlx5_devx_drop_action_create ../drivers/net/mlx5/mlx5_devx.c:1051
4 in mlx5_drop_action_create ../drivers/net/mlx5/mlx5_rxq.c:2846
5 in mlx5_dev_spawn ../drivers/net/mlx5/linux/mlx5_os.c:1743
6 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2501
7 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2647
8 in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2722
9 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:657
10 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:711
11 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:150
12 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:269
13 in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:353
14 in pci_probe ../drivers/bus/pci/pci_common.c:380
15 in rte_bus_probe ../lib/eal/common/eal_common_bus.c:72
16 in rte_eal_init ../lib/eal/linux/eal.c:1286
17 in main ../app/test-pmd/testpmd.c:4112
Fixes: 0c762e81da ("net/mlx5: share Rx queue drop action code")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When meter is used by E-Switch Manager port, there's an error that
cannot get correct port ID.
This patch fixes this by using specific parsing process to get port
ID for E-Switch Manager.
Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
For BF with old FW which doesn't expose the E-Switch Manager vport ID,
E-Switch Manager port matching works correctly only when BF is in
embedded CPU mode.
This patch adds the limitation description.
Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add command line options to support host shaper configure.
- Command syntax:
mlx5 set port <port_id> host_shaper avail_thresh_triggered <0|1> rate
<rate_num>
- Example commands:
To enable avail_thresh_triggered on port 1 and disable current host
shaper:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 1 rate 0
To disable avail_thresh_triggered and current host shaper on port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 0
The rate unit is 100Mbps.
To disable avail_thresh_triggered and configure a shaper of 5Gbps on
port 1:
testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 50
Add sample code to handle rxq available descriptor threshold event, it
delays a while so that rxq empties, then disables host shaper and
rearms available descriptor threshold event.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Host port shaper can be configured with QSHR (QoS Shaper Host Register).
Add check in build files to enable this function or not.
The host shaper configuration affects all the ethdev ports belonging to the
same host port.
Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives available descriptor
threshold event.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add mlx5 specific available descriptor threshold configuration
and query handler.
In mlx5 PMD, available descriptor threshold is also called
LWM (limit watermark).
While the Rx queue fullness reaches the LWM limit, the driver catches
an HW event and invokes the user callback.
The query handler finds the next Rx queue with pending LWM event
if any, starting from the given Rx queue index.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When LWM meets RQ WQE, the kernel driver raises an event to SW.
Use devx event_channel to catch this and to notify the user.
Allocate this channel per shared device.
The channel has a cookie that informs the specific event port and queue.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add LWM (Limit WaterMark) field to Rxq object which indicates the percentage
of Rx queue size used by HW to raise descriptor event to the user.
Allow LWM setting in modify_rq command.
Allow the LWM configuration dynamically by adding RDY2RDY state change.
Signed-off-by: Spike Du <spiked@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Use fgets instead of fscanf to resolve the following warning
reported by clang 14.0.0 in Fedora 37 (Rawhide):
drivers/net/mlx5/linux/mlx5_ethdev_os.c:1137:52: error:
'fscanf' may overflow; destination buffer in argument 3 has size 16,
but the corresponding specifier may require size 17
[-Werror,-Wfortify-source]
ret = fscanf(file, "%" RTE_STR(IF_NAMESIZE) "s", port_name);
Fixes: 63d1db710f ("net/mlx5: fix unlimited parsing of switch info")
Cc: stable@dpdk.org
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch introduces MODIFY_FIELD action support in meter. User can
create meter policy with MODIFY_FIELD action in green/yellow action.
For example:
testpmd> add port meter policy 0 21 g_actions modify_field op set
dst_type ipv4_ecn src_type value src_value 3 width 2 / ...
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch is to support modify ECN field in IPv4/IPv6 header.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add support for represented_port item in pattern. And if the spec and mask
both are NULL, translate function will not add source vport to matcher.
For example, testpmd starts with PF, VF-rep0 and VF-rep1, below command
will redirect packets from VF0 and VF1 to wire:
testpmd> flow create 0 ingress transfer group 0 pattern eth /
represented_port / end actions represented_port ethdev_id is 0 / end
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
ESP item is not supported on Windows, yet it is expanded from the
expansion graph when trying to create default flow to RSS all packets.
Support ESP item match (without ability to match on SPI field on Windows).
Split ESP validation per OS.
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_action_construct_data structure memory is managed by ipool
named acts_ipool.
The size of one entry in this ipool is mistakenly defined as size of
rte_flow_hw structure.
This size is used to reset in the allocated part. When the size is
incorrect it resets memory that does not belong to it.
This patch defines the correct size.
Fixes: f13fab2392 ("net/mlx5: add flow jump action")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch encompasses a few fixes carried by a previous patch
that aimed to support bonding device stats counting.
- If mlx5_os_read_dev_stat fails, it returns 1 instead of a
negative value, causing mlx5_xstats_get to return an invalid
number of counters. Since this error is not blocking, do not
mess ret value with mlx5_os_read_dev_stat returned value.
This allows avoiding the very annoying log:
"n_xstats != n_xstats_names => skipping"
- Invert the check for mlx5_os_read_dev_stat(), currently leading
us to store the result if the function failed, and use a
backup value if it succeeded, which is the opposite of what we
actually want. Revert to the original (correct) test.
- Add missing test on _mlx5_os_read_dev_counters() to prevent
using trash stats values.
Fixes: 7ed15acdcd ("net/mlx5: improve xstats of bonding port")
Cc: stable@dpdk.org
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Geoffrey Le Gourriérec <geoffrey.le_gourrierec@6wind.com>
Tested-by: Bassam Zaid AlKilani <bzalkilani@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add two kinds of Rx drop counters to DPDK xstats which are
physical port scope.
1. rx_prio[0-7]_buf_discard
The number of unicast packets dropped due to lack of shared
buffer resources.
2. rx_prio[0-7]_cong_discard
The number of packets that is dropped by the Weighted Random
Early Detection (WRED) function.
Prio[0-7] is determined by VLAN PCP value which is 0 by default.
Both counters are retrieved from kernel ethtool API which calls
PRM command finally.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When an error occurs in Tx, and it is moved to ERROR state, it
is not recoverable, during recovery it's state cannot be modified
to INIT. to modify state from RESET to INIT, the port must be
passed in modify attributes, and in case of ERROR to READY
modification path, it was not provided.
Provide port number when changing state from RESET to INIT.
Fixes: 3a87b964ed ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Yellow meter action support is added in meter hierarchy validation.
If one color uses meter action, the other can only use NULL action
or the same meter action. And only shared meter is supported.
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When a hierarchy meter is shared by other ports, it's needed to iterate
all meter policies in hierarchy to create tag rules, to set packet with
next meter ID, which will be used by related meter drop count.
This patch adds the tag rule for yellow support in hierarchy, so both
green/yellow policy flows can set the correct meter ID.
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds the support of meter action for yellow meter policy
flow, so can use meter action for both green and yellow policy flows
in meter hierarchy.
Currently must use the same meter within one meter policy. Packets
passing green/yellow policy flow will have previous meter color of
green/yellow in subsequent meter.
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds the support for previous color aware for meter.
Start_color setting is set to UNDEFINED when creating meter object that
is color aware.
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The AltiVec header file is defining "vector", except in C++ build.
The keyword "vector" may conflict easily.
As a rule, it is better to use the alternative keyword "__vector",
so we will be able to #undef vector after including AltiVec header.
Later it may become possible to #undef vector in rte_altivec.h
with a compatibility breakage.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
In packets with ESP header, the inner IP will be encrypted, and
its fields cannot be used for RSS hashing. So, ESP packets
can be hashed only by the outer IP layer.
So, when using RSS on ESP packets, hashing may not be efficient,
because the fields used by the hash functions are only the outer IPs,
causing all traffic belonging to all tunnels between a given
pair of GWs to land on one core.
Adding the SPI hash field can extend the spreading of IPsec packets.
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When a meter with RSS action being used, there might be several
sub-flows using different sub-policies in the flow splitting stage.
If there's no green action, there's an error that will always use the
same sub-policy for all sub-flows, some resources will be
overwritten and cannot be released, leading assert during port close.
This patch fixes this issue by checking both green and yellow queue
index during getting a blank sub-policy, to avoid the incorrect
resource overwrite.
Fixes: b38a12272b ("net/mlx5: split meter color policy handling")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The driver wrongly set the LRO configurations to the TIR of the DevX
drop queue even when LRO is not supported.
Actually, the LRO configuration is not relevant to the drop queue at
all.
This causes failure in the initialization of the device, which doesn't
support LRO where the drop queue is created.
Probably, the drop queue creation by DevX missed the fact that LRO is
set by default in the TIR creation function and didn't unset it in the
drop queue case like other cases that unset LRO.
Move the default LRO configuration to unset it and set it only in the
case of all the TIR queues configured with LRO.
Fixes: bc5bee028e ("net/mlx5: create drop queue using DevX")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_rx_queue_setup() get LRO offload from user.
When LRO is configured, the LRO flag in rxq_data is set to 1.
This patch adds validation to make sure the LRO is supported.
Fixes: 17ed314 ("net/mlx5: allow LRO per Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When an indirect action was created with an RSS action configured to
hash on both source and destination L3 addresses (or L4 ports), it caused
shared hrxq to be configured to hash only on destination address
(or port).
This patch fixes this behavior by refining RSS types specified in
configuration before calculating hash types used for hrxq. Refining RSS
types removes *_SRC_ONLY and *_DST_ONLY flags if they are both set.
Fixes: 212d17b6a6 ("net/mlx5: fix missing shared RSS hash types")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Queue statistics are being continuously updated in Rx/Tx burst
routines while handling traffic. In addition to that, statistics
can be reset (written with zeroes) on statistics reset in other
threads, causing a race condition, which in turn could result in
wrong stats.
The patch provides an approach with reference values, allowing
the actual counters to be writable within Rx/Tx burst threads
only, and updating reference values on stats reset.
Fixes: 87011737b7 ("mlx5: add software counters")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
GTP items were ignored during conversion of modify header actions. This
caused modify TTL action to generate a wrong modify header command when
tunnel and inner headers used different IP versions.
This patch adds GTP item handling to modify header action conversion.
Fixes: 04233f36c7 ("net/mlx5: fix layer type in header modify action")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Mlx5Devx library has new API's for setting and getting MTU.
Added new glue functions that wrap the new mlx5devx lib API's.
Implemented the os_ethdev callbacks to use the new glue
functions in Windows.
Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Support of the set promiscuous modes by calling the new API
In Mlx5DevX Lib.
Added new glue API for Windows which will be used to communicate
with Windows driver to enable/disable PROMISC or ALLMC.
Signed-off-by: Adham Masarwah <adham@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_rxq_is_hairpin() function checks whether RxQ type is Hairpin.
It is done by reading a flag in Rx control structure coming from
mlx5_rxq_ctrl_get() function.
The function verifies that the queue index is valid even though it has
been checked within the mlx5_rxq_ctrl_get() function.
This patch removes the redundant check.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_rxq_get() function gets RxQ index and return RxQ priv
accordingly.
When it gets an invalid index, it accesses out of array bounds which
might cause undefined behavior.
This patch adds a check for invalid indexes before accessing to array.
Fixes: 0cedf34da7 ("net/mlx5: move Rx queue reference count")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The flow_drv_rxq_flags_set sets the Rx queue flags (Mark/Flag and Tunnel
Ptypes) according to the device flow.
It tries to get the RxQ control structure to update its ptype. However,
external RxQs don't have control structure to update and it may cause a
crash.
This patch add check whether this Queue is external.
Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In rte_flow, if a counter action is before a meter which has
non-termination policy, the counter value only includes packets not
being dropped.
This patch fixes this issue by differentiating the order of counter and
non-termination meter:
1. counter + meter, counts all packets hitting this flow.
2. meter + counter, only counts packets not being dropped.
Fixes: 51ec04dc7b ("net/mlx5: connect meter policy to created flows")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Users can probe primary or secondary PCIe id when bonding is
configured.
1. -a 0a:00.0,representor=pf[0-1]vf[0-1], PMD probes 5 ports
totally: bonding device plus 4 representor ports.
2. -a 0a:00.1,representor=pf[0-1]vf[0-1], PMD only probes 2
representor ports.
Under the 2nd condition, bonding IB device doesn't have the same
PCIe id and PMD needs to check bonding relationship otherwise
probe failure.
Fixes: 6856efa54e ("net/mlx5: fix PF leak on PCI probing failure")
Cc: stable@dpdk.org
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When txq_inline_max is too large and an mbuf is multi-segment
it may be impossible to inline data and build a valid WQE,
because WQE length would be larger then HW can represent.
It is impossible to detect misconfiguration at startup,
because the condition depends on the mbuf composition.
The check on the data path to prevent the error
treated the length limit as expressed in 64B units,
while the calculated length and limit are in 16B units.
Fix the condition to avoid subsequent TxQ failure and recovery.
Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Multi-Packet Rx queue uses PMD-managed buffers to store packets.
These buffers are externally attached to user mbufs.
This conflicts with the feature that allows using user-managed
externally attached buffers in an application.
Fall back to SPRQ in case external buffers mempool is configured.
The limitation is already documented in mlx5 guide.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The default CPU socket ID was used while creating the Rx queue and this caused
creation failure in case if hardware was not resided on the default socket.
The patch sets the correct CPU socket ID for the mlx5_rxq_ctrl before
calling the mlx5_rxq_create_devx_rq_resources() which eventually calls
mlx5_devx_rq_create() with correct CPU socket ID.
Fixes: bc5bee028e ("net/mlx5: create drop queue using DevX")
Cc: stable@dpdk.org
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
If there are an explicit port match and sample action in the same flow,
mlx5 PMD pushes the explicit port match in the prefix subflow, and
uses the tag item match in the suffix subflow.
The explicit port match was translated into source vport match so
the sample suffix subflow lost this match after flow split.
This patch copies the explicit port match to the sample suffix subflow,
and the latter gets the correct source vport value in the flow matcher.
Fixes: b4c0ddbfcc ("net/mlx5: split sample flow into two sub-flows")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
A flow rule with sample action was split into two sub-flows,
and the implicit tag action with unique id was added in the prefix
sub-flow, the suffix sub-flow used the tag item to match with that
unique id, and the implicit set tag action was inserted next to
the sample action.
While there's either PUSH VLAN action or ENCAP action preceding the
sample action, implicit set tag action was added after PUSH VLAN or
ENCAP actions, causing flow creation failure due to rdma-core
does not support this action order.
This patch ensures the implicit set tag action is inserted before
either PUSH VLAN or encap action (if any) in the prefix sub-flow.
Fixes: 6a951567c1 ("net/mlx5: support E-Switch mirroring and jump in one flow")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
For now, only one ASO action is supported in a single flow rule.
Flow rule with more than one ASO action should be rejected in the
validation stage.
Flow rule with action non-shared AGE and COUNT together should be
treated as non-ASO because AGE will fall back to use HW counter,
not ASO hit object.
Group 0 will use HW counter for AGE action even if no COUNT action.
This commit will reject patterns (no matter which group if transfer)
like:
1. group 1 pattern... / end actions age / meter / end
2. group 1 pattern... / end actions conntrack / meter / end
3. group 1 pattern... / end actions age / conntrack... / end
If AGE comes together with COUNT in the above patterns, it's allowed.
Fixes: daed4b6e ("net/mlx5: use aging by counter when counter exists")
Cc: stable@dpdk.org
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
A flow rule with sample action will be split into two sub flows,
and a tag action was added implicitly in the sample prefix sub flow,
the reserved metadata regC index was used for this tag action.
The reserved metadata regC was shared with metering action,
for ConnectX-5 trusted device (VF/SF), the reserved metadata regC was
invalid since PF only supported the legacy metering.
This patch adds the checking for the tag index and back to use the
application tag if a failure happened.
Fixes: a9b6ea45be ("net/mlx5: fix tag ID conflict with sample action")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Flow domain and direction was validated when OF_PUSH_VLAN action
appears in flow actions. Flow was rejected whenever this action:
- was used in NIC domain, in ingress direction;
- was used in FDB domain, in ingress direction, on ConnectX-5.
This validation logic rejected a valid case when the OF_PUSH_VLAN
action was used when directing traffic to the hairpin queue,
configured in TX implicit mode.
This patch moves code responsible for OF_PUSH_VLAN validation of
domain and direction from flow_dv_validate_push_vlan() to
flow_dv_validate(). Domain and direction are now validated when either
non-hairpin queue is used or hairpin queue is configured in Tx explicit
mode.
Fixes: 96f85ec489 ("net/mlx5: check VLAN push/pop support")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Disable means there is no packet drop in the meter. Meter is
active always but programmed with another CIR/CBS value.
If the user wants to disable the meter in creation, PMD calls
the disable() API manually after meter initialized.
Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
None Rx queue configured in a DPDK application should be supported.
In this mode, the NIC can be used to generate packets without
receiving any ingress traffic.
In the current implementation, once there is no Rx queue specified,
the array to store the queues' pointers is NULL after allocation.
Then the checking of the array allocation prevents the application
from starting up.
By adding another condition checking of the Rx queue number, the
application with none Rx queue can start up successfully.
Fixes: 4cda06c3c3 ("net/mlx5: split Rx queue into shareable and private")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
E-Switch DV flow is supported only when DV flow is supported and
enabled.
The mlx5_shared_dev_ctx_args_config() function ensures that when the
environment does not support DV, the "dv_esw_en" flag is turned off.
However, when the environment is supportive but the user has requested
to disable it, the "dv_esw_en" flag remains on and causes the PMD to try
to create an E-Switch through the Verbs engine.
This patch adds check to ensure that "dv_esw_en" flag will be turned off
when DV flow is disabled.
Fixes: a13ec19c19 ("net/mlx5: add shared device context config structure")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When using Verbs flow engine to create flows, GRE Verbs spec was put at
the end of specs list. This created problems for flows matching MPLSoGRE
packets. In generated specs list MPLS spec was put before GRE spec, but
Verbs API requires that MPLS spec must be put in its exact location in
protocol stack.
This patch fixes this behavior. Space for GRE Verbs spec is reserved at
its exact location. MPLS Verbs is inserted at its exact location as
well. GRE spec is filled after all flow items are parsed.
Fixes: 985b479267 ("net/mlx5: fix GRE protocol type translation for Verbs")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Flex item availability is restricted to BlueField-2 and BlueField-3
PF ports.
The patch validates port type compliance before proceeding to
flex item creation.
Fixes: db25cadc08 ("net/mlx5: add flex item operations")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The meter policy creation doesn't belong to flow rule creation
process, so thread workspace was not initialized and there will be
assert error when using it.
This patch removes the incorrect using of thread workspace in meter
policy creation, and adds a flag in policy instead. When creating
flow rule, can use the flag to set the mark flag in thread workspace.
Fixes: 082becbf1f ("net/mlx5: fix mark enabling for Rx")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In the previous implementation, a count was used to record the number
of the references to a table resource, including the creation of the
table, the jumping to the table and the matchers created on the
table. Before releasing the table resource via the driver, it needed
to ensure that there is no reference to this table.
After the optimization of the resources management, the reference
count now is in the hash list entry as a unified solution for all the
resources management.
There is no need to keep the "refcnt" in the table resource
structure. It is removed in case that there is some unnecessary
memory overhead.
Fixes: afd7a62514 ("net/mlx5: make flow table cache thread safe")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Certain flow rules containing a modify header action for an L4 port
could be erroneously rejected as invalid, because this action
was counted as consuming two HW actions, while it only requires one.
Fixes: 72a944dba1 ("net/mlx5: fix header modify action validation")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When an indirection table object is modified, it updates the reference
counter for each RX queue related to it.
The reference counter for regular queues are indeed updated. However,
the reference counter for external RxQs are not.
This patch adds updating for external RxQs too.
Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When an indirection table is destroyed, each Rx queue related to it,
should be dereferenced.
The regular queues are indeed dereferenced.
However, the external RxQs are not.
This patch adds dereferencing for external RxQs too.
Fixes: 311b17e669 ("net/mlx5: support queue/RSS actions for external Rx queue")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When E-Switch mode was enabled, the NIC egress flows was implicitly
appended with source vport to match on. If the metadata register C0
was used to maintain the source vport, it was initialized to zero
on packet steering engine entry, the flow could be hit only
if source vport was zero, the register C0 of the packet was not correct
to match in the TX side, this caused egress flow misses.
This patch:
- removes the implicit source vport match for NIC egress flow.
- rejects the NIC egress flows on the representor ports at validation.
- allows the internal NIC egress flows containing the TX_QUEUE items in
order to not impact hairpins.
Fixes: ce777b147b ("net/mlx5: fix E-Switch flow without port item")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
When both shared and non-shared RSS actions are present in single
flow rule shared RSS index is unset by mistake.
For example:
1. flow indirect_action 0 create action_id 3 ingress action RSS ...
2. set sample_actions 0 mark id 43690 / queue index 0 / end
3. flow create 0 ingress group 107 pattern eth / sample ratio 2
index 0 / indirect 3 / end
PMD translates the indirect action to a shared RSS description at first.
In the split prefix flow, RSS->shared_RSS is unset when translating
sample queue action, the subfix flow will treat the RSS as non-shared.
Fixes: 8e61555657 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
RSS expansion scheme has 2 operational modes: default and specific.
The default mode expands into all valid options for a given network
layer. For example, Ethernet expands by default into VLAN, IPv4 and
IPv6, L3 expands into TCP and UDP, etc.
The specific mode expands according to flow item next protocol
configuration provided by the item spec and mask parameters.
There are 3 outcomes for the specific expansion:
1. Back to default – that is the case when result of (spec & mask)
allows all possibilities.
For example: eth type mask 0 type spec 0
2. No results – in that case item configuration has no valid expansion.
For example: eth type mask 0xffff type spec 101
3. Direct - In that case flow item mask and spec configuration return
valid expansion option.
Example: eth type mask 0x0fff type spec 0x0800.
Current PMD expands flow items with explicit spec and mask
configuration into the Direct(3) or No results (2). Default expansions
were handled as No results.
Fixes: f3f1f576f4 ("net/mlx5: fix RSS expansion with explicit next protocol")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Flex item API provides support for network header with a fixed and
variable lengths.
When PMD compiles a new flex item object configuration it converts
RTE parameters into matching PMD PARSE_GRAPH parameters and checks
the parameter values against port capabilities.
Current implementation mismatched PARSE_GRAPH configuration fields
for the fixed size header.
Fixes: b293e8e49d ("net/mlx5: translate flex item configuration")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
On TCP/IP-based layered network, ICMP is considered and implemented
as part of layer 3 IP protocol. Actually, it is a user of the IP
protocol and must be encapsulated within IP packets. There is no
layer 4 protocol over ICMP.
The rule with layer 4 should be matched prior to the rule only with
layer 3 pattern when:
1. Both rules are created in the same table
2. Both rules could be hit
3. The rules has the same priority
The steering result of the packet is indeterministic if there are
rules with patterns IP and IP+ICMP in the same table with the same
priority. Like TCP / UDP, a packet should hit the rule with a longer
matching criterion.
By treating the priority of ICMP/ICMPv6 as a layer 4 priority in the
PMD internally, the IP+ICMP will be hit in prior to IP only.
Fixes: d53aa89aea ("net/mlx5: support matching on ICMP/ICMP6")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reduce flex item flow handle size from 32 bits to 8 bits for each
flow.
The patch will save memory in setups with millions of flows.
Fixes: a23e9b6e3e ("net/mlx5: handle flex item in flows")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
GRE item translation must set inner protocol value.
For that reason the item is not translated inplace when PMD
translation iterates over flow items, but moved after the loop, when
all inner types are discovered.
If PMD does not translate GRE flow item inside the translation loop
it must save the GRE item for access outside the loop.
Fixes: 985b479267 ("net/mlx5: fix GRE protocol type translation for Verbs")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The AGE action can be implemented by either counters or ASO mechanism.
ASO is more efficient than generating counters just for the purpose of
aging, so when ASO is supported its use is preferable. On the other
hand, when there is count in the list of actions, the counter is already
generated, and it is best to use it for aging even if ASO is supported.
On the other hand, when the count action is "indirect", it cannot be
used for aging since it may be updated from other flow rules in which it
participates.
Checking whether ASO is supported depends on both the capability of the
device and the flow rule group number, ASO is not supported for group 0.
However, the flow_dv_validate() function only checks the capability and
ignores the group, allowing inadmissible flow rules.
For example, when the device supports ASO and a flow rule is set that
combines an indirect counter with aging for group 0, the rule should be
rejected, but it is created and does not function properly.
This patch updates the counter validation which will also consider the
group number when deciding if there is ASO support.
Fixes: daed4b6e3d ("net/mlx5: use aging by counter when counter exists")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The AGE action can be implemented by either counters or ASO mechanism.
When user ask count action in the flow rule, AGE action is implemented
by the same counter. However, if user ask indirect count action, it
cannot be used for AGE.
The flow_dv_validate() function has a flag named "shared_count" which
indicates whether AGE action validate depends on ASO support or not.
This flag is initialized to false and is updated if there is indirect
count action in the action list.
This flag is mistakenly set within the loop that reads the action list
and in each iteration it is reinitialized to false, regardless of the
existence of an indirect count action in the list.
This patch moves the flag initialization out of the loop.
Fixes: f3191849f2 ("net/mlx5: support flow count action handle")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The table remove callback function is trying to destroy the
matchers list associated with table entries without checking
if the list is valid, which causes null pointer dereference.
Fixed by validating the matchers list before destroying it.
Issue can be reproduced with testpmd on Windows, when you run:
port close all
Fixes: 1872635570 ("net/mlx5: make matcher list thread safe")
Cc: stable@dpdk.org
Signed-off-by: Adham Masarwah <adham@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
For indexed pool with local cache, when a new trunk is allocated,
half of the trunk's index was fetched to the local cache. In case
of local cache size was less then half of the trunk size, memory
overlap happened.
This commit adds the check of the fetch size, if local cache size
is less than fetch size, adjust the fetch size to be local cache
size.
Fixes: d15c0946be ("net/mlx5: add indexed pool local cache")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Link status change takes time that depends on the HW and the kernel.
It was checked immediately after the change was issued at probing.
If the port had been down before probing, a "down" state may be read,
while the port would be "up" imminently.
After that, DPDK reported the port as "down" mistakenly
and "ifconfig $DEV up" did not trigger an LSC event,
because from the system's perspective the port was "up" already.
Install Netlink event handler at port probe before requesting the port
to come up in order to receive LSC event even if it comes up
between probe and start.
Fixes: a85a606ca5 ("net/mlx5: fix link status initialization")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Sometimes net/mlx5 devices did not detect link status change to "up".
Each shared device was monitoring IBV_EVENT_PORT_{ACTIVE,ERR}
and queried the link status upon receiving the event.
IBV_EVENT_PORT_ACTIVE is delivered when the logical link status
(UP flag) is set, but the physical link status (RUNNING flag)
may be down at that time, in which case the new link status
would be erroneously considered down.
IBV interface is insufficient for the task.
Monitor interface events using Netlink.
Fixes: 198a3c339a ("mlx5: handle link status interrupts")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Introduce mlx5_nl_read_events() to read Netlink events
(technically, messages) from a socket that was configured
to listen for them via a new mlx5_nl_init() parameter.
Add mlx5_nl_parse_link_status_update() helper
to extract information from link-related events.
This patch is a shared base for later fixes.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add support queue/RSS action for external Rx queue.
In indirection table creation, the queue index will be taken from
mapping array.
This feature supports neither LRO nor Hairpin.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
External queue is a queue that has been created and managed outside the
PMD. The queues owner might use PMD to generate flow rules using these
external queues.
When the queue is created in hardware it is given an ID represented by
32 bits. In contrast, the index of the queues in PMD is represented by
16 bits. To enable the use of PMD to generate flow rules, the queue
owner must provide a mapping between the HW index and a 16-bit index
corresponding to the ethdev API.
This patch adds an API enabling to insert/cancel a mapping between HW
queue id and ethdev queue id.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The RxQ/TxQ control structure has a field named type. This type is enum
with values for standard and hairpin.
The use of this field is to check whether the queue is of the hairpin
type or standard.
This patch replaces it with a boolean variable that saves whether it is
a hairpin.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds matching on the optional fields (checksum/key/sequence)
of GRE header. The matching on checksum and sequence fields requests
support from rdma-core with the capability of misc5 and tunnel_header 0-3.
For patterns without checksum and sequence specified, keep using misc for
matching as before, but for patterns with checksum or sequence, validate
capability first and then use misc5 for the matching.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
HW steering header reformat action can work under bulk mode. In
this case, when create the table, bulk size of header reformat
actions will be allocated in low level. Afterwards, when create
flow, just simply specify the action index in the bulk and the
encapsulation data to the action will be enough.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
HW steering can support indirect action as well. With indirect action,
the flow can be created with more flexible shared RSS action selection.
This will can save the action template with different RSS actions.
This commit adds the flow queue operation callback for:
rte_flow_async_action_handle_create();
rte_flow_async_action_handle_destroy();
rte_flow_async_action_handle_update();
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The mark action is covered by tag action internally. While it is added
the HW will add a tag to the packet. The mark value can be set as fixed
or dynamic as the action mask indicates.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This commit adds the queue and RSS action. Similar to the jump action,
dynamic ones will be added to the action construct list.
Due to the queue and RSS action in template should not be destroyed
during port restart, the actions are created with standalone indirect
table as indirect action does. When port stops, detaches the indirect
table from action, when port starts, attaches the indirect table back
to the action.
One more change is made to accelerate the action creation. Currently
the mlx5_hrxq_get() function returns the object index instead of object
pointer. This introduced an extra converting the index to the object by
calling mlx5_ipool_get() in most of the case. And that extra converting
hurts multi-thread performance since mlx5_ipool_get() uses the global
lock inside. As the hash Rx queue object itself also contains the index,
returns the object directly will achieve better performance without the
global lock.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Jump action connects different level of flow tables and allows packet
handling in the chain of flows.
A new action construct data struct is also added in this commit to help
to handle not only the dynamic jump action but also for the other
generic dynamic actions. The actions with empty mask configuration means
dynamic action, and the dedicated action will be created with the flow
action configuration during flow creation. In that dynamic action case,
the action will be appended to the table template's action list during
table creation.
When creating the flows, traverse the action list and pick the dynamic
action configuration details from flow actions as the action construct
data struct describes, then create the dedicated dynamic actions.
This commit adds the jump action and the generic dynamic action
construct mechanism.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
In case port is being stopped, all created flows should be flushed.
This commit adds the flow flush helper function.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The HW steering uses async queue-based flow rules management
mechanism. The matcher and part of the actions have been
prepared during flow table creation. Some remaining actions
will be constructed during flow creation if needed.
A flow postpone attribute bit describes if flow management
should be applied to the HW directly. An extra push function
is provided to force push all the cached flows to the HW.
Once the flow has been applied to the HW, the pull function
will be called to get the queued creation/destruction flows.
The DR rule flow memory is represented in PMD layer instead
of allocating from HW steering layer. While destroying the
flow, the flow rule memory can only be freed after the CQE
received.
The HW queue job descriptor is currently introduced to convey
the flow information and operation type between the flow
insertion/destruction in the pull function.
This commit adds the basic flow queue operation for:
rte_flow_async_create();
rte_flow_async_destroy();
rte_flow_push();
rte_flow_pull();
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Flow table is a group of flows with the same matching criteria
and the same actions defined for them. The table defines rules
that have the same matching fields but with different matching
values. For example, matching on 5 tuple, the table will be
(IPv4 source + IPv4 dest + s_port + d_port + next_proto)
while the values for each rule will be different.
The templates' relevant matching criteria and action instances
will be created in the table creation and saved in the table.
As table attributes indicate the supported flow number, the flow
memory will also be allocated at the same time.
This commit adds the table management functions.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The action template holds a list of action types that will be
used together on the same rule. The template's actions instances
will be created only when the template bind to the dedicated
group. And the created actions will be saved to each individual
group in order for best performance. The actions in a group will
not be shared with each other unless shared actions are specified.
This commit adds the action template management which stores the
flow action template.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The pattern template defines flows that have the same matching
fields but with different matching values.
For example, matching on 5 tuple TCP flow, the template will be
(eth(null) + IPv4(source + dest) + TCP(s_port + d_port) while
the values for each rule will be different.
Due to the pattern template can be used in different domains, the
items will only be cached in pattern template create stage, while
the template is bound to a dedicated table, the HW criteria will
be created and saved to the table. The pattern templates can be
used by multiple tables. But different tables create the same
criteria and will not share the matcher between each other in order
to have better performance.
This commit adds pattern template management.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The hardware steering is backend to support rte_flow_async API in
mlx5 PMD. The port configuration function creates the queues and
needed flow management resources.
The PMD layer configuration function allocates the queues' context
and per-queue job descriptor pool. The job descriptor pool size
is equal to the queue size, and the job descriptors will be popped
from pool with LIFO strategy to convey the flow information during
flow insertion/destruction. Then, while polling the queued operation
result, the flow information will be extracted from the job descriptor
and the descriptor will be pushed back to the LIFO pool.
The commit creates the flow port queues and the job descriptor pools.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The new hardware steering engine relies on using dedicated steering WQEs
instead of writing to the low-level steering table entries directly.
In the first implementation the hardware steering engine supports the
new queue based Flow API, the existing synchronous non-queue based Flow
API is not supported.
A new dv_flow_en value 2 is added to manage mlx5 PMD steering engine:
dv_flow_en rte_flow API rte_flow_async API
------------------------------------------------
0 support not support
1 support not support
2 not support support
This commit introduces the extra dv_flow_en = 2 to specify the new
flow initialize and manage operation routine.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The HW steering low-level implementation will be added later in another
patch series. To avoid the linkage issues the abstract stub replacement
is provided currently.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The Connect-X steering is a lookup hardware mechanism that accesses flow
tables, matches packets to the rules, and performs specified actions.
Historically, mlx5 PMD implements several software engines to manage
steering hardware facility:
- FW Steering - Verbs/Direct Verbs, uses FW calls to manage flows
- SW Steering - DevX/mlx5dv, uses WQEs to access table memory directly
However, there are still some disadvantages:
- performance is limited, we should invoke firmware either to
manage the entire flow, or to handle some internal steering objects
- organizing and preparing flow infrastructure (actions, matchers,
groups, etc.) on the flow inserting is sure to cause slow flow
insertion
- security, exposing the low-level steering entries directly to the
userspace may cause security risks
A new hardware WQE based steering operation with codename "HW Steering"
is going to be introduced to get rid of the security risks. And it will
take advantage of the recently new introduced async queue-based rte_flow
APIs to prepare everything in advance to achieve high insertion rate.
In this new HW steering engine, the original SW steering rte_flow API
will not be supported in the first implementation, only the new async
queue-based flow operations is going to be supported. A new steering
mode parameter for dv_flow_en will be introduced and user will be
able to engage the new steering engine.
This commit adds the basic driver operation.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The hardware since ConnectX-7 supports waiting on
specified moment of time with new introduced wait
descriptor. A timestamp can be directly placed
into descriptor and pushed to sending queue.
Once hardware encounter the wait descriptor the
queue operation is suspended till specified moment
of time. This patch update the Tx datapath to handle
this new hardware wait capability.
PMD documentation and release notes updated accordingly.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The wait on time configuration flag is copied to the Tx queue
structure due to performance considerations. Timestamp
mask is prepared and stored in queue structure as well.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The "tx_db_nc" devarg forces doorbell register mapping to non-cached
region eliminating the extra write memory barrier. This argument was
used in creating the UAR for Tx and thus affected its performance.
Recently [1] its use has been extended to all UAR creation in all mlx5
drivers, and now its name is no longer so accurate.
This patch changes its name to "sq_db_nc" to suit any send queue that
uses it. The old name will still work for backward compatibility.
[1] commit 5dfa003db5 ("common/mlx5: fix post doorbell barrier")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
One of the E-Switch vports plays the special role - it is assigned as
"E-Switch manager" and has some special exclusive rights and duties - it
maintains all the representors, manages FDB domain flows, etc. By
default, the E-Switch vport index was supposed to be zero on standalone
NICs (regular ConnectX) and 0xFFFE SmartNIC (BlueField), but that was
not always correct - this index can be assigned with any value by
kernel/hypervisor.
Currently the E-Switch manager vport id is supposed to be default - 0
for standalone NICs, and 0xFFFE for the SmartNICs, and is deduced from
the device PCI id.
To handle this and do not suggest any default values, can use DevX API
to query E-Switch manager vport ID directly from the firmware during
initialization, and use that value by default. If the new method is not
provided (legacy firmware), fallback to use the PCI id approach.
Fixes: a564038699 ("net/mlx5: support E-Switch manager egress traffic match")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Recently shared RxQ has been introduced. All shared Rx queues with same
group and queue ID share the same rxq_ctrl, but each one has
mlx5_rxq_priv structure.
The mlx5_rx_queue_setup generates a new rxq_priv structure, and looks
for a rxq_ctrl structure to refer to. If there is already a compatible
rxq_ctrl structure it refers it, otherwise it calls the mlx5_rxq_new
function that generates a new one.
This patch makes mlx5_rxq_new function "standalone", it generates a
rxq_ctrl structure regardless to specific rxq_priv structure. All
operations on the rxq_ctrl structure that depend on the new rxq_priv
structure are performed in the mlx5_rx_queue_setup function, at the same
place for either a new rxq_ctrl structure or an existing rxq_ctrl
structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The mlx5_rxq_new function creates control structure and if it from
shared group, it is inserted into the shared RXQs list.
After that, there are some validations which in case they fail, RxQ
control object is released.
In these cases, invalid pointer to the object still in the list, and
access it may cause a crash.
Move the list insertion to the end of the function where the RxQ control
object is surely valid.
Fixes: 09c2555303 ("net/mlx5: support shared Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Previously API flow_dv_query_count_ptr is defined to get counter's
action pointer. This DV function is directly called and the better
way is by the callback.
Add one arg in API mlx5_counter_query and the related callback
counter_query. The added arg is for counter's action pointer.
Signed-off-by: Haifei Luo <haifeil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
If meter policy action was RSS, the correct items were not provided
for sub-policy creation.
This fixes the issue by providing original items in meter split, so
the sub-policy creation gets the correct items.
Fixes: 3c481324ba ("net/mlx5: fix meter flow direction check")
Cc: stable@dpdk.org
Signed-off-by: Shun Hao <shunh@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The mlx5_l3t_prepare_entry() function is not used anymore.
This commit removes the unused mlx5_l3t_prepare_entry() function.
Fixes: 92ef4b8f16 ("ethdev: remove deprecated shared counter attribute")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
While mlx5_hlist_create() failed, the rte_flow_error was not filled
with the corresponding error information.
This commit adds the missing rte_flow_error_set() for the failure case.
Fixes: f3020a331d ("net/mlx5: optimize hash list table allocate on demand")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Improve the devargs handling in two aspects:
- Parse the devargs string only once.
- Return error and report for unknown keys.
The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add configuration structure for port (ethdev). This structure contains
all configurations coming from devargs which oriented to port. It is a
field of mlx5_priv structure, and is updated in spawn function for each
port.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add inline function indicating whether HW objects operations can be
created by DevX. It makes the code more readable.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add configuration structure for shared device context. This structure
contains all configurations coming from devargs which oriented to
device. It is a field of shared device context (SH) structure, and is
updated once in mlx5_alloc_shared_dev_ctx() function.
This structure cannot be changed when probing again, so add function to
prevent it. The mlx5_probe_again_args_validate() function creates a
temporary IB context configure structure according to new devargs
attached in probing again, then checks the match between the temporary
structure and the existing IB context configure structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Move all device configure to be performed by mlx5_os_cap_config()
function instead of the spawn function.
In addition move all relevant fields from mlx5_dev_config structure to
mlx5_dev_cap.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Rearrange the mlx5_os_get_dev_attr() function in such a way that it
first executes the queries and only then updates the fields.
In addition, it changed its name in preparation for expanding its
operations to configure the capabilities inside it.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds in SH structure a flag which indicates whether is
E-Switch mode.
When configure "dv_esw_en" from devargs, it is enabled only when is
E-switch mode. So, since dv_esw_en has been configure, it is enough to
check if "dv_esw_en" is valid.
This patch also removes E-Switch mode check when "dv_esw_en" is checked
too.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_flow_counter_mode_config function exists for both Linux and
Windows with the same name and content.
This patch moves its implementation to the folder shared between the
operating systems, removing the duplication.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The realtime timestamp configure work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The check if device is VF work for Linux as same as Windows.
This patch removes it to the function implemented in the folder shared
between the operating systems, removing the duplication.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The sharing device context structure has a field named "device_attr"
which is filled by mlx5_os_get_dev_attr() function.
The spawn function calls mlx5_os_get_dev_attr() again and save it to
local variable identical to "device_attr" field.
There is no need for this duplication, because there is a reference to
the sharing device context structure from spawn function.
This patch removes the local "device_attr" from spawn function, and uses
the context's field instead.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The sharing device context structure has a field named "devx" which
indicates if DevX is supported.
The common configure structure has also field named "devx" with the same
meaning.
There is no need for this duplication, because there is a reference to
the common structure from within the sharing device context structure.
This patch removes it from sharing device context structure and uses the
common config structure instead.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The HCA attribute structure is field of net configure structure.
It is also field of common configure structure.
There is no need for this duplication, because there is a reference to
the common structure from within the net structures.
This patch removes it from net configure structure and uses the common
config structure instead.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The device arguments are parsed and updated twice during spawning. First
time before creating the share device context, and again later after
updating a default value to one of the arguments.
This patch consolidates them into one parsing and updates the default
values before it.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In mlx5_ethdev.c file are implemented those 4 functions:
- mlx5_dev_infos_get
- mlx5_fw_version_get
- mlx5_dev_set_mtu
- mlx5_hairpin_cap_get
In mlx5.h file they are declared twice. First time under mlx5.c file and
second time under mlx5_ethdev.c file.
This patch removes the redundant declaration.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mlx5_alloc_shared_dev_ctx() function has a local variable named
"err" which contains the errno value in case of failure.
When functions called by this function are failed, this variable is
updated with their return value (that should be a positive errno value).
However, some functions doesn't update errno value by themselves or
return negative errno value. If one of them fails, the "err" variable
contains negative value what cause to assertion failure.
This patch updates all functions uses by mlx5_alloc_shared_dev_ctx()
function to update rte_errno and take this value instead of "err" value.
Fixes: 5dfa003db5 ("common/mlx5: fix post doorbell barrier")
Fixes: 5d55a494f4 ("net/mlx5: split multi-thread flow handling per OS")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The ASO connection tracking structure is initialized once for sharing
device context.
Its release takes place in the close function which is called for each
ethdev individually. i.e. when there is more than one ethdev under the
same sharing device context, it will be destroyed when one of them is
closed. If the other wants to use it later, it may cause it to crash.
In addition, the creation of this structure is performed in the spawn
function. If one of the creations of the objects following it fails, it
is supposed to be destroyed but this does not happen.
This patch moves its release to the sharing device context free function
and thus solves both problems.
Fixes: 0af8a2298a ("net/mlx5: release connection tracking management")
Fixes: ee9e5fad03 ("net/mlx5: initialize connection tracking management")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In "dv_xmeta_en" devarg there is an option of dv_xmeta_en=3 which
engages tunnel offload mode. In E-Switch configuration, that mode
implicitly activates dv_xmeta_en=1.
The update according to E-switch support is done immediately after the
first parsing of the devargs, but there is another adjustment later.
This patch moves the adjustment after the second parsing.
Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The MLX5 net driver supports "probe again". In probing again, it
creates a new ethdev under an existing infiniband device context.
Sibling devices sharing infiniband device context should have compatible
configurations, so some of the devargs given in the probe again, the
ones that are mainly relevant to the sharing device context are sent to
the mlx5_dev_check_sibling_config function which makes sure that they
compatible its siblings.
However, the arguments are adjusted according to the capability of the
device, and the function compares the arguments of the probe again
before the adjustment with the arguments of the siblings after the
adjustment. A user who sends the same values to all siblings may fail in
this comparison if he requested something that the device does not
support and adjusted.
This patch moves the call to the mlx5_dev_check_sibling_config function
after the relevant adjustments.
Fixes: 92d5dd4834 ("net/mlx5: check sibling device configurations mismatch")
Fixes: 2d241515eb ("net/mlx5: add devarg for extensive metadata support")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Multiple PMDs have dummy/noop Rx/Tx packet burst functions.
These dummy functions are very simple, introduce a common function in
the ethdev and update drivers to use it instead of each driver having
its own functions.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This patch removes a redundant assert in mlx5_tx_packet_multi_tso().
That assert assured that the amount of bytes requested to be inlined
is greater than or equal to the minimum amount of bytes required
to be inlined. This requirement is either derived from the NIC
inlining mode or configured through devargs. When using TSO this
requirement can be disregarded, because on all NICs it is satisfied by
TSO inlining requirements, since TSO requires L2, L3, and L4 headers to
be inlined.
Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Meter capabilities reporting is not up to date.
Mellanox NICs support RFC2698 and RFC4115 as well as RFC2697.
Add these marker operations to the capabilities list.
Fixes: 6bc327b94f ("net/mlx5: fill meter capabilities using DevX")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Committed Bucket Size calculation tries to fit into 8-bit wide
mantissa field by setting 256 as a maximum value for it.
To compensate for this increase in the mantissa value the exponent
value has to be reduced by 8. But it gives a negative exponent
value for CBS less than 128. And negative exponent value is not
supported by the NIC. Adjust CSB calculation only for values
bigger than 128 to allow both small and big bucket sizes.
Fixes: 3bd26b23ce ("net/mlx5: support meter profile operations")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Due to updated modify field action immediate value buffer
pattern [1] the implicit shift for the metadata is not
needed anymore and should be removed.
[1] commit 40c8fb1fd3 ("net/mlx5: update modify field action")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
As modify field action immediate source parameter the metadata
should follow the CPU endianness (according to SET_META action
structure format), and mlx5 PMD wrongly handled the immediate
parameter metadata buffer as big-endian, resulting in wrong
metadata set action with incorrect endianness.
Fixes: 40c8fb1fd3 ("net/mlx5: update modify field action")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.
Remove redundant NULL pointer checks before free functions
found by nullfree.cocci
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The support for linking rte_pmd_mlx5.h functions with
C++ applications was missing.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently root table as destination is not supported.
The jump action which finally be translated to underlying root table in
rdma-core should be rejected.
Fixes: f78f747f41 ("net/mlx5: allow jump to group lower than current")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
To optimize datapath, the mlx5 pmd checked for mark action on flow
creation, and flagged possible destination rxqs (through queue/RSS
actions), then it enabled the mark action logic only for flagged rxqs.
Mark action didn't work if no queue/rss action was in the same flow,
even when the user use multi-group logic to manage the flows.
So, if mark action is performed in group X and the packet is moved to
group Y > X when the packet is forwarded to Rx queues, SW did not get
the mark ID to the mbuf.
Flag Rx datapath to report mark action for any queue when the driver
detects the first mark action after dev_start operation.
Fixes: 8e61555657 ("net/mlx5: fix shared RSS and mark actions combination")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Preparation of the stride size and the number of strides for
Multi-Packet RQ was updated recently to accommodate the hardware
limitation about minimum WQE size. The wrong assertion was
introduced to ensure this limitation is met. Assert that the
configured WQE size is not less than the minimum supported size.
Fixes: 34776af600 ("net/mlx5: fix MPRQ stride devargs adjustment")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The maximum packet headers size for TSO is calculated as a sum of
Ethernet, VLAN, IPv6 and TCP headers (plus inner headers).
The rationale behind choosing IPv6 and TCP is their headers
are bigger than IPv4 and UDP respectively, giving us the maximum
possible headers size. But it is not true for L3 headers.
IPv4 header size (20 bytes) is smaller than IPv6 header size
(40 bytes) only in the default case. There are up to 10
optional header fields called Options in case IHL > 5.
This means that the maximum size of the IPv4 header is 60 bytes.
Choosing the wrong maximum packets headers size causes inability
to transmit multi-segment TSO packets with IPv4 Options present.
PMD check that it is possible to inline all the packet headers
and the packet headers size exceeds the expected maximum size.
The maximum packet headers size was set to 192 bytes before,
but its value has been reduced during Tx path refactor activity.
Restore the proper maximum packet headers size for TSO.
Fixes: 50724e1bba ("net/mlx5: update Tx definitions")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
A debug assertion in Single-Packet Receive Queue (SPRQ) mode
required all Rx mbufs to have a 128 byte headroom,
based on the assumption that rte_pktmbuf_init() sets it.
However, rte_pktmbuf_init() may set a smaller headroom
if the dataroom is insufficient, e.g. this is a natural case
for split buffer segments. The headroom can also be larger.
Only check the headroom size when vectored Rx routines
are used because they rely on it. Relax the assertion
to require sufficient headroom size, not an exact one.
Fixes: a0a45e8af7 ("net/mlx5: configure Rx queue for buffer split")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When building with -Db_sanitize=thread, GCC gives a warning:
drivers/net/mlx5/mlx5_flow_meter.c: In function ‘mlx5_flow_meter_create’:
drivers/net/mlx5/mlx5_flow_meter.c:1170:33: warning: ‘legacy_fm’ may be
used uninitialized in this function [-Wmaybe-uninitialized]
This is a false-positive: legacy_fm is initialized and used
if and only if priv->sh->meter_aso_en is false.
Work around this by initializing legacy_fm to NULL.
Add an assertion before legacy_fm use in case the logic changes.
Fixes: 4443201863 ("net/mlx5: support meter creation with policy")
Cc: stable@dpdk.org
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add support for the imissed counter using the DevX API on Windows.
imissed is queried by creating a queue counter for the port, attaching
it to all created RQs and querying the "out_of_buffer" field.
If the counter cannot be created, imissed will always report 0.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When application creates several flows to match on GRE tunnel without
explicitly specifying GRE protocol type value in flow rules, PMD will
translate that to zero mask.
RDMA-CORE cannot distinguish between different inner flow types and
produces identical matchers for each zero mask.
The patch extracts inner header type from flow rule and forces it in
GRE protocol type, if application did not specify any.
Fixes: 84c406e745 ("net/mlx5: add flow translate function")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>