This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Currently all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit query and store the hairpin capabilities from the device.
Those capabilities will be used when creating the hairpin queue.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Rx queue for LRO is created using DevX. Flows created on this queue
must use the DV flow engine.
This patch adds check of dv_flow_en=1 when configuring LRO support
on device spawn.
Documentation is updated accordingly.
Fixes: 175f1c21d033 ("net/mlx5: check conditions to enable LRO")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Trying to compile mlx5 pmd in debug mode with icc
will lead to compilation failures due to the fact that
icc doesn't have support for the pragma of pedantic.
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The VXLAN related definitions and structures are moved from
rte_ether.h to a new header file: rte_xvlan.h.
Also introducing a new define macro for VXLAN default port id:
RTE_VXLAN_DEFAULT_PORT
Signed-off-by: Flavia Musatescu <flavia.musatescu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Raslan Darawsheh <rasland@mellanox.com>
Added mlx5_rxtx_vec_altivec.h which supports vectorized RX
using Altivec vector code. Modified associated build files
to use the new code.
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Tested-by: Raslan Darawsheh <rasland@mellanox.com>
The DevX counter management triggers an asynchronous event to get back
the new counters values from the HW.
The counter management doesn't trigger 2 parallel events for the same
pool, hence, the pool cannot be updated again in the event waiting time.
When the port is stopped, the DevX event mechanism wrongly was
destroyed what remained all the waiting pools in waiting state forever.
As a result, the counters of the stuck pools were never updated again.
Separate the DevX interrupt installation from the dev installation and
remove the DevX interrupt unregistration\registration from the
stop\start operations.
Now, the DevX interrupt should be installed in probe and uninstalled in
close.
Cc: stable@dpdk.org
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When to create hrxq for the drop, it could fail on creating qp and goto
the error handle which will release created ind_table by calling drop
release function, which takes rte_ethdev as the only parameter and uses
the priv->drop_queue.hrxq as input to release.
Unfortunately, at this point, the hrxq is not allocated and
priv->drop_queue.hrxq is still NULL, which leads to a segfault.
This patch fixes the above by allocating the hrxq at first place and
when the error happens, hrxq is released as the last one.
This patch also release other allocated resources by the correct order,
which is missing previously.
Fixes: 78be885295b8 ("net/mlx5: handle drop queues as regular queues")
Cc: stable@dpdk.org
Reported-by: Zengmo Gao <gaozengmo@jd.com>
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for matching flows on Geneve headers.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit add querying the HCA which FLEX protocols are already
enabled.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
mlx5_link_update immediately returns when called with no-wait parameter
and its call for retrieving the link status returns with EAGAIN error.
This is too harsh on busy systems where a first call fails with EAGAIN
from time to time.
This patch adds a (very limited) retry on such cases in order to allow
retrieving the link status.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch converts some of the casts to unaligned integer types.
The memcpy call is replaced with explicit copying because it
may require type qualifiers in some environments.
Fixes: 18a1c20044c0 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org
Reported-by: Jeremy Plsek <jplsek@iol.unh.edu>
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The transmitter packets counter was not updated
correctly in the loop inside the tx_burst routines.
Fixes: f32a3f5216a3 ("net/mlx5: fix completion queue overflow for large burst")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The receive queues list size is based on the size of uint32_t, so
when allocating the memory, the correct value should be used. Or
else there is risk to corrupt the memory, depending on the queues
number, because there is some pad area for alignment. If the queue
number is not large enough, the issue couldn't be observed.
Fixes: dc9ceff73c99 ("net/mlx5: create advanced RxQ via DevX")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The Item order validation between L2 and L3 is missing, which leading to
the following flow rule is accepted:
testpmd> flow create 0 ingress pattern ipv4 / eth / end actions drop /
end
Only the outer L3 layer should check whether the L2 layer is present,
because the L3 layer could directly follow the tunnel layer
without L2 layer.
Meanwhile inner L2 layer should check whether there is inner L3 layer
before it.
Fixes: 23c1d42c7138 ("net/mlx5: split flow validation to dedicated function")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Now that -Wextra, and other warning flags are standard in the build, remove
any duplication in driver build specifications.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
The routine mlx5dv_query_devx_port() was called directly
instead of using the mlx5 glue thunk.
Fixes: d5c06b1b10ae ("net/mlx5: query vport index match mode and parameters")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
In LAG configuration the devices in the same switch domain
might be spawned on the base of different PCI devices, so
we should check all devices backed by mlx5 PMD whether they
belong to specified switch domain. When the new devices are
being created it is not possible to detect whether the
sibling devices created in the current probe() loop belong
to the driver, driver field is not filled yet (it will be
done on returned success of current probe()). This patch
updates the device scanning, allowing extra match on
current backing PCI device, is being used to create siblings.
Fixes: f7e95215ac7c ("net/mlx5: extend switch domain searching range")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The hardware may have limitations on maximal amount of
supported Tx descriptors building blocks (WQEBB). Application
requires the Tx queue must accept the specified amount of packets.
If inline data feature is engaged the packet may require more WQEBBs
and overall amount of blocks may exceed the hardware capabilities.
Application has to make a trade-off between Tx queue size and maximal
data inline size.
In case if the inline settings are not requested explicitly with
devarg keys the default values are used. This patch adjusts the
applied default values if large Tx queue size is requested and
default inline settings can not be satisfied due to hardware
limitations.
The explicitly requested inline setting may be aligned (enlarging
only) by configurations routines to provide better WQEBB filling,
this implicit alignment is the subject for adjustment either.
The warning message is emitted to the log if adjustment happens.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The devices backed by mlx5 PMD might share the same multiport
Infiniband device context. It regards representors and slaves
of bonding device. These ports are spawned with devargs.
These patch check whether configuration deduced from these
devargs is compatible with configurations if devices
sharing the same context. It prevents the incorrect
whitelists, like:
-w 82:00.0,representor=0,dv_flow_en=1
-w 82:00.0,representor=1,dv_flow_en=0
The representors with indices [0-1] are supposed to spawned
over the same PCi device, but there is dv_flow_en parameter
mismatch.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
With bonding configuration multiple PFs may represent the
single switching device with multiple ports as representors.
To distinguish representors belonging to different PFs we
should generated unique port ID. It is proposed to use
the PF index in bonding configuration to generate this
unique port IDs.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
With bonding configurations the switch domain may be shared
between multiple PCI devices, we should search the switch
sibling devices within the entire set of present ethernet
devices backed by the mlx5 PMD.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
There new kernel/rdma_core [1] supports matching on metadata
register instead of vport field to provide operations over
VF LAG bonding configurations. This patch provides correct
translations for flow matchers and destination port actions
if united E-Switch (for VF LAG) is configured and/or new vport
matching mode is engaged.
[1] http://patchwork.ozlabs.org/cover/1122170/
"Mellanox, mlx5 vport metadata matching"
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The routine mlx5_port_to_eswitch_info() is elaborated
to two ones (get E-Switch port parameters by port and
by device pointer) and simplified to returning structure
containing all parameters instead of copying.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
There new kernel/rdma_core [1] supports matching on metadata
register instead of vport field to provide operations over
VF LAG bonding configurations. The patch retrieves parameters
and information about the way is engaged to match vport on E-Switch.
[1] http://patchwork.ozlabs.org/cover/1122170/
"Mellanox, mlx5 vport metadata matching"
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
If bonding Infiniband device is found the unified E-Switch
is supposed and the extra rdma-core/kernel support is needed
to retrieve vport indices. The patch introduces this feature
defines, bonding support check is added to probe routine.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
If device is VF LAG bonding one the port name includes
the bonding Infiniband device name and looks like:
82:00.0_mlx5_bond_0 - for master device port PF0
82:00.1_mlx5_bond_0_representor_5 - for representor
VF5 over PF1
where bonding Infiniband device mlx5_bond_0 controls
the 82:00.0 as PF0 and 82:00.1 as PF1 PCI functions.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The Mellanox NICs starting from ConnectX-5 support LAG over
NIC ports internally, implemented by the NIC firmware and hardware.
The multiport NIC presents multiple physical PCI functions (PF),
with SR-IOV multiple virtual PCI functions (VFs) might be presented.
With switchdev mode the VF representors are engaged and PFs and their
VFs are connected by internal E-Switch feature. Each PF and related VFs
have dedicated E-Switch and belong to dedicated switch domain.
If NIC ports are combined to support NIC the kernel drivers introduce
the single unified Infiniband multiport devices, and all only one
unified E-Switch with single switch domain combines master PF
all all VFs. No extra DPDK bonding device is needed.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
At device probing the device list to spawn was allocated
as dynamic size local variable. It was no possible to have
one unified exit point from routine due to compiler warnings.
This patch allocates the spawn device list directly with
rte_zmalloc() and it is possible to goto to unified exit
label from anywhere of the routine.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The routine mlx5_ibv_device_to_pci_addr() takes Infiniband
device list object, takes the device sysfs path from there
and retrieves PCI address. The routine may be implemented
in more generic way by taking sysfs path directly as parameter
and can be used for getting PCI address of netdevs.
The generic routine is renamed to mlx5_dev_to_pci_addr()
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Now all devices created over the same multiport IB device
have shared context containing the backing PCI device field.
For the VF LAG configurations it becomes possible the
representors might be connected to VF created over different
PFs. In this case representors have the different backing
PCI devices and mentioned field should be moved to device
private area.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The PCI virtual function type was not recognized correctly
for ConnectX-6 VF.
Fixes: f0354d842344 ("net/mlx5: add ConnectX-6 device IDs")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The PCI virtual function type was not recognized correctly
for BlueField VF.
Fixes: f38c54571d62 ("net/mlx5: split PCI from generic probing")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
In the process of recovery from error CQE, when using vectorized Rx
burst, the initialization of data length in mbufs was not done.
As a result the wrong length was left written in mbuf, causing
memory overwrite or wrong error report.
This patch fixes the initialization of mbuf data length during
recovery from error CQE, when using vectorized Rx burst,
Fixes: 88c0733535d6 ("net/mlx5: extend Rx completion with error handling")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The txq_uar_init() routine uses the uninitialized uar_mmap_offset
field in 32-bit configurations due to this field is initialized
after txq_uar_init() call.
Fixes: 120dc4a7dcd3 ("net/mlx5: remove device register remap")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Enabling/disabling of allmulticast mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Change return value of the callbacks from void to int. Make
implementations across all drivers return negative errno
values in case of error conditions.
Both callbacks are updated together because a large number of
drivers assign the same function to both callbacks.
Signed-off-by: Igor Romanov <igor.romanov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Enabling/disabling of promiscuous mode is not always successful and
it should be taken into account to be able to handle it properly.
When correct return status is unclear from driver code, -EAGAIN is used.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Change eth_dev_infos_get_t return value from void to int.
Make eth_dev_infos_get_t implementations across all drivers to return
negative errno values if case of error conditions.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The Rx completion queue doorbell field needs to be updated after
the last CQE decompressed. For the weaker memory model processors,
the compiler barrier is not sufficient to guarantee the order of
these operations, so use the coherent I/O memory barrier to make
sure these fields are updated in order.
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org
Suggested-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Matan Azrad <matan@mellanox.com>
E-switch tables one and above provide higher insertion rate
than table zero, as well as enhanced functionality.
This patch adds a mechanism to utilize these advantages, by creating
a default rule on port start, which directs all packets from e-switch
table zero to table one.
Other flow rules, requested for group n, will be created in
e-switch table n+1.
Jump action to e-switch group n will be created to group n+1.
Utility function mlx5_flow_group_to_table() is added to translate the
rte_flow group value to HW table value, and is called by PMD flow
engine on flow rule validation and creation.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The mlx5 PMD uses Netlink socket to communicate with Infiniband
devices kernel drivers to perform some control and setup operations.
The kernel drivers send the information back to the user mode
with Netlink messages which are processed in libnl callback routine.
This routine perform reply message (or set of messages) processing
and returned the processing result in ibindex field of provided
context structure (of mlx5_nl_ifindex_data type). The zero ibindex
value meant an error of reply message processing. It was found in
some configurations the zero is valid value for ibindex and error
was wrongly raised. To avoid this the new flags field is provided
in context structure, attribute processing flags are introduced
and these flags are used to decide whether no error occurred and
valid queried values are returned.
Fixes: e505508a3858 ("net/mlx5: modify get ifindex routine for multiport IB")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch implements ethdev operations get_module_info and
get_module_eeprom, to support ethtool commands ETHTOOL_GMODULEINFO
and ETHTOOL_GMODULEEEPROM.
New functions mlx5_get_module_info() and mlx5_get_module_eeprom()
added in mlx5_ethdev.c.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for modifying the VID of the outermost VLAN
header already present in the packet.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for modifying the VLAN ID (VID) field
in an about-to-be-pushed VLAN header.
This feature can only modify the VID field of a new VLAN header yet
to be pushed. It does not support modifying an existing or already
pushed VLAN headers.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for modifying the VLAN priority (PCP) field
in about-to-be-pushed VLAN header.
This feature can only modify the PCP field of a new VLAN header yet
to be pushed. It does not support modifying an existing or already
pushed VLAN headers.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN using
direct verbs flow rules.
If present in the flow, The VLAN default values are taken from the
VLAN item configuration.
In this commit only the VLAN TPID value can be set since VLAN
modification actions are not supported yet.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>