The sampling feature introduces the scale flow group with factor,
then the scaled table value can be used for the normal path table
due to this table be created implicitly.
But if the input group value already be scaled, for example the
group value of sampling suffix flow, then use 'skip_scale" flag
to skip the scale twice in the translation action.
Consider the flow with jump action and this jump action could be
created implicitly, PMD may only scale the original flow group
value or scale the jump group value or both, so extend the
'skip_scale' flag to two bits:
If bit0 of 'skip_scale' flag is set to 1, then skip the scale the
original flow group;
If bit1 of 'skip_scale' flag is set to 1, then skip the scale the
jump flow group.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
mlx5 E-Switch mirroring is implemented as multiple destination array in
one steering table. The array currently supports only port ID as
destination actions.
This patch adds the jump action support to the array as one of
destination.
The packets can be mirrored to the port and jump to the next table in
the same destination array allowing to continue handling in the new
table.
For example:
set sample_actions 0 port_id id 1 / end
flow create 0 ingress transfer pattern eth / end actions
sample ratio 1 index 0 / jump group 1 / end
flow create 1 ingress transfer group 1 pattern eth / end actions
set_mac_dst mac_addr 00:aa:bb:cc:dd:ee / port_id id 2 / end
The flow results all the matched ingress packets are mirrored
to port id 1 and go to group 1. In the group 1, packets are modified
with the destination mac and sent to port id 2.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The rte_ethdev_driver.h, rte_ethdev_vdev.h and rte_ethdev_pci.h files are
for drivers only and should be a private to DPDK and not installed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Steven Webster <steven.webster@windriver.com>
PMD validates the rss action in the sample sub-actions list,
then translates into rdma-core action and it will be used for sample
path destination.
If the RSS action is in both sample sub-actions list and original flow,
the rss level and rss type in the sample sub-actions list should be
consistent with the original flow list, since the expanding items
for RSS should be the same for both actions.
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
RSS action is valid only in NIC-RX domain, this fix bypass
the function that getting RSS action from the flow action list
under no NIC-RX domain.
Fixes: e745f90007 ("net/mlx5: optimize flow RSS struct")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The GENEVE TLV option matching flows must be created
using a translation function.
This function checks whether we already created a Devx
object for the matching and either creates the objects
or updates the reference counter.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds validation routine for the GENEVE
header TLV option.
The GENEVE TLV option match must include all fields
with full masks due to NIC does not support masking
on option class, type and length.
The option data length must be non zero and provided
data pattern should be zero neither due to hardware
limitations.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently firmware supports the only TLV object per device
to match on the GENEVE header option.
This patch adds the simple TLV object management to the mlx5 PMD.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds the translation function which
sets the qfi, PDU type.
The next extension header which indicates the following
extension header type is set to 0x85 - a PDU session
container.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
In this patch we add validation routine for
GTP PSC extension header.
The GTP PSC extension header must follow the
GTP item.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Using common function for DevX SQ creation for rearm and clock queue.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Using common function for CQ creation at rearm queue and clock queue.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In ASO SQ creation, the PMD allocates umem buffer for SQ.
When umem buffer allocation fails, the MR and CQ memory are not freed
what caused a memory leak.
Free it.
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The data-path code doesn't take care on 'rxq_cqe_pad_en' and use padded
CQE for any case when the system cache-line size is 128B.
This makes the argument redundant.
Remove it.
Fixes: bc91e8db12 ("net/mlx5: add 128B padding of Rx completion entry")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
According to the current data-path implementation in the PMD the CQE
size must follow the cache-line size.
So, the configuration of the CQE size should be depended in
RTE_CACHE_LINE_SIZE.
Wrongly, part of the CQE creations didn't follow it exactly what caused
an incompatibility between HW and SW in the data-path when working in
128B cache-line size systems.
Adjust the rule for any CQE creation.
Remove the cqe_size attribute from the DevX CQ creation command and set
it inside the command translation according to the cache-line size.
Fixes: 79a7e409a2 ("common/mlx5: prepare support of packet pacing")
Fixes: 5cd0a83f41 ("common/mlx5: support more fields in DevX CQ create")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Previously, the identification of hairpin queue was done using
mlx5_rxq_get_type() function.
Recent patch replaced it with use of mlx5_rxq_get_hairpin_conf(),
and check of the return value conf != NULL.
The case of return value is NULL (queue is not hairpin) was not handled.
As result, non-hairpin flows were wrongly handled.
This patch adds the required check for return value is NULL.
Fixes: 509f8470de ("net/mlx5: do not split hairpin flow in explicit mode")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
multi-threaded flows feature uses pthread function pthread_key_create
but for Windows the destruction option in the function is unimplemented.
To resolve it, Windows will implement destruction mechanism to cleanup
mlx5_flow_workspace object for each terminated thread.
Linux flow will keep the current behavior.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Khoa To <khot@microsoft.com>
Wrap glue calls dr_create_flow_action_sampler() and
dr_create_flow_action_dest_array() as OS-specific functions.
This is a follow up on
commit b293fbf967 ("net/mlx5: add OS specific flow actions operations")
On Windows, the sampling actions wrappers currently return ENOTSUP.
Using configuration definitions HAVE_MLX5_DR_CREATE_ACTION_FLOW_SAMPLE and
HAVE_MLX5_DR_CREATE_ACTION_DEST_ARRAY the missing sampling DV structs
are added as stubs to windows/mlx5_glue.h file.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In Tx queue creation, there are two validations for the Tx
configuration.
When one of them fails, the MR btree memory was not freed what caused a
memory leak.
Free it.
Fixes: f6d9ab4e76 ("net/mlx5: check Tx queue size overflow")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In Rx queue creation, there are some validations for the Rx
configuration.
When one of them fails, the MR btree memory was not freed what caused a
memory leak.
Free it.
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The vxlan_decap action performs decapsulation of the VXLAN tunnel.
Currently we can create a flow with vxlan_decap without
matching on VXLAN header.
To solve this issue this patch adds validation verifying
that the VXLAN item was detected when specifying
vxlan_decap action.
Fixes: 49d6465af3 ("net/mlx5: add VXLAN decap action to Direct Verbs")
Cc: stable@dpdk.org
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
In order to allow mbuf mark ID update in Rx data-path, there is a
mechanism in the PMD to enable it according to the rte_flows.
When a flow with mark ID and RSS/QUEUE action exists, all the relevant
Rx queues will be enabled to report the mark ID.
When shared RSS action is combined with mark action, the PMD mechanism
misses the Rx queues updates.
This commit handles the shared RSS case in the mechanism too.
Fixes: e1592b6c4d ("net/mlx5: make Rx queue thread safe")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The clang compiler warns on size mismatches of several
comparisons.
warning: comparison of integers of different signs
To resolve those the right types is used/cast to.
Fixes: 3e8edd0ef8 ("net/mlx5: update metadata register ID query")
Fixes: e554b672aa ("net/mlx5: support flow tag")
Fixes: c8f0abe7f8 ("net/mlx5: fix meter color register consideration")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
IPv6 broadcast flow creation is unsupported in Windows.
do not fail on IPv6 broadcast flow creation on this mast
to avoid entire default rules creation failure.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
use OS functions for flow_dv_sync_domain to compile
Windows.
mlx5_os_flow_dr_sync_domain is unsupported for Windows.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The mutex mlx5_dev_ctx_list_mutex was initialized with
PTHREAD_MUTEX_INITIALIZER global macro however this macro
is not supported on Windows OS shim implementation of pthreads
in DPDK.
Moved the init of this mutex to RTE_INIT to support this mutex
on both OSs.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Modify the ASO feature to use OS independent code
not to break Windows build.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Windows Devx interface name is the same as device name with
different size then IF_NAMESIZE. To support it MLX5_NAMESIZE
is defined with IF_NAMESIZE value for Linux and MLX5_FS_NAME_MAX
value for Windows.
Fixes: e9c0b96e35 ("net/mlx5: move Linux ifname function")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
There are three types of eth_dev_ops: primary, secondary and isolate
represented in three callback tables per OS. In this commit the OS
specific eth dev tables are unified into shared tables in file mlx5.c.
Starting from this commit all operating systems must implement the same
eth dev APIs. In case an OS does not support an API - it can return in
its implementation an error ENOTSUP.
Fixes: 042f5c94fd ("net/mlx5: refactor device operations for Linux")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Use macro HAVE_INFINIBAND_VERBS_H to successfully compile files both
under Linux and Windows (or any non Linux in general). Under Windows
this macro:
1. Hides Verbs references.
2. Exposes required DV structs that are under ifdefs related to rdma
core.
Linux code under definitions such as #ifdef HAVE_IBV_FLOW_DV_SUPPORT is
required unconditionally under Windows however those definitions are
never effective without rdma-core presence. Therefore update the #ifdef
condition to consider HAVE_INFINIBAND_VERBS_H as well (undefined macro
when running without an rdma-core library).
For example:
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_flow_os_create_flow() API. It is equivalent
to Linux rdma-core implementation. The API receives the matcher mask,
matcher value and an array of actions. They are copied into a PRM-like
struct devx_fs_rule_add_in. Then glue API devx_fs_rule_add() is called.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_flow_os_create_flow_action_dest_devx_tir()
API as the Linux rdma-core equivalent. Missing rdma-core parameters are
added to file mlx5_win_defs.h. The action TIR id and type
(MLX5_FLOW_CONTEXT_DEST_TYPE_TIR) are saved in the action struct. The
action struct will be added to array of actions and will be used later
by the flow creation API.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements the mlx5_flow_os_create_flow_matcher() API. It is
the Linux rdma-core equivalent implementation. Missing rdma-core
parameters (e.g. struct mlx5dv_flow_match_parameters) are added to file
mlx5_win_defs.h. The API allocates space to hold the PRM bits in PRM
fte_match_param format and copy the DV translated PRM bits into the
matcher struct. This matcher struct will be used later by the flow
creation API.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch adds the initial flow framework under Windows OS. It supports
a subset of filters (ETH, IPV4, UDP) and a QUEUE action. It is based on
DevX mechanism to send commands to the NIC through the kernel. It does
not support steering rules (i.e. writing directly to the NIC memory).
The Windows framework uses the existing DV framework where file
mlx5_flow_dv.c remains intact.
Steps involved in flow creation:
1. Create a domain (RX, TX, FDB). Since domains are created by steering
rules and not with DevX, Windows does not require a domain object (this
means switch dev mode which requires an FDB domain is not supported).
2. Create a table object. Windows only supports table 0. The call to
mlx5_flow_os_create_flow_tbl() silently returns successfully.
3. Create a matcher object. A matcher struct is created by calling
mlx5_flow_os_create_flow_matcher(). The matcher validation and
translation are part of the DV implementation. The matcher bits that
were created by DV in standard PRM format are copied into the matcher
struct.
4. Create an action object. The call to
mlx5_flow_os_create_flow_action_dest_devx_tir() creates an action struct
with the TIR type and id. This struct will be a parameter later in a
call to flow creation. All other action calls (e.g. packet reformat,
header modification, jump to flow table, etc) return with a non
supported error.
5. Create the flow. The call to mlx5_flow_os_create_flow() receives the
matcher struct, action struct, and copy them into Windows specific
fs_rule struct, then it calls glue API devx_fs_rule_add().
Details on additional APIs:
* mlx5_flow_os_get_type() is called during flow type selection. In
Windows it constantly returns MLX5_FLOW_TYPE_DV.
* mlx5_flow_os_item_supported() is called before starting DV items
validation or translation. It filters out the OS non supported items in
advance.
* mlx5_flow_os_action_supported() is called before starting DV actions
validation or translation. It filters out the OS non supported actions
in advance.
* mlx5_flow_adjust_priority() is an OS stub for flow priority
adjustment. Windows only supports flow priority 0.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap glue call dr_create_flow_action_default_miss() with an OS API. This
commit is a follow up on [1].
[1]
commit d4d85aa6f1 ("common/mlx5: add default miss action")
commit b293fbf967 ("net/mlx5: add OS specific flow actions operations")
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
mlx5_flow_adjust_priority() is used to adjust priorities according to
priorities levels. It is Verbs based and it is called from shared code
(mlx5_flow_dv.c). Therefore, wrap it in an OS API.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Support VF BDF scanning by checking both the BDF and raw BDF provided by
DevX. In Linux a PCI address is formatted as: domain, bus, device,
function (DBDF). This is right for both a PF and a VF. In Windows a PF
also has a DBDF format, but the domain is always 0, while a VF has a
special "domain" called "Virtual PCI Bus, Serial" (for example: "Virtual
PCI Bus Slot 2 Serial 2") or segment. The full VF format under Windows
is called raw DBF. Windows special domain must be considered and DevX
must be called to support it.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_dev_spawn() API which allocates an eth
device (struct rte_eth_dev) for each PCI device. When working with
representors virtual functions (as in Linux), one PCI device may spawn
several eth devices: the master device for the main physical function
(PF) and several representors for the virtual functions (VFs). However,
currently Windows does not work in switch dev mode, therefore, no VFs
are created and no representors are spawned. In this case one eth device
is created per one PCI main port. In addition to device creation - the
device configuration must be correctly set. The device arguments
(devargs - set by the user) are parsed but they may be overridden by
Windows limitations or hardware configurations. Some associated network
parameters are stored in eth device (e.g. ifindex, MAC address, MTU) and
some callback (e.g. burst functions) are set.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_os_pci_probe API under Windows. It does all
required initializations then it gets the PCI device list using glue API
get_device_list(). Next, all non MLX5 matched devices are filtered out.
The supported NIC types are: CONNECTX4VF, CONNECTX4LXVF, CONNECTX5VF,
CONNECTX5EXVF, CONNECTX5BFVF, CONNECTX6VF, MELLANOX_CONNECTX6DXVF. Each
device in the list is assigned with default configuration parameters,
most of them are 0. The default dv_flow_en parameter value is 1 (which
means Windows match and action flows are based on DV code). Next for
each PCI device call mlx5_dev_spawn() to create an eth device (struct
rte_ethdev). The implementation of device spawn is in the follow up
commit. Finally, the device list is free.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_os_open_device() API. It calls glue API
open_device() then glue API query_device() to fill in 'struct
mlx5_context' with data for later usage.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit adds stubs to VLAN VM operations. It is the Windows
equivalent implementation of [1]. The Linux implementation was based on
Netlink APIs which are not supported in Windows.
[1]
commit 7af10d29a4 ("net/mlx5/linux: refactor VLAN")
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements mlx5_is_removed() API. A new glue call
'init_shutdown_event' is added to support the new API.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit copies the interface name as saved in the device context
since its creation.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit implements API mlx5_get_mtu(). It returns the MTU size as
saved in the device context since its creation.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit adds a new glue function query_rt_values to support the new
API mlx5_read_clock().
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add support for mlx5_link_update() to get link speed and link state.
Other parameters are currently hard-coded.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commits adds ethdev stubs. These APIs are called from shared code
that must compile under Linux and Windows. The following stubs are added:
mlx5_set_mtu
mlx5_os_read_dev_counters
mlx5_intr_callback_unregister
mlx5_os_get_stats_n
mlx5_os_stats_init
mlx5_set_link_down
mlx5_set_link_up
mlx5_dev_get_flow_ctrl
mlx5_dev_set_flow_ctrl
mlx5_get_module_info
mlx5_get_module_eeprom
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commits implements API mlx5_get_mac(). It returns the MAC address
saved in the device context since its creation.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Windows supports the primary process with no secondary process control.
This commit adds stubs for requests to start/stop the data-path to the
secondary process and for requests to start/stop a queue of the primary
process.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit is the Windows part implementation of
commit d5ed8aa944 ("net/mlx5: add memory region callbacks in per-device cache")
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
ibdev_name and ibdev_path sizes are defined in Windows DevX
differently from the sizes used in Linux with
IBV_SYSFS_NAME_MAX and IBV_SYSFS_PATH_MAX.
Added MLX5_FS_NAME_MAX and MLX5_FS_NAME_PATH in mlx5_os.h for both OSs.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
There are two types of eth_dev_ops used under Windows: primary and
isolate mode. Their function calls initialization is added to the OS
specific file mlx5_os.c. Secondary process eth_dev_ops is nullified.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Get the list of MAC addresses and verify if the input mac parameter
already exists. If not - return -ENOTSUP (as Windows does not support
adding new MAC addresses). If the MAC address exists (EEXIST) return 0
(the equivalent of Linux implementation of this API).
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit is the Windows implementation of mlx5_os_get_dev_attr() API.
It follows the commit in [1]. A new file named mlx5_os.c is added under
windows directory as its Linux counterpart file: linux/mlx5_os.c.
[1].
commit e85f623e13 ("net/mlx5: remove attributes dependency on Verbs")
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Currently MR operations are Verbs based. This commit updates MR
operations prototypes such that DevX MR operations callbacks can be used
as well. Rename 'struct mlx5_verbs_ops' as 'struct mlx5_mr_ops' and
move it to shared file mlx5.h.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap the API to create/destroy event channel and to subscribe an event
with OS calls. In Linux those calls are implemented by glue functions
while in Windows they are not supported.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Glue function destroy_flow_action() was wrapped by OS specific operation
mlx5_flow_os_destroy_flow_action(). It was skipped in file mlx5.c.
Fixes: b293fbf967 ("net/mlx5: add OS specific flow actions operations")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap glue calls for UMEM registration and deregistration with generic OS
calls since each OS (Linux or Windows) has a different glue API
parameters.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Wrap glue calls alloc_pd() and dealloc_pd() with generic OS calls. In
Linux - protection domain allocations are implemented by Verbs glue API
while in Windows it is by DevX API.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Some Windows compilers consider static_assert() as calls to another
function rather than a compiler directive which allows checking type
information at compile time. This only occurs if the static_assert call
appears inside another function scope. To solve it move the
static_assert calls to global scope in the files where they are used.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In Linux 'static_assert' is defined in file mlx5_defs.h:
#ifndef HAVE_STATIC_ASSERT
#define static_assert _Static_assert
#endif
The same definition can originate from Linux file /usr/include/assert.h.
In Windows static_assert is used while _Static_assert is unknown.
Therefore update the definition condition to be:
#if !defined(HAVE_STATIC_ASSERT) && !defined(RTE_EXEC_ENV_WINDOWS)
#define static_assert _Static_assert
#endif
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Functions mlx5_check_mprq_support(), mlx5_rxq_mprq_enabled(),
mlx5_mprq_enabled() are moved from source file mlx5_rxq.c to header file
mlx5_rxtx.h and their type is updated to 'static __rte_always_inline'.
Previously the functions were declared as 'inline' in the source file
which was reported as 'unresolved external symbol' error by some Windows
linkers.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Replace Linux API usleep() and nanosleep() with rte_delay_us_sleep().
The replacement occurs in shared files compiled under different
operating systems.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Packet pacing is allocated under condition #ifdef HAVE_MLX5DV_PP_ALLOC.
In a similar way - free packet pacing index under the same condition.
This update is required to successfully compile under operating systems
which do not support packet pacing.
Fixes: aef1e20ebe ("net/mlx5: allocate packet pacing context")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This commit removes Linux files flow_verbs.c and mlx5_rxtx_vec.c
from Windows compilation.
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Before this commit the PMD used:
const int elt_n = 8
const int *stack[elt_n];
In Windows clang compiler complains:
net/mlx5/mlx5_flow.c:215:19: error: variable length array folded
to constant array as an extension [-Werror,-Wgnu-folding-constant]
Fix it by using a constant macro definition instead of a variable:
#define MLX5_RSS_EXP_ELT_N 8
const int *stack[MLX5_RSS_EXP_ELT_N];
Fixes: c7870bfe09 ("ethdev: move RSS expansion code to mlx5 driver")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The buffer split Rx offload is not compatible with Multi-Packet
Receiving Queue (MPRQ) Rx offload, hence, the buffer split
offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT and other related
values should be advertised only if there is no MPRQ engaged.
Fixes: 6c8f7f1c18 ("net/mlx5: report Rx buffer split capabilities")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Wrong index is used to find mbufs belonging to an application in
the rxq_free_elts_sprq() function in the case of vectorized MPRQ.
elts_ci points to the last allocated mbuf in this case, not rq_ci.
Use this field to avoid double free of mbuf and segmentation fault.
Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Initialize flow descriptor tunnel member during flow creation.
Prevent access to stale data and pointers when flow descriptor is
reallocated after release.
Fix flow index validation.
Fixes: e7bfa3596a ("net/mlx5: separate the flow handle resource")
Fixes: 8bb81f2649 ("net/mlx5: use thread specific flow workspace")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently, when creating the index pool, if the trunk size is not
configured, the index pool default trunk size will be 4096.
The maximum tunnel offload supported now is 256(MLX5_MAX_TUNNELS),
create the index pool with trunk size 4096 wastes the memory.
This commits changes the tunnel offload index pool trunk size to
MLX5_MAX_TUNNELS to save the memory.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Reviewed-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Currently, the hash list saves the hash key in the hash entry. And the
key is mostly used to get the bucket index only.
Save the entire 64 bits key to the entry will not be a good option if
the key is only used to get the bucket index. Since 64 bits costs more
memory for the entry, mostly the signature data in the key only uses
32 bits. And in the unregister function, the key in the entry causes
extra bucket index calculation.
This commit saves the bucket index to the entry instead of the hash key.
For the hash list like table, tag and mreg_copy which save the signature
data in the key, the signature data is moved to the resource data struct
itself.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Since all the hash table operations are related with one dedicated
bucket, the hash table lock and gen_cnt can be allocated per-bucket.
Currently, the hash table uses one global lock to protect all the
buckets, that global lock avoids the buckets to be operated at one
time, it hurts the hash table performance. And the gen_cnt updated
by the entire hash table causes incorrect redundant list research.
This commit optimized the lock and gen_cnt to bucket solid allows
different bucket entries can be operated more efficiently.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Previous patch added support of shared age action.
This feature is supported on group 1 and higher, and validation was
added accordingly.
On FDB table the group 0 is skipped to improve performance.
As a result the mentioned validation is not relevant for transfer rules.
This patch adds the required check to ensure proper validation.
Fixes: f9bc5274a6 ("net/mlx5: allow age modes combination")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The rdma-core library uses callbacks to allocate and free memory
from DPDK. The memory allocation callback used the complicated
and incorrect way to get the NUMA socket ID from the context.
The context was wrong that might result in wrong socket ID
and allocating memory from wrong node.
The callbacks are assigned once as Infinibande device context
is created allowing early access to shared DPDK memory for all
Verbs internal objects need that.
Fixes: 36dabcea78 ("net/mlx5: use anonymous Direct Verbs allocator argument")
Fixes: 2eb4d0107a ("net/mlx5: refactor PCI probing on Linux")
Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In the function rte_flow_shared_action_destroy(),
the errno ETOOMANYREFS has been replaced with EBUSY in the
commit dc328d1c55 ("ethdev: rename a flow shared action error code").
Another occurrence of ETOOMANYREFS, added later by mistake,
is replaced with EBUSY errno.
Fixes: fa7ad49e96 ("net/mlx5: fix shared RSS action update")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
PMD did not remove tunnel offload object from tunnels database before
it released the object memory. As the result, the tunnels database
become corrupted and subsequent search operations triggered PMD crash.
The patch removes tunnel offload object from the tunnels database when
the object is not in-use by PMD any more.
Fixes: bc1d90a3cf ("net/mlx5: fix build with Direct Verbs disabled")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
In mlx5 internal hash list tool, there is a log print when an entry
allocation is failed: Can't allocate hash list entry.
Some initialization checks triggers hash list registration in order to
check some capabilities. Here, the failure in registration doesn't
lead to failure in the initialization flow, that is why the log level
can be lower.
Move the entry allocation failure log to debug level.
Signed-off-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Invalid memory release order of DevX resources caused PMD crash.
1. SQ and CQ memory must be unregistered with DevX before it is freed.
2. SQ objects reference to a CQ ones. Hence, SQ should be destroyed in
advance of CQ it references to.
Fixes: 6deb19e1b2 ("net/mlx5: separate Rx queue object creations")
Fixes: 88f2e3f18c ("net/mlx5: rearrange SQ and CQ creation in DevX module")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Representor is a port in DPDK that is connected to a VF in such a way
that assuming there are no offload flows, each packet that is sent
from the VF will be received by the corresponding representor. While
each packet that is sent to a representor will be received by the VF.
This is very useful in case of SRIOV mode, where the first packet that
is sent by the VF will be received by the DPDK application which will
decide if this flow should be offloaded to the E-Switch.
Representor shares interrupts handler with host PF over the PCI
address. Therefore, after PF completes its interrupts handler
initialization, no additional actions required for representor.
Fixes: 26c08b979d ("net/mlx5: add port representor awareness")
Cc: stable@dpdk.org
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The shared RSS action update was not operational due to lack
of kernel driver support of TIR object modification.
This commit introduces the workaround to support shared RSS
action modify using an indirect queue table update instead of
touching TIR object directly.
Limitations: the only supported RSS property to update is queues, the
rest of the properties ignored.
Fixes: d2046c09aa ("net/mlx5: support shared action for RSS")
Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
If user don't set the dv_xmeta_en to 1 or 2,
in the flow_dv_convert_action_set_meta function:
- flow_dv_get_metadata_reg may return the REG_NONE,
when MLX5_METADATA_FDB enabled for metadata set action.
- reg_to_field(REG_NONE) returns MLX5_MODI_OUT_NONE,
that is invalid and rdma-core fails.
The rdma-core calltrace:
dr_action_create_modify_action
dr_actions_convert_modify_header
dr_action_modify_sw_to_hw
dr_action_modify_sw_to_hw_set
dr_ste_get_modify_hdr_hw_field
Fixes: fcc8d2f716 ("net/mlx5: extend flow metadata support")
Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Changing the allocation scheme to improve mbufs locality caused mbufs
overrun in some cases. Revert the previous replenish logic back.
Calculate a number of unused mbufs and replenish max this number of mbufs.
Mark the last 4 mbufs as fake mbufs to prevent overflowing into consumed
mbufs in the future. Keep the consumed index and the produced index 4 mbufs
apart for this purpose.
Replenish some mbufs only in case the consumed index is within the
replenish threshold of the produced index in order to retain the cache
locality for the vectorized MPRQ routine.
Fixes: 5c68764377 ("net/mlx5: improve vectorized MPRQ descriptors locality")
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
mlx5 PMD refuses to update link state if link speed is defined but
status is down or if link speed is undefined but status is up, even if
the ioctl() succeeded.
This prevents application to detect link up/down event, especially when
the link speed is not correctly detected.
Commit [1] allowed returning unknown link speed, so now PMD allows
the return of unknown link speed in the above case.
Due to some old kernel driver bug, link speed wasn't detected properly.
[1] http://git.dpdk.org/dpdk/commit/?id=810b17d116f03
Signed-off-by: Benoît Ganne <bganne@cisco.com>
Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The rte_flow_item_eth and rte_flow_item_vlan items are refined.
The structs do not exactly represent the packet bits captured on the
wire anymore.
Should use real header instead of the whole struct.
Replace the rte_flow_item_* with the existing corresponding rte_*_hdr.
Fixes: 09315fc838 ("ethdev: add VLAN attributes to ethernet and VLAN items")
Fixes: f9210259ca ("net/mlx5: fix raw encap/decap limit")
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In the experimental function rte_flow_shared_action_destroy()
introduced in DPDK 20.11, the errno ETOOMANYREFS was used.
This errno is not always available on Windows,
so it is preferred using EBUSY instead.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Existing code uses the previous API offered by rdma-core in order
to create ASO Flow Hit action.
A general API is now formally released, to create ASO action of any
type. This patch moves the MLX5 PMD code to use the formal API.
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Existing code uses the hard-coded value REG_C_5 as input for function
mlx5dv_dr_action_create_flow_hit().
This patch updates function mlx5_flow_get_reg_id() to return the
selected REG_C value for ASO Flow Hit operation.
The returned value is used, after reducing offset REG_C_0, as input
for function mlx5dv_dr_action_create_flow_hit().
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Recent patch introduced the use of ASO flow hit action for age action.
The relevant management struct uses dynamically allocated memory.
This memory was not freed on closing.
This patch adds memory freeing as needed.
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Starting ConnectX-6 Dx, the VF device ID is generic
and not per chip.
https://pci-ids.ucw.cz/v2.2/pci.ids
101e ConnectX Family mlx5Gen Virtual Function
This means that all will have the same VF device ID.
Fixes: 5fc66630be ("net/mlx5: add ConnectX6-DX device ID")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Tunnel offload API provides applications with ability to restore
packet outer headers after partial offload. Exact feature execution
depends on hardware abilities and PMD implementation. Hardware that is
supported by MLX5 PMD places a mark on a packet after partial offload.
PMD decodes that mark and provides application with required
information.
Application can call the restore API for packets that are part of
offloaded tunnel and not. It's up to a PMD to provide correct
information.
Current MLX5 tunnel offload implementation does not allow applications
to use flow MARK actions. It is restricted to tunnel offload use only.
This fault was triggered by application that did not activate tunnel
offload and called the restore API with a marked packet. The PMD tried
to decode the mark value and crashed. The patch decodes mark value
only if tunnel offload is active.
Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When creating a flow with eCPRI item, the mask and the value are both
needed in order to build the matching criteria.
In the current implementation, the unused value bits clear operation
was missed when filling the mask and value fields. For the value, the
bits not required were not masked with the mask provided. Indeed,
this action is not mandatory. But when creating a flow in the root
table, the kernel driver got involved and a check would prevent this
flow from being created. The same flow could be created successfully
with the userspace rdma-core on the non-root tables.
An AND operation needs to be added to clear the unused bits in the
value when building the matching criteria. Then the same flow can be
created successfully no matter with kernel driver or with rdma-core.
Fixes: daa38a8924 ("net/mlx5: add flow translation of eCPRI header")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The sample and mirror action objects are maintained on the list
shared between the ports belonging to the same multiport Infiniband
device(between representors).
The actions in the NIC steering domains might contain the references
to the sub-flow action objects created over the given port. The action
deletion might happen in the context of the different port and on the
deletion of referenced objects the incorrect port might be specified.
To avoid this we should save the port on what the sub-flow actions
were created and then use this saved port for sub-flow action release.
This commit saves the create device in the sample and mirror actions
struct to avoid using the incorrect port device in releasing.
Fixes: 1978414169 ("net/mlx5: make sample and mirror action thread safe")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Reviewed-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently, header reformat action uses the hash list 32-bit key
generated in header reformat register function directly. The key will
not be recalculated in the hash list function.
As the 64-bit key is composed of the 32-bit attributes and 32-bit
reformat buffer csum, the hash list function only gets 32-bit key
directly will take the attribute part only, csum part will be ignored.
For different header reformat actions, the attributes can be the same,
while the buffer will be different. Only take the attribute part causes
lots of the conflicts.
This commits adds the attribute part and the significant different csum
part for the key.
Fixes: f961fd490f ("net/mlx5: make header reformat action thread safe")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The MLX5_ENCAPSULATION_DECISION_SIZE constant is used
to check the raw encap/decap actions for the raw header
size. The header is constructed of the rte_xxx_hdr
structures instead of rte items. Hence, constant
must be defined with rte_xxx_hdr structure sizes.
Fixes: 50f576d657 ("net/mlx5: fix VLAN actions in meter")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
When the representor device was set to PF1 in bonding mode, iterating
device iterator that looking for representors by bonding device failed
to match PF0 pci address with PF1 address. So detaching PF bonding
device only detached all representors on PF0.
This patch registers all representors of PF1 with PF0 as PCI device.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The following assertion fails in case RTE_ENABLE_ASSERT is enabled:
PANIC in mlx5_tx_handle_completion():
assert "(txq->fcqs[txq->cq_ci & txq->cqe_m] >> 16)
== cqe->wqe_counter" failed
The free completion queue only contains an expected WQE counter if
RTE_LIBRTE_MLX5_DEBUG is enabled as well. Thus enabling
RTE_ENABLE_ASSERT alone causes the assert to fail.
Compile the assert conditionally only if RTE_ENABLE_ASSERT is enabled.
Fixes: 0afacb04f5 ("common/mlx5: remove NDEBUG")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The kernel can use two approaches to distinguish the E-Switch
source vport in the packet metadata - either with dedicated
source_port field or register C0. To eliminate the extra source
vport matching in the hardware the source_port field can be
set to specific values (0xFFFF) for the wire source port.
This match can be applied to recognize wire port only in FDB
domain. Missing the register C0 match in the NIC Rx domain causes
incorrect representor steering within shared IB device ports
and must be always specified (if kernel uses this approach).
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Three bugs in rx_queue_count function:
- One entry may contain several segments, so 'used' must be multiplied
by number of segments per entry to properly reflect the queue usage.
- The number of cqes is equals to (1U << rxq->elts_n) - 1 in SPRQ mode.
The range returned by rx_queue_count should be the number of entries
used in queue, so it ranges from 0 to max number of entries
in queue, not this number minus one.
- For MPRQ mode, we need to take into account of the number of strd.
Fixes: 8788fec1f2 ("net/mlx5: implement descriptor status API")
Cc: stable@dpdk.org
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The commit d2d5760552 ("net/mlx5: fix Rx queue count calculation") is
incorrect because the count calculation is wrong for the next cqe:
Example:
Compressed Set of packets 1 | Compressed Set of packets 2
C | a | e0 | e1 | e2 | e3 | e4 | e5 | C | a | e0
There are 2 compressed set of packets in the first queue. For the first
set, n is computed correctly.
But for the second, n is not computed properly. Because the zip context
is for the first set. The second set is not yet decompressed, so
there are no context.
To fix the issue, we should only use the zip context for the first CQEs
series.
Fixes: d2d5760552 ("net/mlx5: fix Rx queue count calculation")
Cc: stable@dpdk.org
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
When the RSS queues' types are not uniformed, i.e, mixed with normal Rx
queue and hairpin queue, PMD accept this flow after commit[1] instead of
rejecting it.
This because commit[1] creates Rx queue object as DevX type via DevX API
instead of IBV type via Verbs, in which the latter will check the queues'
type when creating Verbs ind table but the former doesn't check when
creating DevX ind table.
However, in any case, logically PMD should check whether the input
configuration of RSS action is reasonable or not, which should
include queues' type check as well as the others.
So add the check of RSS queues' type in validation function to fix issue.
[1]:
commit 6deb19e1b2 ("net/mlx5: separate Rx queue object creations")
Fixes: 63bd16292c ("net/mlx5: support RSS on hairpin")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Received packets can be aligned to the size of the cache line on
PCI transactions. This could improve performance by avoiding
partial cache line writes in exchange for increased PCI bandwidth.
This feature is supposed to be controlled by the rxq_pkt_pad_en
devarg and it is true for an RxQ created via the Verbs API.
But in the DevX API case, it is erroneously controlled by the
rxq_cqe_pad_en devarg instead, which is in charge of the CQE
padding instead and should not control the RxQ creation.
Fix DevX RxQ creation by using the proper configuration flag for
Rx packet padding that is being set by the rxq_pkt_pad_en devarg.
Fixes: dc9ceff73c ("net/mlx5: create advanced RxQ via DevX")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The new flow table resource management API triggered a PMD crash in
tunnel offload mode, when tunnel match flow rule was inserted before
tunnel set rule.
Reason for the crash was double flow table registration. The table was
registered by the tunnel offload code for the first time and once
more by PMD code, as part of general table processing. The table
counter was decremented only once during the rule destruction and
caused a resource leak that triggered the crash.
The patch updates PMD registration with tunnel offload parameters and
removes table registration in tunnel related code.
Fixes: afd7a62514 ("net/mlx5: make flow table cache thread safe")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The original patch was removing active tunnel offload objects from a
tunnels db list without checking its reference counter value.
That action was leading to a PMD crash.
Current patch isolates tunnels db list into a separate API. That API
manages MT protection of the tunnel offload db.
Fixes: 5b38d8cd46 ("net/mlx5: make tunnel hub list thread safe")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The original patch allocated tunnel offload objects with invalid
indexes. As the result, PMD tunnel object allocation failed.
In this patch indexed pool provides both an index and memory for a new
tunnel offload object.
Also tunnel offload ipool moved to dv enabled code only.
Fixes: 4ae8825c50 ("net/mlx5: use indexed pool as id generator")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Tunnel offload implementation introduced 64 bit-field flow_grp_info
structure. Since the structure size is 64 bits, the code passed that
type by value in function calls.
The patch changes that structure passing method to reference.
Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Tunnel offload API is implemented for Direct Verbs environment only.
Current patch re-arranges tunnel related functions for compilation in
non Direct Verbs setups to prevent compilation failures. The patch
does not introduce new functions.
Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Bonding adjustment is done only when DEVX_PORT is supported in the
rdma-core.
Some bonding condition was done even when DEVX_PORT is not supported.
Remove it.
Fixes: 2eb4d0107a ("net/mlx5: refactor PCI probing on Linux")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In ASO age pools resize, the PMD starts ASO data-path.
When starting ASO data-path is failed, the pools memory was not freed
what caused a memory leak.
Free it.
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In Rx queue creation, there is a validation for the Rx configuration.
When scatter offload validation for buffer split is failed, the Rx queue
object memory was not freed what caused a memory leak.
Free it.
Fixes: a0a45e8af7 ("net/mlx5: configure Rx queue for buffer split")
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The RSS flow expansion get a memory buffer to fill the new patterns of
the expanded flows.
This memory management saves the next address to write into the buffer
in a dedicated variable.
The calculation for the next address was wrongly also done when all the
patterns were ready.
Remove it.
Fixes: 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
If xmedata mode 1 enabled and create a flow with RSS and mark action,
there was an error that rdma-core failed to create RQT due to wrong
queue definition. This was due to mixed flow creation in thread specific
flow workspace.
This patch introduces nested flow workspace(context data), each flow
uses dedicate flow workspace, pop and restore workspace when nested flow
creation done, the original flow with continue with original flow
workspace. The total number of thread specific flow workspace should be
2 due to only one nested flow creation scenario so far.
Fixes: 8bb81f2649 ("net/mlx5: use thread specific flow workspace")
Fixes: 3ac3d8234b ("net/mlx5: fix index when creating flow")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
mlx_steering_dump_parser.py tool failed to dump flow due to socket file
name changed.
Change socket file name back to make it consistent.
Fixes: e4b7b8d082 ("common/mlx5: fix PCI driver name")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently, the counter offset support is discovered by creating the
rule with invalid offset counter and jump action in root table. If
the rule creation fails with EINVAL errno, that mean counter offset
is not supported in root table.
However, jump action may not be supported in some rdma-core version.
In this case, the discover code will not work properly.
This commits changes the jump action to generic drop action. That
makes the discover code to be more compatible.
Fixes: 994829e695 ("net/mlx5: remove single counter container")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
In the implementation of mlx5_hairpin_unbind, a copy-paste error was
inside. If a single peer Rx port needed to be unbound, it would be
bound again by mistake.
All the hardware resources were released when stopping the device and
no mess of the configuration was introduced. But when trying to unbind
the ports again, the issue would appear.
The typo of the function call is fixed. If there is no hairpin queue
bound between two ports, the unbinding process should be considered
successful.
Fixes: 37cd4501e8 ("net/mlx5: support two ports hairpin mode")
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Currently PMD only accept flow which item_mpls directly follow item_gre,
means to match the GRE header without GRE optional field key in MPLSoGRE
encapsulation.
However, for the MPLSoGRE, the GRE header could have the optional field
(i.e, key) according to the RFC. So PMD need to accept this.
Add MLX5_FLOW_LAYER_GRE_KEY into allowed prev_layer to fix
Fixes: a7a0365565 ("net/mlx5: match GRE key and present bits")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Based on the specification, eCPRI can only follow ETH (VLAN) layer
or UDP layer. When creating a flow with eCPRI item, this should be
checked and invalid layout of the layers should be rejected.
Fixes: c7eca23657 ("net/mlx5: add flow validation of eCPRI header")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The number of descriptors configured is returned to a user
via the rxq_info_get API. This number is incorrect for MPRQ.
For SPRQ this number matches the number of mbufs allocated.
For MPRQ we have fewer external MPRQ buffers that can hold
multiple packets in strides of this big buffer. Take that
into account and return the number of MPRQ buffers multiplied
by the number of strides in this case.
Fixes: 26f1bae837 ("net/mlx5: add Rx/Tx burst mode info")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
There is a performance penalty for the replenish scheme
used in vectorized Rx burst for both MPRQ and SPRQ.
Mbuf elements are being filled at the end of the mbufs
array and being replenished at the beginning. That leads
to an increase in cache misses and the performance drop.
The more Rx descriptors are used the worse the situation.
Change the allocation scheme for vectorized MPRQ Rx burst:
allocate new mbufs only when consumed mbufs are almost
depleted (always have one burst gap between allocated and
consumed indices). Keeping a small number of mbufs allocated
improves cache locality and improves performance a lot.
Unfortunately, this approach cannot be applied to SPRQ Rx
burst routine. In MPRQ Rx burst we simply copy packets from
external MPRQ buffers or attach these buffers to mbufs.
In SPRQ Rx burst we allow the NIC to fill mbufs for us.
Hence keeping a small number of allocated mbufs will limit
NIC ability to fill as many buffers as possible. This fact
offsets the advantage of better cache locality.
Fixes: 0f20acbf5e ("net/mlx5: implement vectorized MPRQ burst")
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
As shared RSS action will be shared by multiple flows, the action
is created as global standalone action and managed only by the
relevant shared action management functions.
Currently, hrxqs will be created by shared RSS action or general
queue action. For hrxqs created by shared RSS action, they should
also only be released with shared RSS action. It's not correct to
release the shared RSS action hrxqs as general queue actions do
in flow destroy.
This commit adds a new fate action type for shared RSS action to
handle the shared RSS action hrxq release correctly.
Fixes: e1592b6c4d ("net/mlx5: make Rx queue thread safe")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
For transfer flow with meter, packet was passed without applying flow
action. The group level was multiplied by 10 for group level 65531.
This patch fixes this issue by correcting suffix table group level
calculation.
Fixes: 3e8f3e51fd ("net/mlx5: fix meter table definitions")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The Tx queue completion production index was not reset
on Tx queue stop and there were completions remaining
from the previous queue run. This caused the wrong
completion queue operating and overall Tx queue malfunction
on queue restart.
Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The Rx queue completion consumer index got temporary
wrong value pointing to the midst of the compressed CQE
session. If application crashed at the moment the next
queue restart caused handling wrong CQEs pointed by index
and losing consuming index synchronization, that made
reliable queue restart impossible.
Fixes: 88c0733535 ("net/mlx5: extend Rx completion with error handling")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The entry variable assert in the mlx5_hlist_register() function is not
correct. Remove the invalid entry variable.
Fixes: e69a59227d ("net/mlx5: support concurrent access for hash list")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Recent patch uses a local string array as input for function
rte_flow_error_set().
This stack memory may be later used by other code sections,
overwriting the desired error string.
This patch implements an error string for the specific case
requested, of ICMP item not supported in Verbs flow engine.
Fixes: d51475d1bf ("net/mlx5: support item type error message in flow Verbs")
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
mlx5 PMD split the sampling flow into prefix flow and suffix
flow. On the sample action translation function, the scaled
group value of suffix flow be attached into sample object and
saved into sample resource.
mlx5 PMD fetched the group value from the sample resource to
create the suffix flow. On the mlx5_flow_group_to_table
function the group value of suffix flow was scaled with table
factor again and translated into HW table. That caused the
incorrect group value of sample suffix flow.
The fix introduces a 'skip_scale' flag and sets it to 1 for the
sample suffix flow creation. On the mlx5_flow_group_to_table
function skips the scale with table factor to use the correct
group value.
Fixes: 4ec6360de3 ("net/mlx5: implement tunnel offload")
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
In the bonding configurations the port switch id for representors was
composed of pf index in bonding as the 1 MSB and the representor's index
as the remaining 15 LSBs. The special corner case for the host PF
representor on BF setups with representor id 0xFFFF was missed as well.
The new switch port id consists of 4 MSBs for the pf bonding index and
the remaining 12 LSBs for the representor index. The switch port id
ranges for each type of representors are as follows:
Uplink representor(AKA master): 0xFFFF
Host PF representor: 0x<pf_bond>FFF
VF representor: 0x<pf_bond>[0-FFE]
Fixes: bee57a0a35 ("net/mlx5: update switch port id in bonding configuration")
Cc: stable@dpdk.org
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Recent patch introduced a new SQ for ASO flow hit management.
This SQ uses two WQEBB's for each WQE.
The SQ producer index is 16 bits wide.
The enqueue loop posts new WQEs to the ASO SQ, using WQE index for
the SQ management.
This 16 bits index multiplied by 2 was wrongly used also for SQ
doorbell ringing.
The multiplication caused the SW index overlapping to be out of sync
with the hardware index, causing it to get stuck.
This patch separates the WQE index management from the doorbell index
management.
So, for each WQE index incrementation by 1, the doorbell index is
incremented by 2.
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The input header of a RTE flow item is with network byte order. In
the host with little endian, the bit field order are the same as the
byte order.
When checking the eCPRI message type, the wrong field will be selected.
Fixing to use correct field.
Fixes: daa38a8924 ("net/mlx5: add flow translation of eCPRI header")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The Tx queue stop API doesn't call the PMD callback when the state of
the queue is stopped.
The drivers should update the state to be stopped when the queue stop
callback is done successfully or when the port is stopped.
The drivers should update the state to be started when the queue start
callback is done successfully or when the port is started.
The driver wrongly didn't update the state as started when the port
start callback was done which kept the state as stopped.
Following call to a queue stop API was not completed by ethdev layer
because the state is already stopped.
Move the state update from the Tx queue setup to the port start
callback.
Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The Txq refcnt 1 value means that there is no real reference to the
queue and only the control configurations are saved in the struct.
The patch below wrongly didn't consider it and caused a leak in the Txq
object resource.
Revert the specific update in the refcnt.
Fixes: b5c8b3e70c ("net/mlx5: use C11 atomics for RxQ/TxQ refcounts")
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
mlx5 PMD created the MR (Memory Region) resource on the
mlx5_dma_map call to make the memory available for DMA
operations. On the mlx5_dma_unmap call the MR resource
was not freed but inserted to MR Free list for further
garbage collection.
Actual MR resource destroying happened on device stop
call. That caused the runtime out of memory in case of
application performed multiple DMA map/unmap calls.
The fix immediately frees the MR resource on mlx5_dma_unmap
call not engaging the list. The export for mlx5_mr_free
function from common PMD part is added as well.
Fixes: 989e999d93 ("net/mlx5: support PCI device DMA map and unmap")
Cc: stable@dpdk.org
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Fix in error flow in which the function
mlx5_txq_release_devx_sq_resources is called twice by setting the
release object to NULL after the first call
The incorrect flow was introduced in the work done on generic
object creation.
Once an error flow inside mlx5_txq_create_devx_sq_resources
occurs the function will call mlx5_txq_release_devx_sq_resources
however the released pointers are not set to NULL after the release
calls and undefined memory is released in the same call in
mlx5_txq_release_devx_resources.
This results in calls to MLX5_FREE with
an already released memory addresses and assert in mlx5_release_dbr:
EAL: Error: Invalid memory
EAL: Error: Invalid memory
PANIC in mlx5_txq_release_devx_sq_resources():
assert "(mlx5_release_dbr(&txq_obj->txq_ctrl->priv->dbrpgs,
mlx5_os_get_umem_id (txq_obj->sq_dbrec_page->umem),
txq_obj->sq_dbrec_offset)) == 0" failed
The fix is setting the released pointers to NULL after the first release
calls.
Fixes: 86d259cec8 ("net/mlx5: separate Tx queue object creations")
Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The space for extra buffer pointers used by MPRQ routines was not
allocated in Rx queue object creation structure causing memory
corruption.
The fix allocates the extra memory for the pointers in case MPRQ is
engaged.
Fixes: a0a45e8af7 ("net/mlx5: configure Rx queue for buffer split")
Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The dedicated UAR was allocated for the ASO queues.
The shared UAR created for Tx queues can be used instead.
Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch introduces the routine to allocate the UAR (User
Access Region) with various memory mapping types. The origin
patch being fixed provided the UAR allocation workaround
for the mlx5 net PMD only. As it was found the other mlx5
based drivers - vdpa and regex are affected by the issue
as well and must be fixed.
Fixes: a0bfe9d56f ("net/mlx5: fix UAR memory mapping type")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>