numam-dpdk

Author	SHA1	Message	Date
Suanming Mou	994829e695	net/mlx5: remove single counter container A flow counter which was allocated by a batch API couldn't be assigned to a flow in the root table (group 0) in old rdma-core version. Hence, a root table flow counter required PMD mechanism to manage counters which were allocated singly. Currently, the batch counters have already been supported in root table includes a new rdma-core version with MLX5_FLOW_ACTION_COUNTER_OFFSET enum and with a kernel driver includes MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX_OFFSET enum. When the PMD uses rdma-core API to assign a batch counter to a root table flow using invalid counter offset, it should get an error only if the batch counter assignment for root table is supported. Using this trial in the initialization time can help to detect the support. Using the above trial, if the support is valid, remove the management of single counter container in the fast counter mechanism. Otherwise, move the counter mechanism to fallback mode. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	6b7c717ed1	net/mlx5: locate aging pools in the general container Commit [1] introduced different container for the aging counter pools. In order to save container memory the aging counter pools can be located in the general pool container. This patch locates the aging counter pools in the general pool container. Remove the aging container management. [1] commit `fd143711a6` ("net/mlx5: separate aging counter pool range") Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Matan Azrad	8dc775d8b1	net/mlx5: fix event queue number query When a Rx\Tx queue is created by DevX, its CQ configuration should include the EQ number of the interrupts. The EQ is managed by the kernel and there is a glue API in order to query the EQ number from the kernel. The EQ query API gets a vector number specifies the kernel vector of the interrupt handling. The vector number was wrongly detected according to the configuration CPU instead of using the device attributes of the supported vectors. The CPU was wrongly detected by the rte_lcore_to_cpu_id API without any check, and in case of non-EAL thread context the value was 0xFFFFFFFF which caused a failure in the EQ number query API. Use vector 0 for each EQ number query which must be supported by the kernel. Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Fixes: `d133f4cdb7` ("net/mlx5: create clock queue for packet pacing") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 22:29:24 +01:00
Thomas Monjalon	8a5a0aad5d	ethdev: allow close function to return an error The API function rte_eth_dev_close() was returning void. The return type is changed to int for notifying of errors. If an error happens during a close operation, the status of the port is undefined, a maximum of resources having been freed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Liron Himi <lironh@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Jiawei Wang	00c10c2211	net/mlx5: update translate function for mirroring Translate the attribute of sample action that include sample ratio and sub actions list. PMD will check the destination action number in current flow, if found multiple destination actions, then create the new destination array rdma action that group actions for each destination. Currently only support port or queue for destination action, and only encap action can be attached into one port destination. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	b4c0ddbfcc	net/mlx5: split sample flow into two sub-flows The flow with sample action will be split into two sub flows: the prefix sub flow with the all actions preceding the sample action and sample action itself, and the suffix sub flow with the actions following the sample action. The original items remain in the prefix sub flow, add the implicit tag action with unique id to set in metadata register, and suffix sub flow uses the tag item to match with that unique id. The flow split as below: Original flow: items / actions pre / sample / actions sfx -> prefix sub flow - items / actions pre / set_tag action / sample suffix sub flow - tag_item / actions sfx Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Michael Baum	e7055bbfbe	net/mlx5: reposition event queue number field The eqn field has become a field of sh directly since it is also relevant for Tx and Rx. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Thomas Monjalon	b142387b07	ethdev: allow drivers to return error on close The device operation .dev_close was returning void. This driver interface is changed to return an int. Note that the API rte_eth_dev_close() is still returning void, although a deprecation notice is pending to change it as well. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:13 +02:00
Ophir Munk	1f66ac5bbe	net/mlx5: remove more Direct Verbs dependencies Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced with 'void ' to enable compilation under non-Linux operating systems. New getter functions were added to retrieve the specific fields that were previously accessed directly. Replaced structs: 'struct mlx5dv_pp ' 'struct mlx5dv_devx_event_channel ' 'struct mlx5dv_devx_umem ' 'struct mlx5dv_devx_uar *' Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	f00f6562e1	net/mlx5: remove netlink dependency in shared code This commit adds Linux implementation of routine mlx5_os_mac_addr_flush as wrapper to Netlink API to avoid direct calls under non-Linux operating systems. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	f2e8b4556f	net/mlx5: remove unused includes Remove unused Linux included files: <sys/ioctl.h>, <arpa/inet.h> from file net/mlx5/mlx5_mac.c <sys/mman.h> from file net/mlx5/mlx5.c Fixes: `771fa900b7` ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters") Cc: stable@dpdk.org Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	e9c0b96e35	net/mlx5: move Linux ifname function mlx5_get_ifname() prototype includes 'IF_NAMESIZE' definition from Linux file net/if.h. Since this API is only used under Linux and to enable compilation under non-Linux OS - move this prototype from shared file mlx5.h to file linux/mlx5_os.h. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Viacheslav Ovsiienko	a0bfe9d56f	net/mlx5: fix UAR memory mapping type The User Access Region is a special mechanism to provide direct access to the hardware registers, and is the part of PCI address space that is mapped to CPU virtual address. The mapping can be performed with the type "Write-Combining" or "Non-Cached", and these ones might be supported or not on different setups. To prevent device probing failure the UAR allocation attempt with alternative mapping type is performed. The datapath takes the actual UAR mapping into account on queue creation. There was another issue with NULL UAR base address. OFED 5.0.x and Upstream rdma_core before v29 returned the NULL as UAR base address if UAR was not the first object in the UAR page. It caused the PMD failure and we should try to get another UAR till we get the first one with non-NULL base address returned. Fixes: `fc4d4f732b` ("net/mlx5: introduce shared UAR resource") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	f4a0873197	net/mlx5: optimize critical section in device free When PMD releases shared IB device context, It locks the mlx5_ibv_list_mutex lock throughout the function so that it does not happen while removing a device from the list, another process will try to insert another device into it. On the other hand, having removed the device from the list even if it has not yet released all of its resources, it should not care about other processes and can release the lock. However, the PMD does not release the lock even though it can, and performs a number of operations, some of which include sleep and may be long. To improve this, shorten the lock time to the minimum necessary. Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Parav Pandit	392bf9084d	common/mlx5: register class drivers through common layer Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-28 19:01:30 +02:00
Parav Pandit	8208800163	common/mlx5: avoid class constructor priority mlx5_common is shared library between mlx5 net, VDPA and regex PMD. It is better to use common initialization helper instead of using RTE_PRIORITY_CLASS priority. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-07-28 18:52:11 +02:00
Dekel Peled	08d1838f64	net/mlx5: implement CQ for Rx using DevX API This patch continues the work to use DevX API for different objects creation and management. On Rx control path, the RQ, RQT, and TIR objects can already be created using DevX API. This patch adds the support to create CQ for RxQ using DevX API. The corresponding event channel is also created and utilized using DevX API. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	9d60f54569	common/mlx5: remove inclusion of Verbs header files Several source files include Verbs header files as in (1). These source files will not compile under non-Linux operating systems. This commit removes this inclusion in two cases: Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source file so the inclusion in (1) can be safely removed. Case 2: Verbs symbols are used. Please note the inclusion in (1) already appears in file linux/mlx5_glue.h (which represents the interface to the rdma-core library). Therefore, replace (1) in the source file with (2). Under non-Linux operating systems - file mlx5_glue.h will not include (1). (1) #include <infiniband/verbs.h> #include <infiniband/mlx5dv.h> (2) #include <mlx5_glue.h> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	2e86c4e5c7	net/mlx5: refactor multi-process communication 1. The shared data communication between the primary and the secondary processes is implemented using Linux API. Move the Linux API code under linux directory (file linux/mlx5_os.c). 2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved under linux (new file linux/mlx5_mp_os.c). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	50f95b23c9	net/mlx5: add option to configure FCS or decapsulation There are some limitations on some NICs (at least on ConnectX-6 Dx and BlueField 2) with supporting FCS (frame checksum) scattering for the tunnel decapsulated packets. For the case only one of the features can be supported in the same time, and the new devarg "decap_en" is introduced to provide the choice to the users. If FCS scattering feature is not supposed to be engaged by application, this new devarg should be specified as "decap_en=0", forcing the FCS feature enable and rejecting tunnel decap actions in the rte_flow engine. If FCS scatter is not needed and application supposes to use tunnel decapsulation in rte_flow, the devarg can be omitted or set to non-zero value (this is default settings). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	2175c4dc62	net/mlx5: convert configuration objects to unified malloc This commit allocates the miscellaneous configuration objects from the unified malloc function. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	83c2047c5f	net/mlx5: convert control path memory to unified malloc This commit allocates the control path memory from unified malloc function. The objects be changed: 1. hlist; 2. rss key; 3. vlan vmwa; 4. indexed pool; 5. fdir objects; 6. meter profile; 7. flow counter pool; 8. hrxq and indirect table; 9. flow object cache resources; 10. temporary resources in flow create; Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Suanming Mou	5522da6b20	net/mlx5: add option to allocate memory from system Currently, for MLX5 PMD, once millions of flows created, the memory consumption of the flows are also very huge. For the system with limited memory, it means the system need to reserve most of the memory as huge page memory to serve the flows in advance. And other normal applications will have no chance to use this reserved memory any more. While most of the time, the system will not have lots of flows, the reserved huge page memory becomes a bit waste of memory at most of the time. By the new sys_mem_en devarg, once set it to be true, it allows the PMD allocate the memory from system by default with the new add mlx5 memory management functions. Only once the MLX5_MEM_RTE flag is set, the memory will be allocate from rte, otherwise, it allocates memory from system. So in this case, the system with limited memory no need to reserve most of the memory for hugepage. Only some needed memory for datapath objects will be enough to allocated with explicitly flag. Other memory will be allocated from system. For system with enough memory, no need to care about the devarg, the memory will always be from rte hugepage. One restriction is that for DPDK application with multiple PCI devices, if the sys_mem_en devargs are different between the devices, the sys_mem_en only gets the value from the first device devargs, and print out a message to warn that. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	d7c49561d3	net/mlx5: add eCPRI flex parser capacity check If the NIC or the FW does not support the dynamic flex parser, it will return error when trying to create the parser for eCRPI. Then it is hard to know the detail error reason of the failure. Before creating the parser node and the following usage of the parser, the capacity bit saved in the HCA_CAP could be used to confirm if the dynamic flex parser is supported. If no, an error will be returned directly with ENOTSUP to prevent the following steps to be executed. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	1c5064044f	net/mlx5: create and destroy eCPRI flex parser eCPRI protocol has unified format layout for the variants, over ETH layer (including .1Q) and UDP layer. The common header of the message has 4 bytes fixed length, and the message payload layers are different based on the type field. Now only type #0, #2 and #5 will be supported, and 2 bytes are needed. When creating the flex parser, the header will be extended to 8 bytes and 2 DW samples are needed. The 1st DW starts from offset 0 and will be used for the type field of the common header. The 2nd DW starts from offset 4 and will be used for the physical channel ID, real-time control ID or measurement ID fields. The parser will be created once a flow with eCPRI item is observed for the first time. After creating, it will remain in the system and HW until the device is stopped. Right now, there is no need to destroy the eCPRI flex parser after the last flow with eCPRI item is destroyed. This is to get rid of the alternate states of creating and destroying eCPRI flex parser with a single eCPRI flow. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	daa38a8924	net/mlx5: add flow translation of eCPRI header In the translation stage, the eCPRI item should be translated into the format that lower layer driver could use. All the fields that need to match must be in network byte order after translation, as well as the mask. Since the header in the item belongs to the network layers stack, and the input parameter of the header is considered to be in big-endian format already. Base on the definition in the PRM, the DW samples will be used for matching in the FTE/STE. Now, the type field and only the PC ID, RTC ID, and DLY MSR ID of the payload will be supported. The masks should be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are needed to support such matching. The mask fields could be zeros to support some wildcard rules. But it makes no sense to support the rule matching only on the payload but without matching type field. The DW samples should be stored after the flex parser creation for eCPRI. There is no need to query the sample IDs each time when creating a flow rule with eCPRI item. It will not introduce insertion rate degradation significantly. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	d133f4cdb7	net/mlx5: create clock queue for packet pacing This patch creates the special completion queue providing reference completions to schedule packet send from other transmitting queues. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	fc4d4f732b	net/mlx5: introduce shared UAR resource This is preparation step before moving the Tx queue creation to the DevX approach. Some features require the shared UAR for Tx queues and scheduling completion queues, the patch manages the shared UAR. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	24feb04596	net/mlx5: fix UAR lock sharing for multiport devices The master and representors might be created over the multiport Infiniband devices and the UAR resource allocated for sibling ports might belong to the same underlying Infiniband device. Hardware requires the write access to the UAR must be performed as atomic 64-bit write, on 32-bit systems this is two sequential writes, protected by lock. Due to possibility to share the same UAR between sibling devices the locks must be moved to shared context. Fixes: `f048f3d479` ("net/mlx5: switch to the shared IB device context") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	8f848f32fc	net/mlx5: introduce send scheduling devargs This patch introduces the new devargs: tx_pp - enables accurate packet send scheduling on mbuf timestamps in the PMD. On the device start if "rte_dynflag_timestamp" dynamic flag is registered and this devarg non-zero value is specified, the driver initializes all necessary internal infrastructure to provide packet scheduling. The parameter value specifies scheduling granularity in nanoseconds. tx_skew - the parameter adjusts the send packet scheduling on timestamps and represents the average delay between beginning of the transmitting descriptor processing by the hardware and appearance of actual packet data on the wire. The value should be provided in nanoseconds and is valid only if tx_pp parameter is specified. The default value is zero. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Ali Alnubani	28c9a7d7b4	net/mlx5: add ConnectX-6 Lx device ID This adds the ConnectX-6 Lx device id to the list of supported Mellanox devices that run the MLX5 PMD. The device is still in development stage. Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Acked-by: Raslan Darawsheh <rasland@mellanox.com>	2020-07-11 06:18:53 +02:00
Jerin Jacob	9c99878aa1	log: introduce logtype register macro Introduce the RTE_LOG_REGISTER macro to avoid the code duplication in the logtype registration process. It is a wrapper macro for declaring the logtype, registering it and setting its level in the constructor context. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-03 15:52:51 +02:00
Shiri Kuzin	0f0ae73a32	net/mlx5: add parameter for LACP packets control The new devarg will control the steering of the lacp traffic. When setting dv_lacp_by_user = 0 the lacp traffic will be steered to kernel and managed there. When setting dv_lacp_by_user = 1 the lacp traffic will not be steered and the user will need to manage it. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Ori Kam	262c7ad0dd	common/mlx5: move doorbell record from net driver The creation of DBR can be used by a number of different Mellanox PMDs. for example RegEx / Net / VDPA. This commits moves the DBR creation and release functions to common folder. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-30 14:52:30 +02:00
Ophir Munk	391b8bcc81	common/mlx5: move some getter functions from net driver Getter functions such as: 'mlx5_os_get_ctx_device_name', 'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name', 'mlx5_os_get_umem_id' are implemented under net directory. To enable additional devices (e.g. regex, vdpa) to access these getter functions they are moved under common directory. As part of this commit string sizes DEV_SYSFS_NAME_MAX and DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination string size in strncpy() function is bigger than the source string size. This update will avoid GCC version 8 error -Werror=stringop-truncation. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	ac79183dc6	net/mlx5: optimize free counter lookup Currently, when allocate a new counter, it needs loop the whole container pool list to get a free counter. In the case with millions of counters allocated, and all the pools are empty, allocate the new counter will still need to loop the whole container pool list first, then allocate a new pool to get a free counter. It wastes the cycles during the pool list traversal. Add a global free counter list in the container helps to get the free counters more efficiently. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	b1cc226644	net/mlx5: optimize single counter pool search For single counter, when allocate a new counter, it needs to find the pool it belongs in order to do the query together. Once there are millions of counters allocated, the pool array in the counter container will become very large. In this case, the pool search from the pool array will become extremely slow. Save the minimum and maximum counter ID to have a quick check of current counter ID range. And start searching the pool from the last pool in the container will mostly get the needed pool since counter ID increases sequentially. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
Suanming Mou	632f0f1905	net/mlx5: manage shared counters in three-level table Currently, to check if any shared counter with same ID existing, it will have to loop the counter pools to search for the counter. Even add the counter to the list will also not so helpful while there are thousands of shared counters in the list. Change Three-Level table to look up the counter index saved in the relevant table entry will be more efficient. This patch introduces the Three-level table to save the ID relevant counter index in the table. Then the next while the same ID comes, just check the table entry of this ID will get the counter index directly. No search will be needed. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
David Marchand	63783b0172	net/mlx5: remove redundant newline from logs The DRV_LOG macro already appends a newline. Fixes: `46287eacc1` ("net/mlx5: introduce hash list") Fixes: `860897d289` ("net/mlx5: reorganize flow tables with hash list") Fixes: `e484e40323` ("net/mlx5: optimize tag traversal with hash list") Fixes: `6801116688` ("net/mlx5: fix multiple flow table hash list") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com>	2020-06-30 14:52:29 +02:00
Ophir Munk	d5ed8aa944	net/mlx5: add memory region callbacks in per-device cache Prior to this commit MR operations were verbs based and hard coded under common/mlx5/linux directory. This commit enables upper layers (e.g. net/mlx5) to determine which MR operations to use. For example the net layer could set devx based MR operations in non-Linux environments. The reg_mr and dereg_mr callbacks are added to the global per-device MR cache 'struct mlx5_mr_share_cache'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-17 16:32:01 +02:00
Ophir Munk	042f5c94fd	net/mlx5: refactor device operations for Linux There are three types of eth_dev_ops: primary, secondary and isolate. Their function calls assignments are moved from common file mlx5.c to the Linux specific file linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	9138989036	net/mlx5: rename ib in names Renames in this commit: mlx5_ibv_list -> mlx5_dev_ctx_list mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx mlx5_ibv_shared_port -> mlx5_dev_shared_port ibv_port -> dev_port Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Gregory Etelson	5c76123810	net/mlx5: fix flow memory allocation size In DV enabled MLX5 PMD build mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size was initiated for DV structure. If RTE initialization encountered MLX5 PCI function with disabled DV support mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size was reduced to match legacy verbs flow size. Since mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW] is a global variable that change reflected on DV enabled MLX5 PCI functions too. Running flow with invalid ipool size crashes PMD. The patch adjusts ipool flow size for each active PCI function. Fixes: `b88341ca35` ("net/mlx5: convert flow dev handle to indexed") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	834a9019ec	net/mlx5: remove Verbs dependency in spawn struct 1. Replace 'struct ibv_device ' with 'void ' in 'struct mlx5_dev_spawn_data'. Define a getter function to retrieve the device name. 2. Rename ibv_dev and ibv_port as phys_dev and phys_port respectively. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	10f3581dfd	net/mlx5: add Linux-specific header file File drivers/net/linux/mlx5_os.h is added. It includes specific Linux definitions such as PCI driver flags, link state changes interrupts, link removal interrupts, etc. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	2eb4d0107a	net/mlx5: refactor PCI probing on Linux Refactor PCI probing related code. Move Linux specific functions (as well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	c7f6ba0e53	net/mlx5: remove umem field dependency on Direct Verbs umem field is used in several structs. Its type 'struct mlx5dv_devx_umem ' is changed to 'void '. This change will allow non-Linux OS compilations. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	e85f623e13	net/mlx5: remove attributes dependency on Verbs Define 'struct mlx5_dev_attr' which is ibv and dv independent. It contains attribute that were originally contained in 'struct ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API mlx5_os_get_dev_attr() which fills in the new defined struct. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	f44b09f9e3	net/mlx5: add Linux-specific file with getter functions 'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct ibv_context ' to 'void '. 'ctx' members which are verbs dependent (e.g. device_name) will be accessed through getter functions which are added to a new file under Linux directory: linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	6e88bc42c7	net/mlx5: rename Verbs shared object Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Suanming Mou	a1da6f624c	net/mlx5: add reclaim memory mode Currently, when flow destroyed, some memory resources may still be kept as cached to help next time create flow more efficiently. Some system may need the resources to be more flexible with flow create and destroy. After peak time, with millions of flows destroyed, the system would prefer the resources to be reclaimed completely, no cache is needed. Then the resources can be allocated and used by other components. The system is not so sensitive about the flow insertion rate, but more care about the resources. Both DPDK mlx5 PMD driver and the low level component rdma-core have provided the flow resources to be configured cached or not, but there is no APIs or parameters exposed to user to configure the flow resources cache mode. In this case, introduce a new PMD devarg to let user configure the flow resources cache mode will be helpful. This commit is to add a new "reclaim_mem_mode" to help user configure if the destroyed flows' cache resources should be kept or not. Their will be three mode can be chosen: 1. 0(none). It means the flow resources will be cached as usual. The resources will be cached, helpful with flow insertion rate. 2. 1(light). It will only enable the DPDK PMD level resources reclaim. 3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be configured as reclaimed mode. With these three mode, user can configure the resources cache mode with different levels. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-03 17:19:26 +02:00
Suanming Mou	33860cfab6	net/mlx5: fix interrupt installation timing Currently, the DevX counter query works asynchronously with Devx interrupt handler return the query result. When port closes, the interrupt handler will be uninstalled and the Devx comp obj will also be destroyed. Meanwhile the query is still not cancelled. In this case, counter query may use the invalid Devx comp which has been destroyed, and query failure with invalid FD will be reported. Adjust the shared interrupt install and uninstall timing to make the counter asynchronous query stop before interrupt uninstall. Fixes: `f15db67df0` ("net/mlx5: accelerate DV flow counter query") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Suanming Mou	2786b7bf90	net/mlx5: fix secondary process resources release When secondary process starts, it will allocate its own process private data, and also does remap to UAR register of the Tx queue. Once the secondary process exits, these resources should be released accordingly. And the shared resources owned by primary should not be touched. Currently, once one port in the secondary process spawn failed, all the other spawned ports will also be released during process exits. However, the mlx5_dev_close() function does not add the cases for secondary process, it means call the mlx5_dev_close() function directly in secondary process releases the resources it should not touch. Add the case for secondary process release to its own resources in mlx5_dev_close() function to help it quits gracefully. Fixes: `942d13e6e7` ("net/mlx5: fix sharing context destroy order") Fixes: `3a8207423a` ("net/mlx5: close all ports on remove") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Muhammad Bilal	5a448a55b4	fix same typo in multiple places Removed the typing error in doc/guides/eventdevs/index.rst, drivers/net/mlx5/mlx5.c and in lib/librte_vhost/rte_vhost.h Bugzilla ID: 477 Fixes: `0857b94211` ("doc: add event device and software eventdev") Fixes: `039253166a` ("vhost: add device op when notification to guest is sent") Fixes: `ad74bc6195` ("net/mlx5: support multiport IB device during probing") Cc: stable@dpdk.org Signed-off-by: Muhammad Bilal <m.bilal@emumba.com>	2020-05-19 15:55:57 +02:00
Bing Zhao	25a59a3076	net/mlx5: fix doorbell bitmap management offsets The doorbell record is organized with page and bitmap. When some new doorbell needs to be associated with a queue, the bit will be set in the bitmap to indicate the corresponding doorbell occupied. A counter is used to record the number of doorbell occupied to speed up the searching. If the number reaches the maximal value of a pre-defined number of a page, a new page will be allocated. If not, then the bitmap will be checked to find a free one. The LSHIFT and OR (AND NOT) operations are used to update the bitmap of a page. But 1 will be treated as a signed integer when compiling. When the shift number is 31, the shifted value will be considered as negative. Then a wrong extension will be done when setting it to a 64-bits variable. All the upper 32-bits will be set to 1 by such extension. Then a wrong offset value will be calculated because of this. The next 64 bits will be also treated as the bitmap and get corrupted through the bit set operation. The immediate value 1 needs to be used as 64 bits width explicitly. Fixes: `21cae8580f` ("net/mlx5: allocate door-bells via DevX") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:57 +02:00
Matan Azrad	5af61440dd	net/mlx5: fix flow counter container resize The design of counter container resize used double buffer algorithm in order to synchronize between the query thread to the control thread. When the control thread detected resize need, it created new bigger buffer for the counter pools in a new container and change the container index atomically. In case the query thread had not detect the previous resize before a new one need was detected by the control thread, the control thread returned EAGAIN to the flow creation API used a COUNT action. The rte_flow API doesn't allow unblocked commands and doesn't expect to get EAGAIN error type. So, when a lot of flows were created between 2 different periodic queries, 2 different resizes might try to be created and caused EAGAIN error. This behavior may blame flow creations. Change the synchronization way to use lock instead of double buffer algorithm. The critical section of this lock is very small, so flow insertion rate should not be decreased. Fixes: `ebbac312e4` ("net/mlx5: resize a full counter container") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Dong Zhou	fa2d01c87d	net/mlx5: support flow aging Currently, there is no flow aging check and age-out event callback mechanism for mlx5 driver, this patch implements it. It's included: - Splitting the current counter container to aged or no-aged container since reducing memory consumption. Aged container will allocate extra memory to save the aging parameter from user configuration. - Aging check and age-out event callback mechanism based on current counter. When a flow be checked aged-out, RTE_ETH_EVENT_FLOW_AGED event will be triggered to applications. - Implement the new API: rte_flow_get_aged_flows, applications can use this API to get aged flows. Signed-off-by: Dong Zhou <dongz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-05 15:54:27 +02:00
Dong Zhou	8d93c830e4	net/mlx5: modify ext-counter memory allocation Currently, the counter pool needs 512 ext-counter memory for no batch counters, it's allocated separately by once, behind the 512 basic-counter memory. This is not easy to get ext-counter pointer by corresponding basic-counter pointer. This is also no easy for expanding some other potential additional type of counter memory. So, need allocate every one of ext-counter and basic-counter together, as a single piece of memory. It's will be same for further additional type of counter memory. In this case, one piece of memory contains all type of memory for one counter, it's easy to get each type memory by using offsetting. Signed-off-by: Dong Zhou <dongz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-05 15:54:27 +02:00
Asaf Penso	9b080425e3	net/mlx5: fix assert in doorbell lookup The asserts makes sure that 'i' doesn't exceed the expected value. This to prevent an out of bound access to dbr_bitmap. The current location of the assert protects the assignment of dbr_bitmap, but not the access to it. Moved the assert to the correct place, to protect both cases. Also, used an existing define for the assert. Fixes: `21cae8580f` ("net/mlx5: allocate door-bells via DevX") Cc: stable@dpdk.org Signed-off-by: Asaf Penso <asafp@mellanox.com> Reviewed-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-05 15:54:26 +02:00
Suanming Mou	0136df99a9	net/mlx5: reorganize flow API structure Currently, the rte flow structure is not fully aligned and has some bits wasted. The members can be optimized and reorganized to save memory. 1. The drv_type uses only limited bits, change the type to 2 bits what it needs. 2. Align the hairpin_flow_id, drv_type, fdir, copy_applied to 32 bits. As hairpin never uses the full 32 bits. 3. __rte_packed helps tight up the structure memory layout. The optimization totally helps save 14 bytes for the structure. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	ab612adc1e	net/mlx5: allocate flow API from indexed pool This commit allocates rte flow from indexed memory pool. Allocate rte flow memory from indexed memory pool helps save more than MALLOC_ELEM_OVERHEAD bytes memory from rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	90e6053a19	net/mlx5: convert mark copy resource to indexed Allocate mark copy resource from indexed pool helps rte flow saves the 4 bytes index instead of 8 bytes pointer. For mark copy resource itself, it helps save MALLOC_ELEM_OVERHEAD bytes from rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	8638e2b076	net/mlx5: allocate meter from indexed pool This patch allocate the meter object memory from indexed memory pool which will help to save the MALLOC_ELEM_OVERHEAD memory taken by rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	b88341ca35	net/mlx5: convert flow dev handle to indexed This commit converts flow dev handle to indexed. Change the mlx5 flow handle from pointer to uint32_t saves memory for flow. With million flow, it saves several MBytes memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	772dc0eb83	net/mlx5: convert hrxq to indexed This commit converts hrxq to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	7ac99475ce	net/mlx5: convert jump resource to indexed This commit convert jump resource to indexed. The table data struct is allocated from indexed memory. As it is add in the hash list, the pointer is still used for hash list search. The index is added to the table struct, and the pointer in flow handle is decrease to uint32_t type. For flow without jump flows, it saves 4 bytes memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	f3faf9ea11	net/mlx5: convert port id action to indexed This commit converts port id action to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	5f1142692a	net/mlx5: convert tag resource to indexed This commit convert tag resource to indexed. As tag resources are add in the hash list, to avoid introduce performance issue and keep the hash list, only the tag resource memory is allocated from indexed memory. The resources is still added to the hash list. Add four bytes index in the tag resource struct and change the tag resources in the flow handle from pointer to uint32_t seems be no benefit for tag resource, but it saves memory for flows without tag action. And also for sub flows share one tag action resource. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	8acf8ac9b7	net/mlx5: convert push VLAN resource to indexed This commit converts the push VLAN resource to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	014d1cbe51	net/mlx5: convert encap/decap resource to indexed This commit converts the flow encap/decap resource to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Vu Pham	b8dc6b0e29	common/mlx5: refactor memory management Refactor common memory btree and cache management to common driver. Replace some input parameters of MR APIs to more common data structure like PD, port_id, share_cache,... so that multiple PMD drivers can use those MR APIs. Modify mlx5 net pmd driver to use MR management APIs from common driver. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:08 +02:00
Vu Pham	a4de9586ac	common/mlx5: refactor IPC handling from net driver Refactor common multi-process handling codes from net PMD to common driver. Using tuple mp_id{name, port_id} as standard input parameter for all multi-process IPC APIs instead of using rte_eth_dev. Modify net PMD to use multi-process APIs from common driver. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:08 +02:00
Alexander Kozyrev	ecb160456a	net/mlx5: add device parameter for MPRQ stride size Define a device parameter to configure log 2 of a stride size for MPRQ - mprq_log_stride_size. User is able to specify a stride size in a range allowed by an underlying hardware. The default stride size is defined as 2048 bytes to encompass most commonly used packet sizes in the Internet (MTU 1518 and less) and will be used in case a maximum configured packet size cannot fit into the largest possible stride size. Otherwise a stride size is set to a large enough value to encompass a whole packet. Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-04-21 13:57:08 +02:00
Suanming Mou	826b8a8732	net/mlx5: split flow counter struct Currently, the counter struct saves both the members used by batch counters and none batch counters. The members which are only used by none batch counters cost 16 bytes extra memory for batch counters. As normally there will be limited none batch counters, mix the none batch counter and batch counter members becomes quite expensive for batch counter. If 1 million batch counters are created, it means 16 MB memory which will not be used by the batch counters are allocated. Split the mlx5_flow_counter struct for batch and none batch counters helps save the memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-04-21 13:57:07 +02:00
Bing Zhao	1ad9a3d09f	net/mlx5: introduce buffer size parameter for hairpin When creating a hairpin queue, the total data size and the maximal number of packets are interrelated. The differ is the stride size. Larger buffer size means big packet like jumbo could be supported, but in the meanwhile, it will introduce more cache misses and have a side effect on the performance. Now a new device parameter "hp_buf_log_sz" is introduced for applications to set the total data buffer size (the logarithm value). Then the maximal number of packets will also be calculated automatically by this value. Applications could also change this value to a larger one in order to support larger packets in hairpin case. A smaller value will be beneficial for memory consumption. If it is not set, the default value will be used. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:05 +02:00
Bing Zhao	e7bfa3596a	net/mlx5: separate the flow handle resource Only the members of flow handle structure will be used when trying to destroy a flow. Other members of mlx5 device flow resource will only be used for flow creating, and they could be reused for different flows. So only the device flow handle structure needs to be saved for further usage. This could be separated from the whole mlx5 device flow and stored with a list for each rte flow. Other members will be pre-allocated with an array, and an index will be used to help to apply each device flow to the hardware. The flow handle sizes of Verbs and DV mode will be different, and some calculation could be done before allocating a verbs handle. Then the total memory consumption will less for Verbs when there is no inbox driver being used. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-04-21 13:57:05 +02:00
Bing Zhao	8db7e3b698	net/mlx5: change operations for non-cached flows When stopping a mlx5 device, all the flows inserted will be flushed since they are with non-cached mode. And no more action will be done for these flows in the device closing stage. If the device restarts after stopped, no flow with non-cached mode will be re-inserted. The flush operation through rte interface will remain the same, and all the flows will be flushed actively. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-04-21 13:57:05 +02:00
Thomas Monjalon	ee76bddc76	doc: fix naming of Mellanox devices The devices of the family ConnectX may have two letters as suffix. Such suffix is preceded with a space and the second x is lowercase: - ConnectX-4 Lx - ConnectX-5 Ex - ConnectX-6 Dx Uppercase of the device family name BlueField is also fixed. The lists of supported devices are fixed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-02-25 15:55:54 +01:00
Raslan Darawsheh	58b4a2b13e	net/mlx5: add BlueField-2 device ID This adds new device id to the list of Mellanox devices that runs mlx5 PMD. - BlueField-2 integrated ConnectX-6 Dx network controller This device is not ready yet, it is in development stage. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-02-19 13:51:06 +01:00
Thomas Monjalon	43e34a229d	build: remove redundant config include The header file rte_config.h is always included by make or meson. If required in an exported API header file, it must be included in the public header file for external applications. In the internal files, explicit include of rte_config.h is useless, and can be removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2020-02-11 16:50:59 +01:00
Michael Baum	4f8e6befe7	net/mlx5: fix memory regions release deadlock The mpx5 PMD maintains the list of devices for those the memory operation callback routines must be invoked to keep the device MRs (MR is the entity backing the hardware DMA transactions) consistent with the mapped memory. Each device context in the list is protected with dedicated lock on per device basis, which might be taken inside the callback routine. When device is closing the PMD frees all MRs by calling mlx5_mr_release(), that might call rte_free() under the taken device lock. If this rte_free call triggers the entire memory segment freeing it, in its turn, invokes the callback routine and attempt to take the lock inside this one causes the deadlock. The patch proposes the remove the device from the callback list first and then call mlx5_mr_release() and free the remaining device MRs explicitly. Fixes: `0e3d0525b2` ("net/mlx5: fix memory event callback list") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-02-05 11:15:53 +01:00
Alexander Kozyrev	26f1bae837	net/mlx5: add Rx/Tx burst mode info Get a burst mode information for Rx/Tx queues in mlx5. Provide callback functions to show this information in a "show rxq info" and "show txq info" output. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:21 +01:00
Alexander Kozyrev	8e46d4e18f	common/mlx5: improve assert control Use the MLX5_ASSERT macros instead of the standard assert clause. Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it. If RTE_LIBRTE_MLX5_DEBUG is enabled MLX5_ASSERT is equal to RTE_VERIFY to bypass the global CONFIG_RTE_ENABLE_ASSERT option. If RTE_LIBRTE_MLX5_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:21 +01:00
Alexander Kozyrev	0afacb04f5	common/mlx5: remove NDEBUG Use the RTE_LIBRTE_MLX5_DEBUG configuration flag to get rid of dependency on the NDEBUG definition. This is a preparation step to switch from standard assert clauses to DPDK RTE_ASSERT ones in MLX5 driver. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:21 +01:00
Ori Kam	efa79e68c8	net/mlx5: support fine grain dynamic flag The inline feature is designed to save PCI bandwidth by copying some of the data to the wqe. This feature if enabled works for all packets. In some cases when using external memory, the PCI bandwidth is not relevant since the memory can be accessed by other means. This commit introduce the ability to control the inline with mbuf granularity. In order to use this feature the application should register the field name, and restart the port. Signed-off-by: Ori Kam <orika@mellanox.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-02-05 09:51:20 +01:00
Matan Azrad	f22442cb5d	net/mlx5: reduce Netlink commands dependencies As an arrangment for Netlink command moving to the common library, reduce the net/mlx5 dependencies. Replace ethdev class command parameters. Improve Netlink sequence number mechanism to be controlled by the mlx5 Netlink mechanism. Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one which uses it. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Matan Azrad	d768f324d6	net/mlx5: select driver by class device argument There might be a case that one Mellanox device can be probed by multiple mlx5 drivers. One case is that any mlx5 vDPA device can be probed by both net/mlx5 and vdpa/mlx5. Add a new mlx5 common API to get the requested driver by devargs: class=[net/vdpa]. Skip net/mlx5 PMD probing while the device is selected to be probed by the vDPA driver. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Matan Azrad	93e3098296	common/mlx5: share PCI device detection Move PCI detection by IB device from mlx5 PMD to the common code. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Matan Azrad	7b4f1e6bd3	common/mlx5: introduce common library A new Mellanox vdpa PMD will be added to support vdpa operations by Mellanox adapters. This vdpa PMD design includes mlx5_glue and mlx5_devx operations and large parts of them are shared with the net/mlx5 PMD. Create a new common library in drivers/common for mlx5 PMDs. Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5 common library in drivers/common. The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c, mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to drivers/common/mlx5. Share the log mechanism macros. Separate also the log mechanism to allow different log level control to the common library. Build files and version files are adjusted accordingly. Include lines are adjusted accordingly. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Matan Azrad	543e218fa5	net/mlx5: separate DevX commands interface The DevX commands interface is included in the mlx5.h file with a lot of other PMD interfaces. As an arrangement to make the DevX commands shared with different PMDs, this patch moves the DevX interface to a new file called mlx5_devx_cmds.h. Also remove shared device structure dependency on DevX commands. Replace the DevX commands log mechanism from the mlx5 driver log mechanism to the EAL log mechanism. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Suanming Mou	792e749e92	net/mlx5: fix register usage in meter Flow with meter will split to three subflows, the prefix subflow with meter action do the color, the meter subflow filter the packets, the suffix subflow do all the left actions for packets pass the filter. Both the color and the subflow match between prefix and suffix use the register to store the tag. For some of the NICs with meter color register share capability, it only uses 8 LSB of the register for color, the left 24 MSB can be used for flow id match between meter prefix subflow and suffix subflow. Currently, one entire register is allocated for flow matching which causes the NICs with limited registers don't have enough register for other matching. Add the meter color share capability checking to fix lacking of registers issue. Fixes: `9ea9b049a9` ("net/mlx5: split meter flow") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Suanming Mou	30a3687d99	net/mlx5: support maximum flow id allocation The id allocated is for the register unique id match. Some registers may not use the full 32 bits. Add the maximum id to avoid allocate id over the register restriction. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
Xueming Li	e6cdc54cc0	net/mlx5: add socket server for external tools Add pmd unix socket server to enable external tool applications to trigger flow dump. Socket path: /var/tmp/dpdk_mlx5_<pid> Socket format: io_raw: port_id of uint16 file: file descriptor of int Signed-off-by: Xueming Li <xuemingl@mellanox.com> Signed-off-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-01-17 19:59:19 +01:00
Xiaoyu Min	6801116688	net/mlx5: fix multiple flow table hash list The eth devices which share one ibv device only need one hash list of flow table. Currently, flow table hash list is created per each eth device whatever whether they share one ibv device or not. If the devices share one ibv device, the previously created hash list will become dangle because the pointer point to (sh->flow_tbls) is overwritten by the later created hast list. To fix this, just don't create hash list if it is already created. Fixes: `54534725d2` ("net/mlx5: fix flow table hash list conversion") Cc: stable@dpdk.org Reported-by: Zhike Wang <wangzhike@jd.com> Signed-off-by: Xiaoyu Min <jackmin@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-01-17 19:45:23 +01:00
Suanming Mou	4acb96fd52	net/mlx5: add GENEVE in tunnel offloads capabilities GENEVE is available in tunnel offloads. Add it as the default support option. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2019-11-26 18:22:27 +01:00
Viacheslav Ovsiienko	82e75f8323	net/mlx5: fix legacy multi-packet Tx descriptors ConnectX-4LX supports multiple packets within the single Tx descriptor. This feature is named as "Legacy Multi-Packet Write" and imposes a lot of limitations: - no ACLs, it means no NIC Tx Flows are supported and Tx metadata become meaningless - the required minimal inline data must be zero - no SR-IOV, it means no support in E-Switch configurations, - no priority and dscp forcing - no VLAN insertion - no TSO - all packets within MPW session must have the same size This legacy MPW feature is mainly intended for test purposes. To explicitly engage the feature on ConnectX-4LX the devargs should be specified: - txq_mpw_en=1 This feature was dropped in 19.08, this patch reverts it back. Fixes: `18a1c20044` ("net/mlx5: implement Tx burst template") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-26 18:22:27 +01:00
Dekel Peled	0adf23adcb	net/mlx5: fix flow engine choice Commit in fixes line sets the DV (Direct Verbs) flow engine as default. Newer versions of DV flow engine use the DR (Direct Rules) features. DR is supported from RDMA Core library version rdma-core-24.0. This cause failure to start port when using older rdma-core version, without DR support. This patch selects DV flow engine if rdma-core version is v24.0 or higher. Verbs flow engine is selected otherwise. Fixes: `cd4569d2bf` ("net/mlx5: change default flow engine to DV") Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2019-11-26 18:05:15 +01:00
Matan Azrad	1ef4cdef26	net/mlx5: fix flow tag hash list conversion When DR is not supported and DV is supported, tag action still can be used by the metadata feature. Wrongly, the tag hash list was not created what caused failure in metadata action creation. Create the tag hash list for each DV case. Fixes: `860897d289` ("net/mlx5: reorganize flow tables with hash list") Signed-off-by: Matan Azrad <matan@mellanox.com>	2019-11-26 18:05:15 +01:00
Viacheslav Ovsiienko	f078ceb6ae	net/mlx5: fix Tx doorbell write memory barrier As the result of testing it was found that some hosts have the performance penalty imposed by required write memory barrier after doorbell writing. Before 19.08 release there was some heuristics to decide whether write memory barrier should be performed. For the bursts of recommended size (or multiple) it was supposed there were some extra ongoing packets in the next burst and write memory barrier may be skipped (supposed to be performed in the next burst, at least after descriptor writing). This patch restores that behaviour, the devargs tx_db_nc=2 must be specified to engage this performance tuning feature. Fixes: `8409a28573` ("net/mlx5: control transmit doorbell register mapping") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-20 17:36:06 +01:00
Dekel Peled	cd4569d2bf	net/mlx5: change default flow engine to DV The default flow engine is Verbs flow engine, for legacy reasons. This patch changes the default to DV flow engine (dv_flow_en = 1). Documentation is updated accordingly. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-20 17:36:06 +01:00
Viacheslav Ovsiienko	85c4bcbcc5	net/mlx5: fix vport index in port action The rdma_core routine mlx5dv_dr_create_flow_action_dest_vport() requires the vport id parameter to create port action. The register c[0] value was used to deduce the port id value and it fails in bonding configuration. The correct way is to apply vport_num value queried from the rdma_core library. Fixes: `f07341e7ae` ("net/mlx5: update source and destination vport translations") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-20 17:36:06 +01:00
Matan Azrad	54534725d2	net/mlx5: fix flow table hash list conversion For the case when DR is not supported and DV is supported: multi-tables feature is off. In this case, only table 0 is supported. Table 0 structure wrongly was not created what prevented any matcher object to be created and even caused crashes. Create the table hash list in DV case too. Create table zero empty structure for each domain when DR is not supported. Allow NULL DR internal table object to be used. Fixes: `860897d289` ("net/mlx5: reorganize flow tables with hash list") Signed-off-by: Matan Azrad <matan@mellanox.com>	2019-11-20 17:36:06 +01:00
Viacheslav Ovsiienko	06f78b5ebc	net/mlx5: fix environment variable recovery The state of environment variable MLX5_BF_SHUT_UP was not recovered correctly if there was no tx_db_nc devarg specified. Fixes: `8409a28573` ("net/mlx5: control transmit doorbell register mapping") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-20 17:36:05 +01:00
Bing Zhao	e484e40323	net/mlx5: optimize tag traversal with hash list Tag action for flow mark/flag could be reused by different flows. When creating a new flow with mark, the existing tag resources will be traversed in order to confirm if the action is already created. If only one linked list is used, the searching rate will drop significantly with the number of tag actions increasing. By using a hash lists table, it will speed up the searching process and in the meanwhile, the memory consumption won't be large if only a small number tag action resources are created(compared to other hash table implementations). The list heads array size could be optimized with some extendable hash table in the future. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-11 14:23:02 +01:00
Bing Zhao	860897d289	net/mlx5: reorganize flow tables with hash list In the current flow tables organization, arrays are used. This is fast for searching, creating related object that will be used in flow creation. But it introduces some limitation to the table index. Then we can reorganize the flow tables information with hash list. When using hash list, there is no need to maintain three arrays for NIC TX, RX and FDB tables object information. This attribute could be used together with the table ID to generate a 64-bits key that is unique for the hash list insertion, lookup and deletion. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	8409a28573	net/mlx5: control transmit doorbell register mapping The rdma core library can map doorbell register in two ways, depending on the environment variable "MLX5_SHUT_UP_BF": - as regular cached memory, the variable is either missing or set to zero. This type of mapping may cause the significant doorbell register writing latency and requires explicit memory write barrier to mitigate this issue and prevent write combining. - as non-cached memory, the variable is present and set to not "0" value. This type of mapping may cause performance impact under heavy loading conditions but the explicit write memory barrier is not required and it may improve core performance. The new devarg is introduced "tx_db_nc", if this parameter is set to zero, the doorbell register is forced to be mapped to cached memory and requires explicit memory barrier after writing to. If "tx_db_nc" is set to non-zero value the doorbell will be mapped as non-cached memory, not requiring the memory barrier. If "tx_db_nc" is missing the behaviour will be defined by presence of "MLX5_SHUT_UP_BF" in environment. If variable is missed the default value zero will be set for ARM64 hosts and one for others. In run time the code checks the mapping type and provides the memory barrier after writing to tx doorbell register if it is needed. The mapping type is extracted directly from the uar_mmap_offset field in the queue properties. Fixes: `18a1c20044` ("net/mlx5: implement Tx burst template") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	02e7646818	net/mlx5: clean meter resources When the port is closed or program exits ungraceful, the meter rulers should be flushed after the flow destroyed. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	3f373f3523	net/mlx5: support basic meter operations This commit add the basic meter operations for meter create and destroy. New internal functions in rte_mtr_ops callback: 1. create() 2. destroy() The create() callback will create the corresponding flow rules on the meter table. The destroy() callback destroys the flow rules on the meter table. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	3bd26b23ce	net/mlx5: support meter profile operations This commit add the support of meter profile add and delete operations. New internal functions in rte_mtr_ops callback: 1. meter_profile_add() 2. meter_profile_delete() Only RTE_MTR_SRTCM_RFC2697 algorithm is supported and can be added. To add other algorithm will report an error. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	27efd5dead	net/mlx5: allocate flow meter registers Meter need the metadata REG_C to have the color match between the prefix flow and the meter flow. As the user define or metadata feature will both use the REG_C in the suffix flow, the color match register meter uses will not impact the register use in the later sub flow. Another case is that tag is add before meter flow. In this case, meter should not touch the register the tag action is using. To avoid that case, meter should reserve the REG_C's used by user defined MLX5_APP_TAG. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	6bc327b94f	net/mlx5: fill meter capabilities using DevX This commit add the support of fill and get the meter capabilities from DevX. Support items: 1. The srTCM color bind mode. 2. Meter share with multiple flows. 3. Action drop. The color aware mode and multiple meter chaining in a flow are not supported. New internal function in rte_mtr_ops callback: 1. capabilities_get() Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Suanming Mou	d740eb5018	net/mlx5: add meter operation callback Add the new mlx5_flow_meter.c file for metering support. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	dd3c774f6f	net/mlx5: add metadata register copy table While reg_c[meta] can be copied to reg_b simply by modify-header action (it is supported by hardware), it is not possible to copy reg_c[mark] to the STE flow_tag as flow_tag is not a metadata register and this is not supported by hardware. Instead, it should be manually set by a flow per each unique MARK ID. For this purpose, there should be a dedicated flow table - RX_CP_TBL and all the Rx flow should pass by the table to properly copy values from the register to flow tag field. And for each MARK action, a copy flow should be added to RX_CP_TBL according to the MARK ID like: (if reg_c[mark] == mark_id), flow_tag := mark_id / reg_b := reg_c[meta] / jump to RX_ACT_TBL For SET_META action, there can be only one default flow like: reg_b := reg_c[meta] / jump to RX_ACT_TBL Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	71e254bc02	net/mlx5: split Rx flows to provide metadata copy Values set by MARK and SET_META actions should be carried over to the VF representor in case of flow miss on Tx path. However, as not all metadata registers are preserved across the different domains (NIC Rx/Tx and E-Switch FDB), as a workaround, those values should be carried by reg_c's which are preserved across domains and copied to STE flow_tag (MARK) and reg_b (META) fields in the last stage of flow steering, in order to scatter those values to flow_tag and flow_table_metadata of CQE. While reg_c[meta] can be copied to reg_b simply by modify-header action (it is supported by hardware), it is not possible to copy reg_c[mark] to the STE flow_tag as flow_tag is not a metadata register and this is not supported by hardware. Instead, it should be manually set by a flow per MARK ID. For this purpose, there should be a dedicated flow table - RX_CP_TBL and all the Rx flow should pass by the table to properly copy values. As the last action of Rx flow steering must be a terminal action such as QUEUE, RSS or DROP, if a user flow has Q/RSS action, the flow must be split in order to pass by the RX_CP_TBL. And the remained Q/RSS action will be performed by another dedicated action table - RX_ACT_TBL. For example, for an ingress flow: pattern, actions_having_QRSS it must be split into two flows. The first one is, pattern, actions_except_QRSS / copy (reg_c[2] := flow_id) / jump to RX_CP_TBL and the second one in RX_ACT_TBL. (if reg_c[2] == flow_id), action_QRSS where flow_id is uniquely allocated and managed identifier. This patch implements the Rx flow splitting and build the RX_ACT_TBL. Also, per each egress flow on NIC Tx, a copy action (reg_c[]= reg_a) should be added in order to transfer metadata from WQE. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	3913937151	net/mlx5: adjust shared register according to mask The metadata register reg_c[0] might be used by kernel or firmware for their internal purposes. The actual used mask can be queried from the kernel. The remaining bits can be used by PMD to provide META or MARK feature. The code queries the mask of reg_c[0] and adjust the resource usage dynamically. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	2d241515eb	net/mlx5: add devarg for extensive metadata support The PMD parameter dv_xmeta_en is added to control extensive metadata support. A nonzero value enables extensive flow metadata support if device is capable and driver supports it. This can enable extensive support of MARK and META item of rte_flow. The newly introduced SET_TAG and SET_META actions do not depend on dv_xmeta_en parameter, because there is no compatibility issue for new entities. The dv_xmeta_en is disabled by default. There are some possible configurations, depending on parameter value: - 0, this is default value, defines the legacy mode, the MARK and META related actions and items operate only within NIC Tx and NIC Rx steering domains, no MARK and META information crosses the domain boundaries. The MARK item is 24 bits wide, the META item is 32 bits wide. - 1, this engages extensive metadata mode, the MARK and META related actions and items operate within all supported steering domains, including FDB, MARK and META information may cross the domain boundaries. The ``MARK`` item is 24 bits wide, the META item width depends on kernel and firmware configurations and might be 0, 16 or 32 bits. Within NIC Tx domain META data width is 32 bits for compatibility, the actual width of data transferred to the FDB domain depends on kernel configuration and may be vary. The actual supported width can be retrieved in runtime by series of rte_flow_validate() trials. - 2, this engages extensive metadata mode, the MARK and META related actions and items operate within all supported steering domains, including FDB, MARK and META information may cross the domain boundaries. The META item is 32 bits wide, the MARK item width depends on kernel and firmware configurations and might be 0, 16 or 24 bits. The actual supported width can be retrieved in runtime by series of rte_flow_validate() trials. If there is no E-Switch configuration the ``dv_xmeta_en`` parameter is ignored and the device is configured to operate in legacy mode (0). Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko	5e61bcdd24	net/mlx5: check metadata registers availability The metadata registers reg_c provide support for TAG and SET_TAG features. Although there are 8 registers are available on the current mlx5 devices, some of them can be reserved. The availability should be queried by iterative trial-and-error implemented by mlx5_flow_discover_mreg_c() routine. If reg_c is available, it can be regarded inclusively that the extensive metadata support is possible. E.g. metadata register copy action, supporting 16 modify header actions (instead of 8 by default) preserving register across different domains (FDB and NIC) and so on. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-11-11 14:23:01 +01:00
Raslan Darawsheh	5fc66630be	net/mlx5: add ConnectX6-DX device ID This adds new device id to the list of Mellanox devices that runs mlx5 PMD. - ConnectX-6DX device ID - ConnectX-6DX SRIOV device ID Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:05 +01:00
Dekel Peled	06fa6988d8	net/mlx5: remove redundant new line in logs DRV_LOG macro is used to print log messages, one per line. In several locations this macro is used with redundant '\n' character at the end of the log message, causing blank lines between log lines. This patch removes the '\n' character where it is redundant. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	d85c7b5ea5	net/mlx5: split hairpin flows Since the encap action is not supported in RX, we need to split the hairpin flow into RX and TX. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	830d209161	net/mlx5: add ID generation When splitting flows for example in hairpin / metering, there is a need to combine the flows. This is done using ID. This commit introduce a simple way to generate such IDs. The reason why bitmap was not used is due to fact that the release and allocation are O(n) while in the chosen approch the allocation and release are O(1) Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	b6b3bf86bd	net/mlx5: get hairpin capabilities This commits adds the hairpin get capabilities function. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	ae18a1ae96	net/mlx5: support Tx hairpin queues This commit adds the support for creating Tx hairpin queues. Hairpin queue is a queue that is created using DevX and only used by the HW. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	894c4a8e5a	net/mlx5: prepare Tx queues to have different types Currently all Tx queues are created using Verbs. This commit modify the naming so it will not include verbs, since in next commit a new type will be introduce (hairpin) Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Ori Kam	e79c9be915	net/mlx5: support Rx hairpin queues This commit adds the support for creating Rx hairpin queues. Hairpin queue is a queue that is created using DevX and only used by the HW. This results in that all the data part of the RQ is not being used. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Dekel Peled	2eb5dce8c0	net/mlx5: fix LRO dependency to include DV flow Rx queue for LRO is created using DevX. Flows created on this queue must use the DV flow engine. This patch adds check of dv_flow_en=1 when configuring LRO support on device spawn. Documentation is updated accordingly. Fixes: `175f1c21d0` ("net/mlx5: check conditions to enable LRO") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-11-08 23:15:04 +01:00
Matan Azrad	2324206337	net/mlx5: fix DevX event registration timing The DevX counter management triggers an asynchronous event to get back the new counters values from the HW. The counter management doesn't trigger 2 parallel events for the same pool, hence, the pool cannot be updated again in the event waiting time. When the port is stopped, the DevX event mechanism wrongly was destroyed what remained all the waiting pools in waiting state forever. As a result, the counters of the stuck pools were never updated again. Separate the DevX interrupt installation from the dev installation and remove the DevX interrupt unregistration\registration from the stop\start operations. Now, the DevX interrupt should be installed in probe and uninstalled in close. Cc: stable@dpdk.org Fixes: `f15db67df0` ("net/mlx5: accelerate DV flow counter query") Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-10-23 16:43:10 +02:00
Viacheslav Ovsiienko	cc8627bc6d	net/mlx5: fix direct call to rdma-core library The routine mlx5dv_query_devx_port() was called directly instead of using the mlx5 glue thunk. Fixes: `d5c06b1b10` ("net/mlx5: query vport index match mode and parameters") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-10-08 12:14:32 +02:00
Viacheslav Ovsiienko	fbc8341218	net/mlx5: fix device scan within switch domain In LAG configuration the devices in the same switch domain might be spawned on the base of different PCI devices, so we should check all devices backed by mlx5 PMD whether they belong to specified switch domain. When the new devices are being created it is not possible to detect whether the sibling devices created in the current probe() loop belong to the driver, driver field is not filled yet (it will be done on returned success of current probe()). This patch updates the device scanning, allowing extra match on current backing PCI device, is being used to create siblings. Fixes: `f7e95215ac` ("net/mlx5: extend switch domain searching range") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-10-08 12:14:32 +02:00
Viacheslav Ovsiienko	92d5dd4834	net/mlx5: check sibling device configurations mismatch The devices backed by mlx5 PMD might share the same multiport Infiniband device context. It regards representors and slaves of bonding device. These ports are spawned with devargs. These patch check whether configuration deduced from these devargs is compatible with configurations if devices sharing the same context. It prevents the incorrect whitelists, like: -w 82:00.0,representor=0,dv_flow_en=1 -w 82:00.0,representor=1,dv_flow_en=0 The representors with indices [0-1] are supposed to spawned over the same PCi device, but there is dv_flow_en parameter mismatch. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko	bee57a0a35	net/mlx5: update switch port id in bonding configuration With bonding configuration multiple PFs may represent the single switching device with multiple ports as representors. To distinguish representors belonging to different PFs we should generated unique port ID. It is proposed to use the PF index in bonding configuration to generate this unique port IDs. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko	f7e95215ac	net/mlx5: extend switch domain searching range With bonding configurations the switch domain may be shared between multiple PCI devices, we should search the switch sibling devices within the entire set of present ethernet devices backed by the mlx5 PMD. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko	d5c06b1b10	net/mlx5: query vport index match mode and parameters There new kernel/rdma_core [1] supports matching on metadata register instead of vport field to provide operations over VF LAG bonding configurations. The patch retrieves parameters and information about the way is engaged to match vport on E-Switch. [1] http://patchwork.ozlabs.org/cover/1122170/ "Mellanox, mlx5 vport metadata matching" Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	790164ce1d	net/mlx5: check kernel support for VF LAG bonding If bonding Infiniband device is found the unified E-Switch is supposed and the extra rdma-core/kernel support is needed to retrieve vport indices. The patch introduces this feature defines, bonding support check is added to probe routine. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	10dadfcb8a	net/mlx5: generate bonding device name If device is VF LAG bonding one the port name includes the bonding Infiniband device name and looks like: 82:00.0_mlx5_bond_0 - for master device port PF0 82:00.1_mlx5_bond_0_representor_5 - for representor VF5 over PF1 where bonding Infiniband device mlx5_bond_0 controls the 82:00.0 as PF0 and 82:00.1 as PF1 PCI functions. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	2e569a3703	net/mlx5: add VF LAG mode bonding device recognition The Mellanox NICs starting from ConnectX-5 support LAG over NIC ports internally, implemented by the NIC firmware and hardware. The multiport NIC presents multiple physical PCI functions (PF), with SR-IOV multiple virtual PCI functions (VFs) might be presented. With switchdev mode the VF representors are engaged and PFs and their VFs are connected by internal E-Switch feature. Each PF and related VFs have dedicated E-Switch and belong to dedicated switch domain. If NIC ports are combined to support NIC the kernel drivers introduce the single unified Infiniband multiport devices, and all only one unified E-Switch with single switch domain combines master PF all all VFs. No extra DPDK bonding device is needed. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	a62ec99161	net/mlx5: allocate device list explicitly At device probing the device list to spawn was allocated as dynamic size local variable. It was no possible to have one unified exit point from routine due to compiler warnings. This patch allocates the spawn device list directly with rte_zmalloc() and it is possible to goto to unified exit label from anywhere of the routine. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	5cf5f710b0	net/mlx5: update PCI address retrieving routine The routine mlx5_ibv_device_to_pci_addr() takes Infiniband device list object, takes the device sysfs path from there and retrieves PCI address. The routine may be implemented in more generic way by taking sysfs path directly as parameter and can be used for getting PCI address of netdevs. The generic routine is renamed to mlx5_dev_to_pci_addr() Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	46e10a4c1b	net/mlx5: move backing PCI device to private context Now all devices created over the same multiport IB device have shared context containing the backing PCI device field. For the VF LAG configurations it becomes possible the representors might be connected to VF created over different PFs. In this case representors have the different backing PCI devices and mentioned field should be moved to device private area. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko	c930f02c74	net/mlx5: fix ConnectX-6 VF type recognition The PCI virtual function type was not recognized correctly for ConnectX-6 VF. Fixes: `f0354d8423` ("net/mlx5: add ConnectX-6 device IDs") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:57 +02:00
Viacheslav Ovsiienko	a40b734b5e	net/mlx5: fix BlueField VF type recognition The PCI virtual function type was not recognized correctly for BlueField VF. Fixes: `f38c54571d` ("net/mlx5: split PCI from generic probing") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-10-07 15:00:57 +02:00
Dekel Peled	8a6a09f853	net/mlx5: support reading module EEPROM data This patch implements ethdev operations get_module_info and get_module_eeprom, to support ethtool commands ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM. New functions mlx5_get_module_info() and mlx5_get_module_eeprom() added in mlx5_ethdev.c. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-09-20 10:19:41 +02:00
Moti Haimovsky	b41e47da25	net/mlx5: support pop flow action on VLAN header This commit adds support for RTE_FLOW_ACTION_TYPE_OF_POP_VLAN via direct verbs flow rules. Signed-off-by: Moti Haimovsky <motih@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-09-20 10:19:41 +02:00
David Marchand	8ac3591694	remove useless include of EAL memory config header Restrict this header inclusion to its real users. Fixes: `028669bc9f` ("eal: hide shared memory config") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-10-09 10:22:24 +02:00
Raslan Darawsheh	c9ba7523c4	net/mlx5: support UDP tunnel adding This adds support for adding a new UDP tunnel port on a specific VXLAN types. Currently we only support VXLAN, VXLAN-GPE on ports 4789, 4790 respectively. Without having to configure anything in the NIC. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-09-06 17:15:14 +02:00
Viacheslav Ovsiienko	0e3d0525b2	net/mlx5: fix memory event callback list The shared Infiniband device context should be included into memory event callback list only once on context creation, and removed from the list only once on context destroying. Multiple insertions of the same object caused the infinite loop on the list processing. Fixes: `ccb3815346` ("net/mlx5: update memory event callback for shared context") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-08-06 17:42:12 +02:00
Viacheslav Ovsiienko	614de6c898	net/mlx5: fix default minimal data inline The patch [Fixes] sets the default value of required minimal inline data to 0 bytes. On some configurations (depends on switchdev/legacy settings and FW version/settings) the ConnectX-4LX NIC requires minimal 18 bytes of Tx descriptor inline data to operate correctly. Wrongly set to 0 default value may prevent NIC from operating with out-of-the-box settings, this patch reverts default value for ConnectX-4LX back to 18 bytes (inline L2). Fixes: `9f350504bb` ("net/mlx5: fix ConnectX-4LX minimal inline data limit") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-08-06 17:42:12 +02:00
David Christensen	20215627cd	net/mlx5: fix Tx inline minimum for ConnectX-5 The function mlx5_set_min_inline() includes a switch() that checks various PCI device IDs in order to set the txq_inline_min value. No value is set when the PCI device ID matches the ConnectX-5 adapters, resulting in an assert() failure later in the function mlx5_set_txlimit_params(). This error was encountered on an IBM Power 9 system running RHEL 7.6 w/o Mellanox OFED installed. Fixes: `38b4b397a5` ("net/mlx5: add Tx configuration and setup") Signed-off-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2019-08-06 17:42:12 +02:00
Viacheslav Ovsiienko	dfedf3e3f9	net/mlx5: add workaround for VLAN in virtual machine On some virtual setups (particularly on ESXi) when we have SR-IOV and E-Switch enabled there is the problem to receive VLAN traffic on VF interfaces. The NIC driver in ESXi hypervisor does not setup E-Switch vport setting correctly and VLAN traffic targeted to VF is dropped. The patch provides the temporary workaround - if the rule containing the VLAN pattern is being installed for VF the VLAN network interface over VF is created, like the command does: ip link add link vf.if name mlx5.wa.1.100 type vlan id 100 The PMD in DPDK maintains the database of created VLAN interfaces for each existing VF and requested VLAN tags. When all of the RTE Flows using the given VLAN tag are removed the created VLAN interface with this VLAN tag is deleted. The name of created VLAN interface follows the format: evmlx.d1.d2, where d1 is VF interface ifindex, d2 - VLAN ifindex Implementation limitations: - mask in rules is ignored, rule must specify VLAN tags exactly, no wildcards (which are implemented by the masks) are allowed - virtual environment is detected via rte_hypervisor() call, and the type of hypervisor is checked. Currently we engage the workaround for ESXi and unrecognized hypervisors (which always happen on platforms other than x86 - it means workaround applied for the Flow over PCI VF). There are no confirmed data the other hypervisors (HyperV, Qemu) need this workaround, we are trying to reduce the list of configurations on those workaround should be applied. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-08-06 17:42:12 +02:00
Viacheslav Ovsiienko	9f350504bb	net/mlx5: fix ConnectX-4LX minimal inline data limit Mellanox ConnectX-4LX NIC in configurations with disabled E-Switch can operate without minimal required inline data into Tx descriptor. There was the hardcoded limit set to 18B in PMD, fixed to be no limit (0B). Fixes: `38b4b397a5` ("net/mlx5: add Tx configuration and setup") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2019-07-29 18:05:10 +02:00

1 2 3 4 5 ...

536 Commits