numam-dpdk

Author	SHA1	Message	Date
Bing Zhao	02109eaeac	net/mlx5: support getting hairpin peer ports In real-life business, one device could be attached and detached dynamically. The hairpin configuration of this port to/from all the other ports should be enabled and disabled accordingly. The RTE ethdev lib and PMD should provide this ability to get the peer ports list in case that the application doesn't save it. It is recommended that the size of the array to save the port IDs is as large as the "RTE_MAX_ETHPORTS" to have the maximal capacity. The order of the peer port IDs may be different from that during hairpin queues set in the initialization stage. The peer port ID could be the same as the current device port ID when the hairpin peer ports contain itself - the single port hairpin. The application should check the ports' status and decide if the peer port should be bound / unbound when starting / stopping the current device. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:04 +01:00
Bing Zhao	37cd4501e8	net/mlx5: support two ports hairpin mode In order to support hairpin between two ports, mlx5 PMD needs to implement the functions and provide them as the function pointers. The bind and unbind functions are executed per port pairs. All the hairpin queues between the two ports should have the same attributes during queues setup. Different configurations among queue pairs from the same ports are not supported. It is allowed that two ports only have one direction hairpin. In order to set up the connection between two queues, peer Rx queue HW information must be fetched via the internal RTE API and the queue information could be used to modify the SQ object. Then the RQ object will be modified with the Tx queue HW information. The reverse operation is not supported right now. When disconnecting the queues pair, SQ and RQ object should be reset without any peer HW information. The unbinding operation will try to disconnect all Tx queues from the port from the Rx queues of the peer port. Tx explicit mode attribute will be saved and used when creating a hairpin flow. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:03 +01:00
Bing Zhao	1a01264f62	net/mlx5: change hairpin queue peer checking In the current implementation of single port mode hairpin, the peer queue should belong to the same port of the current queue. When the two ports hairpin mode is introduced, such checking should be removed to make the hairpin queue setup execute successfully since it is not an invalid condition, if the Tx port and Rx port are not the same. In the meanwhile, different devices could have different queue configurations. The queues number of peer port is unknown to the current device. The checking should be removed also. If the Tx and Rx port IDs of a hairpin peer are different, only the manual binding and explicit Tx flows are supported. Or else, the four combinations of modes could be supported. The mode attributes consistency checking will be done when connecting the queue with its peer queue. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:03 +01:00
Bing Zhao	e4b7b8d082	common/mlx5: fix PCI driver name In the refactor of mlx5 common layer, the PCI driver name to the RTE device was changed from "net_mlx5" to "mlx5_pci". The string of name "mlx5_pci" is used directly in the structure rte_pci_driver. In the past, a macro "MLX5_DRIVER_NAME" is used instead of any direct string, and now it is missing. The functions that use "MLX5_DRIVER_NAME" will get some mismatch, e.g mlx5_eth_find_next. It needs to use this macro again in all code to make everything get aligned. Fixes: `8a41f4decc` ("common/mlx5: introduce layer for multiple class drivers") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:03 +01:00
Viacheslav Ovsiienko	6c8f7f1c18	net/mlx5: report Rx buffer split capabilities Add rte_eth_dev_info->rx_seg_capa parameters: - receiving to multiple pools is supported - buffer offsets are supported - no offset alignment requirement - reports the maximal number of segments - reports the buffer split offload flag Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko	7f1620082b	net/mlx5: support Rx buffer split on datapath Only the regular rx_burst routine is updated to support split, because the vectorized ones does not support scatter and MPRQ does not support split at all. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko	213e2727a2	net/mlx5: register multiple pool for Rx queue The split feature for receiving packets was added to the mlx5 PMD, now Rx queue can receive the data to the buffers belonging to the different pools and the memory of all the involved pool must be registered for DMA operations in order to allow hardware to store the data. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko	a0a45e8af7	net/mlx5: configure Rx queue for buffer split The scatter-gather elements should be configured accordingly to support the buffer split feature. The application provides the desired settings for the segments at the beginning of the packets and PMD pads the buffer chain (if needed) with attributes of last specified segment to accommodate the packet of maximal length. There are some limitations are implied. The MPRQ feature should be disengaged if split is requested, due to MPRQ neither supports pushing data to the dedicated pools nor follows the flexible buffer sizes. The vectorized rx_burst routines does not support the scattering (these ones are extremely simplified and work over the single segment only) and can't handle split as well. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko	9f209b59c8	net/mlx5: support Rx buffer split description The routine to provide Rx queue setup with specifying extended receiving buffer description is added. It allows application to specify desired segment lengths, data position offsets in the buffer and dedicated memory pool for each segment. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Gregory Etelson	4ec6360de3	net/mlx5: implement tunnel offload Tunnel Offload API provides hardware independent, unified model to offload tunneled traffic. Key model elements are: - apply matches to both outer and inner packet headers during entire offload procedure; - restore outer header of partially offloaded packet; - model is implemented as a set of helper functions. Implementation details: * tunnel_offload PMD parameter must be set to 1 to enable the feature. * application cannot use MARK and META flow actions with tunnel. * offload JUMP action is restricted to steering tunnel rule only. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Andrey Vesnovaty	d2046c09aa	net/mlx5: support shared action for RSS Implement shared action create/destroy/update/query. The current implementation support is limited to shared RSS action only. The shared RSS action create operation prepares hash RX queue objects for all supported permutations of the hash. The shared RSS action update operation relies on functionality to modify hash RX queue introduced in one of the previous commits in this patch series. Implement RSS shared action and handle shared RSS on flow apply and release. The lookup for hash RX queue object for RSS action is limited to the set of objects stored in the shared action itself and when handling shared RSS action. The lookup for hash RX queue object inside shared action is performed by hash only. Current implementation limited to DV flow driver operations i.e. verbs flow driver operations doesn't support shared action. Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Andrey Vesnovaty	d7cfcddded	net/mlx5: translate shared action for RSS action Handle shared action on flow validation/creation/destruction. mlx5 PMD translates shared action into a regular one before handling flow validation/creation. The shared action translation applied to utilize the same execution path for both shared and regular actions. The current implementation supports shared action translation for shared RSS action only. RSS action validation split to validate shared RSS action on its creation in addition to action validation in flow validation/creation path. Implement rte_flow shared action API for mlx5 PMD, mostly forwarding calls to flow driver operations (see struct mlx5_flow_driver_ops). Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Andrey Vesnovaty	b8cc58c140	net/mlx5: modify hash Rx queue objects Implement modification for hashed table of Rx queue object (see mlx5_hrxq_modify()). This implementation relies on the capability to modify TIR object via DevX API, i.e. current implementation doesn't support verbs HW object operations. The functionality to modify hashed table of Rx queue object is prerequisite to implement rete_flow_shared_action_update() for shared RSS action in mlx5 PMD. Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Alexander Kozyrev	0f20acbf5e	net/mlx5: implement vectorized MPRQ burst MPRQ (Multi-Packet Rx Queue) processes one packet at a time using simple scalar instructions. MPRQ works by posting a single large buffer (consisted of multiple fixed-size strides) in order to receive multiple packets at once on this buffer. A Rx packet is then copied to a user-provided mbuf or PMD attaches the Rx packet to the mbuf by the pointer to an external buffer. There is an opportunity to speed up the packet receiving by processing 4 packets simultaneously using SIMD (single instruction, multiple data) extensions. Allocate mbufs in batches for every MPRQ buffer and process the packets in groups of 4 until all the strides are exhausted. Then switch to another MPRQ buffer and repeat the process over again. The vectorized MPRQ burst routine is engaged automatically in case the mprq_en=1 devarg is specified and the vectorization is not disabled explicitly by providing rx_vec_en=0 devarg. There is a limitation: LRO is not supported and scalar MPRQ is selected if it is on. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:24:25 +01:00
Alexander Kozyrev	1ded26239a	net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instructions in a single reusable block is a first preparatory step to implement vectorized Rx burst for MPRQ feature. Pass a pointer to the storage of mbufs directly to the rxq_copy_mbuf_v instead of calculating the pointer inside this function. This is needed for the future vectorized Rx routing which is going to pass a different pointer here. Calculate the number of packets to replenish inside the mlx5_rx_replenish_bulk_mbuf. Containing this logic in one place allows us to do the same for MPRQ case. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:24:25 +01:00
Xueming Li	16dbba257c	net/mlx5: fix port shared data reference count When probe a representor, tag cache hash table and modification cache hash table allocated memory upon each port, overwrote previous existing cache in shared context data. This patch moves reference check of shared data prior to hash table allocation to avoid such issue. Fixes: `6801116688` ("net/mlx5: fix multiple flow table hash list") Fixes: `1ef4cdef26` ("net/mlx5: fix flow tag hash list conversion") Cc: stable@dpdk.org Acked-by: Matan Azrad <matan@nvidia.com> Signed-off-by: Xueming Li <xuemingl@nvidia.com>	2020-11-03 23:24:25 +01:00
Shiri Kuzin	42dcd453d9	net/mlx5: fix xstats reset reinitialization The mlx5_xstats_reset clears the device extended statistics. In this function the driver may reinitialize the structures that are used to read device counters. In case of reinitialization, the number of counters may change, which wouldn't be taken into account by the reset API callback and can cause a segmentation fault. This issue is fixed by allocating the counters size after the reinitialization. Fixes: `a4193ae3bc` ("net/mlx5: support extended statistics") Cc: stable@dpdk.org Reported-by: Ralf Hoffmann <ralf.hoffmann@allegro-packets.com> Signed-off-by: Shiri Kuzin <shirik@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	2b5b1aeb39	net/mlx5: optimize counter extend memory Counter extend memory was allocated for non-batch counter to save the extra DevX object. Currently, for non-batch counter which does not support aging, entry in the generic counter struct is used only when counter is free in free list, and bytes in the struct is used only when counter is allocated in using. In this case, the DevX object can be saved to the generic counter struct union with entry memory when counter is allocated and union with bytes when counter is free. And pool type is also not needed as non-fallback mode only has generic counter and aging counter, just a bit to indicate the pool is aged or not will be enough. This eliminates the counter extend info struct saves the memory. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	cfbdc3f938	net/mlx5: rename flow counter macro Add the MLX5_ prefix to the defined counter macro names. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	e7138997e0	net/mlx5: make shared counters thread safe The shared counters save the counter index to three level table. As three level table supports multiple-thread operations now, the shared counters can take advantage of the table to support multiple-thread. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	0796c7b1de	net/mlx5: make three level table thread safe This commit adds thread safety support in three level table using spinlock and reference counter for each table entry. An new mlx5_l3t_prepare_entry() function is added in order to support multiple-thread operation. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	3aa279157f	net/mlx5: synchronize flow counter pool creation Currently, counter operations are not thread safe as the counter pools' array resize is not protected. This commit protects the container pools' array resize using a spinlock. The original counter pool statistic memory allocate is moved to the host thread in order to minimize the critical section. Since that pool statistic memory is required only in query time. The container pools' array should be resized by the user threads, the new pool may be used by other rte_flow APIs before the host thread resize is done, if the pool is not saved to the pools' array, the specified counter memory will not be found as the pool is not saved to the counter management pool array. The pool raw statistic memory will be filled in host thread. The shared counters will be protected in other commit. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	994829e695	net/mlx5: remove single counter container A flow counter which was allocated by a batch API couldn't be assigned to a flow in the root table (group 0) in old rdma-core version. Hence, a root table flow counter required PMD mechanism to manage counters which were allocated singly. Currently, the batch counters have already been supported in root table includes a new rdma-core version with MLX5_FLOW_ACTION_COUNTER_OFFSET enum and with a kernel driver includes MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX_OFFSET enum. When the PMD uses rdma-core API to assign a batch counter to a root table flow using invalid counter offset, it should get an error only if the batch counter assignment for root table is supported. Using this trial in the initialization time can help to detect the support. Using the above trial, if the support is valid, remove the management of single counter container in the fast counter mechanism. Otherwise, move the counter mechanism to fallback mode. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	df051a3e77	net/mlx5: optimize shared counter memory Instead of using special memory to indicate shared counter, this patch does the optimization to use the counter handler reserved memory to indicate it. The counter index with MLX5_CNT_SHARED_OFFSET means the shared counter. This patch is also an arrangement for a new adjustment to use batch counter as shared counter. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	6b7c717ed1	net/mlx5: locate aging pools in the general container Commit [1] introduced different container for the aging counter pools. In order to save container memory the aging counter pools can be located in the general pool container. This patch locates the aging counter pools in the general pool container. Remove the aging container management. [1] commit `fd143711a6` ("net/mlx5: separate aging counter pool range") Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Dekel Peled	d5a7d04c79	net/mlx5: support query of age action Recent patch [1] adds to ethdev the API for query of age action. This patch implements in MLX5 PMD the query of age action using this API. [1] https://mails.dpdk.org/archives/dev/2020-October/184864.html Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	613d64e412	net/mlx5: log LRO minimal size Add debug printout showing HCA capability lro_min_mss_size - the minimal size of TCP segment required for coalescing. MLX5 PMD documentation is updated to note this condition. Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	90e30c7488	net/mlx5: fix use of atomic cmpset for age state According to documentation [1], function rte_atomic16_cmpset() return value is non-zero on success; 0 on failure. In existing code this function is called, and the return value is compared to AGE_CANDIDATE, which is defined as 1. Such comparison is incorrect and can lead to unwanted behavior. This patch updates the calls to rte_atomic16_cmpset(), to check that the return value is 0 or non-zero. [1] https://doc.dpdk.org/api/rte__atomic_8h.html Fixes: `fa2d01c87d` ("net/mlx5: support flow aging") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	491757372f	net/mlx5: enforce limitation on IPv6 next protocol Due to PRM requirement, the IPv6 header item 'proto' field, indicating the next header protocol, should not be set as extension header. This patch adds the relevant validation, and documents the limitation. Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	0e5a0d8f75	net/mlx5: support match on IPv6 fragment extension rte_flow update, following RFC [1], added to ethdev the rte_flow item ipv6_frag_ext. This patch adds to MLX5 PMD the option to match on this item type. [1] http://mails.dpdk.org/archives/dev/2020-March/160255.html Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	ad3d227ead	net/mlx5: support match on IPv6 fragment packets This patch adds to MLX5 PMD the support of matching on IPv6 fragmented and non-fragmented packets, using the new field has_frag_ext, added to rte_flow following RFC [1]. [1] https://mails.dpdk.org/archives/dev/2020-August/177257.html Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	6859e67ef6	net/mlx5: support match on IPv4 fragment packets This patch adds to MLX5 PMD the support of matching on IPv4 fragmented and non-fragmented packets, using the IPv4 header fragment_offset field. Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-11-03 22:29:25 +01:00
Dekel Peled	d7e2ea627d	net/mlx5: remove handling of ICMP fragmented packets Commit [1] forced setting of match on 'frag' bit mask 1 and value 0. Previous patch in this series added support of match on fragmented and non-fragmented packets on L3 items, so this setting is now redundant. This patch removes the changes done in [1]. [1] commit 85407db9f60d ("net/mlx5: fix matching for ICMP fragments") Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-11-03 22:29:24 +01:00
Matan Azrad	3ec73abeed	net/mlx5/linux: fix Tx queue operations decision One of the conditions to create Tx queue object by DevX is to be sure that the DPDK mlx5 driver is not going to be the E-Switch manager of the device. The issue is with the default FDB flows managed by the kernel driver, which are not created by the kernel when the Tx queues are created by DevX. The current decision is to create the Tx queues by Verbs when E-Switch is enabled while the current behavior uses an opposite condition to create them by DevX. Create the Tx queues by Verbs when E-Switch is enabled. Fixes: `86d259cec8` ("net/mlx5: separate Tx queue object creations") Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 22:29:24 +01:00
Matan Azrad	8dc775d8b1	net/mlx5: fix event queue number query When a Rx\Tx queue is created by DevX, its CQ configuration should include the EQ number of the interrupts. The EQ is managed by the kernel and there is a glue API in order to query the EQ number from the kernel. The EQ query API gets a vector number specifies the kernel vector of the interrupt handling. The vector number was wrongly detected according to the configuration CPU instead of using the device attributes of the supported vectors. The CPU was wrongly detected by the rte_lcore_to_cpu_id API without any check, and in case of non-EAL thread context the value was 0xFFFFFFFF which caused a failure in the EQ number query API. Use vector 0 for each EQ number query which must be supported by the kernel. Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Fixes: `d133f4cdb7` ("net/mlx5: create clock queue for packet pacing") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 22:29:24 +01:00
Matan Azrad	9ab9d46ab9	net/mlx5: fix Tx queue release The HW objects of the Tx queue is created/destroyed in the device start\stop stage while the ethdev configurations for the Tx queue starts from the tx_queue_setup stage. The PMD should save all the last configurations it got from the ethdev and to apply them to the device in the dev_start operation. Wrongly, last code added to mitigate the reference counters didn't take into account the above rule and combined the configurations and HW objects to be created\destroyed together. This causes to memory leak and other memory issues. Make sure the HW object is released in stop operation when there is no any reference to it while the configurations stay saved. Fixes: `17a57183c0` ("net/mlx5: mitigate Tx queue reference counters") Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 22:29:24 +01:00
Matan Azrad	015d2cb628	net/mlx5: fix Rx queue release The HW objects of the Rx queue is created/destroyed in the device start\stop stage while the ethdev configurations for the Rx queue starts from the rx_queue_setup stage. The PMD should save all the last configurations it got from the ethdev and to apply them to the device in the dev_start operation. Wrongly, last code added to mitigate the reference counters didn't take into account the above rule and combined the configurations and HW objects to be created\destroyed together. This causes to memory leak and other memory issues. Make sure the HW object is released in stop operation when there is no any reference to it while the configurations stay saved. Fixes: `24e4b650ba` ("net/mlx5: mitigate Rx queue reference counters") Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 22:29:24 +01:00
Thomas Monjalon	af270529ad	ethdev: include mbuf registration in Tx timestamp API Previously, the Tx timestamp field and flag were registered in testpmd, as described in mlx5 guide. For consistency between Rx and Tx timestamps, managing mbuf registrations inside the driver, as properly documented, is a simpler expectation. The only driver to support this feature (mlx5) is updated as well as the testpmd application. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-11-03 16:21:15 +01:00
Thomas Monjalon	04840ecbcf	net/mlx5: switch Rx timestamp to dynamic mbuf field The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced. The dynamic offset and flag are stored in struct mlx5_rxq_data to favor cache locality. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-11-03 16:21:15 +01:00
Thomas Monjalon	042540e4ef	net/mlx5: fix dynamic mbuf offset lookup check The functions rte_mbuf_dynfield_lookup() and rte_mbuf_dynflag_lookup() can return an offset starting with 0 or a negative error code. In reality the first offsets are probably reserved forever, but for the sake of strict API compliance, the checks which considered 0 as an error are fixed. Fixes: `efa79e68c8` ("net/mlx5: support fine grain dynamic flag") Fixes: `3172c471b8` ("net/mlx5: prepare Tx queue structures to support timestamp") Fixes: `0febfcce36` ("net/mlx5: prepare Tx to support scheduling") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2020-11-03 16:21:15 +01:00
Bruce Richardson	63b3907833	build: remove library name from version map file name Since each version map file is contained in the subdirectory of the library it refers to, there is no need to include the library name in the filename. This makes things simpler in case of library renaming. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Rosen Xu <rosen.xu@intel.com>	2020-10-19 22:13:59 +02:00
Ciara Power	2c5e0dd21f	net/mlx5: check max SIMD bitwidth When choosing a vector path to take, an extra condition must be satisfied to ensure the max SIMD bitwidth allows for the CPU enabled path. Signed-off-by: Ciara Power <ciara.power@intel.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-19 16:45:02 +02:00
Ferruh Yigit	f30e69b41f	ethdev: add device flag to bypass auto-filled queue xstats Queue stats are stored in 'struct rte_eth_stats' as array and array size is defined by 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag. As a result of technical board discussion, decided to remove the queue statistics from 'struct rte_eth_stats' in the long term. Instead PMDs should represent the queue statistics via xstats, this gives more flexibility on the number of the queues supported. Currently queue stats in the xstats are filled by ethdev layer, using some basic stats, when queue stats removed from basic stats the responsibility to fill the relevant xstats will be pushed to the PMDs. During the switch period, temporary 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' device flag is created. Initially all PMDs using xstats set this flag. The PMDs implemented queue stats in the xstats should clear the flag. When all PMDs switch to the xstats for the queue stats, queue stats related fields from 'struct rte_eth_stats' will be removed, as well as 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag. Later 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag also can be removed. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2020-10-16 23:27:15 +02:00
Ivan Ilchenko	62024eb827	ethdev: change stop operation callback to return int Change eth_dev_stop_t return value from void to int. Make eth_dev_stop_t implementations across all drivers to return negative errno values if case of error conditions. Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Thomas Monjalon	8a5a0aad5d	ethdev: allow close function to return an error The API function rte_eth_dev_close() was returning void. The return type is changed to int for notifying of errors. If an error happens during a close operation, the status of the port is undefined, a maximum of resources having been freed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Liron Himi <lironh@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Jiawei Wang	00c10c2211	net/mlx5: update translate function for mirroring Translate the attribute of sample action that include sample ratio and sub actions list. PMD will check the destination action number in current flow, if found multiple destination actions, then create the new destination array rdma action that group actions for each destination. Currently only support port or queue for destination action, and only encap action can be attached into one port destination. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	50390aab11	net/mlx5: update flow mirroring validation Mirroring flow using sample action with ratio is 1, and it doesn't support jump action with the same one flow. Sample action must have destination actions like port or queue for mirroring, and don't need split function as sampling flow. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	0756228b27	net/mlx5: update translate function for sample action Translate the attribute of sample action that include sample ratio and sub actions list, then create the sample DR action. The metadata register value will be lost in the default path after Sampler in FDB due to CX5 HW limitation. Since source vport also be shared with metadata register c0, MLX5 PMD would set the source vport to rdma-core and rdma-core will restore the regc0 value after sampler. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	b4c0ddbfcc	net/mlx5: split sample flow into two sub-flows The flow with sample action will be split into two sub flows: the prefix sub flow with the all actions preceding the sample action and sample action itself, and the suffix sub flow with the actions following the sample action. The original items remain in the prefix sub flow, add the implicit tag action with unique id to set in metadata register, and suffix sub flow uses the tag item to match with that unique id. The flow split as below: Original flow: items / actions pre / sample / actions sfx -> prefix sub flow - items / actions pre / set_tag action / sample suffix sub flow - tag_item / actions sfx Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	96b1f0273c	net/mlx5: validate sample action Add sample action validate function. Sample Flow is supported in NIC-RX and FDB domains. For the NIC-RX the Sample Flow action list must include the destination queue action. Only NIC-RX domain supports the optional actions list. FDB doesn't support any optional actions, the sampled packets is always forwarded to the E-Switch manager port. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Li Zhang	b1088fccb5	net/mlx5: support ICMP identifier matching PRM expose fields "Icmp_header_data" in IPv4 ICMP. Update ICMP mask parameter with ICMP identifier and sequence number fields. ICMP sequence number spec with mask, Icmp_header_data low 16 bits are set. ICMP identifier spec with mask, Icmp_header_data high 16 bits are set. Signed-off-by: Li Zhang <lizh@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-10-16 19:47:58 +02:00
Xueming Li	657df3ca0a	net/mlx5: disable dump of Verbs flows There was a segment fault when dump flows with device argument of dv_flow_en=0. In such case, Verbs flow engine was enabled and fdb resources were not initialized. It's suggested to use mlx_fs_dump for Verbs flow dump. This patch adds verbs engine check, prints warning message and return gracefully. Fixes: `f6d7202402` ("net/mlx5: support flow dump API") Cc: stable@dpdk.org Reported-by: Jørgen Østergaard Sloth <jorgen.sloth@xci.dk> Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	e96242efa4	net/mlx5: remove Rx queue object type field Once the separation between Verbs and DevX is done using function pointers, the type field of the Rx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	4c6d80f1c5	net/mlx5: separate Rx queue state modification Separate Rx state modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	354cc08a2d	net/mlx5: remove Tx queue object type field Once the separation between Verbs and DevX is done using function pointers, the type field of the Tx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	a9c7930662	net/mlx5: share Tx queue object modification Use new modify_qp functions for Tx object creation in DevX and Verbs modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	5d9f3c3f48	net/mlx5: separate Tx queue object modification Separate Tx object modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	e8390b3de0	net/mlx5: rearrange QP creation in Verbs module 1. Rename function to mention the internal resources. 2. Reduce the number of function arguments. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	88f2e3f18c	net/mlx5: rearrange SQ and CQ creation in DevX module 1. Rename functions to mention the internal resources. 2. Reduce the number of function arguments. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	f49f44839d	net/mlx5: share Tx control code Move Tx object similar resources allocations and debug logs from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	86d259cec8	net/mlx5: separate Tx queue object creations As an arrangement to Windows OS support, the Verbs operations should be separated to another file. By this way, the build can easily cut the unsupported Verbs APIs from the compilation process. Define operation structure and DevX module in addition to the existing Linux Verbs module. Separate Tx object creation into the Verbs/DevX modules and update the operation structure according to the OS support and the user configuration. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	e7055bbfbe	net/mlx5: reposition event queue number field The eqn field has become a field of sh directly since it is also relevant for Tx and Rx. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	31d92b5892	net/mlx5: reorder Tx queue Verbs object creation Move the creation of the completion queue from the mlx5_txq_obj_new function into an auxiliary function. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	3d3cfe67e6	net/mlx5: reorder Tx queue DevX object creation Move the creation of the send queue and the completion queue resources from the mlx5_txq_obj_devx_new function into auxiliary functions. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	17a57183c0	net/mlx5: mitigate Tx queue reference counters The Tx queue structures manage 2 different reference counter per queue: txq_ctrl reference counter and txq_obj reference counter. There is no real need to use two different counters, it just complicates the release functions. Remove the txq_obj counter and use only the txq_ctrl counter. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	fa2dd3d4d6	net/mlx5: remove unused variable in Tx queue creation When a CQ is not created by DevX, it be allocated by either DV function or by regular Verbs function. The CQ DV attributes variable was wrongly defined and initialized in Tx queue creation while the CQ is created by the regular Verbs function what remained the attributes variable unused. Remove the unused variable. Fixes: `faf2667fe8` ("net/mlx5: separate DPDK from verbs Tx queue objects") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	95b0e40b10	net/mlx5: fix send queue doorbell As part of SQ creation for Tx queue objects, a HW doorbell memory should be allocated and mapped to the HW. The SQ doorbell handler was wrongly saved on the CQ fields what caused wrong doorbell release in the Tx queue object destroy flow. Save the SQ doorbell handler in the SQ fields. Fixes: `3a87b964ed` ("net/mlx5: create Tx queues with DevX") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Dekel Peled	c7870bfe09	ethdev: move RSS expansion code to mlx5 driver Patch [1] added support for RSS flow expansion. It was added in ethdev for public use, but until now it is used only by MLX5 PMD. To allow local changes in this code, this patch removes it from ethdev and moves it to MLX5 PMD file. [1] commit `4ed05fcd44` ("ethdev: add flow API to expand RSS flows") Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-08 19:58:11 +02:00
Alexander Kozyrev	d2d5760552	net/mlx5: fix Rx queue count calculation There are a few discrepancies in the Rx queue count calculation. The wrong index is used to calculate the number of used descriptors in an Rx queue in case of the compressed CQE processing. The global CQ index is used while we really need an internal index in a single compressed session to get the right number of elements processed. The total number of CQs should be used instead of the number of mbufs to find out about the maximum number of Rx descriptors. These numbers are not equal for the Multi-Packet Rx queue. Allow the Rx queue count calculation for all possible Rx bursts since CQ handling is the same for regular, vectorized, and multi-packet Rx queues. Fixes: `26f0488344` ("net/mlx5: support Rx queue count API") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-08 19:58:11 +02:00
Suanming Mou	3e8f3e51fd	net/mlx5: fix meter table definitions As metering and metadata features were developed at the same time. The metering and metadata tables are defined conflicted. This cause the meter suffix flow jump to the same metadata table and cause flow deadloop. Adjust the metering table define to fix that issue. Fixes: `46a5e6bc6a` ("net/mlx5: prepare meter flow tables") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-08 19:58:11 +02:00
Dekel Peled	38f9369d24	net/mlx5: fix DevX CQ attributes values Previous patch wrongly used rdma-core defined values, when preparing attributes for creating DevX CQ object. This patch adds the correct value definition and uses them instead. Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-08 19:58:11 +02:00
Phil Yang	ae3255bfd9	net/mlx5: relax atomic refcnt for multi-packet Rx buffer Use C11 atomics with RELAXED ordering instead of the rte_atomic ops which enforce unnecessary barriers on aarch64. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-09-30 19:19:15 +02:00
Thomas Monjalon	fbd1913561	ethdev: remove old close behaviour The temporary flag RTE_ETH_DEV_CLOSE_REMOVE is removed. It was introduced in DPDK 18.11 in order to give time for PMDs to migrate. The old behaviour was to free only queues when closing a port. The new behaviour is calling rte_eth_dev_release_port() which does three more tasks: - trigger event callback - reset state and few pointers - free all generic port resources The private port resources must be released in the .dev_close callback. The .remove callback should: - call .dev_close callback - call rte_eth_dev_release_port() - free multi-port device shared resources Despite waiting two years, some drivers have not migrated, so they may hit issues with the incompatible new behaviour. After sending emails, adding logs, and announcing the deprecation, the only last solution is to declare these drivers as unmaintained: ionic, liquidio, nfp Below is a summary of what to implement in those drivers. * The freeing of private port resources must be moved from the ".remove(device)" function to the ".dev_close(port)" function. * If a generic resource (.mac_addrs or .hash_mac_addrs) cannot be freed, it must be set to NULL in ".dev_close" function to protect from subsequent rte_eth_dev_release_port() freeing. * Note 1: The generic resources are freed in rte_eth_dev_release_port(), after ".dev_close" is called in rte_eth_dev_close(), but not when calling ".dev_close" directly from the ".remove" PMD function. That's why rte_eth_dev_release_port() must still be called explicitly from ".remove(device)" after calling the ".dev_close" PMD function. * Note 2: If a device can have multiple ports, the common resources must be freed only in the ".remove(device)" function. * Note 3: The port is supposed to be in a stopped state when it is closed. If it is not the case, it is free to the PMD implementation how to react when trying to close a non-stopped port: either try to stop it automatically or just return an error. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:14 +02:00
Thomas Monjalon	b142387b07	ethdev: allow drivers to return error on close The device operation .dev_close was returning void. This driver interface is changed to return an int. Note that the API rte_eth_dev_close() is still returning void, although a deprecation notice is pending to change it as well. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:13 +02:00
Suanming Mou	bf615b077d	net/mlx5: manage header reformat actions with hashed list To manage encap decap header format actions mlx5 PMD used the single linked list and lookup and insertion operations took too long times if there were millions of objects and this impacted the flow insertion/deletion rate. In order to optimize the performance the hashed list is engaged. The list implementation is updated to support non-unique keys with few collisions. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-09-30 19:19:09 +02:00
Xueming Li	c21e5facf7	net/mlx5: use bond index for netdev operations In case of bonding, device ifindex was detected as the PF ifindex, so any operation using ifindex applied to PF instead of the bond device. These operations includes MTU get/set, up/down and mac address manipulation, etc. This patch detects bond interface ifindex and name for PF that join a bond interface, uses it by default for netdev operations. Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-09-30 19:19:09 +02:00
Viacheslav Ovsiienko	35e75f7816	net/mlx5: fix vectorized Rx burst check The Rx queue start/stop feature is not supported if vectorized rx_burst routine is engaged. There was a routine address typo and rx_burst type check was wrong. Fixes: `161d103b23` ("net/mlx5: add queue start and stop") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-30 19:19:09 +02:00
Phil Yang	f0f5d844d1	eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remove them and use rte_io_*mb instead. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-23 13:40:26 +02:00
Michael Baum	b00f760354	net/mlx5: fix hairpin dependency on destination DevX TIR The PMD supports hairpin only if DevX is supported and DV flow is enabled. When destination DevX TIR is not supported, the PMD tries to create TIR action, and fails. Avoid supporting hairpin when destination DevX TIR is not supported. Fixes: `b6b3bf86bd` ("net/mlx5: get hairpin capabilities") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:11 +02:00
Michael Baum	7aa9892f79	net/mlx5: fix Rx objects creator selection There are 2 creators for Rx objects, DevX and Verbs. There are supported DR versions when a DevX destination TIR flow action creation cannot be supported, using this versions the TIR object should be created by Verbs, what forces all the Rx objects to be created by Verbs. The selection of the Rx objects creator, wrongly, didn't take into account the destination TIR action support what caused a failure in the Rx flows creation. Select Verbs creator when destination TIR action creation is not supported by the DR version. Fixes: `6deb19e1b2` ("net/mlx5: separate Rx queue object creations") Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:11 +02:00
Maxime Leroy	e0d449513b	net/mlx5: fix RSS RETA reset on start The following sequences was working fine on mlx5: rte_eth_dev_configure(portid, ...); for (queueid = 0; queueid < nb_txq; queueid++) rte_eth_tx_queue_setup(portid, queueid, ...); for (queueid = 0; queueid < nb_rxq; queueid++) rte_eth_rx_queue_setup(portid, queueid, ...); // use a custom reta configuration rte_eth_dev_rss_reta_update(portid, reta_conf, reta_size); rte_eth_dev_start(portid); We were able to configure a custom reta before starting the port. The commit "net/mlx5: support RSS on hairpin" breaks this logic by moving the code initializing the RSS reta from rte_eth_dev_configure into rte_eth_dev_start. To fix the issue, the skip_default_rss_reta is always set to 1 in rte_eth_dev_rss_reta to avoid reconfigure the rss reta when the device is started. Fixes: `63bd16292c` ("net/mlx5: support RSS on hairpin") Cc: stable@dpdk.org Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com> Acked-by: Ori Kam <orika@nvidia.com>	2020-09-18 18:55:11 +02:00
Ferruh Yigit	5723fbed4f	ethdev: remove underscore prefix from internal API '_rte_eth_dev_callback_process()' & '_rte_eth_dev_reset()' internal APIs has unconventional underscore ('_') prefix. Although this is not documented most probably this is to mark them as internal. Since we have '__rte_internal' flag to mark this, removing '_' from API names. For '_rte_eth_dev_reset()', there is already a public API named 'rte_eth_dev_reset()', so renaming '_rte_eth_dev_reset()' to 'rte_eth_dev_internal_reset'. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	8682e492ed	ethdev: use hairpin helper functions Hairpin helper functions were not used by drivers, but it was used only local to ethdev. They are: 'rte_eth_dev_is_rx_hairpin_queue()' 'rte_eth_dev_is_tx_hairpin_queue()' Exposing them as internal APIs and update mlx5 driver (only user of hairpin) to use them. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com>	2020-09-18 18:55:08 +02:00
Ferruh Yigit	cbfc6111b5	ethdev: move inline device operations This patch is a preparation to hide the 'struct eth_dev_ops' from applications by moving some device operations from 'struct eth_dev_ops' to 'struct rte_eth_dev'. Mentioned ethdev APIs are in the data path and implemented as inline because of performance reasons. Exposing 'struct eth_dev_ops' to applications is bad because it is a contract between ethdev and PMDs, not really needs to be known by applications, also changes in the struct causing ABI breakages which shouldn't. To be able to both keep APIs inline and hide the 'struct eth_dev_ops', moving device operations used in ethdev inline APIs to 'struct rte_eth_dev' to the same level with Rx/Tx burst functions. The list of dev_ops moved: eth_rx_queue_count_t rx_queue_count; eth_rx_descriptor_done_t rx_descriptor_done; eth_rx_descriptor_status_t rx_descriptor_status; eth_tx_descriptor_status_t tx_descriptor_status; Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com>	2020-09-18 18:55:08 +02:00
Michael Baum	0c762e81da	net/mlx5: share Rx queue drop action code Move Rx queue drop action similar resources allocations from Verbs module to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	5eaf882e94	net/mlx5: separate Rx queue drop Separate Rx queue drop creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	5a959cbfa6	net/mlx5: share Rx hash queue code Move Rx hash queue object similar resources allocations from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	25ae7f1a5d	net/mlx5: share Rx queue indirection table code Move Rx indirection table object similar resources allocations from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	66b96fa6a6	net/mlx5: remove indirection table type field Once the separation between Verbs and DevX is done using function pointers, the type field of the indirection table structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	85552726d3	net/mlx5: separate Rx hash queue creation Separate Rx hash queue creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	87e2db37ef	net/mlx5: separate Rx indirection table object creation Separate Rx indirection table object creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	fa2c85cc9c	net/mlx5: share Rx queue object modification Use new modify_wq functions for Rx object creation in DevX and Verbs modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	c279f187ee	net/mlx5: separate Rx queue object modification Separate Rx object modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	675911d033	net/mlx5: rearrange creation of WQ and CQ object Rearrangement of WQ and CQ creation for Verbs Rx queue: 1. Rename the allocation function. 2. Reduce the number of arguments that the creation functions receive. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	f6dee90058	net/mlx5: rearrange creation of RQ and CQ resources Rearrangement of RQ and CQ resource handling for DevX Rx queue: 1. Rename the allocation function so that it is understood that it allocates all resources and not just the CQ or RQ. 2. Move the allocation and release of the doorbell into creation and release functions. 3. Reduce the number of arguments that the creation functions receive. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	1260a87b28	net/mlx5: share Rx control code Move Rx object similar resources allocations and debug logs from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	322870799e	net/mlx5: separate Rx interrupt handling Separate interrupt event handler into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	6deb19e1b2	net/mlx5: separate Rx queue object creations As an arrangement to Windows OS support, the Verbs operations should be separated to another file. By this way, the build can easily cut the unsupported Verbs APIs from the compilation process. Define operation structure and DevX module in addition to the existing linux Verbs module. Separate Rx object creation into the Verbs/DevX modules and update the operation structure according to the OS support and the user configuration. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	24e4b650ba	net/mlx5: mitigate Rx queue reference counters The Rx queue structures manage 2 different reference counter per queue: rxq_ctrl reference counter and rxq_obj reference counter. There is no real need to use two different counters, it just complicates the release functions. Remove the rxq_obj counter and use only the rxq_ctrl counter. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	c902e264f6	net/mlx5: fix types differentiation in Rx queue create Rx HW objects can be created by both Verbs and DevX operations. The management of the 2 types of operations are done directly in the main flow of the object’s creations. Some arrangements and validations were wrongly done to the irrelevant type: 1. LRO related validations were done for Verbs type where LRO is not supported at all. 2. Verbs allocation arrangements were done for DevX operations where it is not needed. 3. Doorbell destroy was considered for Verbs types where it is irrelevant. Adjust the aforementioned points only for the relevant types. Fixes: `e79c9be915` ("net/mlx5: support Rx hairpin queues") Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Fixes: `17ed314c6c` ("net/mlx5: allow LRO per Rx queue") Fixes: `dc9ceff73c` ("net/mlx5: create advanced RxQ via DevX") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	6c29e209fe	net/mlx5: fix Rx queue state update In order to support DevX Rx queue stop and start operations, the state of the queue should be updated in FW. The state update PRM command requires to set both the current state and the new requested state. The current state and the new requested state fields setting were wrongly switched. Switch them back to the correct setting. Fixes: `161d103b23` ("net/mlx5: add queue start and stop") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	93fa67fb11	net/mlx5: fix Rx hash queue creation error flow The mlx5_hrxq_new function allocates several resources and if one of the allocations fails, the function jumps to an error label where it releases all the allocated resources. When the TIR action creation fails, the hrxq memory is not released what can cause a resource leak. Add an appropriate release to the hrxq pointer in the error flow. Fixes: `772dc0eb83` ("net/mlx5: convert hrxq to indexed") Fixes: `dc9ceff73c` ("net/mlx5: create advanced RxQ via DevX") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Ophir Munk	7af10d29a4	net/mlx5/linux: refactor VLAN File mlx5_vlan.c contains Netlink APIs (Linux dependent) as part of VM workaround implementation. Move this implementation to file linux/mlx5_vlan_os.c. To remove Netlink dependency in header files change pointer of type 'struct mlx5_nl_vlan_vmwa_context ' to 'void '. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	8bb2410ea3	net/mlx5: separate VLAN strip modification When updating a queue vlan stripping offload - either the WQ is modified in Verbs or the RQ is modified in DevX. Add a vlan stripping modify callback to 'struct mlx5_obj_ops' and assign it with the specific Verbs and DevX implementations: 'rxq_obj_modify_wq_vlan_strip' and 'rxq_obj_modify_rq_vlan_strip' respectively. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	fe7a54fd26	net/mlx5: remove Verbs dependency in Rx/Tx objects Replace pointers to ibv structs with pointers to void (file mlx5_rxtx.h). Specifically the following pointers were replaced: 'struct ibv_cq ', 'struct ibv_wq ', 'struct ibv_comp_channel ', 'struct ibv_rwq_ind_table a', 'struct ibv_qp *'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	1f66ac5bbe	net/mlx5: remove more Direct Verbs dependencies Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced with 'void ' to enable compilation under non-Linux operating systems. New getter functions were added to retrieve the specific fields that were previously accessed directly. Replaced structs: 'struct mlx5dv_pp ' 'struct mlx5dv_devx_event_channel ' 'struct mlx5dv_devx_umem ' 'struct mlx5dv_devx_uar *' Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	1e577c9e5f	net/mlx5: call meter detach only if DR is supported Flow metering is supported only in direct rules (DR). Currently the APIs of meter actions create and modify are under #ifdef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER, while detaching the meter action is executed unconditionally. This commit adds the same ifdef to API mlx5_flow_meter_detach(). This commit avoids compilation failure of non-Linux operating systems which do not support DR. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	079f1ae5ac	net/mlx5: remove unused log macros Remove utility macros INFO, WARN, ERROR. They are not in use and conflict with identical definitions when compiled under Windows. Fixes: `80f2d0ed7f` ("net/mlx5: add hardware flow debug dump") Cc: stable@dpdk.org Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	f00f6562e1	net/mlx5: remove netlink dependency in shared code This commit adds Linux implementation of routine mlx5_os_mac_addr_flush as wrapper to Netlink API to avoid direct calls under non-Linux operating systems. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	f2e8b4556f	net/mlx5: remove unused includes Remove unused Linux included files: <sys/ioctl.h>, <arpa/inet.h> from file net/mlx5/mlx5_mac.c <sys/mman.h> from file net/mlx5/mlx5.c Fixes: `771fa900b7` ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters") Cc: stable@dpdk.org Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	e9c0b96e35	net/mlx5: move Linux ifname function mlx5_get_ifname() prototype includes 'IF_NAMESIZE' definition from Linux file net/if.h. Since this API is only used under Linux and to enable compilation under non-Linux OS - move this prototype from shared file mlx5.h to file linux/mlx5_os.h. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	5d5a26f26d	net/mlx5: rename constant conflicting with Windows Enumerated variable REG_NONE (defined in mlx5_prm.h) is in conflict with Windows definition (winnt.h): #define REG_NONE ( 0ul ) // No value type To enable mlx5 PMD Windows compilation - rename REG_NONE as REG_NON. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Suanming Mou	3fe889617b	net/mlx5: manage modify actions with hashed list To manage header modify actions mlx5 PMD used the single linked list and lookup and insertion operations took too long times if there were millions of objects and this impacted the flow insertion/deletion rate. In order to optimize the performance the hashed list is engaged. The list implementation is updated to support non-unique keys with few collisions. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-09-18 18:55:06 +02:00
Suanming Mou	095c397b43	net/mlx5: add hash list extended lookup and insert The mlx5 PMD hashed list was designed in approach to contain the items with unique keys only. Now there is the need to store the objects with possible key collisions. It is not expected to have many collisions (very likely to have a few ones), but keys become not unique. This commit adds the hash list extended functions in order to support insertion and lookup for the lists with non-unique keys. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-09-18 18:55:06 +02:00
Ciara Power	3cc6ecfdfe	build: remove makefiles A decision was made [1] to no longer support Make in DPDK, this patch removes all Makefiles that do not make use of pkg-config, along with the mk directory previously used by make. [1] https://mails.dpdk.org/archives/dev/2020-April/162839.html Signed-off-by: Ciara Power <ciara.power@intel.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2020-09-08 00:09:50 +02:00
Thomas Monjalon	4f86c0ba19	version: 20.11-rc0 Start a new release cycle with empty release notes. The ABI version becomes 21.0. The ABI major is back to normal, having only one number (21 vs 20.0). The map files are updated to the new ABI major number (21). The ABI exceptions are dropped. Travis ABI check is disabled because compatibility is not preserved. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2020-08-12 11:32:16 +02:00
Dekel Peled	38a5704629	net/mlx5: fix number of retries for UAR allocation Previous fix added definition of number of retries for UAR allocation. This value is adequate for x86 systems with 4K pages. On Power9 system with 64K pages the required value is 32. This patch updates the defined value from 2 to 32. Fixes: `a0bfe9d56f` ("net/mlx5: fix UAR memory mapping type") Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-08-05 16:10:50 +02:00
Viacheslav Ovsiienko	972a1bf812	common/mlx5: fix user mode register access command To detect the timestamp mode configured on the NIC the mlx5 PMD uses the firmware command ACCESS_REGISTER_USER. This command is relatively new and might be not supported by older firmware versions and was rejected, causing annoying messages in kernel log. This patch adds the attribute flag check whether firmware supports the command and avoid the call if it does not. Fixes: `bb7ef9a962` ("common/mlx5: add register access DevX routine") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:24 +02:00
Dekel Peled	1ccc479014	net/mlx5: fix Rx interrupt handling and cleanup Recent patch added creation of Rx CQ using DevX API. The reading of events from DevX channel was not done correctly. This patch fixes the event reading, using the correct data structure. Cleanup after CQ creation, in case of error, is also updated. Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Gregory Etelson	1af7452113	net/mlx5: fix dynamic inline hint handling The ConnectX NICs can transfer data from the host memory with two approaches: provide the pointer to the data buffer, or do data inline - copy the data to the transmit descriptor (WQE) entirely or only the part of data. In some configurations the NIC hardware requires the minimal data to be inline in the descriptor to operate correctly. And there is the special dynamic flag to hint PMD not to inline the data (for example, if buffer is located on some other device - storage or GPU) on per packet basis. If there was a packet with length shorter than the minimal inline data length requested by the NIC hardware and the no-inline hint was set the PMD tried to inline the packet with minimal required length instead of actual packet's one. This patch adds the missed length check into no-inline hint handling branch. Fixes: `cacb44a099` ("net/mlx5: add no-inline Tx flag") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Viacheslav Ovsiienko	4ffab7b9e1	net/mlx5: fix metadata storing for NEON Rx There was the typo introducing the bug, affected the mlx5 vectorized rx_burst on ARM architectures in case if CQE compression was enabled. Fixes: `6c55b622a9` ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Viacheslav Ovsiienko	a0bfe9d56f	net/mlx5: fix UAR memory mapping type The User Access Region is a special mechanism to provide direct access to the hardware registers, and is the part of PCI address space that is mapped to CPU virtual address. The mapping can be performed with the type "Write-Combining" or "Non-Cached", and these ones might be supported or not on different setups. To prevent device probing failure the UAR allocation attempt with alternative mapping type is performed. The datapath takes the actual UAR mapping into account on queue creation. There was another issue with NULL UAR base address. OFED 5.0.x and Upstream rdma_core before v29 returned the NULL as UAR base address if UAR was not the first object in the UAR page. It caused the PMD failure and we should try to get another UAR till we get the first one with non-NULL base address returned. Fixes: `fc4d4f732b` ("net/mlx5: introduce shared UAR resource") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2020-07-30 00:41:23 +02:00
Raslan Darawsheh	753dd70283	net/mlx5: fix VF MAC address set over BlueField When trying to set MAC address of an ethernet device and if it was a representor, PMD sets the MAC over the corresponding VF instead. For the case of HPF (Host PF representor on BlueField), PMD shouldn't attempt to set it, since it doesn't have any corresponding VF and fails. This will fix the issue by setting the MAC on the dev directly. Fixes: `0d1d731708` ("net/mlx5: set VF MAC address from host") Cc: stable@dpdk.org Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Alexander Kozyrev	6f52bd3383	net/mlx5: fix vectorized mini-CQE prefetching There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by preheating the cache in the vectorized Rx routine. Prefetching of the next mini-CQE, on the other hand, showed no difference in the performance on x86 platform. So, that was removed. Unfortunately this caused the performance drop on ARM. Prefetch the mini-CQE as well as all the soon to be invalidated CQEs to get both CQE and mini-CQE on the hot path. Fixes: `28a4b96321` ("net/mlx5: prefetch CQEs for a faster decompression") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	d462a83c65	net/mlx5: optimize stack memory in probe The device configuration struct is not small enough to be used as function argument by value. Call spawn function with device configuration by reference. Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	7301d1923a	net/mlx5: fix unnecessary init in mark conversion The flow_dv_convert_action_mark function defines an array of field_modify_info structures and initializes the first entity. In the first entity id field, it initializes to 0, even though its type is an enum that has no value of 0. In fact, the function does not use this id field before assigning the appropriate register id into it, so the initialization is unnecessary. Moreover, this initialization is int into enum, and it would be better not to create a type conflict for no reason. Wait for the first entity initialization until the appropriate register id is already known. Fixes: `55deee1715` ("net/mlx5: extend flow mark support") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	f4a0873197	net/mlx5: optimize critical section in device free When PMD releases shared IB device context, It locks the mlx5_ibv_list_mutex lock throughout the function so that it does not happen while removing a device from the list, another process will try to insert another device into it. On the other hand, having removed the device from the list even if it has not yet released all of its resources, it should not care about other processes and can release the lock. However, the PMD does not release the lock even though it can, and performs a number of operations, some of which include sleep and may be long. To improve this, shorten the lock time to the minimum necessary. Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	63d1db710f	net/mlx5: fix unlimited parsing of switch info In mlx5_sysfs_switch_info function, the driver gets switch information associated with network interface. The driver writes the port name into buffer and translates it. However, when it writes the name, it does not limit writing to the buffer size. Limit writing to the size of the buffer. Fixes: `1256805dd5` ("net/mlx5: move Linux-specific functions") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	8da2a608d0	net/mlx5: remove ineffective increment in hairpin split The flow_hairpin_split function defines a pointer called addr that points to the list of items. When the function wants to progress in the list, it adds the size of an item to the pointer. At the end of the function, it precedes the pointer one more time even though it is not used afterwards. In fact, this line is unaffected and the operation of the function would have been no different without it. Remove the line where the pointer is preceded unnecessarily. Fixes: `d85c7b5ea5` ("net/mlx5: split hairpin flows") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	e71e90938b	net/mlx5: fix crash in NVGRE item translation The flow_dv_translate_item_nvgre function add NVGRE item to matcher and to the value. It defines a pointer named nvrge_m that receives the item's mask into it, and then copies some of it to the matcher. Before copying, it checks for mask validation, and in case the mask is NULL the function gives it a pointer to rte_flow_item_nvgre_mask. However, the function calls from the vni mask's field before the check, and if there is no mask, it actually does dereference to the NULL pointer and indeed the program crashes with segfault. Move the call from the vni field to post-validation. Fixes: `cd18e1b72f` ("net/mlx5: fix build on Arm") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Michael Baum	4868ae8322	net/mlx5: fix initialization of steering registers The mlx5_flow_action_copy_mreg structure contains a field called src type enum modify_reg, similarly the mlx5_rte_flow_item_tag field contains a field called id type enum modify_reg. The enum modify_reg variable represents different registers in the system and it also has a field called REG_NONE whose value is 0 which means that the register does not exist. The flow_mreg_add_copy_action function sets a variable of struct mlx5_flow_action_copy_mreg type, and initializes the src field to be 0. Similarly the flow_create_split_metadata function sets a variable of struct mlx5_rte_flow_item_tag type and initializes the id field to be 0. In both functions, they initialize a enum modify_reg type variable with an int type value while modify_reg has an appropriate field for that value (REG_NONE). Replace assigning 0 with REG_NONE in both functions. Fixes: `dd3c774f6f` ("net/mlx5: add metadata register copy table") Fixes: `71e254bc02` ("net/mlx5: split Rx flows to provide metadata copy") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Suanming Mou	e1293b10de	net/mlx5: fix counter query Currently, the counter query requires the counter ID should start with 4 aligned. In none-batch mode, the counter pool might have the chance to get the counter ID not 4 aligned. In this case, the counter should be skipped, or the query will be failed. Skip the counter with ID not 4 aligned as the first counter in the none-batch count pool to avoid invalid counter query. Once having new min_dcs ID in the poll less than the skipped counters, the skipped counters will be returned to the pool free list to use. Fixes: `5382d28c21` ("net/mlx5: accelerate DV flow counter transactions") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Suanming Mou	fd143711a6	net/mlx5: separate aging counter pool range Currently, when allocate the counter or counter based age from group 0, counter and age may share the same counter dcs ID range. Both age and pure counter need to sync up with each other's container to check if the ID range exists and update the min_dcs. It comes two disadvantages: 1. If the ID range is shared, this counter range will be queried twice both from age and pure counter container in 1s. 2. The same range counter check between the two container makes the counter allocate sync min_dcs time to time with extra min_dcs updating. This patch avoid the same ID range to be shared when allocate the new pool. If the same ID range exists in other container, just add the counter to the other container until get new range which saves the min_dcs sync up time to time. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Raslan Darawsheh	d13f976086	net/mlx5: fix flow items size calculation flow_dv_get_item_len returns the actual header size of an rte_flow item. Changing any of the structs for rte_flow items by adding or removing some extra fields will break this function. This fixes the behavior by returning the actual header size of each item. Fixes: `34d41b7aa3` ("net/mlx5: add VXLAN encap action to Direct Verbs") Cc: stable@dpdk.org Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-30 00:41:23 +02:00
Ophir Munk	385c19397e	net/mlx5: fix premature disabling of interrupt RXQ interrupts under Linux are based on the epoll mechanism. An expected order of operations is as follows: 1. Call rte_eth_dev_rx_intr_enable(), to arm the CQ for receiving events on data input. 2. Block on rte_epoll_wait() with an array of file descriptors representing the CQ events. Upon data arrival the kernel will signal an input event on the corresponding CQ fd. 3. Call rte_eth_dev_rx_intr_disable() after the event was received and continue in polling mode. The mlx5 implementation of rte_eth_dev_rx_intr_disable() is to get the CQ event and ack it. In practice applications may wake up from rte_epoll_wait() due to timeout with no event to ack but still call rte_eth_dev_rx_intr_disable() unconditionally. In such cases the call should return EAGAIN (since the file descriptors are non-blocked), as opposed to EINVAL which indicates a real failure. In case of EAGAIN the PMD should not warn on "Unable to disable interrupt on Rx queue". This commit fixes a earlier commit where the returned value 0 from function devx_get_event() - was considered an error. Fixes: `08d1838f64` ("net/mlx5: implement CQ for Rx using DevX API") Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Raslan Darawsheh <rasland@mellanox.com>	2020-07-30 00:41:22 +02:00
Parav Pandit	f6d099d7da	common/mlx5: remove class check from class drivers Now that mlx5_pci PMD checks for enabled classes and performs probe(), remove() of associated classes, individual class driver does not need to check if other driver is enabled. Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-28 19:01:30 +02:00
Parav Pandit	392bf9084d	common/mlx5: register class drivers through common layer Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-28 19:01:30 +02:00
Parav Pandit	8208800163	common/mlx5: avoid class constructor priority mlx5_common is shared library between mlx5 net, VDPA and regex PMD. It is better to use common initialization helper instead of using RTE_PRIORITY_CLASS priority. Suggested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2020-07-28 18:52:11 +02:00
Gregory Etelson	750ff30a8f	net/mlx5: fix tunnel flow priority PMD flow priority is different from application flow priority. Flow rules with higher match granularity assigned higher PMD priority. Also PMD splits internally RSS flows according to flow RSS layer. Final PMD flow rule priority derived from the last match item network level, after PMD adjusts flow rule, where L4 match gets the highest priority and L2 the lowest. The patch adjusts tunnels flow rule priority calculation for PMDs running verb API. Introduce MLX5_TUNNEL_PRIO_GET macro. Fixes: `4a78c88e3b` ("net/mlx5: fix Verbs flow tunnel") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Ori Kam <orika@mellanox.com>	2020-07-21 15:46:30 +02:00
Dekel Peled	210008309b	net/mlx5: fix VLAN push action on hairpin queue Push VLAN action is allowed on Tx only, same as encap action. Flow rules for hairpin queue are created on Rx, and split by PMD to Rx and Tx rules, according to the above limitation. In current implementation the encap action is split to Tx rule. This patch adds the same handling for push-vlan action, as well as its complementing actions set-vlan-vid and set-vlan-pcp. Fixes: `d85c7b5ea5` ("net/mlx5: split hairpin flows") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Dekel Peled	dd6745da6d	net/mlx5: fix VLAN pop with decap action validation The combination of decap action followed by pop VLAN action is not fully validated in existing code. This patch updates the validation function of pop vlan action. Pop VLAN with preceding Decap requires inner header with VLAN. Pop VLAN without preceding Decap requires outer header with VLAN. Fixes: `b41e47da25` ("net/mlx5: support pop flow action on VLAN header") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Shy Shyman	038e7fc085	net/mlx5: fix HW counters path in switchdev mode When debugging performance of a DPDK application the user may need to view the different statistics of DPDK (for example out_of_buffer) This can be enabled by using testpmd command 'show port xstats <port_id>' for example. The current implementation assumes legacy mode in which the counters are at <ibdev_path>/<port_id>/hw_counters/<file_name>. In switchdev mode the counters file is located right after the device name, hence resides at <ibdev_path>/hw_counters. The fix tries to open the path in the second location after a failure to open the file from the first location. Fixes: `9c0a9eed37` ("net/mlx5: switch to the names in the shared IB context") Cc: stable@dpdk.org Signed-off-by: Shy Shyman <shys@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Viacheslav Ovsiienko	161d103b23	net/mlx5: add queue start and stop The mlx5 PMD did not support queue_start and queue_stop eth_dev API routines, queue could not be suspended and resumed during device operation. There is the use case when this feature is crucial for applications: - there is the secondary process handling the queue - secondary process crashed/aborted - some mbufs were allocated or used by secondary application - some mbufs were allocated by Rx queues to receive packets - some mbufs were placed to send queue - queue goes to undefined state In this case there was no reliable way to recovery queue handling by restarted secondary process but reset queue to initial state freeing all involved resources, including buffers involved in queue operations, reset the mbuf pools, and then reinitialize queue to working state: - reset mbuf pool, allocate all mbuf to initialize pool into safe state after the crush and allow safe mbuf free calls - stop queue, free all potentially involved mbufs - reset mbuf pool again - start queue, reallocate mbufs needed This patch introduces the queue start/stop feature with some limitations: - hairpin queues are not supported - it is application responsibility to synchronize start/stop with datapath routines, rx/tx_burst must be suspended during the queue_start/queue_stop calls - it is application responsibility to track queue usage and provide coordinated queue_start/queue_stop calls from secondary and primary processes. - Rx queues with vectorized Rx routine and engaged CQE compression are not supported by this patch currently Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Dekel Peled	08d1838f64	net/mlx5: implement CQ for Rx using DevX API This patch continues the work to use DevX API for different objects creation and management. On Rx control path, the RQ, RQT, and TIR objects can already be created using DevX API. This patch adds the support to create CQ for RxQ using DevX API. The corresponding event channel is also created and utilized using DevX API. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	9d60f54569	common/mlx5: remove inclusion of Verbs header files Several source files include Verbs header files as in (1). These source files will not compile under non-Linux operating systems. This commit removes this inclusion in two cases: Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source file so the inclusion in (1) can be safely removed. Case 2: Verbs symbols are used. Please note the inclusion in (1) already appears in file linux/mlx5_glue.h (which represents the interface to the rdma-core library). Therefore, replace (1) in the source file with (2). Under non-Linux operating systems - file mlx5_glue.h will not include (1). (1) #include <infiniband/verbs.h> #include <infiniband/mlx5dv.h> (2) #include <mlx5_glue.h> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	2e86c4e5c7	net/mlx5: refactor multi-process communication 1. The shared data communication between the primary and the secondary processes is implemented using Linux API. Move the Linux API code under linux directory (file linux/mlx5_os.c). 2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved under linux (new file linux/mlx5_mp_os.c). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	ef9ee13f6e	net/mlx5: cleanup header file The cleanup refers to header file mlx5.h. 1. Remove unused prototypes. 2. Move prototypes under their correct title. 3. Change functions to static and remove their prototye from the header file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	98c4b12afa	net/mlx5: eliminate dependency on Linux in shared header This commit eliminates Linux dependencies in shared file mlx5.h. 1. All functions using 'struct ifreq' are moved to file linux/mlx5_ethdev_os.c such that this struct can be removed from mlx5.h. 2. Function mlx5_set_flags() that uses Linux flags (e.g. IFF_UP) is changed to static and its prototype is removed from mlx5.h. 3. Remove redundant member verbs_action from 'struct mlx5_priv'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	4d18abd130	net/mlx5: wrap Linux promiscuous and multicast functions This commit adds Linux implementation of routines mlx5_os_set_promisc() and mlx5_os_set_promisc(). The routines call netlink APIs. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	ab27cdd93a	net/mlx5: refactor Linux MAC operations Move OS specific MAC operations add, remove, modify VF into file linux/mlx5_os.c. Remove unused function mlx5_get_mac(). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	2aba9fc725	net/mlx5: replace Linux specific calls The following Linux calls are replaced by their matching rte APIs. mmap ==> rte_mem_map() munmap == >rte_mem_unmap() sysconf(_SC_PAGESIZE) ==> rte_mem_page_size() Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	3eca5f8a61	net/mlx5: move flow priority discovery to Verbs file Function calls mlx5_flow_adjust_priority() and mlx5_flow_discover_priorities() are Verbs based. Move them from file mlx5_flow.c to file mlx5_flow_verbs.c Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	50f95b23c9	net/mlx5: add option to configure FCS or decapsulation There are some limitations on some NICs (at least on ConnectX-6 Dx and BlueField 2) with supporting FCS (frame checksum) scattering for the tunnel decapsulated packets. For the case only one of the features can be supported in the same time, and the new devarg "decap_en" is introduced to provide the choice to the users. If FCS scattering feature is not supposed to be engaged by application, this new devarg should be specified as "decap_en=0", forcing the FCS feature enable and rejecting tunnel decap actions in the rte_flow engine. If FCS scatter is not needed and application supposes to use tunnel decapsulation in rte_flow, the devarg can be omitted or set to non-zero value (this is default settings). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	ac3fc732c4	net/mlx5: convert queue objects to unified malloc This commit allocates the Rx/Tx queue objects from unified malloc function. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	2175c4dc62	net/mlx5: convert configuration objects to unified malloc This commit allocates the miscellaneous configuration objects from the unified malloc function. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	83c2047c5f	net/mlx5: convert control path memory to unified malloc This commit allocates the control path memory from unified malloc function. The objects be changed: 1. hlist; 2. rss key; 3. vlan vmwa; 4. indexed pool; 5. fdir objects; 6. meter profile; 7. flow counter pool; 8. hrxq and indirect table; 9. flow object cache resources; 10. temporary resources in flow create; Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Suanming Mou	5522da6b20	net/mlx5: add option to allocate memory from system Currently, for MLX5 PMD, once millions of flows created, the memory consumption of the flows are also very huge. For the system with limited memory, it means the system need to reserve most of the memory as huge page memory to serve the flows in advance. And other normal applications will have no chance to use this reserved memory any more. While most of the time, the system will not have lots of flows, the reserved huge page memory becomes a bit waste of memory at most of the time. By the new sys_mem_en devarg, once set it to be true, it allows the PMD allocate the memory from system by default with the new add mlx5 memory management functions. Only once the MLX5_MEM_RTE flag is set, the memory will be allocate from rte, otherwise, it allocates memory from system. So in this case, the system with limited memory no need to reserve most of the memory for hugepage. Only some needed memory for datapath objects will be enough to allocated with explicitly flag. Other memory will be allocated from system. For system with enough memory, no need to care about the devarg, the memory will always be from rte hugepage. One restriction is that for DPDK application with multiple PCI devices, if the sys_mem_en devargs are different between the devices, the sys_mem_en only gets the value from the first device devargs, and print out a message to warn that. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	d7c49561d3	net/mlx5: add eCPRI flex parser capacity check If the NIC or the FW does not support the dynamic flex parser, it will return error when trying to create the parser for eCRPI. Then it is hard to know the detail error reason of the failure. Before creating the parser node and the following usage of the parser, the capacity bit saved in the HCA_CAP could be used to confirm if the dynamic flex parser is supported. If no, an error will be returned directly with ENOTSUP to prevent the following steps to be executed. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	1c5064044f	net/mlx5: create and destroy eCPRI flex parser eCPRI protocol has unified format layout for the variants, over ETH layer (including .1Q) and UDP layer. The common header of the message has 4 bytes fixed length, and the message payload layers are different based on the type field. Now only type #0, #2 and #5 will be supported, and 2 bytes are needed. When creating the flex parser, the header will be extended to 8 bytes and 2 DW samples are needed. The 1st DW starts from offset 0 and will be used for the type field of the common header. The 2nd DW starts from offset 4 and will be used for the physical channel ID, real-time control ID or measurement ID fields. The parser will be created once a flow with eCPRI item is observed for the first time. After creating, it will remain in the system and HW until the device is stopped. Right now, there is no need to destroy the eCPRI flex parser after the last flow with eCPRI item is destroyed. This is to get rid of the alternate states of creating and destroying eCPRI flex parser with a single eCPRI flow. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	711aedf187	common/mlx5: add flex parser DevX structures The structures and other definitions will be used for the dynamic flex parser creation via Devx command interface. These structures will be used as some some intermediate variables and input parameters for the parser creation API. It is better to keep all members consistent with the PRM definition even though some of them will not be used. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	daa38a8924	net/mlx5: add flow translation of eCPRI header In the translation stage, the eCPRI item should be translated into the format that lower layer driver could use. All the fields that need to match must be in network byte order after translation, as well as the mask. Since the header in the item belongs to the network layers stack, and the input parameter of the header is considered to be in big-endian format already. Base on the definition in the PRM, the DW samples will be used for matching in the FTE/STE. Now, the type field and only the PC ID, RTC ID, and DLY MSR ID of the payload will be supported. The masks should be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are needed to support such matching. The mask fields could be zeros to support some wildcard rules. But it makes no sense to support the rule matching only on the payload but without matching type field. The DW samples should be stored after the flex parser creation for eCPRI. There is no need to query the sample IDs each time when creating a flow rule with eCPRI item. It will not introduce insertion rate degradation significantly. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	c7eca23657	net/mlx5: add flow validation of eCPRI header When creating a flow with eCPRI header item, the validation of it is mandatory. The detailed limitations are listed below: 1. Over Ether / VLAN, ethertype must be 0xAEFE. 2. No tunnel support is described in the specification now. 3. L3 layer is only supported when L4 is UDP, see #4. 4. Over TCP is not supported from the specification, and over UDP is not supported right now. 5. Concatenation indicator matching is not supported now. 6. No need to check the revision. 7. Only type field in the common header is mandatory, and one byte should be matched integrally. 8. Fields in the message payload header are optional. 9. Only messages with type #0, #2 and #5 are supported now. Some limitations are only from software right now, because there is no need to support all the message types and variants of protocol stack listed in the specification. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	a2854c4de1	net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains the UTC seconds and the lower 32-bit word contains the nanoseconds. This patch detects what format is configured in the NIC and performs the conversion accordingly. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	3b025c0ca4	net/mlx5: provide send scheduling error statistics The mlx5 PMD exposes the following new introduced extended statistics counter to report the errors of packet send scheduling on timestamps: - txpp_err_miss_int - rearm queue interrupt was not handled was not handled in time and service routine might miss the completions - txpp_err_rearm_queue - reports errors in rearm queue - txpp_err_clock_queue - reports errors in clock queue - txpp_err_ts_past - timestamps in the packet being sent were found in the past, timestamps were ignored - txpp_err_ts_future - timestamps in the packet being sent were found in the too distant future (beyond HW/clock queue capabilities to schedule, typically it is about 16M of tx_pp devarg periods) - txpp_jitter - estimated jitter in device clocks between 8K completions of Clock Queue. - txpp_wander - estimated wander in device clocks between 16M completions of Clock Queue. - txpp_sync_lost - error flag, the Clock Queue completions synchronization is lost, accurate packet scheduling can not be handled, timestamps are being ignored, the restart of all ports using scheduling must be performed. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	b94d93ca73	net/mlx5: support reading device clock If send schedule feature is engaged there is the Clock Queue created, that reports reliable the current device clock counter value. The device clock counter can be read directly from the Clock Queue CQE. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	2f827f5ea6	net/mlx5: support scheduling on send routine template This patch adds send scheduling on timestamps into tx_burst routine template. The feature is controlled by static configuration flag, the actual routines supporting the new feature are generated over this updated template. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	0febfcce36	net/mlx5: prepare Tx to support scheduling The new static control flag is introduced to control routine generating from template, enabling the scheduling on timestamps. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	085ff447f0	net/mlx5: convert timestamp to completion index The application provides timestamps in Tx mbuf as clocks, the hardware performs scheduling on Clock Queue completion index match. This patch introduces the timestamp-to-completion-index inline routine. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	3172c471b8	net/mlx5: prepare Tx queue structures to support timestamp The fields to support send scheduling on dynamic timestamp field are introduced and initialized on device start. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	77522be0a5	net/mlx5: introduce clock queue service routine Service routine is invoked periodically on Rearm Queue completion interrupts, typically once per some milliseconds (1-16) to track clock jitter and wander in robust fashion. It performs the following: - fetches the completed CQEs for Rearm Queue - restarts Rearm Queue on errors - pushes new requests to Rearm Queue to make it continuously running and pushing cross-channel requests to Clock Queue - reads and caches the Clock Queue CQE to be used in datapath - gathers statistics to estimate clock jitter and wander - gathers Clock Queue errors statistics Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	aef1e20ebe	net/mlx5: allocate packet pacing context This patch allocates the Packet Pacing context from the kernel, configures one according to requested pace send scheduling granularity and assigns to Clock Queue. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	3a87b964ed	net/mlx5: create Tx queues with DevX To provide the packet send schedule on mbuf timestamp the Tx queue must be attached to the same UAR as Clock Queue is. UAR is special hardware related resource mapped to the host memory and provides doorbell registers, the assigning UAR to the queue being created is provided via DevX API only. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	551c94c83e	net/mlx5: create rearm queue for packet pacing The dedicated Rearm Queue is needed to fire the work requests to the Clock Queue in realtime. The Clock Queue should never stop, otherwise the clock synchronization might be broken and packet send scheduling would fail. The Rearm Queue uses cross channel SEND_EN/WAIT operations to provides the requests to the Clock Queue in robust way. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	d133f4cdb7	net/mlx5: create clock queue for packet pacing This patch creates the special completion queue providing reference completions to schedule packet send from other transmitting queues. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	fc4d4f732b	net/mlx5: introduce shared UAR resource This is preparation step before moving the Tx queue creation to the DevX approach. Some features require the shared UAR for Tx queues and scheduling completion queues, the patch manages the shared UAR. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	24feb04596	net/mlx5: fix UAR lock sharing for multiport devices The master and representors might be created over the multiport Infiniband devices and the UAR resource allocated for sibling ports might belong to the same underlying Infiniband device. Hardware requires the write access to the UAR must be performed as atomic 64-bit write, on 32-bit systems this is two sequential writes, protected by lock. Due to possibility to share the same UAR between sibling devices the locks must be moved to shared context. Fixes: `f048f3d479` ("net/mlx5: switch to the shared IB device context") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	8f848f32fc	net/mlx5: introduce send scheduling devargs This patch introduces the new devargs: tx_pp - enables accurate packet send scheduling on mbuf timestamps in the PMD. On the device start if "rte_dynflag_timestamp" dynamic flag is registered and this devarg non-zero value is specified, the driver initializes all necessary internal infrastructure to provide packet scheduling. The parameter value specifies scheduling granularity in nanoseconds. tx_skew - the parameter adjusts the send packet scheduling on timestamps and represents the average delay between beginning of the transmitting descriptor processing by the hardware and appearance of actual packet data on the wire. The value should be provided in nanoseconds and is valid only if tx_pp parameter is specified. The default value is zero. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Ali Alnubani	28c9a7d7b4	net/mlx5: add ConnectX-6 Lx device ID This adds the ConnectX-6 Lx device id to the list of supported Mellanox devices that run the MLX5 PMD. The device is still in development stage. Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Acked-by: Raslan Darawsheh <rasland@mellanox.com>	2020-07-11 06:18:53 +02:00
Joyce Kong	428e684795	introduce restricted pointer aliasing marker The 'restrict' keyword is recognized in C99, while type qualifier '__restrict' compiles ok in C with all language levels. This patch is to replace the existing 'restrict' with '__rte_restrict' which is a common wrapper supported by all compilers. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2020-07-10 15:35:32 +02:00
Shy Shyman	5f3541724e	net/mlx5: fix flow META item validation When flow is inserted with meta match item it requires a certain register support. As part of the flow validation of such flows, the validation function is missing a check that the mlx5 driver is not in legacy mode in terms of extended meta data support (MLX5_XMETA_MODE_LEGACY flag). If the driver is in legacy mode it will cause downstream function that allocates needed register for meta data. The fix checks explicitly the conditions for support of meta data in FDB mode. If the conditions are not met an error message will be issued. Fixes: `9bf26e1318` ("ethdev: move egress metadata to dynamic field") Cc: stable@dpdk.org Signed-off-by: Shy Shyman <shys@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	b293fbf967	net/mlx5: add OS specific flow actions operations This patch introduces the OS specific functions, for flow actions create and destroy operations. In existing implementation, the functions to create flow actions return a pointer to the created action object. The new OS specific functions to create flow actions return 0 on success, and (-1) on failure. On success, a pointer to the created action object is returned using an additional parameter. On failure errno is set. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	e57b858710	net/mlx5: add OS specific flow create and destroy This patch introduces the OS specific functions, for flow create and flow destroy operations. In existing implementation, the functions to create objects (flow/table/matcher) return a pointer to the created object. The functions to destroy objects return 0 on success and errno on failure. The new OS specific functions to create objects return 0 on success, and (-1) on failure. On success, a pointer to the created object is returned using an additional parameter. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	e4ed8de39b	net/mlx5: add OS specific flow type selection In current implementation the flow type (DV/Verbs) is selected using dedicated function flow_get_drv_type(). This patch adds OS specific function mlx5_flow_os_get_type(), to allow OS specific flow type selection. The new function is called by flow_get_drv_type(), and if it returns a valid value (DV/Verbs) no more logic is required. Otherwise the existing logic is executed. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	17ad3af9f4	net/mlx5: add OS specific flow related utilities This patch introduces the first OS specific utility functions, for use by flow engine in different OS implementation. The first utility functions are: bool mlx5_flow_os_item_supported(item) bool mlx5_flow_os_action_supported(action) They are implemented to check OS specific support for different item types and action types. New header file is added: drivers/net/mlx5/linux/mlx5_flow_os.h This file contains the utility functions mentioned above for Linux OS. At this stage they are implemented as static inline, for efficiency, and always return true. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	6ad7cfaa66	net/mlx5: rename Verbs action to generic name As part of the effort to support DPDK on Windows and other OS, rename 'verbs_action' to the generic name 'action'. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Dekel Peled	341c894104	net/mlx5: rename Verbs flow to generic name As part of the effort to support DPDK on Windows and other OS, rename from IB related name to generic name. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-07 23:38:26 +02:00
Jerin Jacob	9c99878aa1	log: introduce logtype register macro Introduce the RTE_LOG_REGISTER macro to avoid the code duplication in the logtype registration process. It is a wrapper macro for declaring the logtype, registering it and setting its level in the constructor context. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com> Acked-by: Sachin Saxena <sachin.saxena@nxp.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2020-07-03 15:52:51 +02:00
Michael Baum	0f006468c5	net/mlx5: fix iterator type in Rx queue management The mlx5_check_vec_rx_support function in the mlx5_rxtx_vec.c file passes the RX queues array in the loop. Similarly, the mlx5_mprq_enabled function in the mlx5_rxq.c file passes the RX queues array in the loop. In both cases, the iterator of the loop is called i and the variable representing the array size is called rxqs_n. The i variable is of UINT16_T type while the rxqs_n variable is of unsigned int type. The size of the rxqs_n variable is much larger than the number of iterations allowed by the i type, theoretically there may be a situation where the value of the rxqs_n will be greater than can be represented by 16 bits and the loop will never end. Change the type of i to UINT32_T. Fixes: `7d6bf6b866` ("net/mlx5: add Multi-Packet Rx support") Fixes: `6cb559d67b` ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Michael Baum	36dabcea78	net/mlx5: use anonymous Direct Verbs allocator argument The mlx5_dev_spawn function defines an struct mlx5dv_ctx_allocators type variable several hundred rows after it starts, with the only use it being passed as a parameter to the mlx5_glue->dv_set_context_attr function. However, according to DPDK Coding Style Guidelines, variables should be declared at the start of a block of code rather than in the middle. Therefore, to improve the Coding Style, the variable is passed directly to the function without declaring it before. Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Michael Baum	a294e58c80	net/mlx5: use direct API to find port by device Using RTE_ETH_FOREACH_DEV_OF loop is not necessary when the driver wants to find only the first match. Use rte_eth_find_next_of to find it. Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Shiri Kuzin	0f0ae73a32	net/mlx5: add parameter for LACP packets control The new devarg will control the steering of the lacp traffic. When setting dv_lacp_by_user = 0 the lacp traffic will be steered to kernel and managed there. When setting dv_lacp_by_user = 1 the lacp traffic will not be steered and the user will need to manage it. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Shiri Kuzin	3c78124f0a	net/mlx5: add default miss action to flow engine The new action is an internal mlx5 action that will call the rdma-core function MLX5DV_FLOW_ACTION_DEFAULT_MISS. The default miss action will be used when a bond is configured to allow traffic related to the bond to be managed in the kernel. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Viacheslav Ovsiienko	420bbdae89	net/mlx5: fix host physical function representor naming The new kernel adds the names like "pf0" for Host PCI physical function representor on Bluefield SmartNIC hosts. This patch provides correct HPF representor recognition over the kernel versions 5.7 and laters. The following port naming formats are supported: - missing physical port name (no sysfs/netlink key) at all, master is assumed - decimal digits (for example "12"), representor is assumed, the value is the index of attached VF - "p" followed by decimal digits, for example "p2", master is assumed - "pf" followed by PF index, for example "pf0", Host PF representor is assumed on SmartNIC systems. - "pf" followed by PF index concatenated with "vf" followed by VF index, for example "pf0vf1", representor is assumed. If index of VF is "-1" it is a special case of Host PF representor, this representor must be indexed in devargs as 65535, for example representor=[0-3,65535] will allow representors for VF0, VF1, VF2, VF3 and for host PF. Fixes: `79aa430721` ("common/mlx5: split common file under Linux directory") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Ori Kam	262c7ad0dd	common/mlx5: move doorbell record from net driver The creation of DBR can be used by a number of different Mellanox PMDs. for example RegEx / Net / VDPA. This commits moves the DBR creation and release functions to common folder. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-30 14:52:30 +02:00
Ophir Munk	391b8bcc81	common/mlx5: move some getter functions from net driver Getter functions such as: 'mlx5_os_get_ctx_device_name', 'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name', 'mlx5_os_get_umem_id' are implemented under net directory. To enable additional devices (e.g. regex, vdpa) to access these getter functions they are moved under common directory. As part of this commit string sizes DEV_SYSFS_NAME_MAX and DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination string size in strncpy() function is bigger than the source string size. This update will avoid GCC version 8 error -Werror=stringop-truncation. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	ac79183dc6	net/mlx5: optimize free counter lookup Currently, when allocate a new counter, it needs loop the whole container pool list to get a free counter. In the case with millions of counters allocated, and all the pools are empty, allocate the new counter will still need to loop the whole container pool list first, then allocate a new pool to get a free counter. It wastes the cycles during the pool list traversal. Add a global free counter list in the container helps to get the free counters more efficiently. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	b1cc226644	net/mlx5: optimize single counter pool search For single counter, when allocate a new counter, it needs to find the pool it belongs in order to do the query together. Once there are millions of counters allocated, the pool array in the counter container will become very large. In this case, the pool search from the pool array will become extremely slow. Save the minimum and maximum counter ID to have a quick check of current counter ID range. And start searching the pool from the last pool in the container will mostly get the needed pool since counter ID increases sequentially. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
Suanming Mou	632f0f1905	net/mlx5: manage shared counters in three-level table Currently, to check if any shared counter with same ID existing, it will have to loop the counter pools to search for the counter. Even add the counter to the list will also not so helpful while there are thousands of shared counters in the list. Change Three-Level table to look up the counter index saved in the relevant table entry will be more efficient. This patch introduces the Three-level table to save the ID relevant counter index in the table. Then the next while the same ID comes, just check the table entry of this ID will get the counter index directly. No search will be needed. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
Suanming Mou	bd81eaebd9	net/mlx5: add three-level table utility For the case which data is linked with sequence increased index, the array table will be more efficient than hash table once need to search one data entry in large numbers of entries. Since the traditional hash tables has fixed table size, when huge numbers of data saved to the hash table, it also comes lots of hash conflict. But simple array table also has fixed size, allocates all the needed memory at once will waste lots of memory. For the case don't know the exactly number of entries will be impossible to allocate the array. Then the multiple level table helps to balance the two disadvantages. Allocate a global high level table with sub table entries at first, the global table contains the sub table entries, and the sub table will be allocated only once the corresponding index entry need to be saved. e.g. for up to 32-bits index, three level table with 10-10-12 splitting, with sequence increased index, the memory grows with every 4K entries. The currently implementation introduces 10-10-12 32-bits splitting Three-Level table to help the cases which have millions of entries to save. The index entries can be addressed directly by the index, no search will be needed. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-30 14:52:29 +02:00
David Marchand	63783b0172	net/mlx5: remove redundant newline from logs The DRV_LOG macro already appends a newline. Fixes: `46287eacc1` ("net/mlx5: introduce hash list") Fixes: `860897d289` ("net/mlx5: reorganize flow tables with hash list") Fixes: `e484e40323` ("net/mlx5: optimize tag traversal with hash list") Fixes: `6801116688` ("net/mlx5: fix multiple flow table hash list") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Xiaoyu Min <jackmin@mellanox.com>	2020-06-30 14:52:29 +02:00
Matan Azrad	aec086c9f1	common/mlx5: share kernel interface name getter Some configuration of the mlx5 port are done by the kernel net device associated to the IB device represents the PCI device. The DPDK mlx5 driver uses Linux system calls, for example ioctl, in order to configure per port configurations requested by the DPDK user. One of the basic knowledges required to access the correct kernel net device is its name. Move function to get interface name from IB device path to the common library. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-06-30 14:52:29 +02:00
Ophir Munk	4f96d91396	net/mlx5/linux: add memory region callbacks to Verbs Create a set of verbs callbacks in 'struct mlx5_verbs_ops' and add MR operations to it (file net/mlx5/linux/mlx5_verbs.c). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-17 16:32:01 +02:00
Ophir Munk	d5ed8aa944	net/mlx5: add memory region callbacks in per-device cache Prior to this commit MR operations were verbs based and hard coded under common/mlx5/linux directory. This commit enables upper layers (e.g. net/mlx5) to determine which MR operations to use. For example the net layer could set devx based MR operations in non-Linux environments. The reg_mr and dereg_mr callbacks are added to the global per-device MR cache 'struct mlx5_mr_share_cache'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-17 16:32:01 +02:00
Dong Zhou	eb10fe7fb1	net/mlx5: fix LRO checksum The TCP checksum includes IPV4 pseudo-header checksum and L3 payload checksum which include TCP header and TCP payload. When mlx5 LRO is enabled, HW will calculate the TCP payload checksum, PMD need complete the IPV4 pseudo-header checksum and the TCP header checksum. The mlx5_lro_update_tcp_hdr function completes the TCP header checksum, but this function using lower 4 bits of data-offset field in TCP header to get the whole TCP header length, this will cause TCP header checksum wrong calculation. Update the code using higher 4 bits of data-offset field instead of lower 4 bits. Fixes: `e4c2a16eb1` ("net/mlx5: handle LRO packets in Rx queue") Cc: stable@dpdk.org Signed-off-by: Dong Zhou <dongz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Alexander Kozyrev	e891b54a9e	net/mlx5: fix descriptors number adjustment The number of descriptors to configure in a Rx/Tx queue is passed to the mlx5_tx/rx_queue_pre_setup() function by value. That means any adjustments of this variable are local and cannot affect the actual value that is used to allocate mbufs in the mlx5_txq/rxq_new() functions. Pass the number as a reference to actually update it. Fixes: `6218063b39` ("net/mlx5: refactor Rx data path") Fixes: `1d88ba1719` ("net/mlx5: refactor Tx data path") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Alexander Kozyrev	a23d96ae59	net/mlx5: do not select legacy MPW implicitly The Legacy MPW (multi-packet write) should not be engaged implicitly. We should exclude this function from a Tx burst routine selection process unless it is requested specifically by setting the txq_mpw_en devarg. Exclude this function from the selection process the same way it is done for the Enhanced MPW in the mlx5_select_tx_function() routine. Fixes: `eb8121ab9d` ("net/mlx5: introduce Tx burst routine template") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	73bf9235e9	net/mlx5: refactor statistics mlx5 statistics are calculated by several methods: 1. In software when packets go through datapath. 2. Calling ioctl with ETHTOOL command (Linux specific). 3. Reading counters from SYSFS device path (Linux specific). The Linux related functions are moved to file linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	042f5c94fd	net/mlx5: refactor device operations for Linux There are three types of eth_dev_ops: primary, secondary and isolate. Their function calls assignments are moved from common file mlx5.c to the Linux specific file linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	1256805dd5	net/mlx5: move Linux-specific functions File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for functions which are Linux specific. Functions which are Linux agnostics remain in mlx5_ethdev.c file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	f484ffa1b1	net/mlx5: move socket files in Linux directory mlx5_socket.c file is using APIs which are Linux specifics. Therefore move it (including mlx5_socket.h) from net/mlx5 directory to net/mlx5/linux directory. This commit also updates the Makefile and the meson files. Signed-off-by: Ophir Munk <ophirmu@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	9138989036	net/mlx5: rename ib in names Renames in this commit: mlx5_ibv_list -> mlx5_dev_ctx_list mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx mlx5_ibv_shared_port -> mlx5_dev_shared_port ibv_port -> dev_port Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	21b7c452a6	net/mlx5: remove completion object dependency on DV Replace 'struct mlx5dv_devx_cmd_comp ' with 'void ' in 'struct mlx5_dev_ctx_shared'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Gregory Etelson	5c76123810	net/mlx5: fix flow memory allocation size In DV enabled MLX5 PMD build mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size was initiated for DV structure. If RTE initialization encountered MLX5 PCI function with disabled DV support mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size was reduced to match legacy verbs flow size. Since mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW] is a global variable that change reflected on DV enabled MLX5 PCI functions too. Running flow with invalid ipool size crashes PMD. The patch adjusts ipool flow size for each active PCI function. Fixes: `b88341ca35` ("net/mlx5: convert flow dev handle to indexed") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Dekel Peled	7842dfeacd	net/mlx5: fix GTP mask definition location Recent patch added definition of mask MLX5_GTP_FLAGS_MASK, just above function flow_dv_validate_item_gtp(), where it is used. Patch was applied together with other patches which modified the same file, so the mask was located further away from the function it is used in. This patch moves the mask definition to the proper location. Fixes: `563ac307a4` ("net/mlx5: support match on GTP flags") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ali Alnubani	3efac8085e	net/mlx5: fix typos in meter error messages Fixes: `3bd26b23ce` ("net/mlx5: support meter profile operations") Cc: stable@dpdk.org Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	834a9019ec	net/mlx5: remove Verbs dependency in spawn struct 1. Replace 'struct ibv_device ' with 'void ' in 'struct mlx5_dev_spawn_data'. Define a getter function to retrieve the device name. 2. Rename ibv_dev and ibv_port as phys_dev and phys_port respectively. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	10f3581dfd	net/mlx5: add Linux-specific header file File drivers/net/linux/mlx5_os.h is added. It includes specific Linux definitions such as PCI driver flags, link state changes interrupts, link removal interrupts, etc. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	2eb4d0107a	net/mlx5: refactor PCI probing on Linux Refactor PCI probing related code. Move Linux specific functions (as well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	c7f6ba0e53	net/mlx5: remove umem field dependency on Direct Verbs umem field is used in several structs. Its type 'struct mlx5dv_devx_umem ' is changed to 'void '. This change will allow non-Linux OS compilations. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	e85f623e13	net/mlx5: remove attributes dependency on Verbs Define 'struct mlx5_dev_attr' which is ibv and dv independent. It contains attribute that were originally contained in 'struct ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API mlx5_os_get_dev_attr() which fills in the new defined struct. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	c468501658	common/mlx5: remove protection domain dependency on Verbs Replace 'struct ibv_pd ' with 'void ' in struct mlx5_ctx_shared and all function calls in mlx5 PMD. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	f44b09f9e3	net/mlx5: add Linux-specific file with getter functions 'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct ibv_context ' to 'void '. 'ctx' members which are verbs dependent (e.g. device_name) will be accessed through getter functions which are added to a new file under Linux directory: linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	6e88bc42c7	net/mlx5: rename Verbs shared object Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Alexander Kozyrev	c9cc554ba4	net/mlx5: fix vectorized Rx burst termination Maximum burst size of Vectorized Rx burst routine is set to MLX5_VPMD_RX_MAX_BURST(64). This limits the performance of any application that would like to gather more than 64 packets from the single Rx burst for batch processing (i.e. VPP). The situation gets worse with a mix of zipped and unzipped CQEs. They are processed separately and the Rx burst function returns small number of packets every call. Repeat the cycle of gathering packets from the vectorized Rx routine until a requested number of packets are collected or there are no more CQEs left to process. Fixes: `6cb559d67b` ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-03 17:20:32 +02:00
Suanming Mou	a1da6f624c	net/mlx5: add reclaim memory mode Currently, when flow destroyed, some memory resources may still be kept as cached to help next time create flow more efficiently. Some system may need the resources to be more flexible with flow create and destroy. After peak time, with millions of flows destroyed, the system would prefer the resources to be reclaimed completely, no cache is needed. Then the resources can be allocated and used by other components. The system is not so sensitive about the flow insertion rate, but more care about the resources. Both DPDK mlx5 PMD driver and the low level component rdma-core have provided the flow resources to be configured cached or not, but there is no APIs or parameters exposed to user to configure the flow resources cache mode. In this case, introduce a new PMD devarg to let user configure the flow resources cache mode will be helpful. This commit is to add a new "reclaim_mem_mode" to help user configure if the destroyed flows' cache resources should be kept or not. Their will be three mode can be chosen: 1. 0(none). It means the flow resources will be cached as usual. The resources will be cached, helpful with flow insertion rate. 2. 1(light). It will only enable the DPDK PMD level resources reclaim. 3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be configured as reclaimed mode. With these three mode, user can configure the resources cache mode with different levels. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-03 17:19:26 +02:00
Ophir Munk	72f7566056	common/mlx5: move glue files under Linux directory The glue file mlx5_glue.c is based on Linux specifics APIs. Move it (including file mlx5_glue.h) to common/mlx5/linux directory. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-03 17:19:26 +02:00
Suanming Mou	33860cfab6	net/mlx5: fix interrupt installation timing Currently, the DevX counter query works asynchronously with Devx interrupt handler return the query result. When port closes, the interrupt handler will be uninstalled and the Devx comp obj will also be destroyed. Meanwhile the query is still not cancelled. In this case, counter query may use the invalid Devx comp which has been destroyed, and query failure with invalid FD will be reported. Adjust the shared interrupt install and uninstall timing to make the counter asynchronous query stop before interrupt uninstall. Fixes: `f15db67df0` ("net/mlx5: accelerate DV flow counter query") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Suanming Mou	2786b7bf90	net/mlx5: fix secondary process resources release When secondary process starts, it will allocate its own process private data, and also does remap to UAR register of the Tx queue. Once the secondary process exits, these resources should be released accordingly. And the shared resources owned by primary should not be touched. Currently, once one port in the secondary process spawn failed, all the other spawned ports will also be released during process exits. However, the mlx5_dev_close() function does not add the cases for secondary process, it means call the mlx5_dev_close() function directly in secondary process releases the resources it should not touch. Add the case for secondary process release to its own resources in mlx5_dev_close() function to help it quits gracefully. Fixes: `942d13e6e7` ("net/mlx5: fix sharing context destroy order") Fixes: `3a8207423a` ("net/mlx5: close all ports on remove") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Michael Baum	01de93f245	net/mlx5: fix unreachable MPLS error path The mlx5_flow_validate_item_mpls function checks MPLS item validation. It first checks if the device supports MPLS, it is done using the ifdef condition that if it fails to skip to endif and return the appropriate error. When MPLS is supported, the preprocessor will copy the body of the function ending with return 0 followed by the lines that report MPLS support. In fact, these lines are unreachable because before them the function returns 0 and in any case they are unnecessary. Replace the endif by else and move endif to the end of the function. Fixes: `23c1d42c71` ("net/mlx5: split flow validation to dedicated function") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Michael Baum	c55ec83b58	net/mlx5: remove needless Tx queue initialization check The mlx5_txq_obj_new function defines a pointer named txq_data and assign value into it. After assigning, the code writer is sure that the variable does not point to NULL and even express it using assertion. During the function, the function does dereferencing to the pointer several times and at no point change its value. However, at the end of the function at the error label when it wants to free one of the fields of the structure that txq_data points to, it checks again whether txq_data is invalid. This check is unnecessary since it knows for sure that txq_data is valid. Remove the aforementioned needless check. Fixes: `6449068818` ("net/mlx5: add free on completion queue") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Michael Baum	50181d9965	net/mlx5: fix socket close The mlx5_pmd_socket_handle function calls the accept function that returns the socket descriptor into the conn_sock variable. The socket descriptor value can be 0 (according to accept API) or positive and so immediately after calling the function it checks whether conn_sock < 0. Later in the function when other things fail it jumps to the error label and release previously allocated resources (such as socket or file). During the resource release, it checks whether the variable conn_sock containing the socket descriptor is positive and if it is, it releases it. However, in this check it misses the case where conn_sock == 0, in this case the socket will not be released and there will be a Resource leak. Extend the close condition for 0 value too. Fixes: `e6cdc54cc0` ("net/mlx5: add socket server for external tools") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:23 +02:00
Michael Baum	a943102fc6	net/mlx5: remove unnecessary init in socket creation In the mlx5_pmd_socket_handle function it calls the recvmsg function which returns the number of bytes read. The function assigns this return value into a ret variable defined at the beginning of the function. Similarly in the mlx5_pmd_socket_init function the it calls the socket function which returns a file descriptor for the new socket. The function also assigns this return value into a ret variable defined at the beginning of the function. In both functions they initialize the variable when defining it, however, in both cases they do not use any ret variable before assigning the return value from the function, so the initialization is unnecessary. Clean the aforementioned unnecessary initializations. Fixes: `e6cdc54cc0` ("net/mlx5: add socket server for external tools") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:23 +02:00
Michael Baum	ebed623f62	net/mlx5: fix hairpin Rx queue creation error path The mlx5_rxq_obj_hairpin_new function defines a pointer named tmpl and allocates memory for it using the rte_zmalloc_socket function. Later, this function allocates memory to a variable inside tmpl using the mlx5_devx_cmd_create_rq function. In both cases, if the allocation fails, the code jumps to the error label and frees allocated resources. However, in the first jump there are still no resources to free and the jump only for the line return NULL is unnecessary. Even worse, when it jumps to error label with invalid tmpl it actually does dereference to a null pointer. In contrast, the second jump needs to free the tmpl variable but the function instead of freeing, tries to free the variable that it just failed to allocate. In addition, for another error, the function returns NULL without freeing the tmpl variable before, causing a memory leak. Delete the error label and replace each jump with local return NULL and free tmpl variable if needed. Fixes: `e79c9be915` ("net/mlx5: support Rx hairpin queues") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:23 +02:00
Michael Baum	7e6eba619d	net/mlx5: fix hairpin Tx queue creation error path The mlx5_txq_obj_hairpin_new function defines a pointer named tmpl and allocates memory for it using the rte_zmalloc_socket function. Later, this function allocates memory to a variable inside tmpl using the mlx5_devx_cmd_create_sq function. In both cases, if the allocation fails, the code jumps to the error label and frees allocated resources. However, in the first jump there are still no resources to free and the jump only for the line return NULL is unnecessary. Even worse, when it jumps to error label with invalid tmpl it actually does dereference to a null pointer. In contrast, the second jump needs to free the tmpl variable but the function instead of freeing, tries to free the variable that it just failed to allocate, and another variable that has never been allocated. In addition, for another error, the function returns NULL without freeing the tmpl variable before, causing a memory leak. Delete the error label and replace each jump with local return NULL and free tmpl variable if needed. Fixes: `ae18a1ae96` ("net/mlx5: support Tx hairpin queues") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:23 +02:00
Dekel Peled	b0447b5470	net/mlx5: revert DevX preference for Rx objects Recent patch exposed a minor performance issue, so it is reverted. Fixes: `d237d22fbe` ("net/mlx5: prefer DevX API to create Rx objects") Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-21 17:59:29 +02:00
Muhammad Bilal	5a448a55b4	fix same typo in multiple places Removed the typing error in doc/guides/eventdevs/index.rst, drivers/net/mlx5/mlx5.c and in lib/librte_vhost/rte_vhost.h Bugzilla ID: 477 Fixes: `0857b94211` ("doc: add event device and software eventdev") Fixes: `039253166a` ("vhost: add device op when notification to guest is sent") Fixes: `ad74bc6195` ("net/mlx5: support multiport IB device during probing") Cc: stable@dpdk.org Signed-off-by: Muhammad Bilal <m.bilal@emumba.com>	2020-05-19 15:55:57 +02:00
Bing Zhao	29e091cefe	net/mlx5: fix port action resource initialization After memory optimization, the organization of some resources are changed from pointer based LIST to the index based ILIST. A lot of code parts are touched due to such change. Some static code checking and analysis tool will complain and raise a false warning on the uninitialized value using. E.g. in the port action registering function, the stack variable will be used as the right value with some uninitialized field to initialize variable allocated from heap. But indeed, it is not an error because all the fields set with the uninitialized value will be overwritten in the following code part and the macros. All the fields will be used as the left value explicitly. It makes no sense to clear the stack variable to 0 in this case, and the extra memset will introduce some cycles overhead. It just needs to ignore the false warning from the tool, if any. Fixes: `f3faf9ea11` ("net/mlx5: convert port id action to indexed") Signed-off-by: Bing Zhao <bingz@mellanox.com> Reviewed-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Bing Zhao	d3b61f4b7c	net/mlx5: fix port action assert timing After memory optimization, some action object handles are changed to index to save the overhead. Assertion in debug mode will be helpful for trouble shooting. In the current implementation, only one port action is supported in switchdev mode for one device flow. In debug mode, an assertion will be used to check the if the port action is none, and it should locate before the port action resource registration but not after it. The action index in the handle should be 0 before registration. Or else it will always cause a failure because the port action is registered and the index is not 0. Fixes: `f3faf9ea11` ("net/mlx5: convert port id action to indexed") Signed-off-by: Bing Zhao <bingz@mellanox.com> Reviewed-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Suanming Mou	a95decbc9d	net/mlx5: fix shared flow counter lookup Currently, the shared counter search uses the wrong nested index which is used by the pool index. The incorrect nested index using causes the search go to incorrect counter pool is not existed. Add the counter index to fix the incorrect nested use case. Fixes: `4001d7ad26` ("net/mlx5: change Direct Verbs counter to indexed") Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:57 +02:00
Bing Zhao	25a59a3076	net/mlx5: fix doorbell bitmap management offsets The doorbell record is organized with page and bitmap. When some new doorbell needs to be associated with a queue, the bit will be set in the bitmap to indicate the corresponding doorbell occupied. A counter is used to record the number of doorbell occupied to speed up the searching. If the number reaches the maximal value of a pre-defined number of a page, a new page will be allocated. If not, then the bitmap will be checked to find a free one. The LSHIFT and OR (AND NOT) operations are used to update the bitmap of a page. But 1 will be treated as a signed integer when compiling. When the shift number is 31, the shifted value will be considered as negative. Then a wrong extension will be done when setting it to a 64-bits variable. All the upper 32-bits will be set to 1 by such extension. Then a wrong offset value will be calculated because of this. The next 64 bits will be also treated as the bitmap and get corrupted through the bit set operation. The immediate value 1 needs to be used as 64 bits width explicitly. Fixes: `21cae8580f` ("net/mlx5: allocate door-bells via DevX") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:57 +02:00
Suanming Mou	d71d5b949c	net/mlx5: fix Verbs counter pool allocation When create the Verbs flows with counter, randomly SEGSEV will also comes. The reason is that the counter pool memory is not allocated sufficiently and initialized correctly in Verbs case. As the mlx5_flow_counter array member is moved out of the counter pool struct, the counter pool memory layout currently contain implicitly with mlx5_flow_counter, mlx5_age_param(if the pool is an age pool), mlx5_flow_counter_ext(if the pool is a none batch pool). When allocate the pool memory, the pool size should be calculated based on the pool type accordingly. Currently, for Verbs counter pool, both mlx5_flow_counter and mlx5_flow_counter_ext need to be taken into account in the pool size. And the pool type should also be initialized as CNT_POOL_TYPE_EXT. This patch add the missing size and type for the Verbs counter pool. Fixes: `8d93c830e4` ("net/mlx5: modify ext-counter memory allocation") Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:57 +02:00
Dekel Peled	ff55182ce3	net/mlx5: fix VLAN flow action with wildcard VLAN item Previous patch added support of VLAN item without VLAN ID value, i.e. using wildcard VLAN item, to match VLAN with any VLAN ID. The implication on VLAN actions was not taken into consideration. VLAN actions (e.g. push vlan) use the VLAN ID value in the VLAN item, and expect it to be valid. This patch updates function flow_dev_get_vlan_info_from_items() to check the VLAN item contents before trying to use it. Fixes: `92818d839e` ("net/mlx5: fix match on empty VLAN item in DV mode") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Matan Azrad	5af61440dd	net/mlx5: fix flow counter container resize The design of counter container resize used double buffer algorithm in order to synchronize between the query thread to the control thread. When the control thread detected resize need, it created new bigger buffer for the counter pools in a new container and change the container index atomically. In case the query thread had not detect the previous resize before a new one need was detected by the control thread, the control thread returned EAGAIN to the flow creation API used a COUNT action. The rte_flow API doesn't allow unblocked commands and doesn't expect to get EAGAIN error type. So, when a lot of flows were created between 2 different periodic queries, 2 different resizes might try to be created and caused EAGAIN error. This behavior may blame flow creations. Change the synchronization way to use lock instead of double buffer algorithm. The critical section of this lock is very small, so flow insertion rate should not be decreased. Fixes: `ebbac312e4` ("net/mlx5: resize a full counter container") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Shiri Kuzin	4c204fe5e5	common/mlx5: disable relaxed ordering in unsuitable CPUs Relaxed ordering is a PCI optimization that enables reordering reads/writes in order to improve performance. Relaxed ordering was enabled for all processors causing a degradation in performance in Haswell and Broadwell processors that don't support this optimization. In order to avoid that we check if the processor is Haswell or Broadwell and if so we disable relaxed ordering. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:57 +02:00
Shiri Kuzin	ffd5b302ba	common/mlx5: fix relaxed ordering count object In order to improve performance relaxed ordering was enabled when creating count object using Devx. Currently rte enables this optimization by default when using Devx. This causes an issue when using firmware that does not have this capability causing a count object failure. In order to fix this issue a check of firmware capabilities was added before enabling relaxed ordering. Fixes: `53ac93f71a` ("net/mlx5: create relaxed ordering memory regions") Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:56 +02:00
Dekel Peled	d237d22fbe	net/mlx5: prefer DevX API to create Rx objects Currently, DevX API is used to create Rx objects (RQ, RQT, TIR) only if LRO or hairpin features are enabled on this RQ. This patch uses DevX API by default, if DevX is supported and can be used. Otherwise, Verbs API is used. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-18 20:35:56 +02:00
Dekel Peled	563ac307a4	net/mlx5: support match on GTP flags This patch adds to MLX5 PMD the support of matching on GTP header item v_pt_rsv_flags. This item is contained in 1 byte of the format: ------------------------------------------- \| bit \| 0 - 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| \|-----------------------------------------\| \| value \| Version \| PT \| Res \| E \| S \| PN \| ------------------------------------------- Matching is supported only for GTP flags E, S, PN. Therefore values 0 to 7 are supported. Mask must be set accordingly: ... gtp v_pt_rsv_flags is 1 v_pt_rsv_flags mask 0x07 ... Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-11 22:27:39 +02:00
Alexander Kozyrev	776aec28fc	net/mlx5: fix Tx queue release debug log timing Program received signal SIGSEGV, Segmentation fault. 0x00000000008ef7c4 in mlx5_tx_queue_release (dpdk_txq=0x17ce01680) at drivers/net/mlx5/mlx5_txq.c:302 301 mlx5_txq_release(ETH_DEV(priv), i); 302 DRV_LOG(DEBUG, "port %u removing Tx queue %u from list", 303 PORT_ID(priv), txq->idx); The problem is txq is freed inside the mlx5_txq_release() function and no longer valid in the debug log right after this invocation. Move the debug log before the mlx5_txq_release() function to fix this. Fixes: `a6d83b6a92` ("net/mlx5: standardize on negative errno values") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-11 22:27:39 +02:00
Michael Baum	c8f0abe7f8	net/mlx5: fix meter color register consideration The mlx5_flow_get_reg_id() function translates tag ID to register from the registers that are supported and available for use. The user does not know which register is available at a time and therefore there is an array that represents mapping to the available registers. Usually the free registers are continuous in the flow_mreg_c array but sometimes the mtr_color_reg register is between them and it must be skipped and the next register returned, in which case the function returns the mapping of the next entity in the array. When the function reads from the next entity in the array, it does not check whether such an entity exists and in some situation invalid access to memory occurs beyond the array boundaries. So, when all the registers are valid from HW perspective and the meter color register is not the default, the tag id 5 causes an out of bound access. Validate registers availability when meter color register is not the default. Coverity issue: 146355 Fixes: `792e749e92` ("net/mlx5: fix register usage in meter") Cc: stable@dpdk.org Signed-off-by: Michael Baum <michaelba@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-11 22:27:39 +02:00
Raslan Darawsheh	8a2e026add	net/mlx5: fix matching for UDP tunnels with Verbs When creating flow rule with zero specs it will cause matching all UDP packets like following: eth / ipv4 / udp / vxlan / end Such rule will match all udp packets. This change the behavior to match the dv flow engine which will automatically set the match on relative outer UDP port if the user didn't specify any. Fixes: `84c406e745` ("net/mlx5: add flow translate function") Cc: stable@dpdk.org Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-11 22:27:39 +02:00

... 3 4 5 6 7 ...

1854 Commits