numam-dpdk

Author	SHA1	Message	Date
Ophir Munk	db12615b42	net/mlx5: prepare MR prototypes for DevX Currently MR operations are Verbs based. This commit updates MR operations prototypes such that DevX MR operations callbacks can be used as well. Rename 'struct mlx5_verbs_ops' as 'struct mlx5_mr_ops' and move it to shared file mlx5.h. Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-01-08 16:03:07 +01:00
Ophir Munk	1f29d15ec9	net/mlx5: extend device attributes getter This commit adds device attributes parameters to be reported by mlx5_os_get_dev_attr(): max_cqe, max_mr, max_pd, max_srq, max_srq_wr Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-01-08 16:03:07 +01:00
Viacheslav Ovsiienko	81c3b97735	net/mlx5: fix Verbs memory allocation callback The rdma-core library uses callbacks to allocate and free memory from DPDK. The memory allocation callback used the complicated and incorrect way to get the NUMA socket ID from the context. The context was wrong that might result in wrong socket ID and allocating memory from wrong node. The callbacks are assigned once as Infinibande device context is created allowing early access to shared DPDK memory for all Verbs internal objects need that. Fixes: `36dabcea78` ("net/mlx5: use anonymous Direct Verbs allocator argument") Fixes: `2eb4d0107a` ("net/mlx5: refactor PCI probing on Linux") Fixes: `17e19bc4dd` ("net/mlx5: add IB shared context alloc/free functions") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-01-08 16:03:04 +01:00
Andrey Vesnovaty	fa7ad49e96	net/mlx5: fix shared RSS action update The shared RSS action update was not operational due to lack of kernel driver support of TIR object modification. This commit introduces the workaround to support shared RSS action modify using an indirect queue table update instead of touching TIR object directly. Limitations: the only supported RSS property to update is queues, the rest of the properties ignored. Fixes: `d2046c09aa` ("net/mlx5: support shared action for RSS") Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-22 16:40:03 +01:00
Xueming Li	e6818853c0	net/mlx5: set representor to first PF in bonding mode When the representor device was set to PF1 in bonding mode, iterating device iterator that looking for representors by bonding device failed to match PF0 pci address with PF1 address. So detaching PF bonding device only detached all representors on PF0. This patch registers all representors of PF1 with PF0 as PCI device. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-20 21:10:05 +01:00
Gregory Etelson	9cac7ded37	net/mlx5: fix tunnel offload object allocation The original patch allocated tunnel offload objects with invalid indexes. As the result, PMD tunnel object allocation failed. In this patch indexed pool provides both an index and memory for a new tunnel offload object. Also tunnel offload ipool moved to dv enabled code only. Fixes: `4ae8825c50` ("net/mlx5: use indexed pool as id generator") Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-20 21:10:04 +01:00
Suanming Mou	fabf8a3724	net/mlx5: fix shared RSS action release As shared RSS action will be shared by multiple flows, the action is created as global standalone action and managed only by the relevant shared action management functions. Currently, hrxqs will be created by shared RSS action or general queue action. For hrxqs created by shared RSS action, they should also only be released with shared RSS action. It's not correct to release the shared RSS action hrxqs as general queue actions do in flow destroy. This commit adds a new fate action type for shared RSS action to handle the shared RSS action hrxq release correctly. Fixes: `e1592b6c4d` ("net/mlx5: make Rx queue thread safe") Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-13 19:43:26 +01:00
Xueming Li	c3ba8ecb76	net/mlx5: fix missing meter packet For transfer flow with meter, packet was passed without applying flow action. The group level was multiplied by 10 for group level 65531. This patch fixes this issue by correcting suffix table group level calculation. Fixes: `3e8f3e51fd` ("net/mlx5: fix meter table definitions") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-13 19:43:26 +01:00
Dekel Peled	105d214965	net/mlx5: fix aging queue doorbell ringing Recent patch introduced a new SQ for ASO flow hit management. This SQ uses two WQEBB's for each WQE. The SQ producer index is 16 bits wide. The enqueue loop posts new WQEs to the ASO SQ, using WQE index for the SQ management. This 16 bits index multiplied by 2 was wrongly used also for SQ doorbell ringing. The multiplication caused the SW index overlapping to be out of sync with the hardware index, causing it to get stuck. This patch separates the WQE index management from the doorbell index management. So, for each WQE index incrementation by 1, the doorbell index is incremented by 2. Fixes: `f935ed4b64` ("net/mlx5: support flow hit action for aging") Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-13 16:26:54 +01:00
Viacheslav Ovsiienko	eb63ec0e56	net/mlx5: fix UAR used by ASO queues The dedicated UAR was allocated for the ASO queues. The shared UAR created for Tx queues can be used instead. Fixes: `f935ed4b64` ("net/mlx5: support flow hit action for aging") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-14 10:56:30 +01:00
Tal Shnaiderman	e82ddd28e3	common/mlx5: split PCI relaxed ordering for read and write The current DevX implementation of the relaxed ordering feature is enabling relaxed ordering usage only if both relaxed ordering read AND write are supported. In that case both relaxed ordering read and write are activated. This commit will optimize the usage of relaxed ordering by enabling it when the read OR write features are supported. Each relaxed ordering type will be activated according to its own capability bit. This will align the DevX flow with the verbs implementation of ibv_reg_mr when using the flag IBV_ACCESS_RELAXED_ORDERING Fixes: `53ac93f71a` ("net/mlx5: create relaxed ordering memory regions") Cc: stable@dpdk.org Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-04 19:16:24 +01:00
Matan Azrad	f9bc5274a6	net/mlx5: allow age modes combination ASO age action mode is not supported in group 0 while counter base age action mode supports group 0. Allow using the 2 modes of age action in parallel, so group 0 flows will use counter base age actions and group > 0 flows will use ASO age actions. Currently, counter base age action doesn't support shared action API so group 0 flows cannot share age actions. Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Dekel Peled <dekelp@nvidia.com>	2020-11-03 23:35:07 +01:00
Matan Azrad	81073e1f8c	net/mlx5: support shared age action Add support for rte_flow shared action API for ASO age action. First step here to support validate, create, query and destroy. The support is only for age ASO mode. Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Dekel Peled <dekelp@nvidia.com>	2020-11-03 23:35:07 +01:00
Matan Azrad	4a42ac1f1c	net/mlx5: optimize shared RSS action memory The RSS shared action was saved in flow memory by a pointer. It means that every flow memory includes 8B only for optional shared RSS case. Move the RSS objects to be used by indexed pool which reduces the flow handle memory to 4B. So, now, the shared action handler is also just a 4B index. Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Dekel Peled <dekelp@nvidia.com>	2020-11-03 23:35:07 +01:00
Dekel Peled	f935ed4b64	net/mlx5: support flow hit action for aging A new ASO (Advanced Steering Operation) feature was added in the last mlx5 adapters to support flow hit detection. Using this new steering action, the driver can detect flow traffic hit and to reset this indication any time. The ASO age action cannot support flows in table 0. Add support for flow aging action in rte_flow using this new feature. The counter aging mode will be taken only when the ASO feature is not supported for the user flow groups. Signed-off-by: Dekel Peled <dekelp@nvidia.com> Signed-off-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:07 +01:00
Alexander Kozyrev	54c2d46b16	net/mlx5: support flow tag and packet header miniCQEs CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniCQE size is only 8 bytes and this limits the ability to successfully keep the compression session in case of various traffic patterns. The current miniCQE format only keeps the compression session alive in case of uniform traffic with the Hash RSS as the only difference. There are requests to keep the compression session in case of tagged traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic. Add 2 new miniCQE formats in order to achieve the best performance for these traffic patterns: Flow Tag and Packet Header miniCQEs. The existing rxq_cqe_comp_en devarg is modified to specify the desired miniCQE format. Specifying 2 selects Flow Tag format for better compression rate in case of RTE Flow Mark traffic. Specifying 3 selects Checksum format (existing format for MPRQ). Specifying 4 selects L3/L4 Header format for better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:07 +01:00
Viacheslav Ovsiienko	41c2bb6357	net/mlx5: use C11 atomics in packet scheduling The rte_atomic API is deprecated and needs to be replaced with C11 atomic builtins. Use the relaxed ordering and explicit memory barrier for Clock Queue and timestamps synchronization. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:05 +01:00
Xueming Li	9fbe97f0ce	net/mlx5: remove shared context lock To support multi-thread flow insertion, this patch removes shared data lock since all resources should support concurrent protection. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:05 +01:00
Suanming Mou	cc608e4df4	net/mlx5: make shared action list thread safe This commit uses spinlock to protect the shared action list in multiple thread. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:05 +01:00
Suanming Mou	1978414169	net/mlx5: make sample and mirror action thread safe This commit uses cache list to make sample and mirror action thread safe. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	3422af2af2	net/mlx5: make push VLAN action cache thread safe To support multi-thread flow insertion, this patch converts push VLAN action cache list to thread safe cache list. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	0fd5f82aaa	net/mlx5: make port ID action cache thread safe To support multi-thread flow insertion, this patch convert port id action cache list to thread safe cache list. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	1872635570	net/mlx5: make matcher list thread safe To support multi-thread flow insertion, this path converts matcher list to use thread safe cache list API. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Suanming Mou	e1592b6c4d	net/mlx5: make Rx queue thread safe This commit applies the cache linked list to Rx queue to make it thread safe. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Suanming Mou	84d3389048	net/mlx5: optimize shared RSS list operation When create shared RSS hrxq, the hrxq will be created directly, no hrxq will be reused. In this case, add the shared RSS hrxq to the queue list is redundant. And it also hurts the generic queue lookup. This commit avoids add the shared RSS hrxq to the queue list. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Suanming Mou	ff7ab341af	net/mlx5: remove unused mreg copy After non-cache mode feature was implemented, the flows can only be created when port started. No need to check if the mreg flows are created in port stopped status, and apply the mreg flows after port start will also never happen. This commit removed the relevant not used mreg copy code. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	afd7a62514	net/mlx5: make flow table cache thread safe To support multi-thread flow insertion/removal, this patch uses thread safe hash list API for flow table cache hash list. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Suanming Mou	b80726dc51	net/mlx5: create global default miss action This commit creates the global default miss action instead of maintain it in flow insertion time. This makes the action to be thread safe. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	d163fc2d15	net/mlx5: make flow list thread safe To support multi-thread flow operations, this patch introduces list lock for the rte_flow list manages all the rte_flow handlers. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	4ae8825c50	net/mlx5: use indexed pool as id generator The ID generation API used an integer pool to save released ID, To support multiple flow, it has to be enhanced to be thread safe. Indexed pool could be used to generate unique ID by setting size of pool entry to zero. Since bitmap is used, an extra benefits is saving memory to about one bit per entry. Further more indexed pool could be thread safe by enabling lock. This patch leverages indexed pool to generate ID, removes unused ID generating API. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	94b6d88438	net/mlx5: reuse flow id as hairpin id Hairpin flow matching required a unique flow ID for matching. This patch reuses flow ID as hairpin flow ID, this will save some code to generate a separate hairpin ID, also saves flow memory by removing hairpin ID. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Xueming Li	8bb81f2649	net/mlx5: use thread specific flow workspace As part of multi-thread flow support, this patch moves flow intermediate data to thread specific, makes them a flow workspace. The workspace is allocated per thread, destroyed along with thread life-cycle. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:04 +01:00
Alexander Kozyrev	cf7d1995b9	net/mlx5: use C11 atomics for flow tables The rte_atomic API is deprecated and needs to be replaced with C11 atomic builtins. Use the relaxed ordering for RTE flow tables. Enforce Acquire/Release model for managing DevX pools. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:04 +01:00
Alexander Kozyrev	b5c8b3e70c	net/mlx5: use C11 atomics for RxQ/TxQ refcounts The rte_atomic API is deprecated and needs to be replaced with C11 atomic builtins. Use the relaxed ordering for RxQ/TxQ refcounts. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:04 +01:00
Matan Azrad	89f170c0da	net/mlx5: fix Tx queue start The Tx queue stop\start operations update the HW state of the Tx queue object. The stop API should update the state from ready to reset in order to stop any queue traffic and the start API should update the state from reset to ready in order to open the traffic path. The start API wrongly tried to change the state from ready to ready what caused a failure in FW on the current state validation. Replace ready to ready command by reset to ready command in the Tx start API. Fixes: `161d103b23` ("net/mlx5: add queue start and stop") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Asaf Penso <asafp@nvidia.com>	2020-11-03 23:35:04 +01:00
Bing Zhao	02109eaeac	net/mlx5: support getting hairpin peer ports In real-life business, one device could be attached and detached dynamically. The hairpin configuration of this port to/from all the other ports should be enabled and disabled accordingly. The RTE ethdev lib and PMD should provide this ability to get the peer ports list in case that the application doesn't save it. It is recommended that the size of the array to save the port IDs is as large as the "RTE_MAX_ETHPORTS" to have the maximal capacity. The order of the peer port IDs may be different from that during hairpin queues set in the initialization stage. The peer port ID could be the same as the current device port ID when the hairpin peer ports contain itself - the single port hairpin. The application should check the ports' status and decide if the peer port should be bound / unbound when starting / stopping the current device. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:04 +01:00
Bing Zhao	37cd4501e8	net/mlx5: support two ports hairpin mode In order to support hairpin between two ports, mlx5 PMD needs to implement the functions and provide them as the function pointers. The bind and unbind functions are executed per port pairs. All the hairpin queues between the two ports should have the same attributes during queues setup. Different configurations among queue pairs from the same ports are not supported. It is allowed that two ports only have one direction hairpin. In order to set up the connection between two queues, peer Rx queue HW information must be fetched via the internal RTE API and the queue information could be used to modify the SQ object. Then the RQ object will be modified with the Tx queue HW information. The reverse operation is not supported right now. When disconnecting the queues pair, SQ and RQ object should be reset without any peer HW information. The unbinding operation will try to disconnect all Tx queues from the port from the Rx queues of the peer port. Tx explicit mode attribute will be saved and used when creating a hairpin flow. Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:03 +01:00
Viacheslav Ovsiienko	a0a45e8af7	net/mlx5: configure Rx queue for buffer split The scatter-gather elements should be configured accordingly to support the buffer split feature. The application provides the desired settings for the segments at the beginning of the packets and PMD pads the buffer chain (if needed) with attributes of last specified segment to accommodate the packet of maximal length. There are some limitations are implied. The MPRQ feature should be disengaged if split is requested, due to MPRQ neither supports pushing data to the dedicated pools nor follows the flexible buffer sizes. The vectorized rx_burst routines does not support the scattering (these ones are extremely simplified and work over the single segment only) and can't handle split as well. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko	9f209b59c8	net/mlx5: support Rx buffer split description The routine to provide Rx queue setup with specifying extended receiving buffer description is added. It allows application to specify desired segment lengths, data position offsets in the buffer and dedicated memory pool for each segment. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:35:02 +01:00
Gregory Etelson	4ec6360de3	net/mlx5: implement tunnel offload Tunnel Offload API provides hardware independent, unified model to offload tunneled traffic. Key model elements are: - apply matches to both outer and inner packet headers during entire offload procedure; - restore outer header of partially offloaded packet; - model is implemented as a set of helper functions. Implementation details: * tunnel_offload PMD parameter must be set to 1 to enable the feature. * application cannot use MARK and META flow actions with tunnel. * offload JUMP action is restricted to steering tunnel rule only. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Andrey Vesnovaty	d7cfcddded	net/mlx5: translate shared action for RSS action Handle shared action on flow validation/creation/destruction. mlx5 PMD translates shared action into a regular one before handling flow validation/creation. The shared action translation applied to utilize the same execution path for both shared and regular actions. The current implementation supports shared action translation for shared RSS action only. RSS action validation split to validate shared RSS action on its creation in addition to action validation in flow validation/creation path. Implement rte_flow shared action API for mlx5 PMD, mostly forwarding calls to flow driver operations (see struct mlx5_flow_driver_ops). Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Andrey Vesnovaty	b8cc58c140	net/mlx5: modify hash Rx queue objects Implement modification for hashed table of Rx queue object (see mlx5_hrxq_modify()). This implementation relies on the capability to modify TIR object via DevX API, i.e. current implementation doesn't support verbs HW object operations. The functionality to modify hashed table of Rx queue object is prerequisite to implement rete_flow_shared_action_update() for shared RSS action in mlx5 PMD. Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-11-03 23:35:02 +01:00
Xueming Li	16dbba257c	net/mlx5: fix port shared data reference count When probe a representor, tag cache hash table and modification cache hash table allocated memory upon each port, overwrote previous existing cache in shared context data. This patch moves reference check of shared data prior to hash table allocation to avoid such issue. Fixes: `6801116688` ("net/mlx5: fix multiple flow table hash list") Fixes: `1ef4cdef26` ("net/mlx5: fix flow tag hash list conversion") Cc: stable@dpdk.org Acked-by: Matan Azrad <matan@nvidia.com> Signed-off-by: Xueming Li <xuemingl@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	2b5b1aeb39	net/mlx5: optimize counter extend memory Counter extend memory was allocated for non-batch counter to save the extra DevX object. Currently, for non-batch counter which does not support aging, entry in the generic counter struct is used only when counter is free in free list, and bytes in the struct is used only when counter is allocated in using. In this case, the DevX object can be saved to the generic counter struct union with entry memory when counter is allocated and union with bytes when counter is free. And pool type is also not needed as non-fallback mode only has generic counter and aging counter, just a bit to indicate the pool is aged or not will be enough. This eliminates the counter extend info struct saves the memory. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	cfbdc3f938	net/mlx5: rename flow counter macro Add the MLX5_ prefix to the defined counter macro names. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	e7138997e0	net/mlx5: make shared counters thread safe The shared counters save the counter index to three level table. As three level table supports multiple-thread operations now, the shared counters can take advantage of the table to support multiple-thread. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	3aa279157f	net/mlx5: synchronize flow counter pool creation Currently, counter operations are not thread safe as the counter pools' array resize is not protected. This commit protects the container pools' array resize using a spinlock. The original counter pool statistic memory allocate is moved to the host thread in order to minimize the critical section. Since that pool statistic memory is required only in query time. The container pools' array should be resized by the user threads, the new pool may be used by other rte_flow APIs before the host thread resize is done, if the pool is not saved to the pools' array, the specified counter memory will not be found as the pool is not saved to the counter management pool array. The pool raw statistic memory will be filled in host thread. The shared counters will be protected in other commit. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	994829e695	net/mlx5: remove single counter container A flow counter which was allocated by a batch API couldn't be assigned to a flow in the root table (group 0) in old rdma-core version. Hence, a root table flow counter required PMD mechanism to manage counters which were allocated singly. Currently, the batch counters have already been supported in root table includes a new rdma-core version with MLX5_FLOW_ACTION_COUNTER_OFFSET enum and with a kernel driver includes MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX_OFFSET enum. When the PMD uses rdma-core API to assign a batch counter to a root table flow using invalid counter offset, it should get an error only if the batch counter assignment for root table is supported. Using this trial in the initialization time can help to detect the support. Using the above trial, if the support is valid, remove the management of single counter container in the fast counter mechanism. Otherwise, move the counter mechanism to fallback mode. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	df051a3e77	net/mlx5: optimize shared counter memory Instead of using special memory to indicate shared counter, this patch does the optimization to use the counter handler reserved memory to indicate it. The counter index with MLX5_CNT_SHARED_OFFSET means the shared counter. This patch is also an arrangement for a new adjustment to use batch counter as shared counter. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Suanming Mou	6b7c717ed1	net/mlx5: locate aging pools in the general container Commit [1] introduced different container for the aging counter pools. In order to save container memory the aging counter pools can be located in the general pool container. This patch locates the aging counter pools in the general pool container. Remove the aging container management. [1] commit `fd143711a6` ("net/mlx5: separate aging counter pool range") Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 23:24:25 +01:00
Dekel Peled	d5a7d04c79	net/mlx5: support query of age action Recent patch [1] adds to ethdev the API for query of age action. This patch implements in MLX5 PMD the query of age action using this API. [1] https://mails.dpdk.org/archives/dev/2020-October/184864.html Signed-off-by: Dekel Peled <dekelp@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-11-03 22:29:25 +01:00
Ivan Ilchenko	62024eb827	ethdev: change stop operation callback to return int Change eth_dev_stop_t return value from void to int. Make eth_dev_stop_t implementations across all drivers to return negative errno values if case of error conditions. Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2020-10-16 22:26:41 +02:00
Jiawei Wang	00c10c2211	net/mlx5: update translate function for mirroring Translate the attribute of sample action that include sample ratio and sub actions list. PMD will check the destination action number in current flow, if found multiple destination actions, then create the new destination array rdma action that group actions for each destination. Currently only support port or queue for destination action, and only encap action can be attached into one port destination. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	b4c0ddbfcc	net/mlx5: split sample flow into two sub-flows The flow with sample action will be split into two sub flows: the prefix sub flow with the all actions preceding the sample action and sample action itself, and the suffix sub flow with the actions following the sample action. The original items remain in the prefix sub flow, add the implicit tag action with unique id to set in metadata register, and suffix sub flow uses the tag item to match with that unique id. The flow split as below: Original flow: items / actions pre / sample / actions sfx -> prefix sub flow - items / actions pre / set_tag action / sample suffix sub flow - tag_item / actions sfx Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Jiawei Wang	96b1f0273c	net/mlx5: validate sample action Add sample action validate function. Sample Flow is supported in NIC-RX and FDB domains. For the NIC-RX the Sample Flow action list must include the destination queue action. Only NIC-RX domain supports the optional actions list. FDB doesn't support any optional actions, the sampled packets is always forwarded to the E-Switch manager port. Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-16 19:48:18 +02:00
Michael Baum	e96242efa4	net/mlx5: remove Rx queue object type field Once the separation between Verbs and DevX is done using function pointers, the type field of the Rx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	4c6d80f1c5	net/mlx5: separate Rx queue state modification Separate Rx state modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	354cc08a2d	net/mlx5: remove Tx queue object type field Once the separation between Verbs and DevX is done using function pointers, the type field of the Tx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	5d9f3c3f48	net/mlx5: separate Tx queue object modification Separate Tx object modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	f49f44839d	net/mlx5: share Tx control code Move Tx object similar resources allocations and debug logs from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	86d259cec8	net/mlx5: separate Tx queue object creations As an arrangement to Windows OS support, the Verbs operations should be separated to another file. By this way, the build can easily cut the unsupported Verbs APIs from the compilation process. Define operation structure and DevX module in addition to the existing Linux Verbs module. Separate Tx object creation into the Verbs/DevX modules and update the operation structure according to the OS support and the user configuration. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Michael Baum	e7055bbfbe	net/mlx5: reposition event queue number field The eqn field has become a field of sh directly since it is also relevant for Tx and Rx. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-10-09 13:17:42 +02:00
Suanming Mou	3e8f3e51fd	net/mlx5: fix meter table definitions As metering and metadata features were developed at the same time. The metering and metadata tables are defined conflicted. This cause the meter suffix flow jump to the same metadata table and cause flow deadloop. Adjust the metering table define to fix that issue. Fixes: `46a5e6bc6a` ("net/mlx5: prepare meter flow tables") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-10-08 19:58:11 +02:00
Thomas Monjalon	b142387b07	ethdev: allow drivers to return error on close The device operation .dev_close was returning void. This driver interface is changed to return an int. Note that the API rte_eth_dev_close() is still returning void, although a deprecation notice is pending to change it as well. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com> Reviewed-by: Liron Himi <lironh@marvell.com> Reviewed-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2020-09-30 19:19:13 +02:00
Suanming Mou	bf615b077d	net/mlx5: manage header reformat actions with hashed list To manage encap decap header format actions mlx5 PMD used the single linked list and lookup and insertion operations took too long times if there were millions of objects and this impacted the flow insertion/deletion rate. In order to optimize the performance the hashed list is engaged. The list implementation is updated to support non-unique keys with few collisions. Signed-off-by: Suanming Mou <suanmingm@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-09-30 19:19:09 +02:00
Xueming Li	c21e5facf7	net/mlx5: use bond index for netdev operations In case of bonding, device ifindex was detected as the PF ifindex, so any operation using ifindex applied to PF instead of the bond device. These operations includes MTU get/set, up/down and mac address manipulation, etc. This patch detects bond interface ifindex and name for PF that join a bond interface, uses it by default for netdev operations. Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2020-09-30 19:19:09 +02:00
Michael Baum	0c762e81da	net/mlx5: share Rx queue drop action code Move Rx queue drop action similar resources allocations from Verbs module to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	5eaf882e94	net/mlx5: separate Rx queue drop Separate Rx queue drop creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	5a959cbfa6	net/mlx5: share Rx hash queue code Move Rx hash queue object similar resources allocations from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	25ae7f1a5d	net/mlx5: share Rx queue indirection table code Move Rx indirection table object similar resources allocations from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	66b96fa6a6	net/mlx5: remove indirection table type field Once the separation between Verbs and DevX is done using function pointers, the type field of the indirection table structure becomes redundant and no more code is used. Remove the unnecessary field from the structure. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	85552726d3	net/mlx5: separate Rx hash queue creation Separate Rx hash queue creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	87e2db37ef	net/mlx5: separate Rx indirection table object creation Separate Rx indirection table object creation into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	c279f187ee	net/mlx5: separate Rx queue object modification Separate Rx object modification to the Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	1260a87b28	net/mlx5: share Rx control code Move Rx object similar resources allocations and debug logs from DevX and Verbs modules to a shared location. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	322870799e	net/mlx5: separate Rx interrupt handling Separate interrupt event handler into both Verbs and DevX modules. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Michael Baum	6deb19e1b2	net/mlx5: separate Rx queue object creations As an arrangement to Windows OS support, the Verbs operations should be separated to another file. By this way, the build can easily cut the unsupported Verbs APIs from the compilation process. Define operation structure and DevX module in addition to the existing linux Verbs module. Separate Rx object creation into the Verbs/DevX modules and update the operation structure according to the OS support and the user configuration. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2020-09-18 18:55:08 +02:00
Ophir Munk	7af10d29a4	net/mlx5/linux: refactor VLAN File mlx5_vlan.c contains Netlink APIs (Linux dependent) as part of VM workaround implementation. Move this implementation to file linux/mlx5_vlan_os.c. To remove Netlink dependency in header files change pointer of type 'struct mlx5_nl_vlan_vmwa_context ' to 'void '. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	8bb2410ea3	net/mlx5: separate VLAN strip modification When updating a queue vlan stripping offload - either the WQ is modified in Verbs or the RQ is modified in DevX. Add a vlan stripping modify callback to 'struct mlx5_obj_ops' and assign it with the specific Verbs and DevX implementations: 'rxq_obj_modify_wq_vlan_strip' and 'rxq_obj_modify_rq_vlan_strip' respectively. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	1f66ac5bbe	net/mlx5: remove more Direct Verbs dependencies Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced with 'void ' to enable compilation under non-Linux operating systems. New getter functions were added to retrieve the specific fields that were previously accessed directly. Replaced structs: 'struct mlx5dv_pp ' 'struct mlx5dv_devx_event_channel ' 'struct mlx5dv_devx_umem ' 'struct mlx5dv_devx_uar *' Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	f00f6562e1	net/mlx5: remove netlink dependency in shared code This commit adds Linux implementation of routine mlx5_os_mac_addr_flush as wrapper to Netlink API to avoid direct calls under non-Linux operating systems. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Ophir Munk	e9c0b96e35	net/mlx5: move Linux ifname function mlx5_get_ifname() prototype includes 'IF_NAMESIZE' definition from Linux file net/if.h. Since this API is only used under Linux and to enable compilation under non-Linux OS - move this prototype from shared file mlx5.h to file linux/mlx5_os.h. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-09-18 18:55:06 +02:00
Suanming Mou	3fe889617b	net/mlx5: manage modify actions with hashed list To manage header modify actions mlx5 PMD used the single linked list and lookup and insertion operations took too long times if there were millions of objects and this impacted the flow insertion/deletion rate. In order to optimize the performance the hashed list is engaged. The list implementation is updated to support non-unique keys with few collisions. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-09-18 18:55:06 +02:00
Suanming Mou	e1293b10de	net/mlx5: fix counter query Currently, the counter query requires the counter ID should start with 4 aligned. In none-batch mode, the counter pool might have the chance to get the counter ID not 4 aligned. In this case, the counter should be skipped, or the query will be failed. Skip the counter with ID not 4 aligned as the first counter in the none-batch count pool to avoid invalid counter query. Once having new min_dcs ID in the poll less than the skipped counters, the skipped counters will be returned to the pool free list to use. Fixes: `5382d28c21` ("net/mlx5: accelerate DV flow counter transactions") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-30 00:41:23 +02:00
Parav Pandit	392bf9084d	common/mlx5: register class drivers through common layer Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-28 19:01:30 +02:00
Viacheslav Ovsiienko	161d103b23	net/mlx5: add queue start and stop The mlx5 PMD did not support queue_start and queue_stop eth_dev API routines, queue could not be suspended and resumed during device operation. There is the use case when this feature is crucial for applications: - there is the secondary process handling the queue - secondary process crashed/aborted - some mbufs were allocated or used by secondary application - some mbufs were allocated by Rx queues to receive packets - some mbufs were placed to send queue - queue goes to undefined state In this case there was no reliable way to recovery queue handling by restarted secondary process but reset queue to initial state freeing all involved resources, including buffers involved in queue operations, reset the mbuf pools, and then reinitialize queue to working state: - reset mbuf pool, allocate all mbuf to initialize pool into safe state after the crush and allow safe mbuf free calls - stop queue, free all potentially involved mbufs - reset mbuf pool again - start queue, reallocate mbufs needed This patch introduces the queue start/stop feature with some limitations: - hairpin queues are not supported - it is application responsibility to synchronize start/stop with datapath routines, rx/tx_burst must be suspended during the queue_start/queue_stop calls - it is application responsibility to track queue usage and provide coordinated queue_start/queue_stop calls from secondary and primary processes. - Rx queues with vectorized Rx routine and engaged CQE compression are not supported by this patch currently Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Dekel Peled	08d1838f64	net/mlx5: implement CQ for Rx using DevX API This patch continues the work to use DevX API for different objects creation and management. On Rx control path, the RQ, RQT, and TIR objects can already be created using DevX API. This patch adds the support to create CQ for RxQ using DevX API. The corresponding event channel is also created and utilized using DevX API. Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	9d60f54569	common/mlx5: remove inclusion of Verbs header files Several source files include Verbs header files as in (1). These source files will not compile under non-Linux operating systems. This commit removes this inclusion in two cases: Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source file so the inclusion in (1) can be safely removed. Case 2: Verbs symbols are used. Please note the inclusion in (1) already appears in file linux/mlx5_glue.h (which represents the interface to the rdma-core library). Therefore, replace (1) in the source file with (2). Under non-Linux operating systems - file mlx5_glue.h will not include (1). (1) #include <infiniband/verbs.h> #include <infiniband/mlx5dv.h> (2) #include <mlx5_glue.h> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	2e86c4e5c7	net/mlx5: refactor multi-process communication 1. The shared data communication between the primary and the secondary processes is implemented using Linux API. Move the Linux API code under linux directory (file linux/mlx5_os.c). 2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved under linux (new file linux/mlx5_mp_os.c). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	ef9ee13f6e	net/mlx5: cleanup header file The cleanup refers to header file mlx5.h. 1. Remove unused prototypes. 2. Move prototypes under their correct title. 3. Change functions to static and remove their prototye from the header file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	98c4b12afa	net/mlx5: eliminate dependency on Linux in shared header This commit eliminates Linux dependencies in shared file mlx5.h. 1. All functions using 'struct ifreq' are moved to file linux/mlx5_ethdev_os.c such that this struct can be removed from mlx5.h. 2. Function mlx5_set_flags() that uses Linux flags (e.g. IFF_UP) is changed to static and its prototype is removed from mlx5.h. 3. Remove redundant member verbs_action from 'struct mlx5_priv'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	4d18abd130	net/mlx5: wrap Linux promiscuous and multicast functions This commit adds Linux implementation of routines mlx5_os_set_promisc() and mlx5_os_set_promisc(). The routines call netlink APIs. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	ab27cdd93a	net/mlx5: refactor Linux MAC operations Move OS specific MAC operations add, remove, modify VF into file linux/mlx5_os.c. Remove unused function mlx5_get_mac(). Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Ophir Munk	3eca5f8a61	net/mlx5: move flow priority discovery to Verbs file Function calls mlx5_flow_adjust_priority() and mlx5_flow_discover_priorities() are Verbs based. Move them from file mlx5_flow.c to file mlx5_flow_verbs.c Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	50f95b23c9	net/mlx5: add option to configure FCS or decapsulation There are some limitations on some NICs (at least on ConnectX-6 Dx and BlueField 2) with supporting FCS (frame checksum) scattering for the tunnel decapsulated packets. For the case only one of the features can be supported in the same time, and the new devarg "decap_en" is introduced to provide the choice to the users. If FCS scattering feature is not supposed to be engaged by application, this new devarg should be specified as "decap_en=0", forcing the FCS feature enable and rejecting tunnel decap actions in the rte_flow engine. If FCS scatter is not needed and application supposes to use tunnel decapsulation in rte_flow, the devarg can be omitted or set to non-zero value (this is default settings). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:46:30 +02:00
Suanming Mou	5522da6b20	net/mlx5: add option to allocate memory from system Currently, for MLX5 PMD, once millions of flows created, the memory consumption of the flows are also very huge. For the system with limited memory, it means the system need to reserve most of the memory as huge page memory to serve the flows in advance. And other normal applications will have no chance to use this reserved memory any more. While most of the time, the system will not have lots of flows, the reserved huge page memory becomes a bit waste of memory at most of the time. By the new sys_mem_en devarg, once set it to be true, it allows the PMD allocate the memory from system by default with the new add mlx5 memory management functions. Only once the MLX5_MEM_RTE flag is set, the memory will be allocate from rte, otherwise, it allocates memory from system. So in this case, the system with limited memory no need to reserve most of the memory for hugepage. Only some needed memory for datapath objects will be enough to allocated with explicitly flag. Other memory will be allocated from system. For system with enough memory, no need to care about the devarg, the memory will always be from rte hugepage. One restriction is that for DPDK application with multiple PCI devices, if the sys_mem_en devargs are different between the devices, the sys_mem_en only gets the value from the first device devargs, and print out a message to warn that. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	1c5064044f	net/mlx5: create and destroy eCPRI flex parser eCPRI protocol has unified format layout for the variants, over ETH layer (including .1Q) and UDP layer. The common header of the message has 4 bytes fixed length, and the message payload layers are different based on the type field. Now only type #0, #2 and #5 will be supported, and 2 bytes are needed. When creating the flex parser, the header will be extended to 8 bytes and 2 DW samples are needed. The 1st DW starts from offset 0 and will be used for the type field of the common header. The 2nd DW starts from offset 4 and will be used for the physical channel ID, real-time control ID or measurement ID fields. The parser will be created once a flow with eCPRI item is observed for the first time. After creating, it will remain in the system and HW until the device is stopped. Right now, there is no need to destroy the eCPRI flex parser after the last flow with eCPRI item is destroyed. This is to get rid of the alternate states of creating and destroying eCPRI flex parser with a single eCPRI flow. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	711aedf187	common/mlx5: add flex parser DevX structures The structures and other definitions will be used for the dynamic flex parser creation via Devx command interface. These structures will be used as some some intermediate variables and input parameters for the parser creation API. It is better to keep all members consistent with the PRM definition even though some of them will not be used. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Bing Zhao	daa38a8924	net/mlx5: add flow translation of eCPRI header In the translation stage, the eCPRI item should be translated into the format that lower layer driver could use. All the fields that need to match must be in network byte order after translation, as well as the mask. Since the header in the item belongs to the network layers stack, and the input parameter of the header is considered to be in big-endian format already. Base on the definition in the PRM, the DW samples will be used for matching in the FTE/STE. Now, the type field and only the PC ID, RTC ID, and DLY MSR ID of the payload will be supported. The masks should be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are needed to support such matching. The mask fields could be zeros to support some wildcard rules. But it makes no sense to support the rule matching only on the payload but without matching type field. The DW samples should be stored after the flex parser creation for eCPRI. There is no need to query the sample IDs each time when creating a flow rule with eCPRI item. It will not introduce insertion rate degradation significantly. Signed-off-by: Bing Zhao <bingz@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	a2854c4de1	net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains the UTC seconds and the lower 32-bit word contains the nanoseconds. This patch detects what format is configured in the NIC and performs the conversion accordingly. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	3b025c0ca4	net/mlx5: provide send scheduling error statistics The mlx5 PMD exposes the following new introduced extended statistics counter to report the errors of packet send scheduling on timestamps: - txpp_err_miss_int - rearm queue interrupt was not handled was not handled in time and service routine might miss the completions - txpp_err_rearm_queue - reports errors in rearm queue - txpp_err_clock_queue - reports errors in clock queue - txpp_err_ts_past - timestamps in the packet being sent were found in the past, timestamps were ignored - txpp_err_ts_future - timestamps in the packet being sent were found in the too distant future (beyond HW/clock queue capabilities to schedule, typically it is about 16M of tx_pp devarg periods) - txpp_jitter - estimated jitter in device clocks between 8K completions of Clock Queue. - txpp_wander - estimated wander in device clocks between 16M completions of Clock Queue. - txpp_sync_lost - error flag, the Clock Queue completions synchronization is lost, accurate packet scheduling can not be handled, timestamps are being ignored, the restart of all ports using scheduling must be performed. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	b94d93ca73	net/mlx5: support reading device clock If send schedule feature is engaged there is the Clock Queue created, that reports reliable the current device clock counter value. The device clock counter can be read directly from the Clock Queue CQE. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	085ff447f0	net/mlx5: convert timestamp to completion index The application provides timestamps in Tx mbuf as clocks, the hardware performs scheduling on Clock Queue completion index match. This patch introduces the timestamp-to-completion-index inline routine. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	77522be0a5	net/mlx5: introduce clock queue service routine Service routine is invoked periodically on Rearm Queue completion interrupts, typically once per some milliseconds (1-16) to track clock jitter and wander in robust fashion. It performs the following: - fetches the completed CQEs for Rearm Queue - restarts Rearm Queue on errors - pushes new requests to Rearm Queue to make it continuously running and pushing cross-channel requests to Clock Queue - reads and caches the Clock Queue CQE to be used in datapath - gathers statistics to estimate clock jitter and wander - gathers Clock Queue errors statistics Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	aef1e20ebe	net/mlx5: allocate packet pacing context This patch allocates the Packet Pacing context from the kernel, configures one according to requested pace send scheduling granularity and assigns to Clock Queue. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	551c94c83e	net/mlx5: create rearm queue for packet pacing The dedicated Rearm Queue is needed to fire the work requests to the Clock Queue in realtime. The Clock Queue should never stop, otherwise the clock synchronization might be broken and packet send scheduling would fail. The Rearm Queue uses cross channel SEND_EN/WAIT operations to provides the requests to the Clock Queue in robust way. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	d133f4cdb7	net/mlx5: create clock queue for packet pacing This patch creates the special completion queue providing reference completions to schedule packet send from other transmitting queues. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	fc4d4f732b	net/mlx5: introduce shared UAR resource This is preparation step before moving the Tx queue creation to the DevX approach. Some features require the shared UAR for Tx queues and scheduling completion queues, the patch manages the shared UAR. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	24feb04596	net/mlx5: fix UAR lock sharing for multiport devices The master and representors might be created over the multiport Infiniband devices and the UAR resource allocated for sibling ports might belong to the same underlying Infiniband device. Hardware requires the write access to the UAR must be performed as atomic 64-bit write, on 32-bit systems this is two sequential writes, protected by lock. Due to possibility to share the same UAR between sibling devices the locks must be moved to shared context. Fixes: `f048f3d479` ("net/mlx5: switch to the shared IB device context") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko	8f848f32fc	net/mlx5: introduce send scheduling devargs This patch introduces the new devargs: tx_pp - enables accurate packet send scheduling on mbuf timestamps in the PMD. On the device start if "rte_dynflag_timestamp" dynamic flag is registered and this devarg non-zero value is specified, the driver initializes all necessary internal infrastructure to provide packet scheduling. The parameter value specifies scheduling granularity in nanoseconds. tx_skew - the parameter adjusts the send packet scheduling on timestamps and represents the average delay between beginning of the transmitting descriptor processing by the hardware and appearance of actual packet data on the wire. The value should be provided in nanoseconds and is valid only if tx_pp parameter is specified. The default value is zero. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-07-21 15:44:36 +02:00
Shiri Kuzin	0f0ae73a32	net/mlx5: add parameter for LACP packets control The new devarg will control the steering of the lacp traffic. When setting dv_lacp_by_user = 0 the lacp traffic will be steered to kernel and managed there. When setting dv_lacp_by_user = 1 the lacp traffic will not be steered and the user will need to manage it. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Shiri Kuzin	3c78124f0a	net/mlx5: add default miss action to flow engine The new action is an internal mlx5 action that will call the rdma-core function MLX5DV_FLOW_ACTION_DEFAULT_MISS. The default miss action will be used when a bond is configured to allow traffic related to the bond to be managed in the kernel. Signed-off-by: Shiri Kuzin <shirik@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Ori Kam	262c7ad0dd	common/mlx5: move doorbell record from net driver The creation of DBR can be used by a number of different Mellanox PMDs. for example RegEx / Net / VDPA. This commits moves the DBR creation and release functions to common folder. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-30 14:52:30 +02:00
Ophir Munk	391b8bcc81	common/mlx5: move some getter functions from net driver Getter functions such as: 'mlx5_os_get_ctx_device_name', 'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name', 'mlx5_os_get_umem_id' are implemented under net directory. To enable additional devices (e.g. regex, vdpa) to access these getter functions they are moved under common directory. As part of this commit string sizes DEV_SYSFS_NAME_MAX and DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination string size in strncpy() function is bigger than the source string size. This update will avoid GCC version 8 error -Werror=stringop-truncation. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	ac79183dc6	net/mlx5: optimize free counter lookup Currently, when allocate a new counter, it needs loop the whole container pool list to get a free counter. In the case with millions of counters allocated, and all the pools are empty, allocate the new counter will still need to loop the whole container pool list first, then allocate a new pool to get a free counter. It wastes the cycles during the pool list traversal. Add a global free counter list in the container helps to get the free counters more efficiently. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:30 +02:00
Suanming Mou	b1cc226644	net/mlx5: optimize single counter pool search For single counter, when allocate a new counter, it needs to find the pool it belongs in order to do the query together. Once there are millions of counters allocated, the pool array in the counter container will become very large. In this case, the pool search from the pool array will become extremely slow. Save the minimum and maximum counter ID to have a quick check of current counter ID range. And start searching the pool from the last pool in the container will mostly get the needed pool since counter ID increases sequentially. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
Suanming Mou	632f0f1905	net/mlx5: manage shared counters in three-level table Currently, to check if any shared counter with same ID existing, it will have to loop the counter pools to search for the counter. Even add the counter to the list will also not so helpful while there are thousands of shared counters in the list. Change Three-Level table to look up the counter index saved in the relevant table entry will be more efficient. This patch introduces the Three-level table to save the ID relevant counter index in the table. Then the next while the same ID comes, just check the table entry of this ID will get the counter index directly. No search will be needed. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-30 14:52:29 +02:00
Matan Azrad	aec086c9f1	common/mlx5: share kernel interface name getter Some configuration of the mlx5 port are done by the kernel net device associated to the IB device represents the PCI device. The DPDK mlx5 driver uses Linux system calls, for example ioctl, in order to configure per port configurations requested by the DPDK user. One of the basic knowledges required to access the correct kernel net device is its name. Move function to get interface name from IB device path to the common library. Signed-off-by: Matan Azrad <matan@mellanox.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2020-06-30 14:52:29 +02:00
Ophir Munk	d5ed8aa944	net/mlx5: add memory region callbacks in per-device cache Prior to this commit MR operations were verbs based and hard coded under common/mlx5/linux directory. This commit enables upper layers (e.g. net/mlx5) to determine which MR operations to use. For example the net layer could set devx based MR operations in non-Linux environments. The reg_mr and dereg_mr callbacks are added to the global per-device MR cache 'struct mlx5_mr_share_cache'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-17 16:32:01 +02:00
Ophir Munk	73bf9235e9	net/mlx5: refactor statistics mlx5 statistics are calculated by several methods: 1. In software when packets go through datapath. 2. Calling ioctl with ETHTOOL command (Linux specific). 3. Reading counters from SYSFS device path (Linux specific). The Linux related functions are moved to file linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	042f5c94fd	net/mlx5: refactor device operations for Linux There are three types of eth_dev_ops: primary, secondary and isolate. Their function calls assignments are moved from common file mlx5.c to the Linux specific file linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	1256805dd5	net/mlx5: move Linux-specific functions File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for functions which are Linux specific. Functions which are Linux agnostics remain in mlx5_ethdev.c file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	9138989036	net/mlx5: rename ib in names Renames in this commit: mlx5_ibv_list -> mlx5_dev_ctx_list mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx mlx5_ibv_shared_port -> mlx5_dev_shared_port ibv_port -> dev_port Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	21b7c452a6	net/mlx5: remove completion object dependency on DV Replace 'struct mlx5dv_devx_cmd_comp ' with 'void ' in 'struct mlx5_dev_ctx_shared'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	834a9019ec	net/mlx5: remove Verbs dependency in spawn struct 1. Replace 'struct ibv_device ' with 'void ' in 'struct mlx5_dev_spawn_data'. Define a getter function to retrieve the device name. 2. Rename ibv_dev and ibv_port as phys_dev and phys_port respectively. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	10f3581dfd	net/mlx5: add Linux-specific header file File drivers/net/linux/mlx5_os.h is added. It includes specific Linux definitions such as PCI driver flags, link state changes interrupts, link removal interrupts, etc. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	2eb4d0107a	net/mlx5: refactor PCI probing on Linux Refactor PCI probing related code. Move Linux specific functions (as well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c file. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	c7f6ba0e53	net/mlx5: remove umem field dependency on Direct Verbs umem field is used in several structs. Its type 'struct mlx5dv_devx_umem ' is changed to 'void '. This change will allow non-Linux OS compilations. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	e85f623e13	net/mlx5: remove attributes dependency on Verbs Define 'struct mlx5_dev_attr' which is ibv and dv independent. It contains attribute that were originally contained in 'struct ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API mlx5_os_get_dev_attr() which fills in the new defined struct. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	c468501658	common/mlx5: remove protection domain dependency on Verbs Replace 'struct ibv_pd ' with 'void ' in struct mlx5_ctx_shared and all function calls in mlx5 PMD. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	f44b09f9e3	net/mlx5: add Linux-specific file with getter functions 'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct ibv_context ' to 'void '. 'ctx' members which are verbs dependent (e.g. device_name) will be accessed through getter functions which are added to a new file under Linux directory: linux/mlx5_os.c. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Ophir Munk	6e88bc42c7	net/mlx5: rename Verbs shared object Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-16 19:21:07 +02:00
Suanming Mou	a1da6f624c	net/mlx5: add reclaim memory mode Currently, when flow destroyed, some memory resources may still be kept as cached to help next time create flow more efficiently. Some system may need the resources to be more flexible with flow create and destroy. After peak time, with millions of flows destroyed, the system would prefer the resources to be reclaimed completely, no cache is needed. Then the resources can be allocated and used by other components. The system is not so sensitive about the flow insertion rate, but more care about the resources. Both DPDK mlx5 PMD driver and the low level component rdma-core have provided the flow resources to be configured cached or not, but there is no APIs or parameters exposed to user to configure the flow resources cache mode. In this case, introduce a new PMD devarg to let user configure the flow resources cache mode will be helpful. This commit is to add a new "reclaim_mem_mode" to help user configure if the destroyed flows' cache resources should be kept or not. Their will be three mode can be chosen: 1. 0(none). It means the flow resources will be cached as usual. The resources will be cached, helpful with flow insertion rate. 2. 1(light). It will only enable the DPDK PMD level resources reclaim. 3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be configured as reclaimed mode. With these three mode, user can configure the resources cache mode with different levels. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-06-03 17:19:26 +02:00
Suanming Mou	33860cfab6	net/mlx5: fix interrupt installation timing Currently, the DevX counter query works asynchronously with Devx interrupt handler return the query result. When port closes, the interrupt handler will be uninstalled and the Devx comp obj will also be destroyed. Meanwhile the query is still not cancelled. In this case, counter query may use the invalid Devx comp which has been destroyed, and query failure with invalid FD will be reported. Adjust the shared interrupt install and uninstall timing to make the counter asynchronous query stop before interrupt uninstall. Fixes: `f15db67df0` ("net/mlx5: accelerate DV flow counter query") Cc: stable@dpdk.org Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-06-02 16:06:24 +02:00
Matan Azrad	5af61440dd	net/mlx5: fix flow counter container resize The design of counter container resize used double buffer algorithm in order to synchronize between the query thread to the control thread. When the control thread detected resize need, it created new bigger buffer for the counter pools in a new container and change the container index atomically. In case the query thread had not detect the previous resize before a new one need was detected by the control thread, the control thread returned EAGAIN to the flow creation API used a COUNT action. The rte_flow API doesn't allow unblocked commands and doesn't expect to get EAGAIN error type. So, when a lot of flows were created between 2 different periodic queries, 2 different resizes might try to be created and caused EAGAIN error. This behavior may blame flow creations. Change the synchronization way to use lock instead of double buffer algorithm. The critical section of this lock is very small, so flow insertion rate should not be decreased. Fixes: `ebbac312e4` ("net/mlx5: resize a full counter container") Cc: stable@dpdk.org Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-05-18 20:35:57 +02:00
Dong Zhou	fa2d01c87d	net/mlx5: support flow aging Currently, there is no flow aging check and age-out event callback mechanism for mlx5 driver, this patch implements it. It's included: - Splitting the current counter container to aged or no-aged container since reducing memory consumption. Aged container will allocate extra memory to save the aging parameter from user configuration. - Aging check and age-out event callback mechanism based on current counter. When a flow be checked aged-out, RTE_ETH_EVENT_FLOW_AGED event will be triggered to applications. - Implement the new API: rte_flow_get_aged_flows, applications can use this API to get aged flows. Signed-off-by: Dong Zhou <dongz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-05 15:54:27 +02:00
Dong Zhou	8d93c830e4	net/mlx5: modify ext-counter memory allocation Currently, the counter pool needs 512 ext-counter memory for no batch counters, it's allocated separately by once, behind the 512 basic-counter memory. This is not easy to get ext-counter pointer by corresponding basic-counter pointer. This is also no easy for expanding some other potential additional type of counter memory. So, need allocate every one of ext-counter and basic-counter together, as a single piece of memory. It's will be same for further additional type of counter memory. In this case, one piece of memory contains all type of memory for one counter, it's easy to get each type memory by using offsetting. Signed-off-by: Dong Zhou <dongz@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>	2020-05-05 15:54:27 +02:00
Alexander Kozyrev	6c55b622a9	net/mlx5: set dynamic flow metadata in Rx queues Using a global mbuf dynamic field for metadata incurs some performance penalty on a datapath. Store this information in the Rx queue descriptor for a better cache locality. Fixes: `a18ac61133` ("net/mlx5: add metadata support to Rx datapath") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 22:28:06 +02:00
Suanming Mou	ab612adc1e	net/mlx5: allocate flow API from indexed pool This commit allocates rte flow from indexed memory pool. Allocate rte flow memory from indexed memory pool helps save more than MALLOC_ELEM_OVERHEAD bytes memory from rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	e745f90007	net/mlx5: optimize flow RSS struct When destroy the flow with RSS, flow can invoke the queues information from hrxq index table object, since the queue number and list are both saved to the index table object. No need to save the duplicated data in rte flow. Save the RSS description information to the intermediate private data when create the flow with RSS action helps to save the memory for rte flow. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Wentao Cui	c2ddde7950	net/mlx5: optimize flow director filter memory This commit is for mlx5 fdir flow memory optimization. Currently for the fdir member in rte_flow structure. It saves the fdir memory pointer directly. As fdir is fading away, use one bit help to indicate the function in the flow and add the content to an extra list save the memory for the other widely usage cases. Signed-off-by: Wentao Cui <wentaoc@mellanox.com> Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	90e6053a19	net/mlx5: convert mark copy resource to indexed Allocate mark copy resource from indexed pool helps rte flow saves the 4 bytes index instead of 8 bytes pointer. For mark copy resource itself, it helps save MALLOC_ELEM_OVERHEAD bytes from rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	8638e2b076	net/mlx5: allocate meter from indexed pool This patch allocate the meter object memory from indexed memory pool which will help to save the MALLOC_ELEM_OVERHEAD memory taken by rte_malloc(). Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	b88341ca35	net/mlx5: convert flow dev handle to indexed This commit converts flow dev handle to indexed. Change the mlx5 flow handle from pointer to uint32_t saves memory for flow. With million flow, it saves several MBytes memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	772dc0eb83	net/mlx5: convert hrxq to indexed This commit converts hrxq to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	7ac99475ce	net/mlx5: convert jump resource to indexed This commit convert jump resource to indexed. The table data struct is allocated from indexed memory. As it is add in the hash list, the pointer is still used for hash list search. The index is added to the table struct, and the pointer in flow handle is decrease to uint32_t type. For flow without jump flows, it saves 4 bytes memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	f3faf9ea11	net/mlx5: convert port id action to indexed This commit converts port id action to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	5f1142692a	net/mlx5: convert tag resource to indexed This commit convert tag resource to indexed. As tag resources are add in the hash list, to avoid introduce performance issue and keep the hash list, only the tag resource memory is allocated from indexed memory. The resources is still added to the hash list. Add four bytes index in the tag resource struct and change the tag resources in the flow handle from pointer to uint32_t seems be no benefit for tag resource, but it saves memory for flows without tag action. And also for sub flows share one tag action resource. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	8acf8ac9b7	net/mlx5: convert push VLAN resource to indexed This commit converts the push VLAN resource to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00
Suanming Mou	014d1cbe51	net/mlx5: convert encap/decap resource to indexed This commit converts the flow encap/decap resource to indexed. Using the uint32_t index instead of pointer saves 4 bytes memory for the flow handle. For millions flows, it will save several MBytes of memory. Signed-off-by: Suanming Mou <suanmingm@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-04-21 13:57:09 +02:00

1 2 3 4 5 ...

551 Commits