Commit Graph

551 Commits

Author SHA1 Message Date
Ophir Munk
db12615b42 net/mlx5: prepare MR prototypes for DevX
Currently MR operations are Verbs based. This commit updates MR
operations prototypes such that DevX MR operations callbacks can be used
as well.  Rename 'struct mlx5_verbs_ops' as 'struct mlx5_mr_ops' and
move it to shared file mlx5.h.

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-01-08 16:03:07 +01:00
Ophir Munk
1f29d15ec9 net/mlx5: extend device attributes getter
This commit adds device attributes parameters to be reported by
mlx5_os_get_dev_attr(): max_cqe, max_mr, max_pd, max_srq, max_srq_wr

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-01-08 16:03:07 +01:00
Viacheslav Ovsiienko
81c3b97735 net/mlx5: fix Verbs memory allocation callback
The rdma-core library uses callbacks to allocate and free memory
from DPDK. The memory allocation callback used the complicated
and incorrect way to get the NUMA socket ID from the context.
The context was wrong that might result in wrong socket ID
and allocating memory from wrong node.

The callbacks are assigned once as Infinibande device context
is created allowing early access to shared DPDK memory for all
Verbs internal objects need that.

Fixes: 36dabcea78 ("net/mlx5: use anonymous Direct Verbs allocator argument")
Fixes: 2eb4d0107a ("net/mlx5: refactor PCI probing on Linux")
Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2021-01-08 16:03:04 +01:00
Andrey Vesnovaty
fa7ad49e96 net/mlx5: fix shared RSS action update
The shared RSS action update was not operational due to lack
of kernel driver support of TIR object modification.
This commit introduces the workaround to support shared RSS
action modify using an indirect queue table update instead of
touching TIR object directly.
Limitations: the only supported RSS property to update is queues, the
rest of the properties ignored.

Fixes: d2046c09aa ("net/mlx5: support shared action for RSS")

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-22 16:40:03 +01:00
Xueming Li
e6818853c0 net/mlx5: set representor to first PF in bonding mode
When the representor device was set to PF1 in bonding mode, iterating
device iterator that looking for representors by bonding device failed
to match PF0 pci address with PF1 address. So detaching PF bonding
device only detached all representors on PF0.

This patch registers all representors of PF1 with PF0 as PCI device.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-20 21:10:05 +01:00
Gregory Etelson
9cac7ded37 net/mlx5: fix tunnel offload object allocation
The original patch allocated tunnel offload objects with invalid
indexes. As the result, PMD tunnel object allocation failed.

In this patch indexed pool provides both an index and memory for a new
tunnel offload object.
Also tunnel offload ipool moved to dv enabled code only.

Fixes: 4ae8825c50 ("net/mlx5: use indexed pool as id generator")

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-20 21:10:04 +01:00
Suanming Mou
fabf8a3724 net/mlx5: fix shared RSS action release
As shared RSS action will be shared by multiple flows, the action
is created as global standalone action and managed only by the
relevant shared action management functions.

Currently, hrxqs will be created by shared RSS action or general
queue action. For hrxqs created by shared RSS action, they should
also only be released with shared RSS action. It's not correct to
release the shared RSS action hrxqs as general queue actions do
in flow destroy.

This commit adds a new fate action type for shared RSS action to
handle the shared RSS action hrxq release correctly.

Fixes: e1592b6c4d ("net/mlx5: make Rx queue thread safe")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-13 19:43:26 +01:00
Xueming Li
c3ba8ecb76 net/mlx5: fix missing meter packet
For transfer flow with meter, packet was passed without applying flow
action. The group level was multiplied by 10 for group level 65531.

This patch fixes this issue by correcting suffix table group level
calculation.

Fixes: 3e8f3e51fd ("net/mlx5: fix meter table definitions")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-13 19:43:26 +01:00
Dekel Peled
105d214965 net/mlx5: fix aging queue doorbell ringing
Recent patch introduced a new SQ for ASO flow hit management.
This SQ uses two WQEBB's for each WQE.
The SQ producer index is 16 bits wide.

The enqueue loop posts new WQEs to the ASO SQ, using WQE index for
the SQ management.
This 16 bits index multiplied by 2 was wrongly used also for SQ
doorbell ringing.
The multiplication caused the SW index overlapping to be out of sync
with the hardware index, causing it to get stuck.

This patch separates the WQE index management from the doorbell index
management.
So, for each WQE index incrementation by 1, the doorbell index is
incremented by 2.

Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-13 16:26:54 +01:00
Viacheslav Ovsiienko
eb63ec0e56 net/mlx5: fix UAR used by ASO queues
The dedicated UAR was allocated for the ASO queues.
The shared UAR created for Tx queues can be used instead.

Fixes: f935ed4b64 ("net/mlx5: support flow hit action for aging")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-14 10:56:30 +01:00
Tal Shnaiderman
e82ddd28e3 common/mlx5: split PCI relaxed ordering for read and write
The current DevX implementation of the relaxed ordering feature is
enabling relaxed ordering usage only if both relaxed ordering read AND
write are supported.  In that case both relaxed ordering read and write
are activated.

This commit will optimize the usage of relaxed ordering by enabling it
when the read OR write features are supported.  Each relaxed ordering
type will be activated according to its own capability bit.

This will align the DevX flow with the verbs implementation of
ibv_reg_mr when using the flag IBV_ACCESS_RELAXED_ORDERING

Fixes: 53ac93f71a ("net/mlx5: create relaxed ordering memory regions")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-04 19:16:24 +01:00
Matan Azrad
f9bc5274a6 net/mlx5: allow age modes combination
ASO age action mode is not supported in group 0 while counter base age
action mode supports group 0.

Allow using the 2 modes of age action in parallel, so group 0 flows will
use counter base age actions and group > 0 flows will use ASO age
actions.

Currently, counter base age action doesn't support shared action API so
group 0 flows cannot share age actions.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>
2020-11-03 23:35:07 +01:00
Matan Azrad
81073e1f8c net/mlx5: support shared age action
Add support for rte_flow shared action API for ASO age action.

First step here to support validate, create, query and destroy.

The support is only for age ASO mode.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>
2020-11-03 23:35:07 +01:00
Matan Azrad
4a42ac1f1c net/mlx5: optimize shared RSS action memory
The RSS shared action was saved in flow memory by a pointer.
It means that every flow memory includes 8B only for optional shared
RSS case.

Move the RSS objects to be used by indexed pool which reduces the flow
handle memory to 4B.

So, now, the shared action handler is also just a 4B index.

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Dekel Peled <dekelp@nvidia.com>
2020-11-03 23:35:07 +01:00
Dekel Peled
f935ed4b64 net/mlx5: support flow hit action for aging
A new ASO (Advanced Steering Operation) feature was added in the last
mlx5 adapters to support flow hit detection.

Using this new steering action, the driver can detect flow traffic hit
and to reset this indication any time.

The ASO age action cannot support flows in table 0.

Add support for flow aging action in rte_flow using this new feature.

The counter aging mode will be taken only when the ASO feature is not
supported for the user flow groups.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:07 +01:00
Alexander Kozyrev
54c2d46b16 net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniCQE size is only 8 bytes and this limits the ability
to successfully keep the compression session in case of various
traffic patterns.

The current miniCQE format only keeps the compression session alive
in case of uniform traffic with the Hash RSS as the only difference.
There are requests to keep the compression session in case of tagged
traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic.
Add 2 new miniCQE formats in order to achieve the best performance
for these traffic patterns: Flow Tag and Packet Header miniCQEs.

The existing rxq_cqe_comp_en devarg is modified to specify the
desired miniCQE format. Specifying 2 selects Flow Tag format
for better compression rate in case of RTE Flow Mark traffic.
Specifying 3 selects Checksum format (existing format for MPRQ).
Specifying 4 selects L3/L4 Header format for better compression
rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:07 +01:00
Viacheslav Ovsiienko
41c2bb6357 net/mlx5: use C11 atomics in packet scheduling
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering and explicit
memory barrier for Clock Queue and timestamps synchronization.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:05 +01:00
Xueming Li
9fbe97f0ce net/mlx5: remove shared context lock
To support multi-thread flow insertion, this patch removes shared data
lock since all resources should support concurrent protection.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:05 +01:00
Suanming Mou
cc608e4df4 net/mlx5: make shared action list thread safe
This commit uses spinlock to protect the shared action list in multiple
thread.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:05 +01:00
Suanming Mou
1978414169 net/mlx5: make sample and mirror action thread safe
This commit uses cache list to make sample and mirror action thread
safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
3422af2af2 net/mlx5: make push VLAN action cache thread safe
To support multi-thread flow insertion, this patch converts push VLAN
action cache list to thread safe cache list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
0fd5f82aaa net/mlx5: make port ID action cache thread safe
To support multi-thread flow insertion, this patch convert port id
action cache list to thread safe cache list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
1872635570 net/mlx5: make matcher list thread safe
To support multi-thread flow insertion, this path converts matcher list
to use thread safe cache list API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Suanming Mou
e1592b6c4d net/mlx5: make Rx queue thread safe
This commit applies the cache linked list to Rx queue to make it thread
safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Suanming Mou
84d3389048 net/mlx5: optimize shared RSS list operation
When create shared RSS hrxq, the hrxq will be created directly, no hrxq
will be reused.

In this case, add the shared RSS hrxq to the queue list is redundant.
And it also hurts the generic queue lookup.

This commit avoids add the shared RSS hrxq to the queue list.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Suanming Mou
ff7ab341af net/mlx5: remove unused mreg copy
After non-cache mode feature was implemented, the flows can only be
created when port started. No need to check if the mreg flows are
created in port stopped status, and apply the mreg flows after port
start will also never happen.

This commit removed the relevant not used mreg copy code.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
afd7a62514 net/mlx5: make flow table cache thread safe
To support multi-thread flow insertion/removal, this patch uses thread
safe hash list API for flow table cache hash list.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Suanming Mou
b80726dc51 net/mlx5: create global default miss action
This commit creates the global default miss action instead of maintain
it in flow insertion time. This makes the action to be thread safe.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
d163fc2d15 net/mlx5: make flow list thread safe
To support multi-thread flow operations, this patch introduces list lock
for the rte_flow list manages all the rte_flow handlers.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
4ae8825c50 net/mlx5: use indexed pool as id generator
The ID generation API used an integer pool to save released ID, To
support multiple flow, it has to be enhanced to be thread safe.

Indexed pool could be used to generate unique ID by setting size of pool
entry to zero. Since bitmap is used, an extra benefits is saving memory
to about one bit per entry. Further more indexed pool could be thread
safe by enabling lock.

This patch leverages indexed pool to generate ID, removes
unused ID generating API.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
94b6d88438 net/mlx5: reuse flow id as hairpin id
Hairpin flow matching required a unique flow ID for matching.
This patch reuses flow ID as hairpin flow ID, this will save some code
to generate a separate hairpin ID, also saves flow memory by removing
hairpin ID.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Xueming Li
8bb81f2649 net/mlx5: use thread specific flow workspace
As part of multi-thread flow support, this patch moves flow intermediate
data to thread specific, makes them a flow workspace. The workspace is
allocated per thread, destroyed along with thread life-cycle.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:04 +01:00
Alexander Kozyrev
cf7d1995b9 net/mlx5: use C11 atomics for flow tables
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RTE flow tables.
Enforce Acquire/Release model for managing DevX pools.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:04 +01:00
Alexander Kozyrev
b5c8b3e70c net/mlx5: use C11 atomics for RxQ/TxQ refcounts
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RxQ/TxQ refcounts.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:04 +01:00
Matan Azrad
89f170c0da net/mlx5: fix Tx queue start
The Tx queue stop\start operations update the HW state of the Tx queue
object. The stop API should update the state from ready to reset in
order to stop any queue traffic and the start API should update the
state from reset to ready in order to open the traffic path.

The start API wrongly tried to change the state from ready to ready what
caused a failure in FW on the current state validation.

Replace ready to ready command by reset to ready command in the Tx start
API.

Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Asaf Penso <asafp@nvidia.com>
2020-11-03 23:35:04 +01:00
Bing Zhao
02109eaeac net/mlx5: support getting hairpin peer ports
In real-life business, one device could be attached and detached
dynamically. The hairpin configuration of this port to/from all the
other ports should be enabled and disabled accordingly.

The RTE ethdev lib and PMD should provide this ability to get the
peer ports list in case that the application doesn't save it. It is
recommended that the size of the array to save the port IDs is as
large as the "RTE_MAX_ETHPORTS" to have the maximal capacity.

The order of the peer port IDs may be different from that during
hairpin queues set in the initialization stage. The peer port ID
could be the same as the current device port ID when the hairpin
peer ports contain itself - the single port hairpin.

The application should check the ports' status and decide if the
peer port should be bound / unbound when starting / stopping the
current device.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:04 +01:00
Bing Zhao
37cd4501e8 net/mlx5: support two ports hairpin mode
In order to support hairpin between two ports, mlx5 PMD needs to
implement the functions and provide them as the function pointers.

The bind and unbind functions are executed per port pairs. All the
hairpin queues between the two ports should have the same attributes
during queues setup. Different configurations among queue pairs from
the same ports are not supported. It is allowed that two ports only
have one direction hairpin.

In order to set up the connection between two queues, peer Rx queue
HW information must be fetched via the internal RTE API and the queue
information could be used to modify the SQ object. Then the RQ object
will be modified with the Tx queue HW information. The reverse
operation is not supported right now.

When disconnecting the queues pair, SQ and RQ object should be reset
without any peer HW information. The unbinding operation will try to
disconnect all Tx queues from the port from the Rx queues of the peer
port.

Tx explicit mode attribute will be saved and used when creating a
hairpin flow.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:03 +01:00
Viacheslav Ovsiienko
a0a45e8af7 net/mlx5: configure Rx queue for buffer split
The scatter-gather elements should be configured
accordingly to support the buffer split feature.
The application provides the desired settings for
the segments at the beginning of the packets and
PMD pads the buffer chain (if needed) with attributes
of last specified segment to accommodate the packet
of maximal length.

There are some limitations are implied. The MPRQ
feature should be disengaged if split is requested,
due to MPRQ neither supports pushing data to the
dedicated pools nor follows the flexible buffer sizes.
The vectorized rx_burst routines does not support
the scattering (these ones are extremely simplified
and work over the single segment only) and can't
handle split as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko
9f209b59c8 net/mlx5: support Rx buffer split description
The routine to provide Rx queue setup with specifying
extended receiving buffer description is added.
It allows application to specify desired segment
lengths, data position offsets in the buffer
and dedicated memory pool for each segment.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Gregory Etelson
4ec6360de3 net/mlx5: implement tunnel offload
Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions with tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Andrey Vesnovaty
d7cfcddded net/mlx5: translate shared action for RSS action
Handle shared action on flow validation/creation/destruction.
mlx5 PMD translates shared action into a regular one before handling
flow validation/creation. The shared action translation applied to
utilize the same execution path for both shared and regular actions.
The current implementation supports shared action translation for shared
RSS action only.

RSS action validation split to validate shared RSS action on its
creation in addition to action validation in flow validation/creation
path.

Implement rte_flow shared action API for mlx5 PMD, mostly forwarding
calls to flow driver operations (see struct mlx5_flow_driver_ops).

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Andrey Vesnovaty
b8cc58c140 net/mlx5: modify hash Rx queue objects
Implement modification for hashed table of Rx queue object (see
mlx5_hrxq_modify()). This implementation relies on the capability to
modify TIR object via DevX API, i.e. current implementation doesn't
support verbs HW object operations. The functionality to modify hashed
table of Rx queue object is prerequisite to implement
rete_flow_shared_action_update() for shared RSS action in mlx5 PMD.

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Xueming Li
16dbba257c net/mlx5: fix port shared data reference count
When probe a representor, tag cache hash table and modification cache
hash table allocated memory upon each port, overwrote previous existing
cache in shared context data.

This patch moves reference check of shared data prior to hash table
allocation to avoid such issue.

Fixes: 6801116688 ("net/mlx5: fix multiple flow table hash list")
Fixes: 1ef4cdef26 ("net/mlx5: fix flow tag hash list conversion")
Cc: stable@dpdk.org

Acked-by: Matan Azrad <matan@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
2b5b1aeb39 net/mlx5: optimize counter extend memory
Counter extend memory was allocated for non-batch counter to save the
extra DevX object. Currently, for non-batch counter which does not
support aging, entry in the generic counter struct is used only when
counter is free in free list, and bytes in the struct is used only when
counter is allocated in using.

In this case, the DevX object can be saved to the generic counter struct
union with entry memory when counter is allocated and union with bytes
when counter is free.
And pool type is also not needed as non-fallback mode only has generic
counter and aging counter, just a bit to indicate the pool is aged or
not will be enough.

This eliminates the counter extend info struct saves the memory.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
cfbdc3f938 net/mlx5: rename flow counter macro
Add the MLX5_ prefix to the defined counter macro names.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
e7138997e0 net/mlx5: make shared counters thread safe
The shared counters save the counter index to three level table. As
three level table supports multiple-thread operations now, the shared
counters can take advantage of the table to support multiple-thread.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
3aa279157f net/mlx5: synchronize flow counter pool creation
Currently, counter operations are not thread safe as the counter
pools' array resize is not protected.

This commit protects the container pools' array resize using a spinlock.
The original counter pool statistic memory allocate is moved to the
host thread in order to minimize the critical section. Since that pool
statistic memory is required only in query time. The container pools'
array should be resized by the user threads, the new pool may be used
by other rte_flow APIs before the host thread resize is done, if the
pool is not saved to the pools' array, the specified counter memory will
not be found as the pool is not saved to the counter management pool
array. The pool raw statistic memory will be filled in host thread.

The shared counters will be protected in other commit.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
994829e695 net/mlx5: remove single counter container
A flow counter which was allocated by a batch API couldn't be assigned
to a flow in the root table (group 0) in old rdma-core version.
Hence, a root table flow counter required PMD mechanism to manage
counters which were allocated singly.

Currently, the batch counters have already been supported in root table
includes a new rdma-core version with MLX5_FLOW_ACTION_COUNTER_OFFSET
enum and with a kernel driver includes
MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX_OFFSET enum.

When the PMD uses rdma-core API to assign a batch counter to a root
table flow using invalid counter offset, it should get an error only
if the batch counter assignment for root table is supported.
Using this trial in the initialization time can help to detect the
support.

Using the above trial, if the support is valid, remove the management of
single counter container in the fast counter mechanism. Otherwise, move
the counter mechanism to fallback mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
df051a3e77 net/mlx5: optimize shared counter memory
Instead of using special memory to indicate shared counter, this patch
does the optimization to use the counter handler reserved memory to
indicate it.  The counter index with MLX5_CNT_SHARED_OFFSET means the
shared counter.

This patch is also an arrangement for a new adjustment to use batch
counter as shared counter.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
6b7c717ed1 net/mlx5: locate aging pools in the general container
Commit [1] introduced different container for the aging counter
pools. In order to save container memory the aging counter pools
can be located in the general pool container.

This patch locates the aging counter pools in the general pool
container. Remove the aging container management.

[1] commit fd143711a6 ("net/mlx5: separate aging counter pool range")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Dekel Peled
d5a7d04c79 net/mlx5: support query of age action
Recent patch [1] adds to ethdev the API for query of age action.
This patch implements in MLX5 PMD the query of age action using
this API.

[1] https://mails.dpdk.org/archives/dev/2020-October/184864.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 22:29:25 +01:00
Ivan Ilchenko
62024eb827 ethdev: change stop operation callback to return int
Change eth_dev_stop_t return value from void to int.
Make eth_dev_stop_t implementations across all drivers to return
negative errno values if case of error conditions.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Jiawei Wang
00c10c2211 net/mlx5: update translate function for mirroring
Translate the attribute of sample action that include sample ratio
and sub actions list.
PMD will check the destination action number in current flow,
if found multiple destination actions, then create the new destination
array rdma action that group actions for each destination.
Currently only support port or queue for destination action, and only
encap action can be attached into one port destination.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
b4c0ddbfcc net/mlx5: split sample flow into two sub-flows
The flow with sample action will be split into two sub flows:
the prefix sub flow with the all actions preceding the sample
action and sample action itself, and the suffix sub flow with
the actions following the sample action.

The original items remain in the prefix sub flow, add the
implicit tag action with unique id to set in metadata register,
and suffix sub flow uses the tag item to match with that unique id.

The flow split as below:

Original flow: items / actions pre / sample / actions sfx ->
    prefix sub flow -
            items / actions pre / set_tag action / sample
    suffix sub flow -
            tag_item / actions sfx

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
96b1f0273c net/mlx5: validate sample action
Add sample action validate function.

Sample Flow is supported in NIC-RX and FDB domains. For the NIC-RX
the Sample Flow action list must include the destination queue action.

Only NIC-RX domain supports the optional actions list. FDB doesn't
support any optional actions, the sampled packets is always forwarded
to the E-Switch manager port.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Michael Baum
e96242efa4 net/mlx5: remove Rx queue object type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the Rx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
4c6d80f1c5 net/mlx5: separate Rx queue state modification
Separate Rx state modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
354cc08a2d net/mlx5: remove Tx queue object type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the Tx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
5d9f3c3f48 net/mlx5: separate Tx queue object modification
Separate Tx object modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
f49f44839d net/mlx5: share Tx control code
Move Tx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
86d259cec8 net/mlx5: separate Tx queue object creations
As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.

Define operation structure and DevX module in addition to the existing
Linux Verbs module.
Separate Tx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
e7055bbfbe net/mlx5: reposition event queue number field
The eqn field has become a field of sh directly since it is also
relevant for Tx and Rx.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Suanming Mou
3e8f3e51fd net/mlx5: fix meter table definitions
As metering and metadata features were developed at the same time. The
metering and metadata tables are defined conflicted.

This cause the meter suffix flow jump to the same metadata table and
cause flow deadloop.

Adjust the metering table define to fix that issue.

Fixes: 46a5e6bc6a ("net/mlx5: prepare meter flow tables")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-08 19:58:11 +02:00
Thomas Monjalon
b142387b07 ethdev: allow drivers to return error on close
The device operation .dev_close was returning void.
This driver interface is changed to return an int.

Note that the API rte_eth_dev_close() is still returning void,
although a deprecation notice is pending to change it as well.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com>
Reviewed-by: Liron Himi <lironh@marvell.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:13 +02:00
Suanming Mou
bf615b077d net/mlx5: manage header reformat actions with hashed list
To manage encap decap header format actions mlx5 PMD used the single
linked list and lookup and insertion operations took too long times if
there were millions of objects and this impacted the flow
insertion/deletion rate.

In order to optimize the performance the hashed list is engaged. The
list implementation is updated to support non-unique keys with few
collisions.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-09-30 19:19:09 +02:00
Xueming Li
c21e5facf7 net/mlx5: use bond index for netdev operations
In case of bonding, device ifindex was detected as the PF ifindex, so
any operation using ifindex applied to PF instead of the bond device.
These operations includes MTU get/set, up/down and mac address
manipulation, etc.

This patch detects bond interface ifindex and name for PF that join a
bond interface, uses it by default for netdev operations.

Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-09-30 19:19:09 +02:00
Michael Baum
0c762e81da net/mlx5: share Rx queue drop action code
Move Rx queue drop action similar resources allocations from Verbs
module to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
5eaf882e94 net/mlx5: separate Rx queue drop
Separate Rx queue drop creation into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
5a959cbfa6 net/mlx5: share Rx hash queue code
Move Rx hash queue object similar resources allocations from DevX and
Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
25ae7f1a5d net/mlx5: share Rx queue indirection table code
Move Rx indirection table object similar resources allocations from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
66b96fa6a6 net/mlx5: remove indirection table type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the indirection table structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
85552726d3 net/mlx5: separate Rx hash queue creation
Separate Rx hash queue creation into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
87e2db37ef net/mlx5: separate Rx indirection table object creation
Separate Rx indirection table object creation into both Verbs and DevX
modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
c279f187ee net/mlx5: separate Rx queue object modification
Separate Rx object modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
1260a87b28 net/mlx5: share Rx control code
Move Rx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
322870799e net/mlx5: separate Rx interrupt handling
Separate interrupt event handler into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
6deb19e1b2 net/mlx5: separate Rx queue object creations
As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.

Define operation structure and DevX module in addition to the existing
linux Verbs module.
Separate Rx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Ophir Munk
7af10d29a4 net/mlx5/linux: refactor VLAN
File mlx5_vlan.c contains Netlink APIs (Linux dependent) as part of VM
workaround implementation. Move this implementation to file
linux/mlx5_vlan_os.c.  To remove Netlink dependency in header files
change pointer of type 'struct mlx5_nl_vlan_vmwa_context *' to 'void *'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
8bb2410ea3 net/mlx5: separate VLAN strip modification
When updating a queue vlan stripping offload - either the WQ is modified
in Verbs or the RQ is modified in DevX.  Add a vlan stripping modify
callback to 'struct mlx5_obj_ops' and assign it with the specific Verbs
and DevX implementations: 'rxq_obj_modify_wq_vlan_strip' and
'rxq_obj_modify_rq_vlan_strip' respectively.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
1f66ac5bbe net/mlx5: remove more Direct Verbs dependencies
Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced
with 'void *' to enable compilation under non-Linux operating systems.
New getter functions were added to retrieve the specific fields that
were previously accessed directly.

Replaced structs:
'struct mlx5dv_pp *'
'struct mlx5dv_devx_event_channel *'
'struct mlx5dv_devx_umem *'
'struct mlx5dv_devx_uar *'

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
f00f6562e1 net/mlx5: remove netlink dependency in shared code
This commit adds Linux implementation of routine mlx5_os_mac_addr_flush
as wrapper to Netlink API to avoid direct calls under non-Linux
operating systems.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
e9c0b96e35 net/mlx5: move Linux ifname function
mlx5_get_ifname() prototype includes 'IF_NAMESIZE' definition from Linux
file net/if.h. Since this API is only used under Linux and to enable
compilation under non-Linux OS - move this prototype from shared file
mlx5.h to file linux/mlx5_os.h.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Suanming Mou
3fe889617b net/mlx5: manage modify actions with hashed list
To manage header modify actions mlx5 PMD used the single linked list and
lookup and insertion operations took too long times if there were
millions of objects and this impacted the flow insertion/deletion rate.

In order to optimize the performance the hashed list is engaged. The
list implementation is updated to support non-unique keys with few
collisions.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-09-18 18:55:06 +02:00
Suanming Mou
e1293b10de net/mlx5: fix counter query
Currently, the counter query requires the counter ID should start
with 4 aligned. In none-batch mode, the counter pool might have the
chance to get the counter ID not 4 aligned. In this case, the counter
should be skipped, or the query will be failed.

Skip the counter with ID not 4 aligned as the first counter in the
none-batch count pool to avoid invalid counter query. Once having
new min_dcs ID in the poll less than the skipped counters, the
skipped counters will be returned to the pool free list to use.

Fixes: 5382d28c21 ("net/mlx5: accelerate DV flow counter transactions")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Parav Pandit
392bf9084d common/mlx5: register class drivers through common layer
Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class
driver.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-28 19:01:30 +02:00
Viacheslav Ovsiienko
161d103b23 net/mlx5: add queue start and stop
The mlx5 PMD did not support queue_start and queue_stop eth_dev API
routines, queue could not be suspended and resumed during device
operation.

There is the use case when this feature is crucial for applications:

- there is the secondary process handling the queue
- secondary process crashed/aborted
- some mbufs were allocated or used by secondary application
- some mbufs were allocated by Rx queues to receive packets
- some mbufs were placed to send queue
- queue goes to undefined state

In this case there was no reliable way to recovery queue handling
by restarted secondary process but reset queue to initial state
freeing all involved resources, including buffers involved in queue
operations, reset the mbuf pools, and then reinitialize queue
to working state:

- reset mbuf pool, allocate all mbuf to initialize pool into
  safe state after the crush and allow safe mbuf free calls
- stop queue, free all potentially involved mbufs
- reset mbuf pool again
- start queue, reallocate mbufs needed

This patch introduces the queue start/stop feature with some
limitations:

- hairpin queues are not supported
- it is application responsibility to synchronize start/stop
  with datapath routines, rx/tx_burst must be suspended during
  the queue_start/queue_stop calls
- it is application responsibility to track queue usage and
  provide coordinated queue_start/queue_stop calls from
  secondary and primary processes.
- Rx queues with vectorized Rx routine and engaged CQE
  compression are not supported by this patch currently

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Dekel Peled
08d1838f64 net/mlx5: implement CQ for Rx using DevX API
This patch continues the work to use DevX API for different objects
creation and management.
On Rx control path, the RQ, RQT, and TIR objects can already be
created using DevX API.
This patch adds the support to create CQ for RxQ using DevX API.
The corresponding event channel is also created and utilized using
DevX API.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
9d60f54569 common/mlx5: remove inclusion of Verbs header files
Several source files include Verbs header files as in (1). These source
files will not compile under non-Linux operating systems. This commit
removes this inclusion in two cases:

Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source
file so the inclusion in (1) can be safely removed.

Case 2: Verbs symbols are used. Please note the inclusion in (1) already
appears in file linux/mlx5_glue.h (which represents the interface
to the rdma-core library). Therefore, replace (1) in the source file
with (2).  Under non-Linux operating systems - file mlx5_glue.h will not
include (1).

(1)
 #include <infiniband/verbs.h>
 #include <infiniband/mlx5dv.h>

(2)
 #include <mlx5_glue.h>

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
2e86c4e5c7 net/mlx5: refactor multi-process communication
1. The shared data communication between the primary and the secondary
processes is implemented using Linux API. Move the Linux API code under
linux directory (file linux/mlx5_os.c).

2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary
processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved
under linux (new file linux/mlx5_mp_os.c).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
ef9ee13f6e net/mlx5: cleanup header file
The cleanup refers to header file mlx5.h.
1. Remove unused prototypes.
2. Move prototypes under their correct title.
3. Change functions to static and remove their prototye from the header
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
98c4b12afa net/mlx5: eliminate dependency on Linux in shared header
This commit eliminates Linux dependencies in shared file mlx5.h.

1. All functions using 'struct ifreq' are moved to file
linux/mlx5_ethdev_os.c such that this struct can be removed from mlx5.h.
2. Function mlx5_set_flags() that uses Linux flags (e.g. IFF_UP) is
changed to static and its prototype is removed from mlx5.h.
3. Remove redundant member verbs_action from 'struct mlx5_priv'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
4d18abd130 net/mlx5: wrap Linux promiscuous and multicast functions
This commit adds Linux implementation of routines mlx5_os_set_promisc()
and mlx5_os_set_promisc(). The routines call netlink APIs.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
ab27cdd93a net/mlx5: refactor Linux MAC operations
Move OS specific MAC operations add, remove, modify VF into file
linux/mlx5_os.c.
Remove unused function mlx5_get_mac().

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
3eca5f8a61 net/mlx5: move flow priority discovery to Verbs file
Function calls mlx5_flow_adjust_priority() and
mlx5_flow_discover_priorities() are Verbs based. Move them from file
mlx5_flow.c to file mlx5_flow_verbs.c

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
50f95b23c9 net/mlx5: add option to configure FCS or decapsulation
There are some limitations on some NICs (at least on ConnectX-6 Dx
and BlueField 2) with supporting FCS (frame checksum) scattering for
the tunnel decapsulated packets.

For the case only one of the features can be supported in the same time,
and the new devarg "decap_en" is introduced to provide the choice to the
users.

If FCS scattering feature is not supposed to be engaged by application,
this new devarg should be specified as "decap_en=0", forcing the FCS
feature enable and rejecting tunnel decap actions in the rte_flow engine.
If FCS scatter is not needed and application supposes to use tunnel
decapsulation in rte_flow, the devarg can be omitted or set to non-zero
value (this is default settings).

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
5522da6b20 net/mlx5: add option to allocate memory from system
Currently, for MLX5 PMD, once millions of flows created, the memory
consumption of the flows are also very huge. For the system with limited
memory, it means the system need to reserve most of the memory as huge
page memory to serve the flows in advance. And other normal applications
will have no chance to use this reserved memory any more. While most of
the time, the system will not have lots of flows, the  reserved huge
page memory becomes a bit waste of memory at most of the time.

By the new sys_mem_en devarg, once set it to be true, it allows the PMD
allocate the memory from system by default with the new add mlx5 memory
management functions. Only once the MLX5_MEM_RTE flag is set, the memory
will be allocate from rte, otherwise, it allocates memory from system.

So in this case, the system with limited memory no need to reserve most
of the memory for hugepage. Only some needed memory for datapath objects
will be enough to allocated with explicitly flag. Other memory will be
allocated from system. For system with enough memory, no need to care
about the devarg, the memory will always be from rte hugepage.

One restriction is that for DPDK application with multiple PCI devices,
if the sys_mem_en devargs are different between the devices, the
sys_mem_en only gets the value from the first device devargs, and print
out a message to warn that.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
1c5064044f net/mlx5: create and destroy eCPRI flex parser
eCPRI protocol has unified format layout for the variants, over
ETH layer (including .1Q) and UDP layer.

The common header of the message has 4 bytes fixed length, and the
message payload layers are different based on the type field. Now
only type #0, #2 and #5 will be supported, and 2 bytes are needed.

When creating the flex parser, the header will be extended to 8
bytes and 2 DW samples are needed. The 1st DW starts from offset 0
and will be used for the type field of the common header. The 2nd
DW starts from offset 4 and will be used for the physical channel
ID, real-time control ID or measurement ID fields.

The parser will be created once a flow with eCPRI item is observed
for the first time. After creating, it will remain in the system
and HW until the device is stopped. Right now, there is no need to
destroy the eCPRI flex parser after the last flow with eCPRI item
is destroyed. This is to get rid of the alternate states of creating
and destroying eCPRI flex parser with a single eCPRI flow.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
711aedf187 common/mlx5: add flex parser DevX structures
The structures and other definitions will be used for the dynamic
flex parser creation via Devx command interface. These structures
will be used as some some intermediate variables and input
parameters for the parser creation API.
It is better to keep all members consistent with the PRM definition
even though some of them will not be used.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
daa38a8924 net/mlx5: add flow translation of eCPRI header
In the translation stage, the eCPRI item should be translated into
the format that lower layer driver could use. All the fields that
need to match must be in network byte order after translation, as
well as the mask. Since the header in the item belongs to the network
layers stack, and the input parameter of the header is considered to
be in big-endian format already.

Base on the definition in the PRM, the DW samples will be used for
matching in the FTE/STE. Now, the type field and only the PC ID, RTC
ID, and DLY MSR ID of the payload will be supported. The masks should
be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are
needed to support such matching. The mask fields could be zeros to
support some wildcard rules. But it makes no sense to support the
rule matching only on the payload but without matching type field.

The DW samples should be stored after the flex parser creation for
eCPRI. There is no need to query the sample IDs each time when
creating a flow rule with eCPRI item. It will not introduce
insertion rate degradation significantly.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
a2854c4de1 net/mlx5: convert Rx timestamps in real-time format
The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
3b025c0ca4 net/mlx5: provide send scheduling error statistics
The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:

  - txpp_err_miss_int - rearm queue interrupt was not handled
    was not handled in time and service routine might miss
    the completions

  - txpp_err_rearm_queue - reports errors in rearm queue
  - txpp_err_clock_queue - reports errors in clock queue

  - txpp_err_ts_past - timestamps in the packet being sent
    were found in the past, timestamps were ignored

  - txpp_err_ts_future - timestamps in the packet being sent
    were found in the too distant future (beyond HW/clock queue
    capabilities to schedule, typically it is about 16M of
    tx_pp devarg periods)

  - txpp_jitter - estimated jitter in device clocks between
    8K completions of Clock Queue.

  - txpp_wander - estimated wander in device clocks between
    16M completions of Clock Queue.

  - txpp_sync_lost - error flag, the Clock Queue completions
    synchronization is lost, accurate packet scheduling can
    not be handled, timestamps are being ignored, the restart
    of all ports using scheduling must be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
b94d93ca73 net/mlx5: support reading device clock
If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
085ff447f0 net/mlx5: convert timestamp to completion index
The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
77522be0a5 net/mlx5: introduce clock queue service routine
Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:

- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
  continuously running and pushing cross-channel requests
  to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
aef1e20ebe net/mlx5: allocate packet pacing context
This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularity and assigns to Clock Queue.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
551c94c83e net/mlx5: create rearm queue for packet pacing
The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
d133f4cdb7 net/mlx5: create clock queue for packet pacing
This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
fc4d4f732b net/mlx5: introduce shared UAR resource
This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
24feb04596 net/mlx5: fix UAR lock sharing for multiport devices
The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.

Fixes: f048f3d479 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
8f848f32fc net/mlx5: introduce send scheduling devargs
This patch introduces the new devargs:

tx_pp - enables accurate packet send scheduling on mbuf timestamps
  in the PMD. On the device start if "rte_dynflag_timestamp"
  dynamic flag is registered and this devarg non-zero value is
  specified, the driver initializes all necessary internal
  infrastructure to provide packet scheduling. The parameter
  value specifies scheduling granularity in nanoseconds.

tx_skew - the parameter adjusts the send packet scheduling on
  timestamps and represents the average delay between beginning
  of the transmitting descriptor processing by the hardware and
  appearance of actual packet data on the wire. The value should
  be provided in nanoseconds and is valid only if tx_pp parameter
  is specified. The default value is zero.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Shiri Kuzin
0f0ae73a32 net/mlx5: add parameter for LACP packets control
The new devarg will control the steering of the lacp traffic.
When setting dv_lacp_by_user = 0 the lacp traffic will be
steered to kernel and managed there.

When setting dv_lacp_by_user = 1 the lacp traffic will
not be steered and the user will need to manage it.

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Shiri Kuzin
3c78124f0a net/mlx5: add default miss action to flow engine
The new action is an internal mlx5 action that will call
the rdma-core function MLX5DV_FLOW_ACTION_DEFAULT_MISS.

The default miss action will be used when a bond is
configured to allow traffic related to the bond to
be managed in the kernel.

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Ori Kam
262c7ad0dd common/mlx5: move doorbell record from net driver
The creation of DBR can be used by a number of different
Mellanox PMDs. for example RegEx / Net / VDPA.

This commits moves the DBR creation and release functions to common
folder.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-30 14:52:30 +02:00
Ophir Munk
391b8bcc81 common/mlx5: move some getter functions from net driver
Getter functions such as: 'mlx5_os_get_ctx_device_name',
'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name',
'mlx5_os_get_umem_id' are implemented under net directory. To enable
additional devices (e.g. regex, vdpa) to access these getter functions
they are moved under common directory.

As part of this commit string sizes DEV_SYSFS_NAME_MAX and
DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination
string size in strncpy() function is bigger than the source string size.
This update will avoid GCC version 8 error -Werror=stringop-truncation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
ac79183dc6 net/mlx5: optimize free counter lookup
Currently, when allocate a new counter, it needs loop the whole
container pool list to get a free counter.

In the case with millions of counters allocated, and all the pools
are empty, allocate the new counter will still need to loop the
whole container pool list first, then allocate a new pool to get a
free counter. It wastes the cycles during the pool list traversal.

Add a global free counter list in the container helps to get the free
counters more efficiently.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
b1cc226644 net/mlx5: optimize single counter pool search
For single counter, when allocate a new counter, it needs to find the pool
it belongs in order to do the query together.

Once there are millions of counters allocated, the pool array in the
counter container will become very large. In this case, the pool search
from the pool array will become extremely slow.

Save the minimum and maximum counter ID to have a quick check of current
counter ID range. And start searching the pool from the last pool in the
container will mostly get the needed pool since counter ID increases
sequentially.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Suanming Mou
632f0f1905 net/mlx5: manage shared counters in three-level table
Currently, to check if any shared counter with same ID existing, it will
have to loop the counter pools to search for the counter. Even add the
counter to the list will also not so helpful while there are thousands
of shared counters in the list.

Change Three-Level table to look up the counter index saved in the
relevant table entry will be more efficient.

This patch introduces the Three-level table to save the ID relevant
counter index in the table. Then the next while the same ID comes, just
check the table entry of this ID will get the counter index directly.
No search will be needed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Matan Azrad
aec086c9f1 common/mlx5: share kernel interface name getter
Some configuration of the mlx5 port are done by the kernel net device
associated to the IB device represents the PCI device.

The DPDK mlx5 driver uses Linux system calls, for example ioctl, in
order to configure per port configurations requested by the DPDK user.

One of the basic knowledges required to access the correct kernel net
device is its name.

Move function to get interface name from IB device path to the common
library.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:29 +02:00
Ophir Munk
d5ed8aa944 net/mlx5: add memory region callbacks in per-device cache
Prior to this commit MR operations were verbs based and hard coded under
common/mlx5/linux directory. This commit enables upper layers (e.g.
net/mlx5) to determine which MR operations to use. For example the net
layer could set devx based MR operations in non-Linux environments. The
reg_mr and dereg_mr callbacks are added to the global per-device MR
cache 'struct mlx5_mr_share_cache'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-17 16:32:01 +02:00
Ophir Munk
73bf9235e9 net/mlx5: refactor statistics
mlx5 statistics are calculated by several methods:
1. In software when packets go through datapath.
2. Calling ioctl with ETHTOOL command (Linux specific).
3. Reading counters from SYSFS device path (Linux specific).

The Linux related functions are moved to file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
042f5c94fd net/mlx5: refactor device operations for Linux
There are three types of eth_dev_ops: primary, secondary and isolate.
Their function calls assignments are moved from common file
mlx5.c to the Linux specific file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
1256805dd5 net/mlx5: move Linux-specific functions
File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for
functions which are Linux specific. Functions which are Linux agnostics
remain in mlx5_ethdev.c file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
9138989036 net/mlx5: rename ib in names
Renames in this commit:
mlx5_ibv_list -> mlx5_dev_ctx_list
mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx
mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx
mlx5_ibv_shared_port -> mlx5_dev_shared_port
ibv_port -> dev_port

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
21b7c452a6 net/mlx5: remove completion object dependency on DV
Replace 'struct mlx5dv_devx_cmd_comp *' with 'void *' in 'struct
mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
834a9019ec net/mlx5: remove Verbs dependency in spawn struct
1. Replace 'struct ibv_device *' with 'void *' in 'struct
mlx5_dev_spawn_data'. Define a getter function to retrieve the
device name.
2. Rename ibv_dev and ibv_port as phys_dev and phys_port
respectively.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
10f3581dfd net/mlx5: add Linux-specific header file
File drivers/net/linux/mlx5_os.h is added. It includes specific
Linux definitions such as PCI driver flags, link state changes
interrupts, link removal interrupts, etc.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
2eb4d0107a net/mlx5: refactor PCI probing on Linux
Refactor PCI probing related code. Move Linux specific functions (as
well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c7f6ba0e53 net/mlx5: remove umem field dependency on Direct Verbs
umem field is used in several structs. Its type 'struct mlx5dv_devx_umem
*' is changed to 'void *'. This change will allow non-Linux OS
compilations.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
e85f623e13 net/mlx5: remove attributes dependency on Verbs
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c468501658 common/mlx5: remove protection domain dependency on Verbs
Replace 'struct ibv_pd *' with 'void *' in struct mlx5_ctx_shared and
all function calls in mlx5 PMD.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
f44b09f9e3 net/mlx5: add Linux-specific file with getter functions
'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct
ibv_context *' to 'void *'.  'ctx' members which are verbs dependent
(e.g. device_name) will be accessed through getter functions which are
added to a new file under Linux directory: linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
6e88bc42c7 net/mlx5: rename Verbs shared object
Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Suanming Mou
a1da6f624c net/mlx5: add reclaim memory mode
Currently, when flow destroyed, some memory resources may still be kept
as cached to help next time create flow more efficiently.

Some system may need the resources to be more flexible with flow create
and destroy.  After peak time, with millions of flows destroyed, the
system would prefer the resources to be reclaimed completely, no cache
is needed. Then the resources can be allocated and used by other
components. The system is not so sensitive about the flow insertion
rate, but more care about the resources.

Both DPDK mlx5 PMD driver and the low level component rdma-core have
provided the flow resources to be configured cached or not, but there is
no APIs or parameters exposed to user to configure the flow resources
cache mode. In this case, introduce a new PMD devarg to let user
configure the flow resources cache mode will be helpful.

This commit is to add a new "reclaim_mem_mode" to help user configure if
the destroyed flows' cache resources should be kept or not.

Their will be three mode can be chosen:
1. 0(none). It means the flow resources will be cached as usual. The
resources will be cached, helpful with flow insertion rate.
2. 1(light). It will only enable the DPDK PMD level resources reclaim.
3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be
configured as reclaimed mode.

With these three mode, user can configure the resources cache mode with
different levels.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-03 17:19:26 +02:00
Suanming Mou
33860cfab6 net/mlx5: fix interrupt installation timing
Currently, the DevX counter query works asynchronously with Devx
interrupt handler return the query result. When port closes, the
interrupt handler will be uninstalled and the Devx comp obj will
also be destroyed. Meanwhile the query is still not cancelled.

In this case, counter query may use the invalid Devx comp which
has been destroyed, and query failure with invalid FD will be
reported.

Adjust the shared interrupt install and uninstall timing to make
the counter asynchronous query stop before interrupt uninstall.

Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Matan Azrad
5af61440dd net/mlx5: fix flow counter container resize
The design of counter container resize used double buffer algorithm in
order to synchronize between the query thread to the control thread.
When the control thread detected resize need, it created new bigger
buffer for the counter pools in a new container and change the container
index atomically.
In case the query thread had not detect the previous resize before a new
one need was detected by the control thread, the control thread returned
EAGAIN to the flow creation API used a COUNT action.

The rte_flow API doesn't allow unblocked commands and doesn't expect to
get EAGAIN error type.

So, when a lot of flows were created between 2 different periodic
queries, 2 different resizes might try to be created and caused EAGAIN
error.
This behavior may blame flow creations.

Change the synchronization way to use lock instead of double buffer
algorithm.

The critical section of this lock is very small, so flow insertion
rate should not be decreased.

Fixes: ebbac312e4 ("net/mlx5: resize a full counter container")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Dong Zhou
fa2d01c87d net/mlx5: support flow aging
Currently, there is no flow aging check and age-out event callback
mechanism for mlx5 driver, this patch implements it. It's included:
- Splitting the current counter container to aged or no-aged container
  since reducing memory consumption. Aged container will allocate extra
  memory to save the aging parameter from user configuration.
- Aging check and age-out event callback mechanism based on current
  counter. When a flow be checked aged-out, RTE_ETH_EVENT_FLOW_AGED
  event will be triggered to applications.
- Implement the new API: rte_flow_get_aged_flows, applications can use
  this API to get aged flows.

Signed-off-by: Dong Zhou <dongz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-05 15:54:27 +02:00
Dong Zhou
8d93c830e4 net/mlx5: modify ext-counter memory allocation
Currently, the counter pool needs 512 ext-counter memory for no batch
counters, it's allocated separately by once, behind the 512
basic-counter memory. This is not easy to get ext-counter pointer by
corresponding basic-counter pointer. This is also no easy for expanding
some other potential additional type of counter memory.

So, need allocate every one of ext-counter and basic-counter together,
as a single piece of memory. It's will be same for further additional
type of counter memory. In this case, one piece of memory contains all
type of memory for one counter, it's easy to get each type memory by
using offsetting.

Signed-off-by: Dong Zhou <dongz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-05 15:54:27 +02:00
Alexander Kozyrev
6c55b622a9 net/mlx5: set dynamic flow metadata in Rx queues
Using a global mbuf dynamic field for metadata incurs some
performance penalty on a datapath. Store this information in
the Rx queue descriptor for a better cache locality.

Fixes: a18ac61133 ("net/mlx5: add metadata support to Rx datapath")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 22:28:06 +02:00
Suanming Mou
ab612adc1e net/mlx5: allocate flow API from indexed pool
This commit allocates rte flow from indexed memory pool.

Allocate rte flow memory from indexed memory pool helps save more than
MALLOC_ELEM_OVERHEAD bytes memory from rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
e745f90007 net/mlx5: optimize flow RSS struct
When destroy the flow with RSS, flow can invoke the queues information
from hrxq index table object, since the queue number and list are both
saved to the index table object. No need to save the duplicated data in
rte flow.

Save the RSS description information to the intermediate private data
when create the flow with RSS action helps to save the memory for rte
flow.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Wentao Cui
c2ddde7950 net/mlx5: optimize flow director filter memory
This commit is for mlx5 fdir flow memory optimization.

Currently for the fdir member in rte_flow structure. It saves the fdir
memory pointer directly. As fdir is fading away, use one bit help to
indicate the function in the flow and add the content to an extra list
save the memory for the other widely usage cases.

Signed-off-by: Wentao Cui <wentaoc@mellanox.com>
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
90e6053a19 net/mlx5: convert mark copy resource to indexed
Allocate mark copy resource from indexed pool helps rte flow saves the 4
bytes index instead of 8 bytes pointer. For mark copy resource itself,
it helps save MALLOC_ELEM_OVERHEAD bytes from rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
8638e2b076 net/mlx5: allocate meter from indexed pool
This patch allocate the meter object memory from indexed memory pool
which will help to save the MALLOC_ELEM_OVERHEAD memory taken by
rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
b88341ca35 net/mlx5: convert flow dev handle to indexed
This commit converts flow dev handle to indexed.

Change the mlx5 flow handle from pointer to uint32_t saves memory for
flow. With million flow, it saves several MBytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
772dc0eb83 net/mlx5: convert hrxq to indexed
This commit converts hrxq to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
7ac99475ce net/mlx5: convert jump resource to indexed
This commit convert jump resource to indexed.

The table data struct is allocated from indexed memory. As it is add in
the hash list, the pointer is still used for hash list search. The index
is added to the table struct, and the pointer in flow handle is decrease
to uint32_t type. For flow without jump flows, it saves 4 bytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
f3faf9ea11 net/mlx5: convert port id action to indexed
This commit converts port id action to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
5f1142692a net/mlx5: convert tag resource to indexed
This commit convert tag resource to indexed.

As tag resources are add in the hash list, to avoid introduce
performance issue and keep the hash list, only the tag resource memory
is allocated from indexed memory. The resources is still added to the
hash list. Add four bytes index in the tag resource struct and change
the tag resources in the flow handle from pointer to uint32_t seems be
no benefit for tag resource, but it saves memory for flows without tag
action. And also for sub flows share one tag action resource.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
8acf8ac9b7 net/mlx5: convert push VLAN resource to indexed
This commit converts the push VLAN resource to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
014d1cbe51 net/mlx5: convert encap/decap resource to indexed
This commit converts the flow encap/decap resource to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00