Commit Graph

338 Commits

Author SHA1 Message Date
Ophir Munk
391b8bcc81 common/mlx5: move some getter functions from net driver
Getter functions such as: 'mlx5_os_get_ctx_device_name',
'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name',
'mlx5_os_get_umem_id' are implemented under net directory. To enable
additional devices (e.g. regex, vdpa) to access these getter functions
they are moved under common directory.

As part of this commit string sizes DEV_SYSFS_NAME_MAX and
DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination
string size in strncpy() function is bigger than the source string size.
This update will avoid GCC version 8 error -Werror=stringop-truncation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
ac79183dc6 net/mlx5: optimize free counter lookup
Currently, when allocate a new counter, it needs loop the whole
container pool list to get a free counter.

In the case with millions of counters allocated, and all the pools
are empty, allocate the new counter will still need to loop the
whole container pool list first, then allocate a new pool to get a
free counter. It wastes the cycles during the pool list traversal.

Add a global free counter list in the container helps to get the free
counters more efficiently.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
b1cc226644 net/mlx5: optimize single counter pool search
For single counter, when allocate a new counter, it needs to find the pool
it belongs in order to do the query together.

Once there are millions of counters allocated, the pool array in the
counter container will become very large. In this case, the pool search
from the pool array will become extremely slow.

Save the minimum and maximum counter ID to have a quick check of current
counter ID range. And start searching the pool from the last pool in the
container will mostly get the needed pool since counter ID increases
sequentially.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Suanming Mou
632f0f1905 net/mlx5: manage shared counters in three-level table
Currently, to check if any shared counter with same ID existing, it will
have to loop the counter pools to search for the counter. Even add the
counter to the list will also not so helpful while there are thousands
of shared counters in the list.

Change Three-Level table to look up the counter index saved in the
relevant table entry will be more efficient.

This patch introduces the Three-level table to save the ID relevant
counter index in the table. Then the next while the same ID comes, just
check the table entry of this ID will get the counter index directly.
No search will be needed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Matan Azrad
aec086c9f1 common/mlx5: share kernel interface name getter
Some configuration of the mlx5 port are done by the kernel net device
associated to the IB device represents the PCI device.

The DPDK mlx5 driver uses Linux system calls, for example ioctl, in
order to configure per port configurations requested by the DPDK user.

One of the basic knowledges required to access the correct kernel net
device is its name.

Move function to get interface name from IB device path to the common
library.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:29 +02:00
Ophir Munk
d5ed8aa944 net/mlx5: add memory region callbacks in per-device cache
Prior to this commit MR operations were verbs based and hard coded under
common/mlx5/linux directory. This commit enables upper layers (e.g.
net/mlx5) to determine which MR operations to use. For example the net
layer could set devx based MR operations in non-Linux environments. The
reg_mr and dereg_mr callbacks are added to the global per-device MR
cache 'struct mlx5_mr_share_cache'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-17 16:32:01 +02:00
Ophir Munk
73bf9235e9 net/mlx5: refactor statistics
mlx5 statistics are calculated by several methods:
1. In software when packets go through datapath.
2. Calling ioctl with ETHTOOL command (Linux specific).
3. Reading counters from SYSFS device path (Linux specific).

The Linux related functions are moved to file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
042f5c94fd net/mlx5: refactor device operations for Linux
There are three types of eth_dev_ops: primary, secondary and isolate.
Their function calls assignments are moved from common file
mlx5.c to the Linux specific file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
1256805dd5 net/mlx5: move Linux-specific functions
File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for
functions which are Linux specific. Functions which are Linux agnostics
remain in mlx5_ethdev.c file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
9138989036 net/mlx5: rename ib in names
Renames in this commit:
mlx5_ibv_list -> mlx5_dev_ctx_list
mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx
mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx
mlx5_ibv_shared_port -> mlx5_dev_shared_port
ibv_port -> dev_port

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
21b7c452a6 net/mlx5: remove completion object dependency on DV
Replace 'struct mlx5dv_devx_cmd_comp *' with 'void *' in 'struct
mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
834a9019ec net/mlx5: remove Verbs dependency in spawn struct
1. Replace 'struct ibv_device *' with 'void *' in 'struct
mlx5_dev_spawn_data'. Define a getter function to retrieve the
device name.
2. Rename ibv_dev and ibv_port as phys_dev and phys_port
respectively.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
10f3581dfd net/mlx5: add Linux-specific header file
File drivers/net/linux/mlx5_os.h is added. It includes specific
Linux definitions such as PCI driver flags, link state changes
interrupts, link removal interrupts, etc.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
2eb4d0107a net/mlx5: refactor PCI probing on Linux
Refactor PCI probing related code. Move Linux specific functions (as
well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c7f6ba0e53 net/mlx5: remove umem field dependency on Direct Verbs
umem field is used in several structs. Its type 'struct mlx5dv_devx_umem
*' is changed to 'void *'. This change will allow non-Linux OS
compilations.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
e85f623e13 net/mlx5: remove attributes dependency on Verbs
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c468501658 common/mlx5: remove protection domain dependency on Verbs
Replace 'struct ibv_pd *' with 'void *' in struct mlx5_ctx_shared and
all function calls in mlx5 PMD.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
f44b09f9e3 net/mlx5: add Linux-specific file with getter functions
'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct
ibv_context *' to 'void *'.  'ctx' members which are verbs dependent
(e.g. device_name) will be accessed through getter functions which are
added to a new file under Linux directory: linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
6e88bc42c7 net/mlx5: rename Verbs shared object
Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Suanming Mou
a1da6f624c net/mlx5: add reclaim memory mode
Currently, when flow destroyed, some memory resources may still be kept
as cached to help next time create flow more efficiently.

Some system may need the resources to be more flexible with flow create
and destroy.  After peak time, with millions of flows destroyed, the
system would prefer the resources to be reclaimed completely, no cache
is needed. Then the resources can be allocated and used by other
components. The system is not so sensitive about the flow insertion
rate, but more care about the resources.

Both DPDK mlx5 PMD driver and the low level component rdma-core have
provided the flow resources to be configured cached or not, but there is
no APIs or parameters exposed to user to configure the flow resources
cache mode. In this case, introduce a new PMD devarg to let user
configure the flow resources cache mode will be helpful.

This commit is to add a new "reclaim_mem_mode" to help user configure if
the destroyed flows' cache resources should be kept or not.

Their will be three mode can be chosen:
1. 0(none). It means the flow resources will be cached as usual. The
resources will be cached, helpful with flow insertion rate.
2. 1(light). It will only enable the DPDK PMD level resources reclaim.
3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be
configured as reclaimed mode.

With these three mode, user can configure the resources cache mode with
different levels.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-03 17:19:26 +02:00
Suanming Mou
33860cfab6 net/mlx5: fix interrupt installation timing
Currently, the DevX counter query works asynchronously with Devx
interrupt handler return the query result. When port closes, the
interrupt handler will be uninstalled and the Devx comp obj will
also be destroyed. Meanwhile the query is still not cancelled.

In this case, counter query may use the invalid Devx comp which
has been destroyed, and query failure with invalid FD will be
reported.

Adjust the shared interrupt install and uninstall timing to make
the counter asynchronous query stop before interrupt uninstall.

Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Matan Azrad
5af61440dd net/mlx5: fix flow counter container resize
The design of counter container resize used double buffer algorithm in
order to synchronize between the query thread to the control thread.
When the control thread detected resize need, it created new bigger
buffer for the counter pools in a new container and change the container
index atomically.
In case the query thread had not detect the previous resize before a new
one need was detected by the control thread, the control thread returned
EAGAIN to the flow creation API used a COUNT action.

The rte_flow API doesn't allow unblocked commands and doesn't expect to
get EAGAIN error type.

So, when a lot of flows were created between 2 different periodic
queries, 2 different resizes might try to be created and caused EAGAIN
error.
This behavior may blame flow creations.

Change the synchronization way to use lock instead of double buffer
algorithm.

The critical section of this lock is very small, so flow insertion
rate should not be decreased.

Fixes: ebbac312e4 ("net/mlx5: resize a full counter container")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Dong Zhou
fa2d01c87d net/mlx5: support flow aging
Currently, there is no flow aging check and age-out event callback
mechanism for mlx5 driver, this patch implements it. It's included:
- Splitting the current counter container to aged or no-aged container
  since reducing memory consumption. Aged container will allocate extra
  memory to save the aging parameter from user configuration.
- Aging check and age-out event callback mechanism based on current
  counter. When a flow be checked aged-out, RTE_ETH_EVENT_FLOW_AGED
  event will be triggered to applications.
- Implement the new API: rte_flow_get_aged_flows, applications can use
  this API to get aged flows.

Signed-off-by: Dong Zhou <dongz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-05 15:54:27 +02:00
Dong Zhou
8d93c830e4 net/mlx5: modify ext-counter memory allocation
Currently, the counter pool needs 512 ext-counter memory for no batch
counters, it's allocated separately by once, behind the 512
basic-counter memory. This is not easy to get ext-counter pointer by
corresponding basic-counter pointer. This is also no easy for expanding
some other potential additional type of counter memory.

So, need allocate every one of ext-counter and basic-counter together,
as a single piece of memory. It's will be same for further additional
type of counter memory. In this case, one piece of memory contains all
type of memory for one counter, it's easy to get each type memory by
using offsetting.

Signed-off-by: Dong Zhou <dongz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-05 15:54:27 +02:00
Alexander Kozyrev
6c55b622a9 net/mlx5: set dynamic flow metadata in Rx queues
Using a global mbuf dynamic field for metadata incurs some
performance penalty on a datapath. Store this information in
the Rx queue descriptor for a better cache locality.

Fixes: a18ac61133 ("net/mlx5: add metadata support to Rx datapath")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 22:28:06 +02:00
Suanming Mou
ab612adc1e net/mlx5: allocate flow API from indexed pool
This commit allocates rte flow from indexed memory pool.

Allocate rte flow memory from indexed memory pool helps save more than
MALLOC_ELEM_OVERHEAD bytes memory from rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
e745f90007 net/mlx5: optimize flow RSS struct
When destroy the flow with RSS, flow can invoke the queues information
from hrxq index table object, since the queue number and list are both
saved to the index table object. No need to save the duplicated data in
rte flow.

Save the RSS description information to the intermediate private data
when create the flow with RSS action helps to save the memory for rte
flow.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Wentao Cui
c2ddde7950 net/mlx5: optimize flow director filter memory
This commit is for mlx5 fdir flow memory optimization.

Currently for the fdir member in rte_flow structure. It saves the fdir
memory pointer directly. As fdir is fading away, use one bit help to
indicate the function in the flow and add the content to an extra list
save the memory for the other widely usage cases.

Signed-off-by: Wentao Cui <wentaoc@mellanox.com>
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
90e6053a19 net/mlx5: convert mark copy resource to indexed
Allocate mark copy resource from indexed pool helps rte flow saves the 4
bytes index instead of 8 bytes pointer. For mark copy resource itself,
it helps save MALLOC_ELEM_OVERHEAD bytes from rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
8638e2b076 net/mlx5: allocate meter from indexed pool
This patch allocate the meter object memory from indexed memory pool
which will help to save the MALLOC_ELEM_OVERHEAD memory taken by
rte_malloc().

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
b88341ca35 net/mlx5: convert flow dev handle to indexed
This commit converts flow dev handle to indexed.

Change the mlx5 flow handle from pointer to uint32_t saves memory for
flow. With million flow, it saves several MBytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
772dc0eb83 net/mlx5: convert hrxq to indexed
This commit converts hrxq to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
7ac99475ce net/mlx5: convert jump resource to indexed
This commit convert jump resource to indexed.

The table data struct is allocated from indexed memory. As it is add in
the hash list, the pointer is still used for hash list search. The index
is added to the table struct, and the pointer in flow handle is decrease
to uint32_t type. For flow without jump flows, it saves 4 bytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
f3faf9ea11 net/mlx5: convert port id action to indexed
This commit converts port id action to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
5f1142692a net/mlx5: convert tag resource to indexed
This commit convert tag resource to indexed.

As tag resources are add in the hash list, to avoid introduce
performance issue and keep the hash list, only the tag resource memory
is allocated from indexed memory. The resources is still added to the
hash list. Add four bytes index in the tag resource struct and change
the tag resources in the flow handle from pointer to uint32_t seems be
no benefit for tag resource, but it saves memory for flows without tag
action. And also for sub flows share one tag action resource.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
8acf8ac9b7 net/mlx5: convert push VLAN resource to indexed
This commit converts the push VLAN resource to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Suanming Mou
014d1cbe51 net/mlx5: convert encap/decap resource to indexed
This commit converts the flow encap/decap resource to indexed.

Using the uint32_t index instead of pointer saves 4 bytes memory for the
flow handle. For millions flows, it will save several MBytes of memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:09 +02:00
Vu Pham
b8dc6b0e29 common/mlx5: refactor memory management
Refactor common memory btree and cache management to common driver.
Replace some input parameters of MR APIs to more common data structure
like PD, port_id, share_cache,... so that multiple PMD drivers can
use those MR APIs.

Modify mlx5 net pmd driver to use MR management APIs from common driver.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:08 +02:00
Vu Pham
a4de9586ac common/mlx5: refactor IPC handling from net driver
Refactor common multi-process handling codes from net PMD to common
driver. Using tuple mp_id{name, port_id} as standard input parameter
for all multi-process IPC APIs instead of using rte_eth_dev.

Modify net PMD to use multi-process APIs from common driver.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:08 +02:00
Suanming Mou
9dbaf7eef6 net/mlx5: fix meter suffix table leak
Currently, the meter suffix table is created and saved in the mlx5
shared struct. It causes the suffix table will never be released
even without any meter rules.

Move the suffix table to meter domain struct to help the suffix table
be released when all the meter rules are destroyed.

Fixes: 46a5e6bc6a ("net/mlx5: prepare meter flow tables")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:08 +02:00
Alexander Kozyrev
ecb160456a net/mlx5: add device parameter for MPRQ stride size
Define a device parameter to configure log 2 of a stride size for MPRQ
- mprq_log_stride_size. User is able to specify a stride size in a range
allowed by an underlying hardware. The default stride size is defined as
2048 bytes to encompass most commonly used packet sizes in the Internet
(MTU 1518 and less) and will be used in case a maximum configured packet
size cannot fit into the largest possible stride size. Otherwise a
stride size is set to a large enough value to encompass a whole packet.

Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:08 +02:00
Bing Zhao
3ac3d8234b net/mlx5: fix index when creating flow
When creating a flow, usually the creating routine is called in
serial. No parallel execution is supported right now. The same
function will be called only once for a single flow creation.

But there is a special case that the creating routine will be called
nested. If the xmeta feature is enabled and there is FLAG / MARK in
the actions list, some metadata reg copy flow needs to be created
before the original flow is applied to the hardware.
In the flow non-cached mode, resources only for flow creation will
not be saved anymore. The memory space is pre-allocated and reused
for each flow. A global index for each device is used to indicate
the memory address of the resources. If the function is called in a
nested mode, then the index will be reset and make everything get
corrupted.

To solve this, a nested index is introduced to save the position for
the original flow creation. Currently, only one level nested call
of the flow creating routine is supported.

Fixes: e7bfa3596a ("net/mlx5: separate the flow handle resource")

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
261bb99a21 net/mlx5: reorganize fallback counter management
Currently, the fallback counter is also allocated from the pool, the
specify fallback function code becomes a bit duplicate.

Reorganize the fallback counter code to make it reuse from the normal
counter code.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
826b8a8732 net/mlx5: split flow counter struct
Currently, the counter struct saves both the members used by batch
counters and none batch counters. The members which are only used
by none batch counters cost 16 bytes extra memory for batch counters.
As normally there will be limited none batch counters, mix the none
batch counter and batch counter members becomes quite expensive for
batch counter. If 1 million batch counters are created, it means 16 MB
memory which will not be used by the batch counters are allocated.

Split the mlx5_flow_counter struct for batch and none batch counters
helps save the memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
956d5c74d7 net/mlx5: optimize flow counter handle type
Currently, DV and verbs counters are both changed to indexed. It means
while creating the flow with counter, flow can save the indexed value to
address the counter.

Save the 4 bytes indexed value in the rte_flow instead of 8 bytes
pointer helps to save memory with millions of flows.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
4001d7ad26 net/mlx5: change Direct Verbs counter to indexed
This part of the counter optimize change the DV counter to indexed as
what have already done in verbs. In this case, all the mlx5 flow counter
can be addressed by index.

The counter index is composed of pool index and the counter offset in
the pool counter array. The batch and none batch counter dcs ID offset
0x800000 is used to avoid the mix up for the index. As batch counter dcs
ID starts from 0x800000 and none batch counter dcs starts from 0, the
0x800000 offset is added to the batch counter index to indicate the
index of batch counter.

The counter pointer in rte_flow struct will be aligned to index instead
of pointer. It will save 4 bytes memory for every rte_flow. With
millions of rte_flow, it will save MBytes memory.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
c3d3b14099 net/mlx5: change verbs counter allocator to indexed
This is part of the counter optimize which will save the indexed counter
id instead of the counter pointer in the rte_flow.

Place the verbs counter into the container pool helps the counter to be
indexed correctly independent with the raw counter.

The counter pointer in rte_flow will be changed to indexed value after
the DV counter is also changed to indexed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Suanming Mou
c989f49a38 net/mlx5: optimize counter release query generation
Query generation was introduced to avoid counter to be reallocated
before the counter statistics be fully updated. Since the counters
be released between query trigger and query handler may miss the
packets arrived in the trigger and handler gap period. In this case,
user can only allocate the counter while pool query_gen is greater
than the counter query_gen + 1 which indicates a new round of query
finished, the statistic is fully updated.

Split the pool query_gen to start_query_gen and end_query_gen helps
to have a better identify for the counter released in the gap period.
And it helps the counter released before query trigger or after query
handler can be reallocated more efficiently.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-04-21 13:57:07 +02:00
Jiawei Wang
c5193a0bbe net/mlx5: fix imissed counter overflow
The Hw counters is defined as 32bit unsigned value and read from
the sysfs. Firstly read the base value while application start,
then fetch the new value while do query and minus the base value.
If the new value is less than base value, will result in a
negative value and convert to the big value as unsigned 64bit.

PMD add xstats field to store the last successfully read counter,
use it if failed to read hw counter from sysfs.
PMD also record the last output value to handle the wrap around case,
if overflow happened, increase the wrap count by 1 and save into the
higher 32bit, and update the new value into lower 32bit, finally
return the 64bit counter value.

Fixes: ce9494d76c ("net/mlx5: report imissed statistics")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Wang <jiaweiw@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:05 +02:00
Bing Zhao
1ad9a3d09f net/mlx5: introduce buffer size parameter for hairpin
When creating a hairpin queue, the total data size and the maximal
number of packets are interrelated. The differ is the stride size.
Larger buffer size means big packet like jumbo could be supported,
but in the meanwhile, it will introduce more cache misses and have a
side effect on the performance.
Now a new device parameter "hp_buf_log_sz" is introduced for
applications to set the total data buffer size (the logarithm value).
Then the maximal number of packets will also be calculated
automatically by this value.
Applications could also change this value to a larger one in order
to support larger packets in hairpin case. A smaller value will be
beneficial for memory consumption.
If it is not set, the default value will be used.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-04-21 13:57:05 +02:00