Commit Graph

1854 Commits

Author SHA1 Message Date
Bing Zhao
02109eaeac net/mlx5: support getting hairpin peer ports
In real-life business, one device could be attached and detached
dynamically. The hairpin configuration of this port to/from all the
other ports should be enabled and disabled accordingly.

The RTE ethdev lib and PMD should provide this ability to get the
peer ports list in case that the application doesn't save it. It is
recommended that the size of the array to save the port IDs is as
large as the "RTE_MAX_ETHPORTS" to have the maximal capacity.

The order of the peer port IDs may be different from that during
hairpin queues set in the initialization stage. The peer port ID
could be the same as the current device port ID when the hairpin
peer ports contain itself - the single port hairpin.

The application should check the ports' status and decide if the
peer port should be bound / unbound when starting / stopping the
current device.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:04 +01:00
Bing Zhao
37cd4501e8 net/mlx5: support two ports hairpin mode
In order to support hairpin between two ports, mlx5 PMD needs to
implement the functions and provide them as the function pointers.

The bind and unbind functions are executed per port pairs. All the
hairpin queues between the two ports should have the same attributes
during queues setup. Different configurations among queue pairs from
the same ports are not supported. It is allowed that two ports only
have one direction hairpin.

In order to set up the connection between two queues, peer Rx queue
HW information must be fetched via the internal RTE API and the queue
information could be used to modify the SQ object. Then the RQ object
will be modified with the Tx queue HW information. The reverse
operation is not supported right now.

When disconnecting the queues pair, SQ and RQ object should be reset
without any peer HW information. The unbinding operation will try to
disconnect all Tx queues from the port from the Rx queues of the peer
port.

Tx explicit mode attribute will be saved and used when creating a
hairpin flow.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:03 +01:00
Bing Zhao
1a01264f62 net/mlx5: change hairpin queue peer checking
In the current implementation of single port mode hairpin, the peer
queue should belong to the same port of the current queue. When the
two ports hairpin mode is introduced, such checking should be removed
to make the hairpin queue setup execute successfully since it is not
an invalid condition, if the Tx port and Rx port are not the same.

In the meanwhile, different devices could have different queue
configurations. The queues number of peer port is unknown to the
current device. The checking should be removed also.

If the Tx and Rx port IDs of a hairpin peer are different, only the
manual binding and explicit Tx flows are supported. Or else, the four
combinations of modes could be supported. The mode attributes
consistency checking will be done when connecting the queue with its
peer queue.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:03 +01:00
Bing Zhao
e4b7b8d082 common/mlx5: fix PCI driver name
In the refactor of mlx5 common layer, the PCI driver name to the RTE
device was changed from "net_mlx5" to "mlx5_pci". The string of name
"mlx5_pci" is used directly in the structure rte_pci_driver.

In the past, a macro "MLX5_DRIVER_NAME" is used instead of any direct
string, and now it is missing. The functions that use
"MLX5_DRIVER_NAME" will get some mismatch, e.g mlx5_eth_find_next.

It needs to use this macro again in all code to make everything get
aligned.

Fixes: 8a41f4decc ("common/mlx5: introduce layer for multiple class drivers")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:03 +01:00
Viacheslav Ovsiienko
6c8f7f1c18 net/mlx5: report Rx buffer split capabilities
Add rte_eth_dev_info->rx_seg_capa parameters:
  - receiving to multiple pools is supported
  - buffer offsets are supported
  - no offset alignment requirement
  - reports the maximal number of segments
  - reports the buffer split offload flag

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko
7f1620082b net/mlx5: support Rx buffer split on datapath
Only the regular rx_burst routine is updated to support split,
because the vectorized ones does not support scatter and MPRQ
does not support split at all.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko
213e2727a2 net/mlx5: register multiple pool for Rx queue
The split feature for receiving packets was added to the mlx5
PMD, now Rx queue can receive the data to the buffers belonging
to the different pools and the memory of all the involved pool
must be registered for DMA operations in order to allow hardware
to store the data.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko
a0a45e8af7 net/mlx5: configure Rx queue for buffer split
The scatter-gather elements should be configured
accordingly to support the buffer split feature.
The application provides the desired settings for
the segments at the beginning of the packets and
PMD pads the buffer chain (if needed) with attributes
of last specified segment to accommodate the packet
of maximal length.

There are some limitations are implied. The MPRQ
feature should be disengaged if split is requested,
due to MPRQ neither supports pushing data to the
dedicated pools nor follows the flexible buffer sizes.
The vectorized rx_burst routines does not support
the scattering (these ones are extremely simplified
and work over the single segment only) and can't
handle split as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Viacheslav Ovsiienko
9f209b59c8 net/mlx5: support Rx buffer split description
The routine to provide Rx queue setup with specifying
extended receiving buffer description is added.
It allows application to specify desired segment
lengths, data position offsets in the buffer
and dedicated memory pool for each segment.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:35:02 +01:00
Gregory Etelson
4ec6360de3 net/mlx5: implement tunnel offload
Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions with tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Andrey Vesnovaty
d2046c09aa net/mlx5: support shared action for RSS
Implement shared action create/destroy/update/query. The current
implementation support is limited to shared RSS action only. The shared
RSS action create operation prepares hash RX queue objects for all
supported permutations of the hash.  The shared RSS action update
operation relies on functionality to modify hash RX queue introduced in
one of the previous commits in this patch series.

Implement RSS shared action and handle shared RSS on flow apply and
release. The lookup for hash RX queue object for RSS action is limited
to the set of objects stored in the shared action itself and when
handling shared RSS action. The lookup for hash RX queue object inside
shared action is performed by hash only.

Current implementation limited to DV flow driver operations i.e. verbs
flow driver operations doesn't support shared action.

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Andrey Vesnovaty
d7cfcddded net/mlx5: translate shared action for RSS action
Handle shared action on flow validation/creation/destruction.
mlx5 PMD translates shared action into a regular one before handling
flow validation/creation. The shared action translation applied to
utilize the same execution path for both shared and regular actions.
The current implementation supports shared action translation for shared
RSS action only.

RSS action validation split to validate shared RSS action on its
creation in addition to action validation in flow validation/creation
path.

Implement rte_flow shared action API for mlx5 PMD, mostly forwarding
calls to flow driver operations (see struct mlx5_flow_driver_ops).

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Andrey Vesnovaty
b8cc58c140 net/mlx5: modify hash Rx queue objects
Implement modification for hashed table of Rx queue object (see
mlx5_hrxq_modify()). This implementation relies on the capability to
modify TIR object via DevX API, i.e. current implementation doesn't
support verbs HW object operations. The functionality to modify hashed
table of Rx queue object is prerequisite to implement
rete_flow_shared_action_update() for shared RSS action in mlx5 PMD.

Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:35:02 +01:00
Alexander Kozyrev
0f20acbf5e net/mlx5: implement vectorized MPRQ burst
MPRQ (Multi-Packet Rx Queue) processes one packet at a time using
simple scalar instructions. MPRQ works by posting a single large buffer
(consisted of multiple fixed-size strides) in order to receive multiple
packets at once on this buffer. A Rx packet is then copied to a
user-provided mbuf or PMD attaches the Rx packet to the mbuf by the
pointer to an external buffer.

There is an opportunity to speed up the packet receiving by processing
4 packets simultaneously using SIMD (single instruction, multiple data)
extensions. Allocate mbufs in batches for every MPRQ buffer and process
the packets in groups of 4 until all the strides are exhausted. Then
switch to another MPRQ buffer and repeat the process over again.

The vectorized MPRQ burst routine is engaged automatically in case
the mprq_en=1 devarg is specified and the vectorization is not disabled
explicitly by providing rx_vec_en=0 devarg. There is a limitation:
LRO is not supported and scalar MPRQ is selected if it is on.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:24:25 +01:00
Alexander Kozyrev
1ded26239a net/mlx5: refactor vectorized Rx
Move the main processing cycle into a separate function:
rxq_cq_process_v. Put the regular rxq_burst_v function
to a non-arch specific file. Having all SIMD instructions
in a single reusable block is a first preparatory step to
implement vectorized Rx burst for MPRQ feature.

Pass a pointer to the storage of mbufs directly to the
rxq_copy_mbuf_v instead of calculating the pointer inside
this function. This is needed for the future vectorized Rx
routing which is going to pass a different pointer here.

Calculate the number of packets to replenish inside the
mlx5_rx_replenish_bulk_mbuf. Containing this logic in one
place allows us to do the same for MPRQ case.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 23:24:25 +01:00
Xueming Li
16dbba257c net/mlx5: fix port shared data reference count
When probe a representor, tag cache hash table and modification cache
hash table allocated memory upon each port, overwrote previous existing
cache in shared context data.

This patch moves reference check of shared data prior to hash table
allocation to avoid such issue.

Fixes: 6801116688 ("net/mlx5: fix multiple flow table hash list")
Fixes: 1ef4cdef26 ("net/mlx5: fix flow tag hash list conversion")
Cc: stable@dpdk.org

Acked-by: Matan Azrad <matan@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2020-11-03 23:24:25 +01:00
Shiri Kuzin
42dcd453d9 net/mlx5: fix xstats reset reinitialization
The mlx5_xstats_reset clears the device extended statistics.
In this function the driver may reinitialize the structures
that are used to read device counters.

In case of reinitialization, the number of counters may
change, which wouldn't be taken into account by the
reset API callback and can cause a segmentation fault.

This issue is fixed by allocating the counters size after
the reinitialization.

Fixes: a4193ae3bc ("net/mlx5: support extended statistics")
Cc: stable@dpdk.org

Reported-by: Ralf Hoffmann <ralf.hoffmann@allegro-packets.com>
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
2b5b1aeb39 net/mlx5: optimize counter extend memory
Counter extend memory was allocated for non-batch counter to save the
extra DevX object. Currently, for non-batch counter which does not
support aging, entry in the generic counter struct is used only when
counter is free in free list, and bytes in the struct is used only when
counter is allocated in using.

In this case, the DevX object can be saved to the generic counter struct
union with entry memory when counter is allocated and union with bytes
when counter is free.
And pool type is also not needed as non-fallback mode only has generic
counter and aging counter, just a bit to indicate the pool is aged or
not will be enough.

This eliminates the counter extend info struct saves the memory.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
cfbdc3f938 net/mlx5: rename flow counter macro
Add the MLX5_ prefix to the defined counter macro names.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
e7138997e0 net/mlx5: make shared counters thread safe
The shared counters save the counter index to three level table. As
three level table supports multiple-thread operations now, the shared
counters can take advantage of the table to support multiple-thread.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
0796c7b1de net/mlx5: make three level table thread safe
This commit adds thread safety support in three level table using
spinlock and reference counter for each table entry.

An new mlx5_l3t_prepare_entry() function is added in order to support
multiple-thread operation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
3aa279157f net/mlx5: synchronize flow counter pool creation
Currently, counter operations are not thread safe as the counter
pools' array resize is not protected.

This commit protects the container pools' array resize using a spinlock.
The original counter pool statistic memory allocate is moved to the
host thread in order to minimize the critical section. Since that pool
statistic memory is required only in query time. The container pools'
array should be resized by the user threads, the new pool may be used
by other rte_flow APIs before the host thread resize is done, if the
pool is not saved to the pools' array, the specified counter memory will
not be found as the pool is not saved to the counter management pool
array. The pool raw statistic memory will be filled in host thread.

The shared counters will be protected in other commit.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
994829e695 net/mlx5: remove single counter container
A flow counter which was allocated by a batch API couldn't be assigned
to a flow in the root table (group 0) in old rdma-core version.
Hence, a root table flow counter required PMD mechanism to manage
counters which were allocated singly.

Currently, the batch counters have already been supported in root table
includes a new rdma-core version with MLX5_FLOW_ACTION_COUNTER_OFFSET
enum and with a kernel driver includes
MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX_OFFSET enum.

When the PMD uses rdma-core API to assign a batch counter to a root
table flow using invalid counter offset, it should get an error only
if the batch counter assignment for root table is supported.
Using this trial in the initialization time can help to detect the
support.

Using the above trial, if the support is valid, remove the management of
single counter container in the fast counter mechanism. Otherwise, move
the counter mechanism to fallback mode.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
df051a3e77 net/mlx5: optimize shared counter memory
Instead of using special memory to indicate shared counter, this patch
does the optimization to use the counter handler reserved memory to
indicate it.  The counter index with MLX5_CNT_SHARED_OFFSET means the
shared counter.

This patch is also an arrangement for a new adjustment to use batch
counter as shared counter.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Suanming Mou
6b7c717ed1 net/mlx5: locate aging pools in the general container
Commit [1] introduced different container for the aging counter
pools. In order to save container memory the aging counter pools
can be located in the general pool container.

This patch locates the aging counter pools in the general pool
container. Remove the aging container management.

[1] commit fd143711a6 ("net/mlx5: separate aging counter pool range")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 23:24:25 +01:00
Dekel Peled
d5a7d04c79 net/mlx5: support query of age action
Recent patch [1] adds to ethdev the API for query of age action.
This patch implements in MLX5 PMD the query of age action using
this API.

[1] https://mails.dpdk.org/archives/dev/2020-October/184864.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
613d64e412 net/mlx5: log LRO minimal size
Add debug printout showing HCA capability lro_min_mss_size - the
minimal size of TCP segment required for coalescing.
MLX5 PMD documentation is updated to note this condition.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
90e30c7488 net/mlx5: fix use of atomic cmpset for age state
According to documentation [1], function rte_atomic16_cmpset()
return value is non-zero on success; 0 on failure.
In existing code this function is called, and the return value
is compared to AGE_CANDIDATE, which is defined as 1.
Such comparison is incorrect and can lead to unwanted behavior.

This patch updates the calls to rte_atomic16_cmpset(), to check
that the return value is 0 or non-zero.

[1] https://doc.dpdk.org/api/rte__atomic_8h.html

Fixes: fa2d01c87d ("net/mlx5: support flow aging")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
491757372f net/mlx5: enforce limitation on IPv6 next protocol
Due to PRM requirement, the IPv6 header item 'proto' field, indicating
the next header protocol, should not be set as extension header.
This patch adds the relevant validation, and documents the limitation.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
0e5a0d8f75 net/mlx5: support match on IPv6 fragment extension
rte_flow update, following RFC [1], added to ethdev the rte_flow item
ipv6_frag_ext.
This patch adds to MLX5 PMD the option to match on this item type.

[1] http://mails.dpdk.org/archives/dev/2020-March/160255.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
ad3d227ead net/mlx5: support match on IPv6 fragment packets
This patch adds to MLX5 PMD the support of matching on IPv6
fragmented and non-fragmented packets, using the new field
has_frag_ext, added to rte_flow following RFC [1].

[1] https://mails.dpdk.org/archives/dev/2020-August/177257.html

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
6859e67ef6 net/mlx5: support match on IPv4 fragment packets
This patch adds to MLX5 PMD the support of matching on IPv4
fragmented and non-fragmented packets, using the IPv4 header
fragment_offset field.

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-11-03 22:29:25 +01:00
Dekel Peled
d7e2ea627d net/mlx5: remove handling of ICMP fragmented packets
Commit [1] forced setting of match on 'frag' bit mask 1 and value 0.
Previous patch in this series added support of match on fragmented and
non-fragmented packets on L3 items, so this setting is now redundant.

This patch removes the changes done in [1].

[1] commit 85407db9f60d ("net/mlx5: fix matching for ICMP fragments")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-11-03 22:29:24 +01:00
Matan Azrad
3ec73abeed net/mlx5/linux: fix Tx queue operations decision
One of the conditions to create Tx queue object by DevX is to be sure
that the DPDK mlx5 driver is not going to be the E-Switch manager of
the device. The issue is with the default FDB flows managed by the
kernel driver, which are not created by the kernel when the Tx queues
are created by DevX.

The current decision is to create the Tx queues by Verbs when E-Switch
is enabled while the current behavior uses an opposite condition to
create them by DevX.

Create the Tx queues by Verbs when E-Switch is enabled.

Fixes: 86d259cec8 ("net/mlx5: separate Tx queue object creations")

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 22:29:24 +01:00
Matan Azrad
8dc775d8b1 net/mlx5: fix event queue number query
When a Rx\Tx queue is created by DevX, its CQ configuration should
include the EQ number of the interrupts.
The EQ is managed by the kernel and there is a glue API in order to
query the EQ number from the kernel.
The EQ query API gets a vector number specifies the kernel vector of
the interrupt handling.

The vector number was wrongly detected according to the configuration
CPU instead of using the device attributes of the supported vectors.
The CPU was wrongly detected by the rte_lcore_to_cpu_id API without any
check, and in case of non-EAL thread context the value was 0xFFFFFFFF
which caused a failure in the EQ number query API.

Use vector 0 for each EQ number query which must be supported by the
kernel.

Fixes: 08d1838f64 ("net/mlx5: implement CQ for Rx using DevX API")
Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 22:29:24 +01:00
Matan Azrad
9ab9d46ab9 net/mlx5: fix Tx queue release
The HW objects of the Tx queue is created/destroyed in the device
start\stop stage while the ethdev configurations for the Tx queue
starts from the tx_queue_setup stage.
The PMD should save all the last configurations it got from the ethdev
and to apply them to the device in the dev_start operation.

Wrongly, last code added to mitigate the reference counters didn't take
into account the above rule and combined the configurations and HW
objects to be created\destroyed together.

This causes to memory leak and other memory issues.

Make sure the HW object is released in stop operation when there is no
any reference to it while the configurations stay saved.

Fixes: 17a57183c0 ("net/mlx5: mitigate Tx queue reference counters")

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 22:29:24 +01:00
Matan Azrad
015d2cb628 net/mlx5: fix Rx queue release
The HW objects of the Rx queue is created/destroyed in the device
start\stop stage while the ethdev configurations for the Rx queue
starts from the rx_queue_setup stage.
The PMD should save all the last configurations it got from the ethdev
and to apply them to the device in the dev_start operation.

Wrongly, last code added to mitigate the reference counters didn't take
into account the above rule and combined the configurations and HW
objects to be created\destroyed together.

This causes to memory leak and other memory issues.

Make sure the HW object is released in stop operation when there is no
any reference to it while the configurations stay saved.

Fixes: 24e4b650ba ("net/mlx5: mitigate Rx queue reference counters")

Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-11-03 22:29:24 +01:00
Thomas Monjalon
af270529ad ethdev: include mbuf registration in Tx timestamp API
Previously, the Tx timestamp field and flag were registered in testpmd,
as described in mlx5 guide.
For consistency between Rx and Tx timestamps,
managing mbuf registrations inside the driver, as properly documented,
is a simpler expectation.

The only driver to support this feature (mlx5) is updated
as well as the testpmd application.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-11-03 16:21:15 +01:00
Thomas Monjalon
04840ecbcf net/mlx5: switch Rx timestamp to dynamic mbuf field
The mbuf timestamp is moved to a dynamic field
in order to allow removal of the deprecated static field.
The related mbuf flag is also replaced.

The dynamic offset and flag are stored in struct mlx5_rxq_data
to favor cache locality.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-11-03 16:21:15 +01:00
Thomas Monjalon
042540e4ef net/mlx5: fix dynamic mbuf offset lookup check
The functions rte_mbuf_dynfield_lookup() and rte_mbuf_dynflag_lookup()
can return an offset starting with 0 or a negative error code.

In reality the first offsets are probably reserved forever,
but for the sake of strict API compliance,
the checks which considered 0 as an error are fixed.

Fixes: efa79e68c8 ("net/mlx5: support fine grain dynamic flag")
Fixes: 3172c471b8 ("net/mlx5: prepare Tx queue structures to support timestamp")
Fixes: 0febfcce36 ("net/mlx5: prepare Tx to support scheduling")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-11-03 16:21:15 +01:00
Bruce Richardson
63b3907833 build: remove library name from version map file name
Since each version map file is contained in the subdirectory of the library
it refers to, there is no need to include the library name in the filename.
This makes things simpler in case of library renaming.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Rosen Xu <rosen.xu@intel.com>
2020-10-19 22:13:59 +02:00
Ciara Power
2c5e0dd21f net/mlx5: check max SIMD bitwidth
When choosing a vector path to take, an extra condition must be
satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
path.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-19 16:45:02 +02:00
Ferruh Yigit
f30e69b41f ethdev: add device flag to bypass auto-filled queue xstats
Queue stats are stored in 'struct rte_eth_stats' as array and array size
is defined by 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag.

As a result of technical board discussion, decided to remove the queue
statistics from 'struct rte_eth_stats' in the long term.

Instead PMDs should represent the queue statistics via xstats, this
gives more flexibility on the number of the queues supported.

Currently queue stats in the xstats are filled by ethdev layer, using
some basic stats, when queue stats removed from basic stats the
responsibility to fill the relevant xstats will be pushed to the PMDs.

During the switch period, temporary 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS'
device flag is created. Initially all PMDs using xstats set this flag.
The PMDs implemented queue stats in the xstats should clear the flag.

When all PMDs switch to the xstats for the queue stats, queue stats
related fields from 'struct rte_eth_stats' will be removed, as well as
'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag.
Later 'RTE_ETHDEV_QUEUE_STAT_CNTRS' compile time flag also can be
removed.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Xiao Wang <xiao.w.wang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-10-16 23:27:15 +02:00
Ivan Ilchenko
62024eb827 ethdev: change stop operation callback to return int
Change eth_dev_stop_t return value from void to int.
Make eth_dev_stop_t implementations across all drivers to return
negative errno values if case of error conditions.

Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Thomas Monjalon
8a5a0aad5d ethdev: allow close function to return an error
The API function rte_eth_dev_close() was returning void.
The return type is changed to int for notifying of errors.

If an error happens during a close operation,
the status of the port is undefined,
a maximum of resources having been freed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Liron Himi <lironh@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-16 22:26:41 +02:00
Jiawei Wang
00c10c2211 net/mlx5: update translate function for mirroring
Translate the attribute of sample action that include sample ratio
and sub actions list.
PMD will check the destination action number in current flow,
if found multiple destination actions, then create the new destination
array rdma action that group actions for each destination.
Currently only support port or queue for destination action, and only
encap action can be attached into one port destination.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
50390aab11 net/mlx5: update flow mirroring validation
Mirroring flow using sample action with ratio is 1, and it doesn't
support jump action with the same one flow.

Sample action must have destination actions like port or queue for
mirroring, and don't need split function as sampling flow.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
0756228b27 net/mlx5: update translate function for sample action
Translate the attribute of sample action that include sample ratio
and sub actions list, then create the sample DR action.

The metadata register value will be lost in the default path after
Sampler in FDB due to CX5 HW limitation.

Since source vport also be shared with metadata register c0, MLX5
PMD would set the source vport to rdma-core and rdma-core will
restore the regc0 value after sampler.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
b4c0ddbfcc net/mlx5: split sample flow into two sub-flows
The flow with sample action will be split into two sub flows:
the prefix sub flow with the all actions preceding the sample
action and sample action itself, and the suffix sub flow with
the actions following the sample action.

The original items remain in the prefix sub flow, add the
implicit tag action with unique id to set in metadata register,
and suffix sub flow uses the tag item to match with that unique id.

The flow split as below:

Original flow: items / actions pre / sample / actions sfx ->
    prefix sub flow -
            items / actions pre / set_tag action / sample
    suffix sub flow -
            tag_item / actions sfx

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Jiawei Wang
96b1f0273c net/mlx5: validate sample action
Add sample action validate function.

Sample Flow is supported in NIC-RX and FDB domains. For the NIC-RX
the Sample Flow action list must include the destination queue action.

Only NIC-RX domain supports the optional actions list. FDB doesn't
support any optional actions, the sampled packets is always forwarded
to the E-Switch manager port.

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-16 19:48:18 +02:00
Li Zhang
b1088fccb5 net/mlx5: support ICMP identifier matching
PRM expose fields "Icmp_header_data" in IPv4 ICMP.
Update ICMP mask parameter with ICMP identifier and sequence number
fields.
ICMP sequence number spec with mask, Icmp_header_data low 16 bits are
set.
ICMP identifier spec with mask, Icmp_header_data high 16 bits are set.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-10-16 19:47:58 +02:00
Xueming Li
657df3ca0a net/mlx5: disable dump of Verbs flows
There was a segment fault when dump flows with device argument of
dv_flow_en=0. In such case, Verbs flow engine was enabled and fdb
resources were not initialized. It's suggested to use mlx_fs_dump
for Verbs flow dump.

This patch adds verbs engine check, prints warning message and return
gracefully.

Fixes: f6d7202402 ("net/mlx5: support flow dump API")
Cc: stable@dpdk.org

Reported-by: Jørgen Østergaard Sloth <jorgen.sloth@xci.dk>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
e96242efa4 net/mlx5: remove Rx queue object type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the Rx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
4c6d80f1c5 net/mlx5: separate Rx queue state modification
Separate Rx state modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
354cc08a2d net/mlx5: remove Tx queue object type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the Tx queue object structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
a9c7930662 net/mlx5: share Tx queue object modification
Use new modify_qp functions for Tx object creation in DevX and Verbs
modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
5d9f3c3f48 net/mlx5: separate Tx queue object modification
Separate Tx object modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
e8390b3de0 net/mlx5: rearrange QP creation in Verbs module
1. Rename function to mention the internal resources.
2. Reduce the number of function arguments.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
88f2e3f18c net/mlx5: rearrange SQ and CQ creation in DevX module
1. Rename functions to mention the internal resources.
2. Reduce the number of function arguments.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
f49f44839d net/mlx5: share Tx control code
Move Tx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
86d259cec8 net/mlx5: separate Tx queue object creations
As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.

Define operation structure and DevX module in addition to the existing
Linux Verbs module.
Separate Tx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
e7055bbfbe net/mlx5: reposition event queue number field
The eqn field has become a field of sh directly since it is also
relevant for Tx and Rx.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
31d92b5892 net/mlx5: reorder Tx queue Verbs object creation
Move the creation of the completion queue from the mlx5_txq_obj_new
function into an auxiliary function.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
3d3cfe67e6 net/mlx5: reorder Tx queue DevX object creation
Move the creation of the send queue and the completion queue resources
from the mlx5_txq_obj_devx_new function into auxiliary functions.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
17a57183c0 net/mlx5: mitigate Tx queue reference counters
The Tx queue structures manage 2 different reference counter per queue:
txq_ctrl reference counter and txq_obj reference counter.

There is no real need to use two different counters, it just complicates
the release functions.
Remove the txq_obj counter and use only the txq_ctrl counter.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
fa2dd3d4d6 net/mlx5: remove unused variable in Tx queue creation
When a CQ is not created by DevX, it be allocated by either DV function
or by regular Verbs function.

The CQ DV attributes variable was wrongly defined and initialized in Tx
queue creation while the CQ is created by the regular Verbs function
what remained the attributes variable unused.

Remove the unused variable.

Fixes: faf2667fe8 ("net/mlx5: separate DPDK from verbs Tx queue objects")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Michael Baum
95b0e40b10 net/mlx5: fix send queue doorbell
As part of SQ creation for Tx queue objects, a HW doorbell memory should
be allocated and mapped to the HW.

The SQ doorbell handler was wrongly saved on the CQ fields what caused
wrong doorbell release in the Tx queue object destroy flow.

Save the SQ doorbell handler in the SQ fields.

Fixes: 3a87b964ed ("net/mlx5: create Tx queues with DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-09 13:17:42 +02:00
Dekel Peled
c7870bfe09 ethdev: move RSS expansion code to mlx5 driver
Patch [1] added support for RSS flow expansion.
It was added in ethdev for public use, but until now it is used only
by MLX5 PMD.
To allow local changes in this code, this patch removes it from ethdev
and moves it to MLX5 PMD file.

[1] commit 4ed05fcd44 ("ethdev: add flow API to expand RSS flows")

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-10-08 19:58:11 +02:00
Alexander Kozyrev
d2d5760552 net/mlx5: fix Rx queue count calculation
There are a few discrepancies in the Rx queue count calculation.

The wrong index is used to calculate the number of used descriptors
in an Rx queue in case of the compressed CQE processing. The global
CQ index is used while we really need an internal index in a single
compressed session to get the right number of elements processed.

The total number of CQs should be used instead of the number of mbufs
to find out about the maximum number of Rx descriptors. These numbers
are not equal for the Multi-Packet Rx queue.

Allow the Rx queue count calculation for all possible Rx bursts since
CQ handling is the same for regular, vectorized, and multi-packet Rx
queues.

Fixes: 26f0488344 ("net/mlx5: support Rx queue count API")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-08 19:58:11 +02:00
Suanming Mou
3e8f3e51fd net/mlx5: fix meter table definitions
As metering and metadata features were developed at the same time. The
metering and metadata tables are defined conflicted.

This cause the meter suffix flow jump to the same metadata table and
cause flow deadloop.

Adjust the metering table define to fix that issue.

Fixes: 46a5e6bc6a ("net/mlx5: prepare meter flow tables")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-10-08 19:58:11 +02:00
Dekel Peled
38f9369d24 net/mlx5: fix DevX CQ attributes values
Previous patch wrongly used rdma-core defined values, when preparing
attributes for creating DevX CQ object.
This patch adds the correct value definition and uses them instead.

Fixes: 08d1838f64 ("net/mlx5: implement CQ for Rx using DevX API")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-10-08 19:58:11 +02:00
Phil Yang
ae3255bfd9 net/mlx5: relax atomic refcnt for multi-packet Rx buffer
Use C11 atomics with RELAXED ordering instead of the rte_atomic ops
which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-09-30 19:19:15 +02:00
Thomas Monjalon
fbd1913561 ethdev: remove old close behaviour
The temporary flag RTE_ETH_DEV_CLOSE_REMOVE is removed.
It was introduced in DPDK 18.11 in order to give time for PMDs to migrate.

The old behaviour was to free only queues when closing a port.
The new behaviour is calling rte_eth_dev_release_port() which does
three more tasks:
	- trigger event callback
	- reset state and few pointers
	- free all generic port resources

The private port resources must be released in the .dev_close callback.

The .remove callback should:
	- call .dev_close callback
	- call rte_eth_dev_release_port()
	- free multi-port device shared resources

Despite waiting two years, some drivers have not migrated,
so they may hit issues with the incompatible new behaviour.
After sending emails, adding logs, and announcing the deprecation,
the only last solution is to declare these drivers as unmaintained:
	ionic, liquidio, nfp
Below is a summary of what to implement in those drivers.

* The freeing of private port resources must be moved
from the ".remove(device)" function to the ".dev_close(port)" function.

* If a generic resource (.mac_addrs or .hash_mac_addrs) cannot be freed,
it must be set to NULL in ".dev_close" function to protect from
subsequent rte_eth_dev_release_port() freeing.

* Note 1:
The generic resources are freed in rte_eth_dev_release_port(),
after ".dev_close" is called in rte_eth_dev_close(), but not when
calling ".dev_close" directly from the ".remove" PMD function.
That's why rte_eth_dev_release_port() must still be called explicitly
from ".remove(device)" after calling the ".dev_close" PMD function.

* Note 2:
If a device can have multiple ports, the common resources must be freed
only in the ".remove(device)" function.

* Note 3:
The port is supposed to be in a stopped state when it is closed.
If it is not the case, it is free to the PMD implementation
how to react when trying to close a non-stopped port:
either try to stop it automatically or just return an error.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Liron Himi <lironh@marvell.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:14 +02:00
Thomas Monjalon
b142387b07 ethdev: allow drivers to return error on close
The device operation .dev_close was returning void.
This driver interface is changed to return an int.

Note that the API rte_eth_dev_close() is still returning void,
although a deprecation notice is pending to change it as well.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Reviewed-by: Sachin Saxena <sachin.saxena@oss.nxp.com>
Reviewed-by: Liron Himi <lironh@marvell.com>
Reviewed-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2020-09-30 19:19:13 +02:00
Suanming Mou
bf615b077d net/mlx5: manage header reformat actions with hashed list
To manage encap decap header format actions mlx5 PMD used the single
linked list and lookup and insertion operations took too long times if
there were millions of objects and this impacted the flow
insertion/deletion rate.

In order to optimize the performance the hashed list is engaged. The
list implementation is updated to support non-unique keys with few
collisions.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-09-30 19:19:09 +02:00
Xueming Li
c21e5facf7 net/mlx5: use bond index for netdev operations
In case of bonding, device ifindex was detected as the PF ifindex, so
any operation using ifindex applied to PF instead of the bond device.
These operations includes MTU get/set, up/down and mac address
manipulation, etc.

This patch detects bond interface ifindex and name for PF that join a
bond interface, uses it by default for netdev operations.

Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2020-09-30 19:19:09 +02:00
Viacheslav Ovsiienko
35e75f7816 net/mlx5: fix vectorized Rx burst check
The Rx queue start/stop feature is not supported if vectorized
rx_burst routine is engaged. There was a routine address typo
and rx_burst type check was wrong.

Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-30 19:19:09 +02:00
Phil Yang
f0f5d844d1 eal: remove deprecated coherent IO memory barriers
Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remove them and use rte_io_*mb instead.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-23 13:40:26 +02:00
Michael Baum
b00f760354 net/mlx5: fix hairpin dependency on destination DevX TIR
The PMD supports hairpin only if DevX is supported and DV flow is
enabled.

When destination DevX TIR is not supported, the PMD tries to create TIR
action, and fails.

Avoid supporting hairpin when destination DevX TIR is not supported.

Fixes: b6b3bf86bd ("net/mlx5: get hairpin capabilities")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:11 +02:00
Michael Baum
7aa9892f79 net/mlx5: fix Rx objects creator selection
There are 2 creators for Rx objects, DevX and Verbs.
There are supported DR versions when a DevX destination TIR flow action
creation cannot be supported, using this versions the TIR object should
be created by Verbs, what forces all the Rx objects to be created by
Verbs.

The selection of the Rx objects creator, wrongly, didn't take into
account the destination TIR action support what caused a failure in the
Rx flows creation.

Select Verbs creator when destination TIR action creation is not
supported by the DR version.

Fixes: 6deb19e1b2 ("net/mlx5: separate Rx queue object creations")

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:11 +02:00
Maxime Leroy
e0d449513b net/mlx5: fix RSS RETA reset on start
The following sequences was working fine on mlx5:
   rte_eth_dev_configure(portid, ...);

   for (queueid = 0; queueid < nb_txq; queueid++)
      rte_eth_tx_queue_setup(portid, queueid, ...);

   for (queueid = 0; queueid < nb_rxq; queueid++)
      rte_eth_rx_queue_setup(portid, queueid, ...);

  // use a custom reta configuration
  rte_eth_dev_rss_reta_update(portid, reta_conf, reta_size);
  rte_eth_dev_start(portid);

We were able to configure a custom reta before starting the port.

The commit "net/mlx5: support RSS on hairpin" breaks this logic by
moving the code initializing the RSS reta from rte_eth_dev_configure
into rte_eth_dev_start.

To fix the issue, the skip_default_rss_reta is always set to 1 in
rte_eth_dev_rss_reta to avoid reconfigure the rss reta when the device
is started.

Fixes: 63bd16292c ("net/mlx5: support RSS on hairpin")
Cc: stable@dpdk.org

Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Ori Kam <orika@nvidia.com>
2020-09-18 18:55:11 +02:00
Ferruh Yigit
5723fbed4f ethdev: remove underscore prefix from internal API
'_rte_eth_dev_callback_process()' & '_rte_eth_dev_reset()' internal APIs
has unconventional underscore ('_') prefix.
Although this is not documented most probably this is to mark them as
internal. Since we have '__rte_internal' flag to mark this, removing '_'
from API names.

For '_rte_eth_dev_reset()', there is already a public API named
'rte_eth_dev_reset()', so renaming '_rte_eth_dev_reset()' to
'rte_eth_dev_internal_reset'.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
8682e492ed ethdev: use hairpin helper functions
Hairpin helper functions were not used by drivers, but it was used only
local to ethdev. They are:
'rte_eth_dev_is_rx_hairpin_queue()'
'rte_eth_dev_is_tx_hairpin_queue()'

Exposing them as internal APIs and update mlx5 driver (only user of
hairpin) to use them.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-09-18 18:55:08 +02:00
Ferruh Yigit
cbfc6111b5 ethdev: move inline device operations
This patch is a preparation to hide the 'struct eth_dev_ops' from
applications by moving some device operations from 'struct eth_dev_ops'
to 'struct rte_eth_dev'.

Mentioned ethdev APIs are in the data path and implemented as inline
because of performance reasons.

Exposing 'struct eth_dev_ops' to applications is bad because it is a
contract between ethdev and PMDs, not really needs to be known by
applications, also changes in the struct causing ABI breakages which
shouldn't.

To be able to both keep APIs inline and hide the 'struct eth_dev_ops',
moving device operations used in ethdev inline APIs to 'struct
rte_eth_dev' to the same level with Rx/Tx burst functions.

The list of dev_ops moved:
eth_rx_queue_count_t       rx_queue_count;
eth_rx_descriptor_done_t   rx_descriptor_done;
eth_rx_descriptor_status_t rx_descriptor_status;
eth_tx_descriptor_status_t tx_descriptor_status;

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
2020-09-18 18:55:08 +02:00
Michael Baum
0c762e81da net/mlx5: share Rx queue drop action code
Move Rx queue drop action similar resources allocations from Verbs
module to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
5eaf882e94 net/mlx5: separate Rx queue drop
Separate Rx queue drop creation into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
5a959cbfa6 net/mlx5: share Rx hash queue code
Move Rx hash queue object similar resources allocations from DevX and
Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
25ae7f1a5d net/mlx5: share Rx queue indirection table code
Move Rx indirection table object similar resources allocations from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
66b96fa6a6 net/mlx5: remove indirection table type field
Once the separation between Verbs and DevX is done using function
pointers, the type field of the indirection table structure becomes
redundant and no more code is used.
Remove the unnecessary field from the structure.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
85552726d3 net/mlx5: separate Rx hash queue creation
Separate Rx hash queue creation into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
87e2db37ef net/mlx5: separate Rx indirection table object creation
Separate Rx indirection table object creation into both Verbs and DevX
modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
fa2c85cc9c net/mlx5: share Rx queue object modification
Use new modify_wq functions for Rx object creation in DevX and Verbs
modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
c279f187ee net/mlx5: separate Rx queue object modification
Separate Rx object modification to the Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
675911d033 net/mlx5: rearrange creation of WQ and CQ object
Rearrangement of WQ and CQ creation for Verbs Rx queue:
1. Rename the allocation function.
2. Reduce the number of arguments that the creation functions receive.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
f6dee90058 net/mlx5: rearrange creation of RQ and CQ resources
Rearrangement of RQ and CQ resource handling for DevX Rx queue:
1. Rename the allocation function so that it is understood that it
allocates all resources and not just the CQ or RQ.
2. Move the allocation and release of the doorbell into creation and
release functions.
3. Reduce the number of arguments that the creation functions receive.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
1260a87b28 net/mlx5: share Rx control code
Move Rx object similar resources allocations and debug logs from DevX
and Verbs modules to a shared location.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
322870799e net/mlx5: separate Rx interrupt handling
Separate interrupt event handler into both Verbs and DevX modules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
6deb19e1b2 net/mlx5: separate Rx queue object creations
As an arrangement to Windows OS support, the Verbs operations should be
separated to another file.
By this way, the build can easily cut the unsupported Verbs APIs from
the compilation process.

Define operation structure and DevX module in addition to the existing
linux Verbs module.
Separate Rx object creation into the Verbs/DevX modules and update the
operation structure according to the OS support and the user
configuration.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
24e4b650ba net/mlx5: mitigate Rx queue reference counters
The Rx queue structures manage 2 different reference counter per queue:
rxq_ctrl reference counter and rxq_obj reference counter.

There is no real need to use two different counters, it just complicates
the release functions.
Remove the rxq_obj counter and use only the rxq_ctrl counter.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
c902e264f6 net/mlx5: fix types differentiation in Rx queue create
Rx HW objects can be created by both Verbs and DevX operations.
The management of the 2 types of operations are done directly in the
main flow of the object’s creations.

Some arrangements and validations were wrongly done to the irrelevant
type:

1. LRO related validations were done for Verbs type where LRO is not
supported at all.
2. Verbs allocation arrangements were done for DevX operations where it
is not needed.
3. Doorbell destroy was considered for Verbs types where it is
irrelevant.

Adjust the aforementioned points only for the relevant types.

Fixes: e79c9be915 ("net/mlx5: support Rx hairpin queues")
Fixes: 08d1838f64 ("net/mlx5: implement CQ for Rx using DevX API")
Fixes: 17ed314c6c ("net/mlx5: allow LRO per Rx queue")
Fixes: dc9ceff73c ("net/mlx5: create advanced RxQ via DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
6c29e209fe net/mlx5: fix Rx queue state update
In order to support DevX Rx queue stop and start operations, the state
of the queue should be updated in FW.
The state update PRM command requires to set both the current state and
the new requested state.

The current state and the new requested state fields setting were
wrongly switched.

Switch them back to the correct setting.

Fixes: 161d103b23 ("net/mlx5: add queue start and stop")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Michael Baum
93fa67fb11 net/mlx5: fix Rx hash queue creation error flow
The mlx5_hrxq_new function allocates several resources and if one of the
allocations fails, the function jumps to an error label where it
releases all the allocated resources.

When the TIR action creation fails, the hrxq memory is not released what
can cause a resource leak.

Add an appropriate release to the hrxq pointer in the error flow.

Fixes: 772dc0eb83 ("net/mlx5: convert hrxq to indexed")
Fixes: dc9ceff73c ("net/mlx5: create advanced RxQ via DevX")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
2020-09-18 18:55:08 +02:00
Ophir Munk
7af10d29a4 net/mlx5/linux: refactor VLAN
File mlx5_vlan.c contains Netlink APIs (Linux dependent) as part of VM
workaround implementation. Move this implementation to file
linux/mlx5_vlan_os.c.  To remove Netlink dependency in header files
change pointer of type 'struct mlx5_nl_vlan_vmwa_context *' to 'void *'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
8bb2410ea3 net/mlx5: separate VLAN strip modification
When updating a queue vlan stripping offload - either the WQ is modified
in Verbs or the RQ is modified in DevX.  Add a vlan stripping modify
callback to 'struct mlx5_obj_ops' and assign it with the specific Verbs
and DevX implementations: 'rxq_obj_modify_wq_vlan_strip' and
'rxq_obj_modify_rq_vlan_strip' respectively.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
fe7a54fd26 net/mlx5: remove Verbs dependency in Rx/Tx objects
Replace pointers to ibv structs with pointers to void (file
mlx5_rxtx.h).  Specifically the following pointers were replaced:
'struct ibv_cq *', 'struct ibv_wq *', 'struct ibv_comp_channel *',
'struct ibv_rwq_ind_table *a', 'struct ibv_qp *'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
1f66ac5bbe net/mlx5: remove more Direct Verbs dependencies
Several DV-based structs of type 'struct mlx5dv_devx_XXX' are replaced
with 'void *' to enable compilation under non-Linux operating systems.
New getter functions were added to retrieve the specific fields that
were previously accessed directly.

Replaced structs:
'struct mlx5dv_pp *'
'struct mlx5dv_devx_event_channel *'
'struct mlx5dv_devx_umem *'
'struct mlx5dv_devx_uar *'

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
1e577c9e5f net/mlx5: call meter detach only if DR is supported
Flow metering is supported only in direct rules (DR). Currently the APIs
of meter actions create and modify are under #ifdef
HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER, while detaching the meter action
is executed unconditionally. This commit adds the same ifdef to API
mlx5_flow_meter_detach().
This commit avoids compilation failure of non-Linux operating systems
which do not support DR.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
079f1ae5ac net/mlx5: remove unused log macros
Remove utility macros INFO, WARN, ERROR. They are not in use and
conflict with identical definitions when compiled under Windows.

Fixes: 80f2d0ed7f ("net/mlx5: add hardware flow debug dump")
Cc: stable@dpdk.org

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
f00f6562e1 net/mlx5: remove netlink dependency in shared code
This commit adds Linux implementation of routine mlx5_os_mac_addr_flush
as wrapper to Netlink API to avoid direct calls under non-Linux
operating systems.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
f2e8b4556f net/mlx5: remove unused includes
Remove unused Linux included files:

<sys/ioctl.h>, <arpa/inet.h> from file net/mlx5/mlx5_mac.c
<sys/mman.h> from file net/mlx5/mlx5.c

Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
e9c0b96e35 net/mlx5: move Linux ifname function
mlx5_get_ifname() prototype includes 'IF_NAMESIZE' definition from Linux
file net/if.h. Since this API is only used under Linux and to enable
compilation under non-Linux OS - move this prototype from shared file
mlx5.h to file linux/mlx5_os.h.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Ophir Munk
5d5a26f26d net/mlx5: rename constant conflicting with Windows
Enumerated variable REG_NONE (defined in mlx5_prm.h) is in conflict with
Windows definition (winnt.h): #define REG_NONE ( 0ul ) // No value type
To enable mlx5 PMD Windows compilation - rename REG_NONE as REG_NON.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-09-18 18:55:06 +02:00
Suanming Mou
3fe889617b net/mlx5: manage modify actions with hashed list
To manage header modify actions mlx5 PMD used the single linked list and
lookup and insertion operations took too long times if there were
millions of objects and this impacted the flow insertion/deletion rate.

In order to optimize the performance the hashed list is engaged. The
list implementation is updated to support non-unique keys with few
collisions.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-09-18 18:55:06 +02:00
Suanming Mou
095c397b43 net/mlx5: add hash list extended lookup and insert
The mlx5 PMD hashed list was designed in approach to contain the items
with unique keys only. Now there is the need to store the objects with
possible key collisions. It is not expected to have many collisions
(very likely to have a few ones), but keys become not unique.

This commit adds the hash list extended functions in order to support
insertion and lookup for the lists with non-unique keys.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-09-18 18:55:06 +02:00
Ciara Power
3cc6ecfdfe build: remove makefiles
A decision was made [1] to no longer support Make in DPDK, this patch
removes all Makefiles that do not make use of pkg-config, along with
the mk directory previously used by make.

[1] https://mails.dpdk.org/archives/dev/2020-April/162839.html

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-09-08 00:09:50 +02:00
Thomas Monjalon
4f86c0ba19 version: 20.11-rc0
Start a new release cycle with empty release notes.

The ABI version becomes 21.0.
The ABI major is back to normal, having only one number (21 vs 20.0).
The map files are updated to the new ABI major number (21).
The ABI exceptions are dropped.
Travis ABI check is disabled because compatibility is not preserved.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-08-12 11:32:16 +02:00
Dekel Peled
38a5704629 net/mlx5: fix number of retries for UAR allocation
Previous fix added definition of number of retries for UAR allocation.
This value is adequate for x86 systems with 4K pages.
On Power9 system with 64K pages the required value is 32.
This patch updates the defined value from 2 to 32.

Fixes: a0bfe9d56f ("net/mlx5: fix UAR memory mapping type")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-08-05 16:10:50 +02:00
Viacheslav Ovsiienko
972a1bf812 common/mlx5: fix user mode register access command
To detect the timestamp mode configured on the NIC the mlx5
PMD uses the firmware command ACCESS_REGISTER_USER. This
command is relatively new and might be not supported by
older firmware versions and was rejected, causing annoying
messages in kernel log.

This patch adds the attribute flag check whether firmware
supports the command and avoid the call if it does not.

Fixes: bb7ef9a962 ("common/mlx5: add register access DevX routine")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:24 +02:00
Dekel Peled
1ccc479014 net/mlx5: fix Rx interrupt handling and cleanup
Recent patch added creation of Rx CQ using DevX API.
The reading of events from DevX channel was not done correctly.
This patch fixes the event reading, using the correct data structure.
Cleanup after CQ creation, in case of error, is also updated.

Fixes: 08d1838f64 ("net/mlx5: implement CQ for Rx using DevX API")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Gregory Etelson
1af7452113 net/mlx5: fix dynamic inline hint handling
The ConnectX NICs can transfer data from the host memory with two
approaches: provide the pointer to the data buffer, or do data inline
- copy the data to the transmit descriptor (WQE) entirely or only the
part of data. In some configurations the NIC hardware requires the
minimal data to be inline in the descriptor to operate correctly. And
there is the special dynamic flag to hint PMD not to inline the data
(for example, if buffer is located on some other device - storage or
GPU) on per packet basis.

If there was a packet with length shorter than the minimal inline data
length requested by the NIC hardware and the no-inline hint was set
the PMD tried to inline the packet with minimal required length
instead of actual packet's one.  This patch adds the missed length
check into no-inline hint handling branch.

Fixes: cacb44a099 ("net/mlx5: add no-inline Tx flag")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Viacheslav Ovsiienko
4ffab7b9e1 net/mlx5: fix metadata storing for NEON Rx
There was the typo introducing the bug, affected the mlx5 vectorized
rx_burst on ARM architectures in case if CQE compression was enabled.

Fixes: 6c55b622a9 ("net/mlx5: set dynamic flow metadata in Rx queues")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Viacheslav Ovsiienko
a0bfe9d56f net/mlx5: fix UAR memory mapping type
The User Access Region is a special mechanism to provide direct
access to the hardware registers, and is the part of PCI address
space that is mapped to CPU virtual address. The mapping can be
performed with the type "Write-Combining" or "Non-Cached", and
these ones might be supported or not on different setups.

To prevent device probing failure the UAR allocation attempt
with alternative mapping type is performed. The datapath
takes the actual UAR mapping into account on queue creation.

There was another issue with NULL UAR base address.
OFED 5.0.x and Upstream rdma_core before v29 returned the NULL as
UAR base address if UAR was not the first object in the UAR page.
It caused the PMD failure and we should try to get another UAR
till we get the first one with non-NULL base address returned.

Fixes: fc4d4f732b ("net/mlx5: introduce shared UAR resource")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2020-07-30 00:41:23 +02:00
Raslan Darawsheh
753dd70283 net/mlx5: fix VF MAC address set over BlueField
When trying to set MAC address of an ethernet device and if it was
a representor, PMD sets the MAC over the corresponding VF instead.

For the case of HPF (Host PF representor on BlueField), PMD shouldn't
attempt to set it, since it doesn't have any corresponding VF and fails.

This will fix the issue by setting the MAC on the dev directly.

Fixes: 0d1d731708 ("net/mlx5: set VF MAC address from host")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Alexander Kozyrev
6f52bd3383 net/mlx5: fix vectorized mini-CQE prefetching
There was an optimization work to prefetch all the CQEs before
their invalidation. It allowed us to speed up the mini-CQE
decompression process by preheating the cache in the vectorized
Rx routine.

Prefetching of the next mini-CQE, on the other hand, showed
no difference in the performance on x86 platform. So, that was
removed. Unfortunately this caused the performance drop on ARM.

Prefetch the mini-CQE as well as all the soon to be
invalidated CQEs to get both CQE and mini-CQE on the hot path.

Fixes: 28a4b96321 ("net/mlx5: prefetch CQEs for a faster decompression")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
d462a83c65 net/mlx5: optimize stack memory in probe
The device configuration struct is not small enough to be used as
function argument by value.

Call spawn function with device configuration by reference.

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
7301d1923a net/mlx5: fix unnecessary init in mark conversion
The flow_dv_convert_action_mark function defines an array of
field_modify_info structures and initializes the first entity.

In the first entity id field, it initializes to 0, even though its type
is an enum that has no value of 0.
In fact, the function does not use this id field before assigning the
appropriate register id into it, so the initialization is unnecessary.
Moreover, this initialization is int into enum, and it would be better
not to create a type conflict for no reason.

Wait for the first entity initialization until the appropriate register
id is already known.

Fixes: 55deee1715 ("net/mlx5: extend flow mark support")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
f4a0873197 net/mlx5: optimize critical section in device free
When PMD releases shared IB device context, It locks the
mlx5_ibv_list_mutex lock throughout the function so that it does not
happen while removing a device from the list, another process will try
to insert another device into it.
On the other hand, having removed the device from the list even if it
has not yet released all of its resources, it should not care about
other processes and can release the lock.

However, the PMD does not release the lock even though it can, and
performs a number of operations, some of which include sleep and may be
long.
To improve this, shorten the lock time to the minimum necessary.

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
63d1db710f net/mlx5: fix unlimited parsing of switch info
In mlx5_sysfs_switch_info function, the driver gets switch information
associated with network interface.

The driver writes the port name into buffer and translates it.
However, when it writes the name, it does not limit writing to the
buffer size.

Limit writing to the size of the buffer.

Fixes: 1256805dd5 ("net/mlx5: move Linux-specific functions")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
8da2a608d0 net/mlx5: remove ineffective increment in hairpin split
The flow_hairpin_split function defines a pointer called addr that
points to the list of items.
When the function wants to progress in the list, it adds the size of an
item to the pointer.

At the end of the function, it precedes the pointer one more time even
though it is not used afterwards. In fact, this line is unaffected and
the operation of the function would have been no different without it.

Remove the line where the pointer is preceded unnecessarily.

Fixes: d85c7b5ea5 ("net/mlx5: split hairpin flows")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
e71e90938b net/mlx5: fix crash in NVGRE item translation
The flow_dv_translate_item_nvgre function add NVGRE item to matcher and
to the value.
It defines a pointer named nvrge_m that receives the item's mask into
it, and then copies some of it to the matcher.

Before copying, it checks for mask validation, and in case the mask is
NULL the function gives it a pointer to rte_flow_item_nvgre_mask.
However, the function calls from the vni mask's field before the check,
and if there is no mask, it actually does dereference to the NULL
pointer and indeed the program crashes with segfault.

Move the call from the vni field to post-validation.

Fixes: cd18e1b72f ("net/mlx5: fix build on Arm")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Michael Baum
4868ae8322 net/mlx5: fix initialization of steering registers
The mlx5_flow_action_copy_mreg structure contains a field called src
type enum modify_reg, similarly the mlx5_rte_flow_item_tag field
contains a field called id type enum modify_reg.
The enum modify_reg variable represents different registers in the
system and it also has a field called REG_NONE whose value is 0 which
means that the register does not exist.

The flow_mreg_add_copy_action function sets a variable of struct
mlx5_flow_action_copy_mreg type, and initializes the src field to be 0.
Similarly the flow_create_split_metadata function sets a variable of
struct mlx5_rte_flow_item_tag type and initializes the id field to be 0.
In both functions, they initialize a enum modify_reg type variable with
an int type value while modify_reg has an appropriate field for that
value (REG_NONE).

Replace assigning 0 with REG_NONE in both functions.

Fixes: dd3c774f6f ("net/mlx5: add metadata register copy table")
Fixes: 71e254bc02 ("net/mlx5: split Rx flows to provide metadata copy")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Suanming Mou
e1293b10de net/mlx5: fix counter query
Currently, the counter query requires the counter ID should start
with 4 aligned. In none-batch mode, the counter pool might have the
chance to get the counter ID not 4 aligned. In this case, the counter
should be skipped, or the query will be failed.

Skip the counter with ID not 4 aligned as the first counter in the
none-batch count pool to avoid invalid counter query. Once having
new min_dcs ID in the poll less than the skipped counters, the
skipped counters will be returned to the pool free list to use.

Fixes: 5382d28c21 ("net/mlx5: accelerate DV flow counter transactions")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Suanming Mou
fd143711a6 net/mlx5: separate aging counter pool range
Currently, when allocate the counter or counter based age from group 0,
counter and age may share the same counter dcs ID range. Both age and
pure counter need to sync up with each other's container to check if
the ID range exists and update the min_dcs.

It comes two disadvantages:
1. If the ID range is shared, this counter range will be queried twice
   both from age and pure counter container in 1s.
2. The same range counter check between the two container makes the
   counter allocate sync min_dcs time to time with extra min_dcs
   updating.

This patch avoid the same ID range to be shared when allocate the new
pool. If the same ID range exists in other container, just add the
counter to the other container until get new range which saves the
min_dcs sync up time to time.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-30 00:41:23 +02:00
Raslan Darawsheh
d13f976086 net/mlx5: fix flow items size calculation
flow_dv_get_item_len returns the actual header size of
an rte_flow item.

Changing any of the structs for rte_flow items by adding
or removing some extra fields will break this function.

This fixes the behavior by returning the actual header size
of each item.

Fixes: 34d41b7aa3 ("net/mlx5: add VXLAN encap action to Direct Verbs")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-30 00:41:23 +02:00
Ophir Munk
385c19397e net/mlx5: fix premature disabling of interrupt
RXQ interrupts under Linux are based on the epoll mechanism. An expected
order of operations is as follows:
1. Call rte_eth_dev_rx_intr_enable(), to arm the CQ for receiving events
   on data input.
2. Block on rte_epoll_wait() with an array of file descriptors
   representing the CQ events. Upon data arrival the kernel will signal
   an input event on the corresponding CQ fd.
3. Call rte_eth_dev_rx_intr_disable() after the event was received and
   continue in polling mode. The mlx5 implementation of
   rte_eth_dev_rx_intr_disable() is to get the CQ event and ack it.

In practice applications may wake up from rte_epoll_wait() due to
timeout with no event to ack but still call
rte_eth_dev_rx_intr_disable() unconditionally.  In such cases the call
should return EAGAIN (since the file descriptors are non-blocked), as
opposed to EINVAL which indicates a real failure.  In case of EAGAIN the
PMD should not warn on "Unable to disable interrupt on Rx queue".

This commit fixes a earlier commit where the returned value 0 from
function devx_get_event() - was considered an error.

Fixes: 08d1838f64 ("net/mlx5: implement CQ for Rx using DevX API")

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Raslan Darawsheh <rasland@mellanox.com>
2020-07-30 00:41:22 +02:00
Parav Pandit
f6d099d7da common/mlx5: remove class check from class drivers
Now that mlx5_pci PMD checks for enabled classes and performs
probe(), remove() of associated classes, individual class driver
does not need to check if other driver is enabled.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-28 19:01:30 +02:00
Parav Pandit
392bf9084d common/mlx5: register class drivers through common layer
Migrate mlx5 net, vdpa and regex PMD to start using mlx5 common class
driver.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-28 19:01:30 +02:00
Parav Pandit
8208800163 common/mlx5: avoid class constructor priority
mlx5_common is shared library between mlx5 net, VDPA and regex PMD.
It is better to use common initialization helper instead of using
RTE_PRIORITY_CLASS priority.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2020-07-28 18:52:11 +02:00
Gregory Etelson
750ff30a8f net/mlx5: fix tunnel flow priority
PMD flow priority is different from application flow priority.  Flow
rules with higher match granularity assigned higher PMD priority. Also
PMD splits internally RSS flows according to flow RSS layer.

Final PMD flow rule priority derived from the last match item network
level, after PMD adjusts flow rule, where L4 match gets the highest
priority and L2 the lowest.

The patch adjusts tunnels flow rule priority calculation for PMDs
running verb API.

Introduce MLX5_TUNNEL_PRIO_GET macro.

Fixes: 4a78c88e3b ("net/mlx5: fix Verbs flow tunnel")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2020-07-21 15:46:30 +02:00
Dekel Peled
210008309b net/mlx5: fix VLAN push action on hairpin queue
Push VLAN action is allowed on Tx only, same as encap action.
Flow rules for hairpin queue are created on Rx, and split
by PMD to Rx and Tx rules, according to the above limitation.
In current implementation the encap action is split to Tx rule.
This patch adds the same handling for push-vlan action, as well as
its complementing actions set-vlan-vid and set-vlan-pcp.

Fixes: d85c7b5ea5 ("net/mlx5: split hairpin flows")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Dekel Peled
dd6745da6d net/mlx5: fix VLAN pop with decap action validation
The combination of decap action followed by pop VLAN action is not
fully validated in existing code.

This patch updates the validation function of pop vlan action.
Pop VLAN with preceding Decap requires inner header with VLAN.
Pop VLAN without preceding Decap requires outer header with VLAN.

Fixes: b41e47da25 ("net/mlx5: support pop flow action on VLAN header")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Shy Shyman
038e7fc085 net/mlx5: fix HW counters path in switchdev mode
When debugging performance of a DPDK application the user may
need to view the different statistics of DPDK (for example out_of_buffer)
This can be enabled by using testpmd command 'show port xstats
<port_id>' for example.

The current implementation assumes legacy mode in which the counters
are at <ibdev_path>/<port_id>/hw_counters/<file_name>.
In switchdev mode the counters file is located right after the device
name, hence resides at <ibdev_path>/hw_counters.

The fix tries to open the path in the second location after a failure
to open the file from the first location.

Fixes: 9c0a9eed37 ("net/mlx5: switch to the names in the shared IB context")
Cc: stable@dpdk.org

Signed-off-by: Shy Shyman <shys@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Viacheslav Ovsiienko
161d103b23 net/mlx5: add queue start and stop
The mlx5 PMD did not support queue_start and queue_stop eth_dev API
routines, queue could not be suspended and resumed during device
operation.

There is the use case when this feature is crucial for applications:

- there is the secondary process handling the queue
- secondary process crashed/aborted
- some mbufs were allocated or used by secondary application
- some mbufs were allocated by Rx queues to receive packets
- some mbufs were placed to send queue
- queue goes to undefined state

In this case there was no reliable way to recovery queue handling
by restarted secondary process but reset queue to initial state
freeing all involved resources, including buffers involved in queue
operations, reset the mbuf pools, and then reinitialize queue
to working state:

- reset mbuf pool, allocate all mbuf to initialize pool into
  safe state after the crush and allow safe mbuf free calls
- stop queue, free all potentially involved mbufs
- reset mbuf pool again
- start queue, reallocate mbufs needed

This patch introduces the queue start/stop feature with some
limitations:

- hairpin queues are not supported
- it is application responsibility to synchronize start/stop
  with datapath routines, rx/tx_burst must be suspended during
  the queue_start/queue_stop calls
- it is application responsibility to track queue usage and
  provide coordinated queue_start/queue_stop calls from
  secondary and primary processes.
- Rx queues with vectorized Rx routine and engaged CQE
  compression are not supported by this patch currently

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Dekel Peled
08d1838f64 net/mlx5: implement CQ for Rx using DevX API
This patch continues the work to use DevX API for different objects
creation and management.
On Rx control path, the RQ, RQT, and TIR objects can already be
created using DevX API.
This patch adds the support to create CQ for RxQ using DevX API.
The corresponding event channel is also created and utilized using
DevX API.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
9d60f54569 common/mlx5: remove inclusion of Verbs header files
Several source files include Verbs header files as in (1). These source
files will not compile under non-Linux operating systems. This commit
removes this inclusion in two cases:

Case 1: There is no usage of ibv_* or mlx5dv_* symbols in the source
file so the inclusion in (1) can be safely removed.

Case 2: Verbs symbols are used. Please note the inclusion in (1) already
appears in file linux/mlx5_glue.h (which represents the interface
to the rdma-core library). Therefore, replace (1) in the source file
with (2).  Under non-Linux operating systems - file mlx5_glue.h will not
include (1).

(1)
 #include <infiniband/verbs.h>
 #include <infiniband/mlx5dv.h>

(2)
 #include <mlx5_glue.h>

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
2e86c4e5c7 net/mlx5: refactor multi-process communication
1. The shared data communication between the primary and the secondary
processes is implemented using Linux API. Move the Linux API code under
linux directory (file linux/mlx5_os.c).

2. File net/mlx5/mlx5_mp.c handles requests to the primary and secondary
processes (e.g. start_rxtx, stop_rxtx). It is Linux based so it is moved
under linux (new file linux/mlx5_mp_os.c).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
ef9ee13f6e net/mlx5: cleanup header file
The cleanup refers to header file mlx5.h.
1. Remove unused prototypes.
2. Move prototypes under their correct title.
3. Change functions to static and remove their prototye from the header
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
98c4b12afa net/mlx5: eliminate dependency on Linux in shared header
This commit eliminates Linux dependencies in shared file mlx5.h.

1. All functions using 'struct ifreq' are moved to file
linux/mlx5_ethdev_os.c such that this struct can be removed from mlx5.h.
2. Function mlx5_set_flags() that uses Linux flags (e.g. IFF_UP) is
changed to static and its prototype is removed from mlx5.h.
3. Remove redundant member verbs_action from 'struct mlx5_priv'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
4d18abd130 net/mlx5: wrap Linux promiscuous and multicast functions
This commit adds Linux implementation of routines mlx5_os_set_promisc()
and mlx5_os_set_promisc(). The routines call netlink APIs.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
ab27cdd93a net/mlx5: refactor Linux MAC operations
Move OS specific MAC operations add, remove, modify VF into file
linux/mlx5_os.c.
Remove unused function mlx5_get_mac().

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
2aba9fc725 net/mlx5: replace Linux specific calls
The following Linux calls are replaced by their matching rte APIs.

mmap ==> rte_mem_map()
munmap == >rte_mem_unmap()
sysconf(_SC_PAGESIZE) ==> rte_mem_page_size()

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Ophir Munk
3eca5f8a61 net/mlx5: move flow priority discovery to Verbs file
Function calls mlx5_flow_adjust_priority() and
mlx5_flow_discover_priorities() are Verbs based. Move them from file
mlx5_flow.c to file mlx5_flow_verbs.c

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
50f95b23c9 net/mlx5: add option to configure FCS or decapsulation
There are some limitations on some NICs (at least on ConnectX-6 Dx
and BlueField 2) with supporting FCS (frame checksum) scattering for
the tunnel decapsulated packets.

For the case only one of the features can be supported in the same time,
and the new devarg "decap_en" is introduced to provide the choice to the
users.

If FCS scattering feature is not supposed to be engaged by application,
this new devarg should be specified as "decap_en=0", forcing the FCS
feature enable and rejecting tunnel decap actions in the rte_flow engine.
If FCS scatter is not needed and application supposes to use tunnel
decapsulation in rte_flow, the devarg can be omitted or set to non-zero
value (this is default settings).

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
ac3fc732c4 net/mlx5: convert queue objects to unified malloc
This commit allocates the Rx/Tx queue objects from unified malloc
function.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
2175c4dc62 net/mlx5: convert configuration objects to unified malloc
This commit allocates the miscellaneous configuration objects from the
unified malloc function.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:46:30 +02:00
Suanming Mou
83c2047c5f net/mlx5: convert control path memory to unified malloc
This commit allocates the control path memory from unified malloc
function.

The objects be changed:

1. hlist;
2. rss key;
3. vlan vmwa;
4. indexed pool;
5. fdir objects;
6. meter profile;
7. flow counter pool;
8. hrxq and indirect table;
9. flow object cache resources;
10. temporary resources in flow create;

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Suanming Mou
5522da6b20 net/mlx5: add option to allocate memory from system
Currently, for MLX5 PMD, once millions of flows created, the memory
consumption of the flows are also very huge. For the system with limited
memory, it means the system need to reserve most of the memory as huge
page memory to serve the flows in advance. And other normal applications
will have no chance to use this reserved memory any more. While most of
the time, the system will not have lots of flows, the  reserved huge
page memory becomes a bit waste of memory at most of the time.

By the new sys_mem_en devarg, once set it to be true, it allows the PMD
allocate the memory from system by default with the new add mlx5 memory
management functions. Only once the MLX5_MEM_RTE flag is set, the memory
will be allocate from rte, otherwise, it allocates memory from system.

So in this case, the system with limited memory no need to reserve most
of the memory for hugepage. Only some needed memory for datapath objects
will be enough to allocated with explicitly flag. Other memory will be
allocated from system. For system with enough memory, no need to care
about the devarg, the memory will always be from rte hugepage.

One restriction is that for DPDK application with multiple PCI devices,
if the sys_mem_en devargs are different between the devices, the
sys_mem_en only gets the value from the first device devargs, and print
out a message to warn that.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
d7c49561d3 net/mlx5: add eCPRI flex parser capacity check
If the NIC or the FW does not support the dynamic flex parser,
it will return error when trying to create the parser for eCRPI.
Then it is hard to know the detail error reason of the failure.
Before creating the parser node and the following usage of the
parser, the capacity bit saved in the HCA_CAP could be used to
confirm if the dynamic flex parser is supported.
If no, an error will be returned directly with ENOTSUP to prevent
the following steps to be executed.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
1c5064044f net/mlx5: create and destroy eCPRI flex parser
eCPRI protocol has unified format layout for the variants, over
ETH layer (including .1Q) and UDP layer.

The common header of the message has 4 bytes fixed length, and the
message payload layers are different based on the type field. Now
only type #0, #2 and #5 will be supported, and 2 bytes are needed.

When creating the flex parser, the header will be extended to 8
bytes and 2 DW samples are needed. The 1st DW starts from offset 0
and will be used for the type field of the common header. The 2nd
DW starts from offset 4 and will be used for the physical channel
ID, real-time control ID or measurement ID fields.

The parser will be created once a flow with eCPRI item is observed
for the first time. After creating, it will remain in the system
and HW until the device is stopped. Right now, there is no need to
destroy the eCPRI flex parser after the last flow with eCPRI item
is destroyed. This is to get rid of the alternate states of creating
and destroying eCPRI flex parser with a single eCPRI flow.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
711aedf187 common/mlx5: add flex parser DevX structures
The structures and other definitions will be used for the dynamic
flex parser creation via Devx command interface. These structures
will be used as some some intermediate variables and input
parameters for the parser creation API.
It is better to keep all members consistent with the PRM definition
even though some of them will not be used.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
daa38a8924 net/mlx5: add flow translation of eCPRI header
In the translation stage, the eCPRI item should be translated into
the format that lower layer driver could use. All the fields that
need to match must be in network byte order after translation, as
well as the mask. Since the header in the item belongs to the network
layers stack, and the input parameter of the header is considered to
be in big-endian format already.

Base on the definition in the PRM, the DW samples will be used for
matching in the FTE/STE. Now, the type field and only the PC ID, RTC
ID, and DLY MSR ID of the payload will be supported. The masks should
be 00 ff 00 00 ff ff(00) 00 00 in the network order. Two DWs are
needed to support such matching. The mask fields could be zeros to
support some wildcard rules. But it makes no sense to support the
rule matching only on the payload but without matching type field.

The DW samples should be stored after the flex parser creation for
eCPRI. There is no need to query the sample IDs each time when
creating a flow rule with eCPRI item. It will not introduce
insertion rate degradation significantly.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Bing Zhao
c7eca23657 net/mlx5: add flow validation of eCPRI header
When creating a flow with eCPRI header item, the validation of it is
mandatory. The detailed limitations are listed below:
  1. Over Ether / VLAN, ethertype must be 0xAEFE.
  2. No tunnel support is described in the specification now.
  3. L3 layer is only supported when L4 is UDP, see #4.
  4. Over TCP is not supported from the specification, and over UDP
     is not supported right now.
  5. Concatenation indicator matching is not supported now.
  6. No need to check the revision.
  7. Only type field in the common header is mandatory, and one byte
     should be matched integrally.
  8. Fields in the message payload header are optional.
  9. Only messages with type #0, #2 and #5 are supported now.

Some limitations are only from software right now, because there is
no need to support all the message types and variants of protocol
stack listed in the specification.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
a2854c4de1 net/mlx5: convert Rx timestamps in real-time format
The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
3b025c0ca4 net/mlx5: provide send scheduling error statistics
The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:

  - txpp_err_miss_int - rearm queue interrupt was not handled
    was not handled in time and service routine might miss
    the completions

  - txpp_err_rearm_queue - reports errors in rearm queue
  - txpp_err_clock_queue - reports errors in clock queue

  - txpp_err_ts_past - timestamps in the packet being sent
    were found in the past, timestamps were ignored

  - txpp_err_ts_future - timestamps in the packet being sent
    were found in the too distant future (beyond HW/clock queue
    capabilities to schedule, typically it is about 16M of
    tx_pp devarg periods)

  - txpp_jitter - estimated jitter in device clocks between
    8K completions of Clock Queue.

  - txpp_wander - estimated wander in device clocks between
    16M completions of Clock Queue.

  - txpp_sync_lost - error flag, the Clock Queue completions
    synchronization is lost, accurate packet scheduling can
    not be handled, timestamps are being ignored, the restart
    of all ports using scheduling must be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
b94d93ca73 net/mlx5: support reading device clock
If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
2f827f5ea6 net/mlx5: support scheduling on send routine template
This patch adds send scheduling on timestamps into tx_burst
routine template. The feature is controlled by static configuration
flag, the actual routines supporting the new feature are generated
over this updated template.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
0febfcce36 net/mlx5: prepare Tx to support scheduling
The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
085ff447f0 net/mlx5: convert timestamp to completion index
The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
3172c471b8 net/mlx5: prepare Tx queue structures to support timestamp
The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
77522be0a5 net/mlx5: introduce clock queue service routine
Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:

- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
  continuously running and pushing cross-channel requests
  to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
aef1e20ebe net/mlx5: allocate packet pacing context
This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularity and assigns to Clock Queue.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
3a87b964ed net/mlx5: create Tx queues with DevX
To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
551c94c83e net/mlx5: create rearm queue for packet pacing
The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization might be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
Clock Queue in robust way.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
d133f4cdb7 net/mlx5: create clock queue for packet pacing
This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
fc4d4f732b net/mlx5: introduce shared UAR resource
This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
24feb04596 net/mlx5: fix UAR lock sharing for multiport devices
The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.

Fixes: f048f3d479 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Viacheslav Ovsiienko
8f848f32fc net/mlx5: introduce send scheduling devargs
This patch introduces the new devargs:

tx_pp - enables accurate packet send scheduling on mbuf timestamps
  in the PMD. On the device start if "rte_dynflag_timestamp"
  dynamic flag is registered and this devarg non-zero value is
  specified, the driver initializes all necessary internal
  infrastructure to provide packet scheduling. The parameter
  value specifies scheduling granularity in nanoseconds.

tx_skew - the parameter adjusts the send packet scheduling on
  timestamps and represents the average delay between beginning
  of the transmitting descriptor processing by the hardware and
  appearance of actual packet data on the wire. The value should
  be provided in nanoseconds and is valid only if tx_pp parameter
  is specified. The default value is zero.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-21 15:44:36 +02:00
Ali Alnubani
28c9a7d7b4 net/mlx5: add ConnectX-6 Lx device ID
This adds the ConnectX-6 Lx device id to the list of supported
Mellanox devices that run the MLX5 PMD.
The device is still in development stage.

Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Raslan Darawsheh <rasland@mellanox.com>
2020-07-11 06:18:53 +02:00
Joyce Kong
428e684795 introduce restricted pointer aliasing marker
The 'restrict' keyword is recognized in C99, while type qualifier
'__restrict' compiles ok in C with all language levels. This patch
is to replace the existing 'restrict' with '__rte_restrict' which
is a common wrapper supported by all compilers.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-10 15:35:32 +02:00
Shy Shyman
5f3541724e net/mlx5: fix flow META item validation
When flow is inserted with meta match item it requires a certain
register support.
As part of the flow validation of such flows, the validation
function is missing a check that the mlx5 driver is not in
legacy mode in terms of extended meta data support
(MLX5_XMETA_MODE_LEGACY flag).
If the driver is in legacy mode it will cause downstream
function that allocates needed register for meta data.

The fix checks explicitly the conditions for support of
meta data in FDB mode. If the conditions are not met
an error message will be issued.

Fixes: 9bf26e1318 ("ethdev: move egress metadata to dynamic field")
Cc: stable@dpdk.org

Signed-off-by: Shy Shyman <shys@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
b293fbf967 net/mlx5: add OS specific flow actions operations
This patch introduces the OS specific functions, for flow actions
create and destroy operations.

In existing implementation, the functions to create flow actions
return a pointer to the created action object.

The new OS specific functions to create flow actions return 0 on
success, and (-1) on failure.
On success, a pointer to the created action object is returned
using an additional parameter.
On failure errno is set.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
e57b858710 net/mlx5: add OS specific flow create and destroy
This patch introduces the OS specific functions, for flow create
and flow destroy operations.

In existing implementation, the functions to create objects
(flow/table/matcher) return a pointer to the created object.
The functions to destroy objects return 0 on success and errno on
failure.

The new OS specific functions to create objects return 0 on success,
and (-1) on failure.
On success, a pointer to the created object is returned using an
additional parameter.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
e4ed8de39b net/mlx5: add OS specific flow type selection
In current implementation the flow type (DV/Verbs) is selected
using dedicated function flow_get_drv_type().

This patch adds OS specific function mlx5_flow_os_get_type(), to
allow OS specific flow type selection.
The new function is called by flow_get_drv_type(), and if it returns a
valid value (DV/Verbs) no more logic is required.
Otherwise the existing logic is executed.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
17ad3af9f4 net/mlx5: add OS specific flow related utilities
This patch introduces the first OS specific utility functions,
for use by flow engine in different OS implementation.

The first utility functions are:
bool mlx5_flow_os_item_supported(item)
bool mlx5_flow_os_action_supported(action)

They are implemented to check OS specific support for different
item types and action types.

New header file is added:
drivers/net/mlx5/linux/mlx5_flow_os.h

This file contains the utility functions mentioned above for Linux OS.
At this stage they are implemented as static inline, for efficiency,
and always return true.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
6ad7cfaa66 net/mlx5: rename Verbs action to generic name
As part of the effort to support DPDK on Windows and other OS,
rename 'verbs_action' to the generic name 'action'.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Dekel Peled
341c894104 net/mlx5: rename Verbs flow to generic name
As part of the effort to support DPDK on Windows and other OS,
rename from IB related name to generic name.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-07-07 23:38:26 +02:00
Jerin Jacob
9c99878aa1 log: introduce logtype register macro
Introduce the RTE_LOG_REGISTER macro to avoid the code duplication
in the logtype registration process.

It is a wrapper macro for declaring the logtype, registering it and
setting its level in the constructor context.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
2020-07-03 15:52:51 +02:00
Michael Baum
0f006468c5 net/mlx5: fix iterator type in Rx queue management
The mlx5_check_vec_rx_support function in the mlx5_rxtx_vec.c file
passes the RX queues array in the loop. Similarly, the mlx5_mprq_enabled
function in the mlx5_rxq.c file passes the RX queues array in the loop.

In both cases, the iterator of the loop is called i and the variable
representing the array size is called rxqs_n.
The i variable is of UINT16_T type while the rxqs_n variable is of
unsigned int type. The size of the rxqs_n variable is much larger than
the number of iterations allowed by the i type, theoretically there may
be a situation where the value of the rxqs_n will be greater than can be
represented by 16 bits and the loop will never end.

Change the type of i to UINT32_T.

Fixes: 7d6bf6b866 ("net/mlx5: add Multi-Packet Rx support")
Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Michael Baum
36dabcea78 net/mlx5: use anonymous Direct Verbs allocator argument
The mlx5_dev_spawn function defines an struct mlx5dv_ctx_allocators type
variable several hundred rows after it starts, with the only use it
being passed as a parameter to the mlx5_glue->dv_set_context_attr
function.
However, according to DPDK Coding Style Guidelines, variables should be
declared at the start of a block of code rather than in the middle.
Therefore, to improve the Coding Style, the variable is passed directly
to the function without declaring it before.

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Michael Baum
a294e58c80 net/mlx5: use direct API to find port by device
Using RTE_ETH_FOREACH_DEV_OF loop is not necessary when the driver wants
to find only the first match.

Use rte_eth_find_next_of to find it.

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Shiri Kuzin
0f0ae73a32 net/mlx5: add parameter for LACP packets control
The new devarg will control the steering of the lacp traffic.
When setting dv_lacp_by_user = 0 the lacp traffic will be
steered to kernel and managed there.

When setting dv_lacp_by_user = 1 the lacp traffic will
not be steered and the user will need to manage it.

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Shiri Kuzin
3c78124f0a net/mlx5: add default miss action to flow engine
The new action is an internal mlx5 action that will call
the rdma-core function MLX5DV_FLOW_ACTION_DEFAULT_MISS.

The default miss action will be used when a bond is
configured to allow traffic related to the bond to
be managed in the kernel.

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Viacheslav Ovsiienko
420bbdae89 net/mlx5: fix host physical function representor naming
The new kernel adds the names like "pf0" for Host PCI physical
function representor on Bluefield SmartNIC hosts. This patch
provides correct HPF representor recognition over the kernel
versions 5.7 and laters.

The following port naming formats are supported:

  - missing physical port name (no sysfs/netlink key) at all,
    master is assumed

  - decimal digits (for example "12"), representor is
    assumed, the value is the index of attached VF

  - "p" followed by decimal digits, for example "p2", master
    is assumed

  - "pf" followed by PF index, for example "pf0", Host PF
     representor is assumed on SmartNIC systems.

  - "pf" followed by PF index concatenated with "vf" followed by
     VF index, for example "pf0vf1", representor is assumed.
     If index of VF is "-1" it is a special case of Host PF
     representor, this representor must be indexed in devargs
     as 65535, for example representor=[0-3,65535] will
     allow representors for VF0, VF1, VF2, VF3 and for host PF.

Fixes: 79aa430721 ("common/mlx5: split common file under Linux directory")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Ori Kam
262c7ad0dd common/mlx5: move doorbell record from net driver
The creation of DBR can be used by a number of different
Mellanox PMDs. for example RegEx / Net / VDPA.

This commits moves the DBR creation and release functions to common
folder.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-30 14:52:30 +02:00
Ophir Munk
391b8bcc81 common/mlx5: move some getter functions from net driver
Getter functions such as: 'mlx5_os_get_ctx_device_name',
'mlx5_os_get_ctx_device_path', 'mlx5_os_get_dev_device_name',
'mlx5_os_get_umem_id' are implemented under net directory. To enable
additional devices (e.g. regex, vdpa) to access these getter functions
they are moved under common directory.

As part of this commit string sizes DEV_SYSFS_NAME_MAX and
DEV_SYSFS_PATH_MAX are increased by 1 to make sure that the destination
string size in strncpy() function is bigger than the source string size.
This update will avoid GCC version 8 error -Werror=stringop-truncation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
ac79183dc6 net/mlx5: optimize free counter lookup
Currently, when allocate a new counter, it needs loop the whole
container pool list to get a free counter.

In the case with millions of counters allocated, and all the pools
are empty, allocate the new counter will still need to loop the
whole container pool list first, then allocate a new pool to get a
free counter. It wastes the cycles during the pool list traversal.

Add a global free counter list in the container helps to get the free
counters more efficiently.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:30 +02:00
Suanming Mou
b1cc226644 net/mlx5: optimize single counter pool search
For single counter, when allocate a new counter, it needs to find the pool
it belongs in order to do the query together.

Once there are millions of counters allocated, the pool array in the
counter container will become very large. In this case, the pool search
from the pool array will become extremely slow.

Save the minimum and maximum counter ID to have a quick check of current
counter ID range. And start searching the pool from the last pool in the
container will mostly get the needed pool since counter ID increases
sequentially.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Suanming Mou
632f0f1905 net/mlx5: manage shared counters in three-level table
Currently, to check if any shared counter with same ID existing, it will
have to loop the counter pools to search for the counter. Even add the
counter to the list will also not so helpful while there are thousands
of shared counters in the list.

Change Three-Level table to look up the counter index saved in the
relevant table entry will be more efficient.

This patch introduces the Three-level table to save the ID relevant
counter index in the table. Then the next while the same ID comes, just
check the table entry of this ID will get the counter index directly.
No search will be needed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-30 14:52:29 +02:00
Suanming Mou
bd81eaebd9 net/mlx5: add three-level table utility
For the case which data is linked with sequence increased index, the
array table will be more efficient than hash table once need to search
one data entry in large numbers of entries. Since the traditional hash
tables has fixed table size, when huge numbers of data saved to the hash
table, it also comes lots of hash conflict.

But simple array table also has fixed size, allocates all the needed
memory at once will waste lots of memory. For the case don't know the
exactly number of entries will be impossible to allocate the array.

Then the multiple level table helps to balance the two disadvantages.
Allocate a global high level table with sub table entries at first,
the global table contains the sub table entries, and the sub table will
be allocated only once the corresponding index entry need to be saved.
e.g. for up to 32-bits index, three level table with 10-10-12 splitting,
with sequence increased index, the memory grows with every 4K entries.

The currently implementation introduces 10-10-12 32-bits splitting
Three-Level table to help the cases which have millions of entries to
save. The index entries can be addressed directly by the index, no
search will be needed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-30 14:52:29 +02:00
David Marchand
63783b0172 net/mlx5: remove redundant newline from logs
The DRV_LOG macro already appends a newline.

Fixes: 46287eacc1 ("net/mlx5: introduce hash list")
Fixes: 860897d289 ("net/mlx5: reorganize flow tables with hash list")
Fixes: e484e40323 ("net/mlx5: optimize tag traversal with hash list")
Fixes: 6801116688 ("net/mlx5: fix multiple flow table hash list")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
2020-06-30 14:52:29 +02:00
Matan Azrad
aec086c9f1 common/mlx5: share kernel interface name getter
Some configuration of the mlx5 port are done by the kernel net device
associated to the IB device represents the PCI device.

The DPDK mlx5 driver uses Linux system calls, for example ioctl, in
order to configure per port configurations requested by the DPDK user.

One of the basic knowledges required to access the correct kernel net
device is its name.

Move function to get interface name from IB device path to the common
library.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-06-30 14:52:29 +02:00
Ophir Munk
4f96d91396 net/mlx5/linux: add memory region callbacks to Verbs
Create a set of verbs callbacks in 'struct mlx5_verbs_ops'
and add MR operations to it (file net/mlx5/linux/mlx5_verbs.c).

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-17 16:32:01 +02:00
Ophir Munk
d5ed8aa944 net/mlx5: add memory region callbacks in per-device cache
Prior to this commit MR operations were verbs based and hard coded under
common/mlx5/linux directory. This commit enables upper layers (e.g.
net/mlx5) to determine which MR operations to use. For example the net
layer could set devx based MR operations in non-Linux environments. The
reg_mr and dereg_mr callbacks are added to the global per-device MR
cache 'struct mlx5_mr_share_cache'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-17 16:32:01 +02:00
Dong Zhou
eb10fe7fb1 net/mlx5: fix LRO checksum
The TCP checksum includes IPV4 pseudo-header checksum and L3
payload checksum which include TCP header and TCP payload.
When mlx5 LRO is enabled, HW will calculate the TCP payload
checksum, PMD need complete the IPV4 pseudo-header checksum
and the TCP header checksum.

The mlx5_lro_update_tcp_hdr function completes the TCP header
checksum, but this function using lower 4 bits of data-offset
field in TCP header to get the whole TCP header length, this
will cause TCP header checksum wrong calculation.

Update the code using higher 4 bits of data-offset field
instead of lower 4 bits.

Fixes: e4c2a16eb1 ("net/mlx5: handle LRO packets in Rx queue")
Cc: stable@dpdk.org

Signed-off-by: Dong Zhou <dongz@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Alexander Kozyrev
e891b54a9e net/mlx5: fix descriptors number adjustment
The number of descriptors to configure in a Rx/Tx queue is passed to
the mlx5_tx/rx_queue_pre_setup() function by value. That means any
adjustments of this variable are local and cannot affect the actual
value that is used to allocate mbufs in the mlx5_txq/rxq_new()
functions. Pass the number as a reference to actually update it.

Fixes: 6218063b39 ("net/mlx5: refactor Rx data path")
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Alexander Kozyrev
a23d96ae59 net/mlx5: do not select legacy MPW implicitly
The Legacy MPW (multi-packet write) should not be engaged implicitly.
We should exclude this function from a Tx burst routine selection
process unless it is requested specifically by setting the txq_mpw_en
devarg.  Exclude this function from the selection process the same way
it is done for the Enhanced MPW in the mlx5_select_tx_function()
routine.

Fixes: eb8121ab9d ("net/mlx5: introduce Tx burst routine template")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
73bf9235e9 net/mlx5: refactor statistics
mlx5 statistics are calculated by several methods:
1. In software when packets go through datapath.
2. Calling ioctl with ETHTOOL command (Linux specific).
3. Reading counters from SYSFS device path (Linux specific).

The Linux related functions are moved to file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
042f5c94fd net/mlx5: refactor device operations for Linux
There are three types of eth_dev_ops: primary, secondary and isolate.
Their function calls assignments are moved from common file
mlx5.c to the Linux specific file linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
1256805dd5 net/mlx5: move Linux-specific functions
File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for
functions which are Linux specific. Functions which are Linux agnostics
remain in mlx5_ethdev.c file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
f484ffa1b1 net/mlx5: move socket files in Linux directory
mlx5_socket.c file is using APIs which are Linux specifics.  Therefore
move it (including mlx5_socket.h) from net/mlx5 directory to
net/mlx5/linux directory. This commit also updates the Makefile and
the meson files.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
9138989036 net/mlx5: rename ib in names
Renames in this commit:
mlx5_ibv_list -> mlx5_dev_ctx_list
mlx5_alloc_shared_ibctx -> mlx5_alloc_shared_dev_ctx
mlx5_free_shared_ibctx -> mlx5_free_shared_dev_ctx
mlx5_ibv_shared_port -> mlx5_dev_shared_port
ibv_port -> dev_port

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
21b7c452a6 net/mlx5: remove completion object dependency on DV
Replace 'struct mlx5dv_devx_cmd_comp *' with 'void *' in 'struct
mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Gregory Etelson
5c76123810 net/mlx5: fix flow memory allocation size
In DV enabled MLX5 PMD build mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size
was initiated for DV structure. If RTE initialization encountered MLX5
PCI function with disabled DV support
mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW].size was reduced to match legacy
verbs flow size.  Since mlx5_ipool_cfg[MLX5_IPOOL_MLX5_FLOW] is a
global variable that change reflected on DV enabled MLX5 PCI functions
too.

Running flow with invalid ipool size crashes PMD.

The patch adjusts ipool flow size for each active PCI function.

Fixes: b88341ca35 ("net/mlx5: convert flow dev handle to indexed")
Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Dekel Peled
7842dfeacd net/mlx5: fix GTP mask definition location
Recent patch added definition of mask MLX5_GTP_FLAGS_MASK, just
above function flow_dv_validate_item_gtp(), where it is used.

Patch was applied together with other patches which modified the same
file, so the mask was located further away from the function it is
used in.

This patch moves the mask definition to the proper location.

Fixes: 563ac307a4 ("net/mlx5: support match on GTP flags")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ali Alnubani
3efac8085e net/mlx5: fix typos in meter error messages
Fixes: 3bd26b23ce ("net/mlx5: support meter profile operations")
Cc: stable@dpdk.org

Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
834a9019ec net/mlx5: remove Verbs dependency in spawn struct
1. Replace 'struct ibv_device *' with 'void *' in 'struct
mlx5_dev_spawn_data'. Define a getter function to retrieve the
device name.
2. Rename ibv_dev and ibv_port as phys_dev and phys_port
respectively.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
10f3581dfd net/mlx5: add Linux-specific header file
File drivers/net/linux/mlx5_os.h is added. It includes specific
Linux definitions such as PCI driver flags, link state changes
interrupts, link removal interrupts, etc.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
2eb4d0107a net/mlx5: refactor PCI probing on Linux
Refactor PCI probing related code. Move Linux specific functions (as
well as verbs and dv related code) from mlx5.c file to linux/mlx5_os.c
file.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c7f6ba0e53 net/mlx5: remove umem field dependency on Direct Verbs
umem field is used in several structs. Its type 'struct mlx5dv_devx_umem
*' is changed to 'void *'. This change will allow non-Linux OS
compilations.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
e85f623e13 net/mlx5: remove attributes dependency on Verbs
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
c468501658 common/mlx5: remove protection domain dependency on Verbs
Replace 'struct ibv_pd *' with 'void *' in struct mlx5_ctx_shared and
all function calls in mlx5 PMD.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
f44b09f9e3 net/mlx5: add Linux-specific file with getter functions
'ctx' type (field in 'struct mlx5_ctx_shared') is changed from 'struct
ibv_context *' to 'void *'.  'ctx' members which are verbs dependent
(e.g. device_name) will be accessed through getter functions which are
added to a new file under Linux directory: linux/mlx5_os.c.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Ophir Munk
6e88bc42c7 net/mlx5: rename Verbs shared object
Replace all 'mlx5_ibv_shared' appearances with 'mlx5_dev_ctx_shared'.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-16 19:21:07 +02:00
Alexander Kozyrev
c9cc554ba4 net/mlx5: fix vectorized Rx burst termination
Maximum burst size of Vectorized Rx burst routine is set to
MLX5_VPMD_RX_MAX_BURST(64). This limits the performance of any
application that would like to gather more than 64 packets from
the single Rx burst for batch processing (i.e. VPP).

The situation gets worse with a mix of zipped and unzipped CQEs.
They are processed separately and the Rx burst function returns
small number of packets every call.

Repeat the cycle of gathering packets from the vectorized Rx routine
until a requested number of packets are collected or there are no
more CQEs left to process.

Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-03 17:20:32 +02:00
Suanming Mou
a1da6f624c net/mlx5: add reclaim memory mode
Currently, when flow destroyed, some memory resources may still be kept
as cached to help next time create flow more efficiently.

Some system may need the resources to be more flexible with flow create
and destroy.  After peak time, with millions of flows destroyed, the
system would prefer the resources to be reclaimed completely, no cache
is needed. Then the resources can be allocated and used by other
components. The system is not so sensitive about the flow insertion
rate, but more care about the resources.

Both DPDK mlx5 PMD driver and the low level component rdma-core have
provided the flow resources to be configured cached or not, but there is
no APIs or parameters exposed to user to configure the flow resources
cache mode. In this case, introduce a new PMD devarg to let user
configure the flow resources cache mode will be helpful.

This commit is to add a new "reclaim_mem_mode" to help user configure if
the destroyed flows' cache resources should be kept or not.

Their will be three mode can be chosen:
1. 0(none). It means the flow resources will be cached as usual. The
resources will be cached, helpful with flow insertion rate.
2. 1(light). It will only enable the DPDK PMD level resources reclaim.
3. 2(aggressive). Both DPDK PMD level and rdma-core low level will be
configured as reclaimed mode.

With these three mode, user can configure the resources cache mode with
different levels.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-06-03 17:19:26 +02:00
Ophir Munk
72f7566056 common/mlx5: move glue files under Linux directory
The glue file mlx5_glue.c is based on Linux specifics APIs.
Move it (including file mlx5_glue.h) to common/mlx5/linux directory.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-03 17:19:26 +02:00
Suanming Mou
33860cfab6 net/mlx5: fix interrupt installation timing
Currently, the DevX counter query works asynchronously with Devx
interrupt handler return the query result. When port closes, the
interrupt handler will be uninstalled and the Devx comp obj will
also be destroyed. Meanwhile the query is still not cancelled.

In this case, counter query may use the invalid Devx comp which
has been destroyed, and query failure with invalid FD will be
reported.

Adjust the shared interrupt install and uninstall timing to make
the counter asynchronous query stop before interrupt uninstall.

Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Suanming Mou
2786b7bf90 net/mlx5: fix secondary process resources release
When secondary process starts, it will allocate its own process private
data, and also does remap to UAR register of the Tx queue. Once the
secondary process exits, these resources should be released accordingly.
And the shared resources owned by primary should not be touched.

Currently, once one port in the secondary process spawn failed, all the
other spawned ports will also be released during process exits. However,
the mlx5_dev_close() function does not add the cases for secondary
process, it means call the mlx5_dev_close() function directly in
secondary process releases the resources it should not touch.

Add the case for secondary process release to its own resources in
mlx5_dev_close() function to help it quits gracefully.

Fixes: 942d13e6e7 ("net/mlx5: fix sharing context destroy order")
Fixes: 3a8207423a ("net/mlx5: close all ports on remove")
Cc: stable@dpdk.org

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Michael Baum
01de93f245 net/mlx5: fix unreachable MPLS error path
The mlx5_flow_validate_item_mpls function checks MPLS item validation.
It first checks if the device supports MPLS, it is done using the ifdef
condition that if it fails to skip to endif and return the appropriate
error.

When MPLS is supported, the preprocessor will copy the body of the
function ending with return 0 followed by the lines that report MPLS
support.
In fact, these lines are unreachable because before them the function
returns 0 and in any case they are unnecessary.

Replace the endif by else and move endif to the end of the
function.

Fixes: 23c1d42c71 ("net/mlx5: split flow validation to dedicated function")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Michael Baum
c55ec83b58 net/mlx5: remove needless Tx queue initialization check
The mlx5_txq_obj_new function defines a pointer named txq_data and
assign value into it. After assigning, the code writer is sure that the
variable does not point to NULL and even express it using assertion.

During the function, the function does dereferencing to the pointer
several times and at no point change its value. However, at the end of
the function at the error label when it wants to free one of the fields
of the structure that txq_data points to, it checks again whether
txq_data is invalid.
This check is unnecessary since it knows for sure that txq_data is
valid.

Remove the aforementioned needless check.

Fixes: 6449068818 ("net/mlx5: add free on completion queue")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:24 +02:00
Michael Baum
50181d9965 net/mlx5: fix socket close
The mlx5_pmd_socket_handle function calls the accept function that
returns the socket descriptor into the conn_sock variable. The socket
descriptor value can be 0 (according to accept API) or positive and so
immediately after calling the function it checks whether conn_sock < 0.
Later in the function when other things fail it jumps to the error label
and release previously allocated resources (such as socket or file).

During the resource release, it checks whether the variable conn_sock
containing the socket descriptor is positive and if it is, it releases
it. However, in this check it misses the case where conn_sock == 0, in
this case the socket will not be released and there will be a Resource
leak.

Extend the close condition for 0 value too.

Fixes: e6cdc54cc0 ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:23 +02:00
Michael Baum
a943102fc6 net/mlx5: remove unnecessary init in socket creation
In the mlx5_pmd_socket_handle function it calls the recvmsg function
which returns the number of bytes read. The function assigns this return
value into a ret variable defined at the beginning of the function.
Similarly in the mlx5_pmd_socket_init function the it calls the socket
function which returns a file descriptor for the new socket. The
function also assigns this return value into a ret variable defined at
the beginning of the function.

In both functions they initialize the variable when defining it,
however, in both cases they do not use any ret variable before assigning
the return value from the function, so the initialization is
unnecessary.

Clean the aforementioned unnecessary initializations.

Fixes: e6cdc54cc0 ("net/mlx5: add socket server for external tools")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:23 +02:00
Michael Baum
ebed623f62 net/mlx5: fix hairpin Rx queue creation error path
The mlx5_rxq_obj_hairpin_new function defines a pointer named tmpl and
allocates memory for it using the rte_zmalloc_socket function.
Later, this function allocates memory to a variable inside tmpl using
the mlx5_devx_cmd_create_rq function.

In both cases, if the allocation fails, the code jumps to the error
label and frees allocated resources. However, in the first jump there
are still no resources to free and the jump only for the line return
NULL is unnecessary. Even worse, when it jumps to error label with
invalid tmpl it actually does dereference to a null pointer.
In contrast, the second jump needs to free the tmpl variable but the
function instead of freeing, tries to free the variable that it just
failed to allocate.
In addition, for another error, the function returns NULL without
freeing the tmpl variable before, causing a memory leak.

Delete the error label and replace each jump with local return NULL and
free tmpl variable if needed.

Fixes: e79c9be915 ("net/mlx5: support Rx hairpin queues")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:23 +02:00
Michael Baum
7e6eba619d net/mlx5: fix hairpin Tx queue creation error path
The mlx5_txq_obj_hairpin_new function defines a pointer named tmpl and
allocates memory for it using the rte_zmalloc_socket function.
Later, this function allocates memory to a variable inside tmpl using
the mlx5_devx_cmd_create_sq function.

In both cases, if the allocation fails, the code jumps to the error
label and frees allocated resources. However, in the first jump there
are still no resources to free and the jump only for the line return
NULL is unnecessary. Even worse, when it jumps to error label with
invalid tmpl it actually does dereference to a null pointer.
In contrast, the second jump needs to free the tmpl variable but the
function instead of freeing, tries to free the variable that it just
failed to allocate, and another variable that has never been allocated.
In addition, for another error, the function returns NULL without
freeing the tmpl variable before, causing a memory leak.

Delete the error label and replace each jump with local return NULL and
free tmpl variable if needed.

Fixes: ae18a1ae96 ("net/mlx5: support Tx hairpin queues")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-06-02 16:06:23 +02:00
Dekel Peled
b0447b5470 net/mlx5: revert DevX preference for Rx objects
Recent patch exposed a minor performance issue,
so it is reverted.

Fixes: d237d22fbe ("net/mlx5: prefer DevX API to create Rx objects")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-21 17:59:29 +02:00
Muhammad Bilal
5a448a55b4 fix same typo in multiple places
Removed the typing error in doc/guides/eventdevs/index.rst,
drivers/net/mlx5/mlx5.c and in lib/librte_vhost/rte_vhost.h

Bugzilla ID: 477
Fixes: 0857b94211 ("doc: add event device and software eventdev")
Fixes: 039253166a ("vhost: add device op when notification to guest is sent")
Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")
Cc: stable@dpdk.org

Signed-off-by: Muhammad Bilal <m.bilal@emumba.com>
2020-05-19 15:55:57 +02:00
Bing Zhao
29e091cefe net/mlx5: fix port action resource initialization
After memory optimization, the organization of some resources are
changed from pointer based LIST to the index based ILIST. A lot of
code parts are touched due to such change.
Some static code checking and analysis tool will complain and raise
a false warning on the uninitialized value using. E.g. in the port
action registering function, the stack variable will be used as the
right value with some uninitialized field to initialize variable
allocated from heap. But indeed, it is not an error because all the
fields set with the uninitialized value will be overwritten in the
following code part and the macros. All the fields will be used as
the left value explicitly.
It makes no sense to clear the stack variable to 0 in this case,
and the extra memset will introduce some cycles overhead. It just
needs to ignore the false warning from the tool, if any.

Fixes: f3faf9ea11 ("net/mlx5: convert port id action to indexed")

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Reviewed-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Bing Zhao
d3b61f4b7c net/mlx5: fix port action assert timing
After memory optimization, some action object handles are changed to
index to save the overhead. Assertion in debug mode will be helpful
for trouble shooting.
In the current implementation, only one port action is supported in
switchdev mode for one device flow. In debug mode, an assertion will
be used to check the if the port action is none, and it should
locate before the port action resource registration but not after
it. The action index in the handle should be 0 before registration.
Or else it will always cause a failure because the port action is
registered and the index is not 0.

Fixes: f3faf9ea11 ("net/mlx5: convert port id action to indexed")

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Reviewed-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Suanming Mou
a95decbc9d net/mlx5: fix shared flow counter lookup
Currently, the shared counter search uses the wrong nested index which
is used by the pool index. The incorrect nested index using causes the
search go to incorrect counter pool is not existed.

Add the counter index to fix the incorrect nested use case.

Fixes: 4001d7ad26 ("net/mlx5: change Direct Verbs counter to indexed")

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:57 +02:00
Bing Zhao
25a59a3076 net/mlx5: fix doorbell bitmap management offsets
The doorbell record is organized with page and bitmap. When some new
doorbell needs to be associated with a queue, the bit will be set
in the bitmap to indicate the corresponding doorbell occupied. A
counter is used to record the number of doorbell occupied to speed
up the searching.
If the number reaches the maximal value of a pre-defined number of a
page, a new page will be allocated. If not, then the bitmap will be
checked to find a free one.
The LSHIFT and OR (AND NOT) operations are used to update the bitmap
of a page. But 1 will be treated as a signed integer when compiling.
When the shift number is 31, the shifted value will be considered as
negative. Then a wrong extension will be done when setting it to a
64-bits variable. All the upper 32-bits will be set to 1 by such
extension.
Then a wrong offset value will be calculated because of this. The
next 64 bits will be also treated as the bitmap and get corrupted
through the bit set operation.
The immediate value 1 needs to be used as 64 bits width explicitly.

Fixes: 21cae8580f ("net/mlx5: allocate door-bells via DevX")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:57 +02:00
Suanming Mou
d71d5b949c net/mlx5: fix Verbs counter pool allocation
When create the Verbs flows with counter, randomly SEGSEV will also
comes. The reason is that the counter pool memory is not allocated
sufficiently and initialized correctly in Verbs case.

As the mlx5_flow_counter array member is moved out of the counter pool
struct, the counter pool memory layout currently contain implicitly
with mlx5_flow_counter, mlx5_age_param(if the pool is an age pool),
mlx5_flow_counter_ext(if the pool is a none batch pool). When allocate
the pool memory, the pool size should be calculated based on the pool
type accordingly.

Currently, for Verbs counter pool, both mlx5_flow_counter and
mlx5_flow_counter_ext need to be taken into account in the pool size.
And the pool type should also be initialized as CNT_POOL_TYPE_EXT.

This patch add the missing size and type for the Verbs counter pool.

Fixes: 8d93c830e4 ("net/mlx5: modify ext-counter memory allocation")

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:57 +02:00
Dekel Peled
ff55182ce3 net/mlx5: fix VLAN flow action with wildcard VLAN item
Previous patch added support of VLAN item without VLAN ID value,
i.e. using wildcard VLAN item, to match VLAN with any VLAN ID.
The implication on VLAN actions was not taken into consideration.
VLAN actions (e.g. push vlan) use the VLAN ID value in the VLAN item,
and expect it to be valid.

This patch updates function flow_dev_get_vlan_info_from_items() to
check the VLAN item contents before trying to use it.

Fixes: 92818d839e ("net/mlx5: fix match on empty VLAN item in DV mode")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Matan Azrad
5af61440dd net/mlx5: fix flow counter container resize
The design of counter container resize used double buffer algorithm in
order to synchronize between the query thread to the control thread.
When the control thread detected resize need, it created new bigger
buffer for the counter pools in a new container and change the container
index atomically.
In case the query thread had not detect the previous resize before a new
one need was detected by the control thread, the control thread returned
EAGAIN to the flow creation API used a COUNT action.

The rte_flow API doesn't allow unblocked commands and doesn't expect to
get EAGAIN error type.

So, when a lot of flows were created between 2 different periodic
queries, 2 different resizes might try to be created and caused EAGAIN
error.
This behavior may blame flow creations.

Change the synchronization way to use lock instead of double buffer
algorithm.

The critical section of this lock is very small, so flow insertion
rate should not be decreased.

Fixes: ebbac312e4 ("net/mlx5: resize a full counter container")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-05-18 20:35:57 +02:00
Shiri Kuzin
4c204fe5e5 common/mlx5: disable relaxed ordering in unsuitable CPUs
Relaxed ordering is a PCI optimization that enables reordering
reads/writes in order to improve performance.

Relaxed ordering was enabled for all processors causing
a degradation in performance in Haswell and Broadwell processors
that don't support this optimization.

In order to avoid that we check if the processor is Haswell
or Broadwell and if so we disable relaxed ordering.

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:57 +02:00
Shiri Kuzin
ffd5b302ba common/mlx5: fix relaxed ordering count object
In order to improve performance relaxed ordering was enabled
when creating count object using Devx.

Currently rte enables this optimization by default when using
Devx.

This causes an issue when using firmware that does not have this
capability causing a count object failure.

In order to fix this issue a check of firmware capabilities was
added before enabling relaxed ordering.

Fixes: 53ac93f71a ("net/mlx5: create relaxed ordering memory regions")

Signed-off-by: Shiri Kuzin <shirik@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:56 +02:00
Dekel Peled
d237d22fbe net/mlx5: prefer DevX API to create Rx objects
Currently, DevX API is used to create Rx objects (RQ, RQT, TIR) only
if LRO or hairpin features are enabled on this RQ.

This patch uses DevX API by default, if DevX is supported and can be
used. Otherwise, Verbs API is used.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-18 20:35:56 +02:00
Dekel Peled
563ac307a4 net/mlx5: support match on GTP flags
This patch adds to MLX5 PMD the support of matching on
GTP header item v_pt_rsv_flags.

This item is contained in 1 byte of the format:
-------------------------------------------
| bit   | 0 - 2   | 3  | 4   | 5 | 6 | 7  |
|-----------------------------------------|
| value | Version | PT | Res | E | S | PN |
-------------------------------------------

Matching is supported only for GTP flags E, S, PN.
Therefore values 0 to 7 are supported.

Mask must be set accordingly:
... gtp v_pt_rsv_flags is 1 v_pt_rsv_flags mask 0x07 ...

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-11 22:27:39 +02:00
Alexander Kozyrev
776aec28fc net/mlx5: fix Tx queue release debug log timing
Program received signal SIGSEGV, Segmentation fault.
0x00000000008ef7c4 in mlx5_tx_queue_release (dpdk_txq=0x17ce01680) at
drivers/net/mlx5/mlx5_txq.c:302
301 mlx5_txq_release(ETH_DEV(priv), i);
302 DRV_LOG(DEBUG, "port %u removing Tx queue %u from list",
303         PORT_ID(priv), txq->idx);
The problem is txq is freed inside the mlx5_txq_release() function
and no longer valid in the debug log right after this invocation.
Move the debug log before the mlx5_txq_release() function to fix this.

Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-11 22:27:39 +02:00
Michael Baum
c8f0abe7f8 net/mlx5: fix meter color register consideration
The mlx5_flow_get_reg_id() function translates tag ID to register
from the registers that are supported and available for use. The
user does not know which register is available at a time and therefore
there is an array that represents mapping to the available registers.
Usually the free registers are continuous in the flow_mreg_c array but
sometimes the mtr_color_reg register is between them and it must be
skipped and the next register returned, in which case the function
returns the mapping of the next entity in the array.

When the function reads from the next entity in the array, it does not
check whether such an entity exists and in some situation invalid access
to memory occurs beyond the array boundaries.

So, when all the registers are valid from HW perspective and the meter
color register is not the default, the tag id 5 causes an out of bound
access.

Validate registers availability when meter color register is not the
default.

Coverity issue: 146355
Fixes: 792e749e92 ("net/mlx5: fix register usage in meter")
Cc: stable@dpdk.org

Signed-off-by: Michael Baum <michaelba@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-11 22:27:39 +02:00
Raslan Darawsheh
8a2e026add net/mlx5: fix matching for UDP tunnels with Verbs
When creating flow rule with zero specs it will cause
matching all UDP packets like following:
 eth / ipv4 / udp / vxlan / end
Such rule will match all udp packets.

This change the behavior to match the dv flow engine
which will automatically set the match on relative
outer UDP port if the user didn't specify any.

Fixes: 84c406e745 ("net/mlx5: add flow translate function")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-05-11 22:27:39 +02:00