The associated device index is retrieved via Netlink request to
underlying Infiniband device driver. This network device index
is permanent throughout the lifetime of device. We do not
spawn the rte_eth_dev ports without associated network device, and
if network device is being unbound we get the remove notification
message and rte_eth_dev port is also detached. So, we may store
the ifindex in mlx5_device_spawn() routine at rte_eth_dev port
creation and initialization time and use the cached value further
instead of doing actual Netlink request.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch fills the tx_desc_lim.nb_seg_max and
tx_desc_lim.nb_mtu_seg_max fields of rte_eth_dev_info
structure to report thee maximal number of packet
segments, requested inline data configuration is
taken into account in conservative way.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch adds the implementation of tx_burst routine template.
The template supports all Tx offloads and multiple optimized
tx_burst routines can be generated by compiler from this one.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Mellanox NICs support the wide set of Tx offloads. The supported
offloads are reported by the mlx5 PMD in rte_eth_dev_info
tx_offload_capa field.
An application may choose any combination of supported offloads
and configure the device appropriately. Some of Tx offloads may be
not requested by application, or ever all of them may be omitted.
Most of the Tx offloads require some code branches in tx_burst routine
to support ones. If Tx offload is not requested the tx_burst routine
code may be significantly simplified and consume less CPU cycles.
For example, if application does not engage TSO offload this code
can be omitted, if multi-segment packet is not supposed the tx_burst
may assume single mbuf packets only, etc.
Currently, the mlx5 PMD implements multiple tx_burst subroutines
for most common combinations of requested Tx offloads, each branch
has its own dedicated implementation. It is not very easy to update,
support and develop such kind of code - multiple branches impose
the multiple points to process. Also many of frequently requested
offload combinations are not supported yet. That leads to selecting of
not completely matching tx_burst routine and harms the performance.
This patch introduces the new approach for tx_burst code. It is proposed
to develop the unified template for tx_burst routine, which supports
all the Tx offloads and takes the compile time defined parameter
describing the supposed set of supported offloads. On the base
of this template, the compiler is able to generate multiple tx_burst
routines highly optimized for the statically specified set of
Tx offloads.
Next, in runtime, at Tx queue configuration the best matching optimized
implementation of tx_burst is chosen.
This patch intentionally omits the template internal implementation,
but just introduces the template itself to emboss the approach of
the multiple specially tuned tx_burst routines.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch updates the Tx datapath control and configuration
structures and code for managing Tx datapath settings.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch extends the NIC attributes query via DevX.
The appropriate interface structures are borrowed from
kernel driver headers and DevX calls are added to
mlx5_devx_cmd_query_hca_attr() routine.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch updates Tx datapath definitions, mostly hardware related.
The Tx descriptor structures are redefined with required fields,
size definitions are renamed to reflect the meanings in more
appropriate way. This is a preparation step before introducing
the new Tx datapath implementation.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch introduces new mlx5 PMD devarg options:
- txq_inline_min - specifies minimal amount of data to be inlined into
WQE during Tx operations. NICs may require this minimal data amount
to operate correctly. The exact value may depend on NIC operation
mode, requested offloads, etc.
- txq_inline_max - specifies the maximal packet length to be completely
inlined into WQE Ethernet Segment for ordinary SEND method. If packet
is larger the specified value, the packet data won't be copied by the
driver at all, data buffer is addressed with a pointer. If packet
length is less or equal all packet data will be copied into WQE.
- txq_inline_mpw - specifies the maximal packet length to be completely
inlined into WQE for Enhanced MPW method.
Driver documentation is also updated.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:
- deprecated devargs support
- tx_burst() routines
- related PRM definitions
- SQ configuration code
- Tx routine selection code
- incompatible Tx completion code
The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
to "txq_inline_mpw" for compatibility issue
The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
In functions flow_dv_translate() and flow_dv_validate(), the flow
items are scanned and each item is marked in item_flags bitmap.
The code handling some of the items was ported from another project,
where items are marked in a slightly different manner.
This patch fixes the setting of items in bitmap, adapting it to the
required manner.
Fixes: d53aa89aea ("net/mlx5: support matching on ICMP/ICMP6")
Fixes: 5865955ad994 ("net/mlx5: match GRE key and present bits")
Fixes: 2e4c987aad ("net/mlx5: validate Direct Rule E-Switch")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Xiaoyu Min <jackmin@mellanox.com>
In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.
Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.
Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.
A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.
Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!
More synchronization is added with the host thread:
Container resize uses double buffer algorithm.
Pools growing in container uses atomic operation.
Pool query buffer replace uses a spinlock.
Pool minimum devX counter ID uses atomic operation.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When the counter countainer has no more space to store more counter
pools try to resize the container to allow more pools to be created.
So, the only limitation for the maximum counter number is the memory.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).
Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.
Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.
The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.
The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.
The query operation is in pool resolution which updates all the pool
counter raw data by one operation.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Enabled IP-in-IP tunnel type support on DV/DR flow engine.
This includes the following combination:
- IPv4 over IPv4
- IPv4 over IPv6
- IPv6 over IPv4
- IPv6 over IPv6
MLX5 NIC supports IP-in-IP tunnel via FLEX Parser so
need to make sure fw using FLEX Paser profile 0.
mlxconfig -d <mst device> -y set FLEX_PARSER_PROFILE_ENABLE=0
The example testpmd commands would be:
- Match on IPv4 over IPv4 packets and do inner RSS:
testpmd> flow create 0 ingress pattern eth / ipv4 proto is 0x04 /
ipv4 / udp / end actions rss level 2 queues 0 1 2 3 end / end
- Match on IPv6 over IPv4 packets and do inner RSS:
testpmd> flow create 0 ingress pattern eth / ipv4 proto is 0x29 /
ipv6 / udp / end actions rss level 2 queues 0 1 2 3 end / end
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Support matching on the present bits (C,K,S)
as well as the optional key field.
If the rte_flow_item_gre_key is specified in pattern,
it will set K present match automatically.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
DR engine support matching on GRE protocol field without MPLS supports.
So bypassing the MPLS check when DR is enabled.
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
On DV/DR flow engine, MLX5 can match on ICMP/ICMP6's code and type field
via FLEX Parser, which can be enabled by config FW using FLEX Parser
profile 2:
mlxconfig -d <mst device> -y set FLEX_PARSER_PROFILE_ENABLE=2
The testpmd commands could be:
testpmd> flow create 0 ingress pattern eth / ipv4 /
icmp type is 8 code is 0 / end
actions rss queues 0 1 end / end
testpmd> flow create 0 ingress pattern eth / ipv6 /
icmp6 type is 128 code is 0 / end
actions rss queues 0 1 end / end
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Mellanox NICs do not support UDP checksum hardware tx offload over IPv6.
This limitation becomes critical for UDP based tunnels like VXLAN.
Beside the UDP checksum validity is required by IPv6 there is an option
in Linux to allow accepting UDP zero sum (see udp6zerocsumrx in iproute2
package).
This patch zeroes out the UDP checksum field for encapsulation headers
in raw encap action.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Currently mlx4/mlx5 support only Linux.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Matan Azrad <matan@mellanox.com>
rte_calloc functions returns a non-null pointer in case of
success and null pointer in case of failure.
The return value should be checked and the function flow
should take that into consideration.
This patch adds a check for rte_calloc return value in function
flow_list_create.
Fixes: 84c406e745 ("net/mlx5: add flow translate function")
Cc: stable@dpdk.org
Signed-off-by: Asaf Penso <asafp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
mlx5_link_update uses the newer ethtool command
ETHTOOL_GLINKSETTINGS to determine interface capabilities but falls
back to the older (deprecated) ETHTOOL_GSET command if the new
method fails for any reason.
The older method only supports reporting of capabilities up to 40G.
However, mlx5_link_update_unlocked_gs can return a failure for a
number of reasons (including the link being down).
Using the older method in cases of transient failure of the method
can result in reporting of reduced capabilities to the application.
The older method (mlx5_link_update_unlocked_gset) should only be
invoked if the newer method returns EOPNOTSUPP.
Fixes: 7d2e32f76c ("net/mlx5: fix ethtool link setting call order")
Cc: stable@dpdk.org
Reported-by: Srinivas Narayan <srinivas.narayan@att.com>
Signed-off-by: Asaf Penso <asafp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit removes the support of configuring the device E-switch
using TCF since it is now possible to configure it via DR (direct
verbs rules), and by that to also remove the PMD dependency in libmnl.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
mlx5 implements mlx5_flow_null_drv_ops to be used when a specific
flow typei/driver is not available or invalid.
This routines return error without modifying the rte_flow_error
parameter passed to them which causes testpmd, for example, to crash.
This commit addresses the issue by modifying the rte_flow_error
parameter in theses routines.
Fixes: 0c76d1c9a1 ("net/mlx5: add abstraction for multiple flow drivers")
Fixes: 684dafe795 ("net/mlx5: add flow query abstraction interface")
Cc: stable@dpdk.org
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This patch implements additional actions of packet header
modifications.
Add actions:
- INC_TCP_SEQ - Increase sequence number in the outermost TCP header.
- DEC_TCP_SEQ - Decrease sequence number in the outermost TCP header.
- INC_TCP_ACK - Increase acknowledgment number in the outermost TCP
header.
- DEC_TCP_ACK - Decrease acknowledgment number in the outermost TCP
header.
Original work by Xiaoyu Min.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Now that everything that has ever accessed the shared memory
config is doing so through the public API's, we can make it
internal. Since we're removing quite a few headers from
rte_eal_memconfig.h, we need to add them back in places
where this header is used.
This bumps the ABI, so also change all build files and make
update documentation.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Currently, the memory hotplug is locked automatically by all
memory-related _walk() functions, but sometimes locking the
memory subsystem outside of them is needed. There is no
public API to do that, so it creates a dependency on shared
memory config to be public. Fix this by introducing a new
API to lock/unlock the memory hotplug subsystem.
Create a new common file for all things mem config, and a
new API namespace rte_mcfg_*, and search-and-replace all
usages of the locks with the new API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
For each driver where we optionally disable it, add in the reason why it's
being disabled, so the user knows how to fix it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
This is to fix the error:
```
drivers/net/mlx5/mlx5_defs.h:14:26:
error: format '%lx' expects argument of type 'long unsigned int',
but argument 5 has type 'off_t {aka long long int}' [-Werror=format=]
drivers/net/mlx5/mlx5_txq.c:569:48: note: format string is defined here
DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx"
~~^
%llx
```
Which reproduces with gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0.
Fixes: 6bf10ab69b ("net/mlx5: support 32-bit systems")
Cc: stable@dpdk.org
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
There is the patch [1] that uses master device Netlink socket
to retrieve master device link settings. This is not thread safe
because this resource may be in use by other call to the master
device itself. Using the same Netlink socket concurrently from
the multiple threads causes Netlink requests malfunction and
must be eliminated. The patch replaces master Netlink socket
with the socket from representor device.
[1] http://patches.dpdk.org/patch/53120/
Fixes: 0333b2f584 ("net/mlx5: inherit master link settings for representors")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The SQ errors recovery mechanism in the PMD invokes a Verbs
functions to modify the RQ states in order to reset the SQ and to
reactivate it.
These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured
by secondary processes queues.
Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.
Add support for secondary process Tx errors recovery.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The RQ errors recovery mechanism in the PMD invokes a Verbs functions to
modify the RQ states in order to reset the RQ and to reactivate it.
These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured by
secondary processes queues.
Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.
Add support for secondary process Rx errors recovery.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When WQEs are posted to the HW to send packets, the PMD may get a
completion report with error from the HW, aka error CQE which is
associated to a bad WQE.
The error reason may be bad address, wrong lkey, bad sizes, etc.
that can wrongly be configured by the PMD or by the user.
Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts and huge complexity.
The error CQEs change the SQ state to error state what causes all the
next posted WQEs to be completed with CQE flush error forever.
Currently, the PMD doesn't handle Tx error CQEs and even may crashed
when one of them appears.
Extend the Tx data-path to detect these error CQEs, to report them by
the statistics error counters, to recover the SQ by moving the state
to ready again and adjusting the management variables appropriately.
Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily, hence
a dump file with debug information will be created for the first number
of error CQEs, this number can be configured by the PMD probe
parameters.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When WQEs are posted to the HW to receive packets, the PMD may receive
a completion report with error from the HW, aka error CQE which is
associated to a bad WQE.
The error reason may be bad address, wrong lkey, small buffer size,
etc. that can wrongly be configured by the PMD or by the user.
Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts, moreover, some error CQEs can be
triggered because of the packets coming from the wire when the DPDK
application has no any control.
Most of the error CQE types change the RQ state to error state what
causes all the next received packets to be dropped by the HW and to be
completed with CQE flush error forever.
The current solution detects these error CQEs and even reports the
errors to the user by the statistics error counters but without
recovery, so if the RQ inserted to the error state it never moves to
ready state again and all the next packets ever will be dropped.
Extend the error CQEs handling for recovery by moving the state to
ready again, and rearranging all the RQ WQEs and the management
variables appropriately.
Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily,
hence a dump file with debug information will be created for the first
number of error CQEs, this number can be configured by the PMD probe
parameters.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Move the RQ WQEs initialization code to separate function as an
arrangement to CQE error recovering for code reuse.
CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The RQ WQEs must be written in the memory before the HW gets the RQ
doorbell, hence a memory barrier should be triggered after the WQEs
writing and before the doorbell writing.
The current code used rte_wmb barrier which ensures that all the memory
stores were done while it is enough to use rte_cio_wmb barrier for the
local memory stores because the WQEs are in local memory.
CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.
This behavior doesn't make sense because the user intention is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.
Stop probing and report an error in case of problematic command line
arguments.
Fixes: e72dd09b61 ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Add a global function in the PMD which dumps debug information to
specific file.
The data can be printed in hexadecimal format or as regular string.
The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.
The files will be created in the /var/log directory or in the current
directory.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There is a full correlation between the CQE indexes to the WQE indexes
in the vectorized Rx queues management.
When the RQ is inserted to the reset state, the correlation may break
because the HW starts the RQ polling from index 0 while the CQ polling
continues regularly.
As an arrangement to CQE errors handling, when the RQ can be reset,
the correlation dependence should be removed from all the Rx queues
index managements.
Remove the aforementioned dependence from the vectorized Rx burst
functions.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The device private pointer (dev_private) is of type void *
therefore no cast is necessary in C.
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When device is being closed and tries to unregister interrupt callback,
there is a chance the handler is still active (called in context of
eal_intr_thread_main thread). If so the rte_intr_callback_unregister
returns -EAGAIN and keeps the handler registered, causing crash when
underlaying resourse is gone away.
This race condition may happen if event handling in application takes
a long time. We should check the return code of unregistering routine
and try again to unregister the handler. The diagnostic messages are
shown once a second, while trying to unregister.
Fixes: 028b2a28c3 ("net/mlx5: update event handler for multiport IB devices")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Previous patch added handling of metadata for multi-segment packet.
Function txq_scatter_v in file mlx5_rxtx_vec_neon.h was updated
incorrectly, items were inserted into WQE in wrong order.
This patch fixes the issue, inserting items into WQE correctly.
Fixes: 7f4019d370 ("net/mlx5: fix Tx metadata for multi-segment packet")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Implements support for read_clock for the mlx5 driver. mlx5 supports
hardware timestamp offload, setting packets timestamp field to the
device clock. rte_eth_read_clock allows to read the device's current
clock value and therefore compare values on similar time base.
See rxtx_callbacks for an example.
Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Transmit errors must not be reported in q_errors[] which is for
reception.
Fixes: 87011737b7 ("mlx5: add software counters")
Fixes: 9f9a48eb29 ("net/mlx5: fix Tx stats error counter definition")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.
For secondary processes, IPC will be crucial, so leave all
of the error handling as is.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Since we change these macros, we might as well avoid triggering complaints
from checkpatch because of mixed case.
old=RTE_IPv4
new=RTE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv4
new=RTE_ETHER_TYPE_IPV4
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
old=RTE_ETHER_TYPE_IPv6
new=RTE_ETHER_TYPE_IPV6
git grep -lw $old | xargs sed -i -e "s/\<$old\>/$new/g"
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
In function mlx5_rxq_ibv_new(), pointer *tmpl allocation is attempted
at the start, but not validated or freed in case of error.
In function mlx5_txq_ibv_new(), pointer *txq_ibv allocation is
attempted at the start, but not freed in case of error.
This patch adds pointers initialization, validation and freeing.
Fixes: 09cb5b5817 ("net/mlx5: separate DPDK from verbs Rx queue objects")
Fixes: faf2667fe8 ("net/mlx5: separate DPDK from verbs Tx queue objects")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Patch [1] added, among other definitions, the macro MLX5_ST_SZ_DB.
Patch [2] added later the macro MLX5_ST_SZ_BYTES, which is exactly
the same macro with a different name.
Each of these macros was used in very few places.
This patch removes the definition of MLX5_ST_SZ_DB, and replaces it
with MLX5_ST_SZ_BYTES wherever it was used.
Macro MLX5_ST_SZ_BYTES was preffered since it is the same macro
name used in kernel code, see [3].
[1] http://patches.dpdk.org/patch/45254/
[2] http://patches.dpdk.org/patch/49403/
[3] https://lists.openwall.net/netdev/2014/10/02/75
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Add support to match all TCP control bits (flags)
except "NS (ECN-nonce)" via Direct Verbs (DV) or Direct Rule (DR)
engine.
Signed-off-by: Jack Min <jackmin@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Multiple functions were declared in header file mlx5_rxtx.h,
implemented in mlx5_rxq.c, and called only in mlx5_rxq.c.
This patch moves all these functions declarations into mlx5_rxq.c,
as static functions.
Some functions implementation was copied higher in the file to
precede the functions calls.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Return value of function mlx5_rxq_releasable() was not described
correctly in function description.
This patch updates the description to correctly describe the optional
return values.
Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Function mlx5_rxq_ibv_release() is called in several places.
Before each call except one, the input parameter is validated to make
sure it is not null.
This patch adds the validation where it is missing.
It also changes a priv_ prefix, left in a comment, to mlx5_ prefix.
Fixes: af4f09f282 ("net/mlx5: prefix all functions with mlx5")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Functions implemented but never called:
mlx5_rxq_ibv_releasable()
mlx5_rxq_cleanup()
mlx5_txq_ibv_releasable()
Function declared but not implemented:
rxq_alloc_mprq_buf()
This patch removes these functions from code and header file.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
IBV_EVENT_DEVICE_FATAL event is generated by the driver once for
the entire multiport Infiniband device, not for each existing ports.
The port index is zero and it causes dropping the device removal
event. We should invoke the removal event processing routine
for each port we have installed handler for.
Fixes: 028b2a28c3 ("net/mlx5: update event handler for multiport IB devices")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When Direct Rules API is not supported we don't set the errno.
This results in failing the function but with errno equals to zero.
The result of this is that a function that failed, is considered as
a function that worked correctly.
This commit fixes this issue by setting the errno to ENOTSUP and
returning this error when error value should be returned.
Since RDMA-CORE are returning positive errno we are also returning
positive error values.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Function mlx5_rx_intr_disable() calls mlx5_rxq_ibv_get() and performs
some actions on the returned rxq_ibv.
It doesn't release the rxq_ibv when all is completed with success.
This patch adds call to mlx5_rxq_ibv_release() where it's missing.
Fixes: 09cb5b5817 ("net/mlx5: separate DPDK from verbs Rx queue objects")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Currently, the name of MPRQ mempool is set by
snprintf(name, sizeof(name), "%s-mprq", dev->device->name);
For port representor, the name is duplicate of its master and failed to
create such a mempool having the same name. Port ID is used in the name
instead.
Fixes: 7d6bf6b866 ("net/mlx5: add Multi-Packet Rx support")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Recent patch [1] added, at the end of mlx5_dev_configure(), a call to
mlx5_proc_priv_init(), initializing process_private data of eth_dev.
This call is not reached if PMD is started with zero Rx queues.
In this case mlx5_dev_configure() returns earlier due to the check:
if (rxqs_n == priv->rxqs_n)
return 0;
In such a scenario, later references to uninitialized process_private
data will result in segmentation fault.
For example see in function txq_uar_init().
This patch changes the check logic. The following code is executed
if (rxqs_n != priv->rxqs_n), and skipped otherwise.
Function mlx5_proc_priv_init() is always invoked, to ensure
process_private data is initialized.
[1] http://patches.dpdk.org/patch/52629/
Fixes: 120dc4a7dc ("net/mlx5: remove device register remap")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The RDMA-CORE Direct Rules API was changed in latest upstream code
This commit update the API accordingly.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
BlueField SmartNIC has 0xa2d2 as PCI device ID on both ARM and x86 host. On
ARM side, Tx inlining need not be used as PCI bandwidth is not bottleneck.
Vectorized Tx can still be used up to 16 queues. For other archs
(e.g., x86), keep using the default value.
Fixes: 09d8b41699 ("net/mlx5: make vectorized Tx threshold configurable")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
If Tx packet inlining is enabled, rdma-core library should allocate large
Tx WQ enough to support it. It is better for PMD to calculate the size of
WQ based on the parameters and return error with appropriate message if it
exceeds the device capability.
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When creating a flow rule without the port_id pattern item, always the
PF was selected.
This commit fixes this issue, if no port_id pattern item is available
then we use the port that the flow was created on as source port.
Fixes: 822fb31953 ("net/mlx5: add port id item to Direct Verbs")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
ibv_destroy_flow_action() refers to QP. QP must not be freed until
corresponding action is destroyed.
Fixes: 3eb0044310 ("net/mlx5: fix release of jump to queue action")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Mellanox mlx5 PMD implements the list of devices to process the memory
free events to reflect the actual memory state to Memory Regions.
Because this list contains the devices and devices may share the
same context the callback routine may be called multiple times
with the same parameter, that is not optimal. This patch modifies
the list to contain the device contexts instead of device objects
and shared context is included in the list only once.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The multiport Infiniband device support was introduced [1].
All active ports, belonging to the same Infiniband device use the single
shared Infiniband context of that device and share the resources:
- QPs are created within shared context
- Verbs flows are also created with specifying port index
- DV/DR resources
- Protection Domain
- Event Handlers
This patchset adds support for Memory Regions sharing between
ports, created on the base of multiport Infiniband device.
The datapath of mlx5 uses the layered cache subsystem for
allocating/releasing Memory Regions, only the lowest layer L3
is subject to share due to performance issues.
[1] http://patches.dpdk.org/cover/51800/
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
There are some physical link settings can be queried from
Ethernet devices: link status, link speed, speed capabilities,
duplex mode, etc. These setting do not make a lot of sense for
representors due to missing physical link. The new kernel drivers
dropped query for link settings for representors causing the
ioctl call to fail. This patch adds some kind of emulation
of link settings to PMD - representors inherit the link parameters
from the master device. The actual link status (up/down)
is retrieved from the representor device.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When creating the modify action using Direct Rules, we need to
add flags to mark, if the action will be done on root table or on
private table.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
If there is the support of DevX is exposed by rdma-core but
DevX is not supported by or disabled for the specific interface
the mlx5_devx_cmd_query_hca_attr() routine returns an error
preventing the device from successful probing. The routine
should be invoked only in case of enabled DevX.
Fixes: e2b4925ef7 ("net/mlx5: support Direct Rules E-Switch")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In mlx5_rxq.c, in some comments, text includes "Tx" instead of "Rx".
In mlx5_txq.c, in some comments, text includes "Rx" instead of "Tx".
This patch fixes these typos.
Fixes: faf2667fe8 ("net/mlx5: separate DPDK from verbs Tx queue objects")
Fixes: a1366b1a2b ("net/mlx5: add reference counter on DPDK Rx queues")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
All the library calls must be called via the glue layer.
Fixes: b2177648b8 ("net/mlx5: add Direct Rules flow data alloc/free routines")
Fixes: 79e35d0d59 ("net/mlx5: share Direct Rules/Verbs flow related structures")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Looking for an ethdev port is better (and more efficient)
with an ethdev API than an EAL one.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The flow group should be initialized.
For example selecting if the encapsulation is for root or private tables
is based on the flow->group value.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This commit adds support for drop action when creating E-Switch flow
using DV.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Actions like encap/decap, modify header require setting the flow table
type. Until now we supported only Nic RX and Nic TX, this commits adds
the support for FDB table type for those actions.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
In current implementation the DV steering supported only NIC steering.
This commit adds the transfer attribute in order to create a matcher
on the FDB tables.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This commit checks the for DR E-Switch support.
The support is based on both Device and Kernel.
This commit also enables the user to manually disable this this feature.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The meson build was missing the define for Direct Rules.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Modify the translate vport function to match other translate items
naming conventions.
Fixes: 0fe3f18f78 ("net/mlx5: add source vport match to the ingress rules")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
According to RTE flow the action order should be the order that the
actions were given.
In the case of modify actions the position of the action was always
last.
This commit solves this issue by saving the position of the first modify
action, and then adds to this position the pointer to the modify action.
Fixes: 4bb14c83df ("net/mlx5: support modify header using Direct Verbs")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Currently the allocation of the jump to QP is done in flow apply,
this results in memory leak.
This patch fixes this issue by moving the allocation and release of the
jump to QP action to the responsibility of the hrxq.
Fixes: cbb66daa3c ("net/mlx5: prepare Direct Verbs for Direct Rule")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
On BlueField platform we have the new entity - PF representor.
This one represents the PCI PF attached to external host on the
side of ARM. The traffic sent by the external host to the NIC
via PF will be seem by ARM on this PF representor.
This patch refactors port recognizing capability on the base of
physical port name. We have two groups of name formats. Legacy
name formats are supported by kernels before ver 5.0 (being
more precise - before the patch [1]) or before Mellanox OFED 4.6,
and new naming formats added by the patch [1].
Legacy naming formats are supported:
- missing physical port name (no sysfs/netlink key) at all,
master is assumed
- decimal digits (for example "12"), representor is assumed,
the value is the index of attached VF
New naming formats are supported:
- "p" followed by decimal digits, for example "p2", master
is assumed
- "pf" followed by PF index concatenated with "vf" followed by
VF index, for example "pf0vf1", representor is assumed.
If index of VF is "-1" it is a special case of host PF
representor, this representor must be indexed in devargs
as 65535, for example representor=[0-3,65535] will
allow representors for VF0, VF1, VF2, VF3 and for host PF.
Note: do not specify representor=[0-65535], it causes devargs
processing error, because number of ports (rte_eth_dev) is
limited.
Applications should distinguish representors and master devices
exclusively by device flag RTE_ETH_DEV_REPRESENTOR and do not
rely on switch port_id (mlx5 PMD deduces ones from representor_id)
values returned by dev_infos_get() API.
[1] https://www.spinics.net/lists/netdev/msg547007.html
Linux-tree: c12ecc23 (Or Gerlitz 2018-04-25 17:32 +0300)
"net/mlx5e: Move to use common phys port names for vport representors"
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
mlx5 driver has a global list of Memory Regions created by
device, and there is a ml5_mr_release() routine which makes
a memory cleanup at device closing. The head of device MR list
was fetched outside the rwlock protected section. Also some
noticed typos are fixed.
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Some port iterations are manually checking against RTE_ETH_DEV_UNUSED
instead of using the iterators based on rte_eth_find_next().
A new macro RTE_ETH_FOREACH_VALID_DEV() is introduced, but kept private
because there should be no need of iterating over all devices in the
API. The public iterators have additional filters for ownership, parent
device or sibling ports.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
As stated in the deprecation notice from December 2016,
"the legacy filter API, including rte_eth_dev_filter_supported(),
rte_eth_dev_filter_ctrl() as well as filter types MACVLAN, ETHERTYPE,
FLEXIBLE, SYN, NTUPLE, TUNNEL, FDIR, HASH and L2_TUNNEL, is superseded
by the generic flow API (rte_flow)".
After a long wait of more than two years, the legacy filter API
is marked as deprecated, while still tested with testpmd and
the tep_termination example.
The next step will be to announce a deadline for complete removal.
As preparation of the removal of rte_eth_ctrl.h,
RTE_ETH_FLOW_*, RTE_TUNNEL_TYPE_* and RTE_ETH_HASH_FUNCTION_* definitions
are moved to rte_ethdev.h and rte_flow.h.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
The RSS validation function was missing the verifcation that
if RSS is requested on inner packet, the flow must have tunnel data.
Fixes: 23c1d42c71 ("net/mlx5: split flow validation to dedicated function")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
If MLNX_OFED is installed, there's no .pc file installed for libraries and
dependency() can't find libraries by pkg-config. By adding fallback of
using cc.find_library(), libraries are properly located.
Fixes: e30b4e566f ("build: improve dependency handling")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Luca Boccassi <bluca@debian.org>
UAR (User Access Region) register does not need to be remapped for
primary process but it should be remapped only for secondary process.
UAR register table is in the process private structure in
rte_eth_devices[],
(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private
The actual UAR table follows the data structure and the table is used
for both Tx and Rx.
For Tx, BlueFlame in UAR is used to ring the doorbell.
MLX5_TX_BFREG(txq) is defined to get a register for the txq. Processes
access its own private data to acquire the register from the UAR table.
For Rx, the doorbell in UAR is required in arming CQ event. However, it
is a known issue that the register isn't remapped for secondary process.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Queue index is redundantly stored for both Rx and Tx structures.
E.g. txq_ctrl->idx and txq->stats.idx. Both are consolidated to single
storage - rxq->idx and txq->idx.
Also, rxq and txq are moved to the beginning of its control structure
(rxq_ctrl and txq_ctrl) for cacheline alignment.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In case of cross compilation on aarch64 we must add include for
stdlib in order to use the free function.
Fixes: cbb66daa3c ("net/mlx5: prepare Direct Verbs for Direct Rule")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
At the mlx5 device closing the shared IB context was destroyed
before cleanup routines completion. As it was found on some
setups (Netlink fails with old kernel drivers and we have to use
sysfs to retrieve interface index, this requires IB device name,
which is stored in shared context) the mlx5_nl_mac_addr_flush()
requires IB device name, and if shared context is removed it
causes the segmentation fault.
Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Retrieving network interface index via Netlink fails in
case of old ib_core kernel driver installed - mlx5_nl_ifindex()
routine fails due to RDMA_NLDEV_ATTR_NDEV_INDEX attribute is not
supported by the old driver.
The patch allowing to retrieve the network interface index and
name via Netlink [1]. So, the problem depends on ib_core module
version - 4.16 supports getting ifindex via Netlink, 4.15 does not.
This error was ignored in previous versions of MLX5 PMD probing
routine. For single device ifindex was retrieved via sysfs
and link control was not lost, so problem just was not noticed.
In order to support MLX5 PMD functioning over old kernel driver
this patch adds ifindex retrieving via sysfs into probing routine.
It is worth to note this method works for master/standalone
device only.
[1] https://www.spinics.net/lists/linux-rdma/msg62948.html
Linux tree: 5b2cc79d (Leon Romanovsky 2018-03-27 20:40:49 +0300 270)
Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
We are going to share the Direct Rules and Direct Verbs flow
device data structures between master and representors in the
E-Switch configurations over multiport IB device.
The code of initializing and destroying these data is
moved to dedicated routines, this is just a preparation
step for actual data sharing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When using Direct Rules we can add actions to jump between tables.
This is extra useful since rule insertion rate is much higher on other
tables compared to table zero.
If no group is selected the rule is added to group 0.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Adds calls to the Direct Rules API inside the glue functions.
Due to difference in parameters between the Direct Rules and Direct
Verbs some of the glue functions API was updated.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This is the first patch of a series that is designed to enable the
Direct Rules API.
The main difference between Direct Verbs and Direct Rules from API
perspective is that in Direct Rules each action has it's own create
function and the object itself is of type void.
In this patch I'm adding functions to generate actions that currently
are done without create action, and I'm changing the action type to be
void *, so in next patches only the glue functions will need to change.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The API that was defined in OFED 4.5 was replaced both in OFED 4.6 and
in upstream.
This commit updates the API to match the upstream one.
Fixes: f5bf91de73 ("net/mlx5: support flow counters using devx")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Iterating over siblings was done with RTE_ETH_FOREACH_DEV()
which skips the owned ports.
The new iterators RTE_ETH_FOREACH_DEV_SIBLING()
and RTE_ETH_FOREACH_DEV_OF() are more appropriate and more correct.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC
channel (rte_mp_msg) which is established on initialization. Once a MR
is created by primary process, it is immediately visible to secondary
process because the MR list is global per a device. Thus, secondary
process can look up the list after the request is successfully returned.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.
If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller
and get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in
the extended chunk is freed, that doesn't become reusable until the
entire memory is freed and the MR is destroyed.
To make freed pages available immediately, this parameter has to be
turned off but it could drop performance.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.
Fixes: 7e43a32ee0 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There's more need to have PMD global data structure. This should be
initialized once per a process regardless of how many PMD instances are
probed. mlx5_init_once() is called during probing and make sure all the
init functions are called once per a process. Currently, such global
data and its initialization functions are even scattered. Rather than
'extern'-ing such variables and calling such functions one by one making
sure it is called only once by checking the validity of such variables, it
will be better to have a global storage to hold such data and a
consolidated function having all the initializations. The existing shared
memory gets more extensively used for this purpose. As there could be
multiple secondary processes, a static storage (local to process) is also
added.
As the reserved virtual address for UAR remap is a PMD global resource,
this doesn't need to be stored in the device priv structure, but in the
PMD global data.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Socket API is used for IPC in order for secondary process to acquire
Verb command file descriptor. The FD is used to remap UAR address.
The multi-process APIs (rte_mp) in EAL are newly introduced.
mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs.
As it is PMD global infrastructure, only one IPC channel is established.
All the IPC message types may have port_id in the message if there is
need to reference a specific device.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
As the memory event is propagated to secondary processes, the event is
processed redundantly. This should be processed once because the data
structure used for MR and the event is global across the processes.
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
vulnerable to load stall. For x86, reducing the number of instructions
seems to matter most.
For x86, this is simply a load but for other architectures, it is
calculated from the address of mbuf structure by rte_mbuf_buf_addr()
without having to load the first cacheline of the mbuf.
Fixes: 12d468a62b ("net/mlx5: fix instruction hotspot on replenishing Rx buffer")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
We are implementing the support for multiport Infiniband device
with representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to provide IB port index for the Verbs
objects being created - QPs and Verbs Flows.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created within this shared context.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multiple ports.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The PMD code is updated to use Protected Domain from the
shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.
Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the single shared Infiniband context of that device.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.
The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.
Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This patch fixes the build failure with message:
drivers/net/mlx5/mlx5_ethdev.c: In function ‘mlx5_sysfs_switch_info’:
drivers/net/mlx5/mlx5_ethdev.c:1381:3:
error: ignoring return value of ‘fscanf’, declared with attribute
warn_unused_result [-Werror=unused-result]
fscanf(file, "%s", port_name);
^
Which reproduces on Ubuntu 16.04 LTS with
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609.
Fixes: b2f3a38101 ("net/mlx5: support new representor naming format")
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Dekel Peled <dekelp@mellanox.com>
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee0
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Kernel update [1] introduce new format of representors names.
This patch implements RFC [2], updating MLX5 PMD to support the new
format, while maintaining support of the existing format.
[1] https://github.com/torvalds/linux/commit/c12ecc2
[2] http://mails.dpdk.org/archives/dev/2019-March/125676.html
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Inlining a packet to WQE that cross the WQ wraparound, i.e. the WQE
starts on the end of the ring and ends on the beginning, is not
supported and blocked by the data path logic.
However, in case of TSO, an extra inline header is required before
inlining. This inline header is not taken into account when checking if
there is enough room left for the required inline size.
On some corner cases were
(ring_tailroom - inline header) < inline size < ring_tailroom ,
this can lead to WQE being written outsize of the ring buffer.
Fixing it by always assuming the worse case that inline of packet will
require the inline header.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Function mlx5_tx_complete() reads completion entry information
from Tx queue.
For some processors not having strongly-ordered memory model,
there has to be a memory barrier between reading the entry index
and the entry fields, in order to guarantee data is valid.
Fixes: 54d3fe948d ("net/mlx5: poll completion queue once per a call")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
struct mlx5_cqe is defined in MLX5 PMD code (mlx5_prm.h).
It includes 64 bytes padding in case of (RTE_CACHE_LINE_SIZE == 128).
struct mlx5_err_cqe is defined in kernel, and doesn't include padding.
When running in debug mode, in case an error CQE is detected
it is printed using rte_hexdump().
The size of data to print should be sizeof(*cqe) instead of
sizeof(*err_cqe), to handle the case of (RTE_CACHE_LINE_SIZE == 128),
and print the full data in any case.
Fixes: c771499209 ("net/mlx5: extend debug logs verbosity")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The call to strlcpy uses either libc, libbsd or internal rte_strlcpy.
No need to call the DPDK flavor explicitly.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.
Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).
Fixes: 2815702bae ("net/mlx5: replace verbs priorities by flow")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Whenever possible (if the library ships a pkg-config file) use meson's
dependency() function to look for it, as it will automatically add it
to the Requires.private list if needed, to allow for static builds to
succeed for reverse dependencies of DPDK. Otherwise the recursive
dependencies are not parsed, and users doing static builds have to
resolve them manually by themselves.
When using this API avoid additional checks that are superfluous and
take extra time, and avoid adding the linker flag manually which causes
it to be duplicated.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.
It is tested with command "drvinfo" of examples/ethtool/.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>