If MLNX_OFED is installed, there's no .pc file installed for libraries and
dependency() can't find libraries by pkg-config. By adding fallback of
using cc.find_library(), libraries are properly located.
Fixes: e30b4e566f ("build: improve dependency handling")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Luca Boccassi <bluca@debian.org>
UAR (User Access Region) register does not need to be remapped for
primary process but it should be remapped only for secondary process.
UAR register table is in the process private structure in
rte_eth_devices[],
(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private
The actual UAR table follows the data structure and the table is used
for both Tx and Rx.
For Tx, BlueFlame in UAR is used to ring the doorbell.
MLX5_TX_BFREG(txq) is defined to get a register for the txq. Processes
access its own private data to acquire the register from the UAR table.
For Rx, the doorbell in UAR is required in arming CQ event. However, it
is a known issue that the register isn't remapped for secondary process.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Queue index is redundantly stored for both Rx and Tx structures.
E.g. txq_ctrl->idx and txq->stats.idx. Both are consolidated to single
storage - rxq->idx and txq->idx.
Also, rxq and txq are moved to the beginning of its control structure
(rxq_ctrl and txq_ctrl) for cacheline alignment.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In case of cross compilation on aarch64 we must add include for
stdlib in order to use the free function.
Fixes: cbb66daa3c ("net/mlx5: prepare Direct Verbs for Direct Rule")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
At the mlx5 device closing the shared IB context was destroyed
before cleanup routines completion. As it was found on some
setups (Netlink fails with old kernel drivers and we have to use
sysfs to retrieve interface index, this requires IB device name,
which is stored in shared context) the mlx5_nl_mac_addr_flush()
requires IB device name, and if shared context is removed it
causes the segmentation fault.
Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Retrieving network interface index via Netlink fails in
case of old ib_core kernel driver installed - mlx5_nl_ifindex()
routine fails due to RDMA_NLDEV_ATTR_NDEV_INDEX attribute is not
supported by the old driver.
The patch allowing to retrieve the network interface index and
name via Netlink [1]. So, the problem depends on ib_core module
version - 4.16 supports getting ifindex via Netlink, 4.15 does not.
This error was ignored in previous versions of MLX5 PMD probing
routine. For single device ifindex was retrieved via sysfs
and link control was not lost, so problem just was not noticed.
In order to support MLX5 PMD functioning over old kernel driver
this patch adds ifindex retrieving via sysfs into probing routine.
It is worth to note this method works for master/standalone
device only.
[1] https://www.spinics.net/lists/linux-rdma/msg62948.html
Linux tree: 5b2cc79d (Leon Romanovsky 2018-03-27 20:40:49 +0300 270)
Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
We are going to share the Direct Rules and Direct Verbs flow
device data structures between master and representors in the
E-Switch configurations over multiport IB device.
The code of initializing and destroying these data is
moved to dedicated routines, this is just a preparation
step for actual data sharing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When using Direct Rules we can add actions to jump between tables.
This is extra useful since rule insertion rate is much higher on other
tables compared to table zero.
If no group is selected the rule is added to group 0.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Adds calls to the Direct Rules API inside the glue functions.
Due to difference in parameters between the Direct Rules and Direct
Verbs some of the glue functions API was updated.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This is the first patch of a series that is designed to enable the
Direct Rules API.
The main difference between Direct Verbs and Direct Rules from API
perspective is that in Direct Rules each action has it's own create
function and the object itself is of type void.
In this patch I'm adding functions to generate actions that currently
are done without create action, and I'm changing the action type to be
void *, so in next patches only the glue functions will need to change.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The API that was defined in OFED 4.5 was replaced both in OFED 4.6 and
in upstream.
This commit updates the API to match the upstream one.
Fixes: f5bf91de73 ("net/mlx5: support flow counters using devx")
Cc: stable@dpdk.org
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Iterating over siblings was done with RTE_ETH_FOREACH_DEV()
which skips the owned ports.
The new iterators RTE_ETH_FOREACH_DEV_SIBLING()
and RTE_ETH_FOREACH_DEV_OF() are more appropriate and more correct.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC
channel (rte_mp_msg) which is established on initialization. Once a MR
is created by primary process, it is immediately visible to secondary
process because the MR list is global per a device. Thus, secondary
process can look up the list after the request is successfully returned.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.
If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller
and get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in
the extended chunk is freed, that doesn't become reusable until the
entire memory is freed and the MR is destroyed.
To make freed pages available immediately, this parameter has to be
turned off but it could drop performance.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.
Fixes: 7e43a32ee0 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There's more need to have PMD global data structure. This should be
initialized once per a process regardless of how many PMD instances are
probed. mlx5_init_once() is called during probing and make sure all the
init functions are called once per a process. Currently, such global
data and its initialization functions are even scattered. Rather than
'extern'-ing such variables and calling such functions one by one making
sure it is called only once by checking the validity of such variables, it
will be better to have a global storage to hold such data and a
consolidated function having all the initializations. The existing shared
memory gets more extensively used for this purpose. As there could be
multiple secondary processes, a static storage (local to process) is also
added.
As the reserved virtual address for UAR remap is a PMD global resource,
this doesn't need to be stored in the device priv structure, but in the
PMD global data.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Socket API is used for IPC in order for secondary process to acquire
Verb command file descriptor. The FD is used to remap UAR address.
The multi-process APIs (rte_mp) in EAL are newly introduced.
mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs.
As it is PMD global infrastructure, only one IPC channel is established.
All the IPC message types may have port_id in the message if there is
need to reference a specific device.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
As the memory event is propagated to secondary processes, the event is
processed redundantly. This should be processed once because the data
structure used for MR and the event is global across the processes.
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
vulnerable to load stall. For x86, reducing the number of instructions
seems to matter most.
For x86, this is simply a load but for other architectures, it is
calculated from the address of mbuf structure by rte_mbuf_buf_addr()
without having to load the first cacheline of the mbuf.
Fixes: 12d468a62b ("net/mlx5: fix instruction hotspot on replenishing Rx buffer")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
We are implementing the support for multiport Infiniband device
with representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to provide IB port index for the Verbs
objects being created - QPs and Verbs Flows.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created within this shared context.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multiple ports.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The PMD code is updated to use Protected Domain from the
shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.
Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the single shared Infiniband context of that device.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.
The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.
Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This patch fixes the build failure with message:
drivers/net/mlx5/mlx5_ethdev.c: In function ‘mlx5_sysfs_switch_info’:
drivers/net/mlx5/mlx5_ethdev.c:1381:3:
error: ignoring return value of ‘fscanf’, declared with attribute
warn_unused_result [-Werror=unused-result]
fscanf(file, "%s", port_name);
^
Which reproduces on Ubuntu 16.04 LTS with
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609.
Fixes: b2f3a38101 ("net/mlx5: support new representor naming format")
Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Dekel Peled <dekelp@mellanox.com>
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee0
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Kernel update [1] introduce new format of representors names.
This patch implements RFC [2], updating MLX5 PMD to support the new
format, while maintaining support of the existing format.
[1] https://github.com/torvalds/linux/commit/c12ecc2
[2] http://mails.dpdk.org/archives/dev/2019-March/125676.html
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Inlining a packet to WQE that cross the WQ wraparound, i.e. the WQE
starts on the end of the ring and ends on the beginning, is not
supported and blocked by the data path logic.
However, in case of TSO, an extra inline header is required before
inlining. This inline header is not taken into account when checking if
there is enough room left for the required inline size.
On some corner cases were
(ring_tailroom - inline header) < inline size < ring_tailroom ,
this can lead to WQE being written outsize of the ring buffer.
Fixing it by always assuming the worse case that inline of packet will
require the inline header.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Function mlx5_tx_complete() reads completion entry information
from Tx queue.
For some processors not having strongly-ordered memory model,
there has to be a memory barrier between reading the entry index
and the entry fields, in order to guarantee data is valid.
Fixes: 54d3fe948d ("net/mlx5: poll completion queue once per a call")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
struct mlx5_cqe is defined in MLX5 PMD code (mlx5_prm.h).
It includes 64 bytes padding in case of (RTE_CACHE_LINE_SIZE == 128).
struct mlx5_err_cqe is defined in kernel, and doesn't include padding.
When running in debug mode, in case an error CQE is detected
it is printed using rte_hexdump().
The size of data to print should be sizeof(*cqe) instead of
sizeof(*err_cqe), to handle the case of (RTE_CACHE_LINE_SIZE == 128),
and print the full data in any case.
Fixes: c771499209 ("net/mlx5: extend debug logs verbosity")
Cc: stable@dpdk.org
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The call to strlcpy uses either libc, libbsd or internal rte_strlcpy.
No need to call the DPDK flavor explicitly.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.
Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).
Fixes: 2815702bae ("net/mlx5: replace verbs priorities by flow")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Whenever possible (if the library ships a pkg-config file) use meson's
dependency() function to look for it, as it will automatically add it
to the Requires.private list if needed, to allow for static builds to
succeed for reverse dependencies of DPDK. Otherwise the recursive
dependencies are not parsed, and users doing static builds have to
resolve them manually by themselves.
When using this API avoid additional checks that are superfluous and
take extra time, and avoid adding the linker flag manually which causes
it to be duplicated.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.
It is tested with command "drvinfo" of examples/ethtool/.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>