The call to strlcpy uses either libc, libbsd or internal rte_strlcpy.
No need to call the DPDK flavor explicitly.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.
Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).
Fixes: 2815702baea7 ("net/mlx5: replace verbs priorities by flow")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.
It is tested with command "drvinfo" of examples/ethtool/.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In rdma-core library IBV_WQ_FLAG_RX_END_PADDING is renamed to
IBV_WQ_FLAGS_PCI_WRITE_END_PADDING. Way to query the capability is also
changed.
Fixes: 43e9d9794cde ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Rx packet padding is supposed to be set by an environment variable -
MLX5_PMD_ENABLE_PADDING, but it has been missing for some time by mistake.
Rather than using such a variable, a PMD parameter (rxq_pkt_pad_en) is
added instead.
Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Rename options CONFIG_RTE_LIBRTE_MLX4_DLOPEN_DEPS and
CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS to a single option
CONFIG_RTE_IBVERBS_LINK_DLOPEN.
Rename meson option enable_driver_mlx_glue to ibverbs_link.
There was no good reason for setting a different link option
for mlx4 and mlx5. Having a single common option makes it
easier to understand and unify make and meson systems.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This commit adds counters support when creating flows via direct
verbs. The implementation uses devx interface in order to create
query and delete the counters.
This support requires MLNX_OFED_LINUX-4.5-0.1.0.1 installation.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This commit includes the add of:
- ConnectX-6 device ID
- ConnectX-6 SRIOV device ID
Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The ethdev flag RTE_ETH_DEV_CLOSE_REMOVE is set for drivers
having migrated to the new behaviour of rte_eth_dev_close().
As any other flag, it can be useful to know about its value
as soon as the port is probed.
Unfortunately, it was set inside the close operation,
just before being erased by memset() in rte_eth_dev_release_port().
The flag assignment is moved to the probing stage, so it can
be checked by the application in order to anticipate the behaviour.
Fixes: 42603bbdb58e ("net/mlx5: release port on close")
Fixes: 6c99085d972b ("net/vmxnet3: fix hot-unplug")
Fixes: 4d7877fde2ef ("net/ena: remove resources when port is being closed")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds support for the rx_queue_count API in mlx5 driver
Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Add txqs_max_vec parameter to configure the maximum number of Tx queues to
enable vectorized Tx. And its default value is set according to the
architecture and device type.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When a device is spawned, it does make more sense that the configuration
parameters are passed by callee. Furthermore, setting default value for
some configuration would need PCIe device ID which can be found in the
probe function.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on
RX side. The size of CQE is aligned with the size of a cacheline of the
core. If cacheline size is 128B, the CQE size is configured to be 128B even
though the device writes only 64B data on the cacheline. This is to avoid
unnecessary cache invalidation by device's two consecutive writes on to one
cacheline. However in some architecture, it is more beneficial to update
entire cacheline with padding the rest 64B rather than striding because
read-modify-write could drop performance a lot. On the other hand, writing
extra data will consume more PCIe bandwidth and could also drop the maximum
throughput. It is recommended to empirically set this parameter. Disabled
by default.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
With the introduction of representors several eth devices are using
the same rte device (e.g. a PCI bus). When calling port detach on one
eth device it is required that all eth devices belonging to the
same rte device have been closed in advance, then the rte device
itself can be removed/detached.
This commit implements this requirement implicitly by adding a
remove callback to struct rte_pci_driver.
The new behavior can be demonstrated in testpmd.
First we attach a representor 0 using PCI address 0000:08:00.0
testpmd> port attach 0000:08:00.0,representor=[0]
Attaching a new port...
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1013 net_mlx5
Port 0 is attached.
Done
Port 1 is attached.
Done
Port 0 is the master device (PF) - an ethdev of the PCI address.
Port 1 is representor 0 - another ethdev (representing a VF) using the
same PCI address. Next we detach port 1
testpmd> port detach 1
Removing a device...
Port 0 is closed
Port 1 is closed
Now total ports is 0
Done
Since port 0 has been implicitly closed we cannot act on it anymore.
testpmd> port stop 0
Invalid port 0
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
With the introduction of representors several eth devices are using
the same rte device (e.g. a PCI bus). It is therefore required to
release the eth device resources during an eth device close operation
rather than during an rte device removal (detach) operation.
In current version many PMDs are still releasing the eth device as
part of the rte device removal. In order to allow a smooth transition
for all PMDs to behave correctly an ethdev flag RTE_ETH_DEV_CLOSE_REMOVE
is used. When this flag is set it indicates to rte_eth_dev_close() to
call rte_eth_dev_release_port(), so the port is freed during the close
operation.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Implement probing of a rte device multiple times, see [1].
Set PCI driver RTE_PCI_DRV_PROBE_AGAIN flag to enable multiple probing
of the PCI device by the PCI common driver.
Consecutive probing requests with a devargs string may contain
repetitive master and representors devices for which eth device should
be created only once. In case an eth device already exists - silently
ignore it.
[1]
commit e9d159c3d534 ("eal: allow probing a device again")
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In case that the library doesn't support DV flow, if enabled by
'dv_flow_en=1', print out a warning message and disable it.
Fixes: 51e72d386c99 ("net/mlx5: add runtime parameter to enable Direct Verbs")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
The redundant check of Flow counters support in runtime is removed.
The flag flow_counter_en is eliminated from the code. The Verbs
create counter function just returns an error if no counter
support presented in the system.
If there is no any of Flow counters configuration macro defined
the log message is emited, indicating the missing counter support.
mlx5_flow_validate_action_count() fuctnion is also updated due to
flow_counter_en flag removal.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT is replaced with
HAVE_IBV_DEVICE_COUNTERS_SET_V42. At this stage it is just
macro renaming. This macro is defined if system supports
the "old" Flow counters functionality, MLNX_OFED version
from 4.2 to 4.4 is required.
We need to do this preparation before introducing the new
configuration macro (HAVE_IBV_DEVICE_COUNTERS_SET_V45) for
the "new" Flow counters support.
Both makefile and meson.build are changed.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The representor id is added in rte_eth_dev_data in order to be able
to match a port with its representor id in devargs.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This commit refactors tc_flow as a preparation to coming commits
that sends different type of messages and expect differ type of replies
while still using the same underlying routines.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
This is a clean-up of common ethdev data freeing.
All data freeing are moved to rte_eth_dev_release_port()
and done only in case of primary process.
It is probably fixing some memory leaks for PMDs which were
not freeing all data.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa5e ("bus/pci: reference driver structure before mapping")
However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
Flows having 'transfer' attribute have to be inserted to E-Switch on the
NIC and the control path uses Linux TC flower interface via Netlink
socket.
This patch adds the flow driver on top of the new flow engine.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Netlink based E-Switch flow engine will be migrated to the new flow
engine.
nl_flow will be renamed to flow_tcf as it goes through Linux TC flower
interface.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Flow engine has to support multiple driver paths. Verbs/DV for NIC flow
steering and Linux TC flower for E-Switch flow steering. In the future,
another flow driver could be added (devX).
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
DV flow API is based on new kernel API and is
missing some functionality like counter but add other functionality
like encap.
In order not to affect current users even if the kernel supports
the new DV API it should be enabled only manually.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This commit modify the conversion of the input parameters into Verbs
spec, in order to support all previous changes.
Some of those changes are:
removing the use of the parser,
storing each flow in its own flow structure.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered
un-secure, as on some cases were the application provides incorrect mbufs
on the Tx burst the host or NIC can get stuck.
Hence, disabling the feature by default for this specific NIC.
Users can still enable this feature and enjoy the performance gain
(mostly for low number of cores) by using the txq_mpw_en devarg.
This patch will impact the out of the box performance of some application
using ConnectX-4 Lx for the sack of security and robustness.
Since we need different defaults based on the underlying device the mpw
field in the configuration struct was extended to contain also the
MLX5_ARG_UNSET option.
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling allmulti
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as
rte_eth_allmulticast_enable/disable() fail to set the flag
(dev->data->all_multicast). The flag is used when starting traffic by
mlx5_traffic_enable(). When switching out of flow isolation mode, allmulti
mode will not be set even though it has been enabled.
Fixes: 0887aa7f27f3 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling promiscuous
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as rte_eth_promiscuous_enable/disable()
fail to set the flag (dev->data->promiscuous). The flag is used when
starting traffic by mlx5_traffic_enable(). When switching out of flow
isolation mode, promiscuous mode will not be set even though it has been
enabled.
Fixes: 0887aa7f27f3 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Route Netlink message socket is wrongly initialized by registering to
the route link group. This causes the socket to receive all link
message related to routes whereas the PMD do not expect to receive such
information. In some situation it ends by filling the socket at a point
that any new message cannot be exchanged.
As the PMD is not expected to process such broadcast messages, the
parameter in the nl_group in the function is also remove.
Fixes: ccdcba53a3f4 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org
Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
On systems where the required Netlink commands are not supported but
Mellanox OFED is installed, representors information must be retrieved
through sysfs.
Fixes: 26c08b979d26 ("net/mlx5: add port representor awareness")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).
This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).
Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.
[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.
The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
of an on-going write of a DoorBell over a given UAR page.
The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs. Rename this to make more
sense on what the PMD has to translate.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow. Due to this, drop queues can be fully mapped on regular
queues.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.
This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.
This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Probe existing port representors in addition to their master device and
associate them automatically.
To avoid collision between Ethernet devices, they are named as follows:
- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
representors.
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.
Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.
This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:
- There is only one matching device which isn't identified as a
representor, in that case use it.
- Otherwise log an error and do not probe the device.
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.
This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).
No functional impact.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.
This patch divides the original function, which has become huge over time,
as follows:
1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).
(Patch based on prior work from Yuanhan Liu)
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.
While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.
No functional impact.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
There are several attribute objects in this function:
- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).
"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>