The buffer split Rx offload is not compatible with Multi-Packet
Receiving Queue (MPRQ) Rx offload, hence, the buffer split
offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT and other related
values should be advertised only if there is no MPRQ engaged.
Fixes: 6c8f7f1c18 ("net/mlx5: report Rx buffer split capabilities")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
In the bonding configurations the port switch id for representors was
composed of pf index in bonding as the 1 MSB and the representor's index
as the remaining 15 LSBs. The special corner case for the host PF
representor on BF setups with representor id 0xFFFF was missed as well.
The new switch port id consists of 4 MSBs for the pf bonding index and
the remaining 12 LSBs for the representor index. The switch port id
ranges for each type of representors are as follows:
Uplink representor(AKA master): 0xFFFF
Host PF representor: 0x<pf_bond>FFF
VF representor: 0x<pf_bond>[0-FFE]
Fixes: bee57a0a35 ("net/mlx5: update switch port id in bonding configuration")
Cc: stable@dpdk.org
Signed-off-by: Dong Zhou <dongzhou@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Rx RSS hash offload should be controlled by the user
and should not be enforced by RSS multi-queue Rx mode.
Fixes: 8b945a7f7d ("drivers/net: update Rx RSS hash offload capabilities")
Cc: stable@dpdk.org
Author: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The rte_atomic API is deprecated and needs to be replaced with
C11 atomic builtins. Use the relaxed ordering for RxQ/TxQ refcounts.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add rte_eth_dev_info->rx_seg_capa parameters:
- receiving to multiple pools is supported
- buffer offsets are supported
- no offset alignment requirement
- reports the maximal number of segments
- reports the buffer split offload flag
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
MPRQ (Multi-Packet Rx Queue) processes one packet at a time using
simple scalar instructions. MPRQ works by posting a single large buffer
(consisted of multiple fixed-size strides) in order to receive multiple
packets at once on this buffer. A Rx packet is then copied to a
user-provided mbuf or PMD attaches the Rx packet to the mbuf by the
pointer to an external buffer.
There is an opportunity to speed up the packet receiving by processing
4 packets simultaneously using SIMD (single instruction, multiple data)
extensions. Allocate mbufs in batches for every MPRQ buffer and process
the packets in groups of 4 until all the strides are exhausted. Then
switch to another MPRQ buffer and repeat the process over again.
The vectorized MPRQ burst routine is engaged automatically in case
the mprq_en=1 devarg is specified and the vectorization is not disabled
explicitly by providing rx_vec_en=0 devarg. There is a limitation:
LRO is not supported and scalar MPRQ is selected if it is on.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Previously, the Tx timestamp field and flag were registered in testpmd,
as described in mlx5 guide.
For consistency between Rx and Tx timestamps,
managing mbuf registrations inside the driver, as properly documented,
is a simpler expectation.
The only driver to support this feature (mlx5) is updated
as well as the testpmd application.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
In case of bonding, device ifindex was detected as the PF ifindex, so
any operation using ifindex applied to PF instead of the bond device.
These operations includes MTU get/set, up/down and mac address
manipulation, etc.
This patch detects bond interface ifindex and name for PF that join a
bond interface, uses it by default for netdev operations.
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The PMD supports hairpin only if DevX is supported and DV flow is
enabled.
When destination DevX TIR is not supported, the PMD tries to create TIR
action, and fails.
Avoid supporting hairpin when destination DevX TIR is not supported.
Fixes: b6b3bf86bd ("net/mlx5: get hairpin capabilities")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
File mlx5_ethdev.c is partially moved to linux/mlx5_ethdev_os.c for
functions which are Linux specific. Functions which are Linux agnostics
remain in mlx5_ethdev.c file.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Define 'struct mlx5_dev_attr' which is ibv and dv independent. It
contains attribute that were originally contained in 'struct
ibv_device_attr_ex' and 'struct mlx5dv_context dv_attr'. Add a new API
mlx5_os_get_dev_attr() which fills in the new defined struct.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Currently, the DevX counter query works asynchronously with Devx
interrupt handler return the query result. When port closes, the
interrupt handler will be uninstalled and the Devx comp obj will
also be destroyed. Meanwhile the query is still not cancelled.
In this case, counter query may use the invalid Devx comp which
has been destroyed, and query failure with invalid FD will be
reported.
Adjust the shared interrupt install and uninstall timing to make
the counter asynchronous query stop before interrupt uninstall.
Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Replace checking against 65535 limit,
with a simpler form using RTE_MIN and UINT16_MAX macros.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
Use the MLX5_ASSERT macros instead of the standard assert clause.
Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it.
If RTE_LIBRTE_MLX5_DEBUG is enabled MLX5_ASSERT is equal to RTE_VERIFY
to bypass the global CONFIG_RTE_ENABLE_ASSERT option.
If RTE_LIBRTE_MLX5_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT
can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT.
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move Netlink mechanism and its dependencies from net/mlx5 to
common/mlx5 in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependencies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
If configuring the number of tx/rx queue with rte_eth_dev_configure
to nr_queues + hairpin_nr_queues, and setting tx/rx queues to
nr_queues with rte_eth_tx/rx_queue_setup. But not configuring the
hairpin queues via rte_eth_tx/rx_hairpin_queue_setup.
When starting the netdev, there is a crash because of NULL accessing.
Fixes: cf5516696d ("ethdev: add hairpin queue")
Cc: stable@dpdk.org
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Ori Kam <orika@mellanox.com>
As the result of testing it was found that some hosts have
the performance penalty imposed by required write memory barrier
after doorbell writing. Before 19.08 release there was some
heuristics to decide whether write memory barrier should be
performed. For the bursts of recommended size (or multiple)
it was supposed there were some extra ongoing packets in the
next burst and write memory barrier may be skipped (supposed
to be performed in the next burst, at least after descriptor
writing).
This patch restores that behaviour, the devargs tx_db_nc=2
must be specified to engage this performance tuning feature.
Fixes: 8409a28573 ("net/mlx5: control transmit doorbell register mapping")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
By default RSS hash delivery (offload) is bound to RSS mode and
it is incorrect to advertise it as enabled if Rx multi-queue mode
has no RSS.
Fixes: 8b945a7f7d ("drivers/net: update Rx RSS hash offload capabilities")
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch implements use of the API for LRO aggregated packet
max size.
Rx queue create is updated to use the relevant configuration.
Documentation is updated accordingly.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Add DEV_RX_OFFLOAD_RSS_HASH flag for all PMDs that support RSS hash
delivery.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The metadata registers reg_c provide support for TAG and
SET_TAG features. Although there are 8 registers are available
on the current mlx5 devices, some of them can be reserved.
The availability should be queried by iterative trial-and-error
implemented by mlx5_flow_discover_mreg_c() routine.
If reg_c is available, it can be regarded inclusively that
the extensive metadata support is possible. E.g. metadata
register copy action, supporting 16 modify header actions
(instead of 8 by default) preserving register across
different domains (FDB and NIC) and so on.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
This commits adds the hairpin get capabilities function.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The DevX counter management triggers an asynchronous event to get back
the new counters values from the HW.
The counter management doesn't trigger 2 parallel events for the same
pool, hence, the pool cannot be updated again in the event waiting time.
When the port is stopped, the DevX event mechanism wrongly was
destroyed what remained all the waiting pools in waiting state forever.
As a result, the counters of the stuck pools were never updated again.
Separate the DevX interrupt installation from the dev installation and
remove the DevX interrupt unregistration\registration from the
stop\start operations.
Now, the DevX interrupt should be installed in probe and uninstalled in
close.
Cc: stable@dpdk.org
Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
mlx5_link_update immediately returns when called with no-wait parameter
and its call for retrieving the link status returns with EAGAIN error.
This is too harsh on busy systems where a first call fails with EAGAIN
from time to time.
This patch adds a (very limited) retry on such cases in order to allow
retrieving the link status.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
In LAG configuration the devices in the same switch domain
might be spawned on the base of different PCI devices, so
we should check all devices backed by mlx5 PMD whether they
belong to specified switch domain. When the new devices are
being created it is not possible to detect whether the
sibling devices created in the current probe() loop belong
to the driver, driver field is not filled yet (it will be
done on returned success of current probe()). This patch
updates the device scanning, allowing extra match on
current backing PCI device, is being used to create siblings.
Fixes: f7e95215ac ("net/mlx5: extend switch domain searching range")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
With bonding configuration multiple PFs may represent the
single switching device with multiple ports as representors.
To distinguish representors belonging to different PFs we
should generated unique port ID. It is proposed to use
the PF index in bonding configuration to generate this
unique port IDs.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
With bonding configurations the switch domain may be shared
between multiple PCI devices, we should search the switch
sibling devices within the entire set of present ethernet
devices backed by the mlx5 PMD.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The routine mlx5_port_to_eswitch_info() is elaborated
to two ones (get E-Switch port parameters by port and
by device pointer) and simplified to returning structure
containing all parameters instead of copying.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The routine mlx5_ibv_device_to_pci_addr() takes Infiniband
device list object, takes the device sysfs path from there
and retrieves PCI address. The routine may be implemented
in more generic way by taking sysfs path directly as parameter
and can be used for getting PCI address of netdevs.
The generic routine is renamed to mlx5_dev_to_pci_addr()
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Change eth_dev_infos_get_t return value from void to int.
Make eth_dev_infos_get_t implementations across all drivers to return
negative errno values if case of error conditions.
Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch implements ethdev operations get_module_info and
get_module_eeprom, to support ethtool commands ETHTOOL_GMODULEINFO
and ETHTOOL_GMODULEEEPROM.
New functions mlx5_get_module_info() and mlx5_get_module_eeprom()
added in mlx5_ethdev.c.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When the link is down, the link speed returned by ethtool is
UINT32_MAX and the link status is 0.
In this case, the DPDK ethdev link speed should be set to
ETH_SPEED_NUM_NONE.
Otherwise since link speed is non-zero but link status is zero, this
is an inconsistent situation and -EAGAIN is returned, which is not right.
Fixes: 1884087198 ("net/mlx5: fix support for newer link speeds")
Cc: stable@dpdk.org
Signed-off-by: Xiaoyu Min <jackmin@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Enabling LRO offload per queue makes sense because the user will
probably want to allocate different mempool for LRO queues - the LRO
mempool mbuf size may be bigger than non LRO mempool.
Change the LRO offload to be per queue instead of per port.
If one of the queues is with LRO enabled, all the queues will be
configured via DevX.
If RSS flows direct TCP packets to queues with different LRO enabling,
these flows will not be offloaded with LRO.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
LRO support was only for MPRQ, hence mprq Rx burst was selected when
LRO was configured in the port.
The current support for MPRQ is suffering from bad memory utilization
since an external mempool is allocated by the PMD for the packets data
in addition to the user mempool, besides that, the user may get packet
data addresses which were not configured by him.
Even though MPRQ has the best performance for packet receiving in the
most cases and because of the above facts it is better to remove the
automatic MPRQ select when LRO is configured.
Move MPRQ to be selected only when the user force it by the PMD
arguments including LRO case.
Allow LRO offload using the regular RQ with the regular Rx burst
function.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Use DevX API to read device LRO capabilities.
Check if LRO is supported and can be enabled.
Check if MPRQ is supported and can be used.
Enable MPRQ for LRO use if not enabled by user.
Added note for mlx5_mprq_enabled(), to emphasize that LRO
enables MPRQ.
Disable CQE compression and CRC stripping if LRO is enabled.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Add command-line argument to set LRO session timeout.
Add LRO settings struct in PMD configuration struct.
Add support of LRO offload in port configuration.
Add macros and function to check if LRO is supported and enabled.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The associated device index is retrieved via Netlink request to
underlying Infiniband device driver. This network device index
is permanent throughout the lifetime of device. We do not
spawn the rte_eth_dev ports without associated network device, and
if network device is being unbound we get the remove notification
message and rte_eth_dev port is also detached. So, we may store
the ifindex in mlx5_device_spawn() routine at rte_eth_dev port
creation and initialization time and use the cached value further
instead of doing actual Netlink request.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch fills the tx_desc_lim.nb_seg_max and
tx_desc_lim.nb_mtu_seg_max fields of rte_eth_dev_info
structure to report thee maximal number of packet
segments, requested inline data configuration is
taken into account in conservative way.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:
- deprecated devargs support
- tx_burst() routines
- related PRM definitions
- SQ configuration code
- Tx routine selection code
- incompatible Tx completion code
The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
to "txq_inline_mpw" for compatibility issue
The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.
Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.
A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.
Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!
More synchronization is added with the host thread:
Container resize uses double buffer algorithm.
Pools growing in container uses atomic operation.
Pool query buffer replace uses a spinlock.
Pool minimum devX counter ID uses atomic operation.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
mlx5_link_update uses the newer ethtool command
ETHTOOL_GLINKSETTINGS to determine interface capabilities but falls
back to the older (deprecated) ETHTOOL_GSET command if the new
method fails for any reason.
The older method only supports reporting of capabilities up to 40G.
However, mlx5_link_update_unlocked_gs can return a failure for a
number of reasons (including the link being down).
Using the older method in cases of transient failure of the method
can result in reporting of reduced capabilities to the application.
The older method (mlx5_link_update_unlocked_gset) should only be
invoked if the newer method returns EOPNOTSUPP.
Fixes: 7d2e32f76c ("net/mlx5: fix ethtool link setting call order")
Cc: stable@dpdk.org
Reported-by: Srinivas Narayan <srinivas.narayan@att.com>
Signed-off-by: Asaf Penso <asafp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>