Commit Graph

166 Commits

Author SHA1 Message Date
Yongseok Koh
dceb502942 net/mlx5: add control of excessive memory pinning by kernel
A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller
and get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in
the extended chunk is freed, that doesn't become reusable until the
entire memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be
turned off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
2aac5b5d11 net/mlx5: sync stop/start with secondary process
Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
7be600c8d8 net/mlx5: rework PMD global data init
There's more need to have PMD global data structure. This should be
initialized once per a process regardless of how many PMD instances are
probed. mlx5_init_once() is called during probing and make sure all the
init functions are called once per a process. Currently, such global
data and its initialization functions are even scattered. Rather than
'extern'-ing such variables and calling such functions one by one making
sure it is called only once by checking the validity of such variables, it
will be better to have a global storage to hold such data and a
consolidated function having all the initializations. The existing shared
memory gets more extensively used for this purpose. As there could be
multiple secondary processes, a static storage (local to process) is also
added.

As the reserved virtual address for UAR remap is a PMD global resource,
this doesn't need to be stored in the device priv structure, but in the
PMD global data.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
9a8ab29b84 net/mlx5: replace IPC socket with EAL API
Socket API is used for IPC in order for secondary process to acquire
Verb command file descriptor. The FD is used to remap UAR address.
The multi-process APIs (rte_mp) in EAL are newly introduced.
mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs.

As it is PMD global infrastructure, only one IPC channel is established.
All the IPC message types may have port_id in the message if there is
need to reference a specific device.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Viacheslav Ovsiienko
53e5a82fd1 net/mlx5: update install/uninstall event handlers
We are implementing the support for multiport Infiniband device
with representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
f048f3d479 net/mlx5: switch to the shared IB device context
The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created within this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
d485cdca01 net/mlx5: switch to the shared context IB attributes
The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multiple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
1b782252cb net/mlx5: switch to the shared protection domain
The PMD code is updated to use Protected Domain from the
shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
9c0a9eed37 net/mlx5: switch to the names in the shared IB context
The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
17e19bc4dd net/mlx5: add IB shared context alloc/free functions
The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
bbfad6427b net/mlx5: add getting IB ports number for multiport IB
There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
e505508a38 net/mlx5: modify get ifindex routine for multiport IB
There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
299d7dc28c net/mlx5: add representor recognition on Linux 5.x
The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Dekel Peled
b2f3a38101 net/mlx5: support new representor naming format
Kernel update [1] introduce new format of representors names.
This patch implements RFC [2], updating MLX5 PMD to support the new
format, while maintaining support of the existing format.

[1] https://github.com/torvalds/linux/commit/c12ecc2
[2] http://mails.dpdk.org/archives/dev/2019-March/125676.html

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-20 18:15:42 +01:00
Thomas Monjalon
dbeba4cf18 net/mlx: prefix private structure
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-03-01 18:17:35 +01:00
Thomas Monjalon
714bf46ebb net/mlx: support firmware version query
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.

It is tested with command "drvinfo" of examples/ethtool/.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-02-13 12:55:38 +01:00
Moti Haimovsky
f5bf91de73 net/mlx5: support flow counters using devx
This commit adds counters support when creating flows via direct
verbs. The implementation uses devx interface in order to create
query and delete the counters.
This support requires MLNX_OFED_LINUX-4.5-0.1.0.1 installation.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:29 +01:00
Wisam Jaddo
f0354d8423 net/mlx5: add ConnectX-6 device IDs
This commit includes the add of:
- ConnectX-6 device ID
- ConnectX-6 SRIOV device ID

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-03 13:07:06 +01:00
Dekel Peled
4bb14c83df net/mlx5: support modify header using Direct Verbs
This patch implements the set of actions to support offload
of packet header modifications to MLX5 NIC.

Implementation is based on RFC [1].

[1] http://mails.dpdk.org/archives/dev/2018-November/119971.html

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-03 12:56:43 +01:00
Tom Barbette
ce9494d76c net/mlx5: report imissed statistics
The imissed counters (number of packets dropped because the queues were
full) were actually reported through xstats as "rx_out_of_buffer"
but was not reported through stats.

Following a recent discussion on the ML, as there is no way to tell the
user if a counter is implemented or not, this should be considered a
bug. For example, user looking at imissed will think the packets are
lost before reaching the device.

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-12-13 16:31:06 +00:00
Yongseok Koh
09d8b41699 net/mlx5: make vectorized Tx threshold configurable
Add txqs_max_vec parameter to configure the maximum number of Tx queues to
enable vectorized Tx. And its default value is set according to the
architecture and device type.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Dekel Peled
c513f05cde net/mlx5: add caching of encap/decap actions
Make flow encap and decap Verbs actions cacheable resources.
Store created actions in local database.
This enables MLX5 PMD reuse of existing actions.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Yongseok Koh
bc91e8db12 net/mlx5: add 128B padding of Rx completion entry
A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on
RX side. The size of CQE is aligned with the size of a cacheline of the
core. If cacheline size is 128B, the CQE size is configured to be 128B even
though the device writes only 64B data on the cacheline. This is to avoid
unnecessary cache invalidation by device's two consecutive writes on to one
cacheline. However in some architecture, it is more beneficial to update
entire cacheline with padding the rest 64B rather than striding because
read-modify-write could drop performance a lot. On the other hand, writing
extra data will consume more PCIe bandwidth and could also drop the maximum
throughput. It is recommended to empirically set this parameter. Disabled
by default.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Viacheslav Ovsiienko
2dd8b72167 net/mlx5: simplify flow counters support check
The redundant check of Flow counters support in runtime is removed.
The flag flow_counter_en is eliminated from the code. The Verbs
create counter function just returns an error if no counter
support presented in the system.

If there is no any of Flow counters configuration macro defined
the log message is emited, indicating the missing counter support.

mlx5_flow_validate_action_count() fuctnion is also updated due to
flow_counter_en flag removal.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2018-10-26 22:14:06 +02:00
Moti Haimovsky
d53180afe3 net/mlx5: refactor TC-flow infrastructure
This commit refactors tc_flow as a preparation to coming commits
that sends different type of messages and expect differ type of replies
while still using the same underlying routines.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
2018-10-26 22:14:05 +02:00
Shahaf Shuler
7dd7be29b4 net/mlx5: always use representor ifindex for ioctl
In the current code, on some cases the representor ethdev is using the
PF interface to query some link status information or pause parameters.

It was done because in previous kernel versions there was no support
from the kernel for the representor info.

Using the PF i/f for such ioctl is error prone and not always working
because:
 * On some cases there is no PF at all, only representors (e.g Bluefield
   with host representors)
 * Query the up/down status from representor and link status from PF
   is in-consist
 * PF link is down doesn't necessarily means representor is down.
 * setting different pause configuration for the PF and the
   representors will result on undefined behaviour

Making the code cleaner and more robust by using only the representor
i/f for the ioctl. whatever the kernel will provide on this query will
be used. No need to do W.A. for kernel missing functionality.

Note:
 1. Setting pause parameters will obviously won't work on representors
 2. Old kernel will not report all the possible representor info

Fixes: 2b73026388 ("net/mlx5: probe all port representors")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2018-10-11 18:56:02 +02:00
Shahaf Shuler
1a611fdaf6 net/mlx5: support missing counter in extended statistics
The current code would fail if one of the counters DPDK counters was not
found on the device counters.

As representors and PF port has different counters the both cannot work
together.

Addressing this issue by making the counter init more flexible to
contain all the counter found and skipping the error.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2018-10-11 18:56:02 +02:00
Yongseok Koh
40c9ccf9e9 net/mlx5: remove Netlink flow driver
Netlink based E-Switch flow engine will be migrated to the new flow
engine.
nl_flow will be renamed to flow_tcf as it goes through Linux TC flower
interface.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
51e72d386c net/mlx5: add runtime parameter to enable Direct Verbs
DV flow API is based on new kernel API and is
missing some functionality like counter but add other functionality
like encap.

In order not to affect current users even if the kernel supports
the new DV API it should be enabled only manually.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
865a0c1567 net/mlx5: add Direct Verbs prepare function
This function allocates the Direct Verbs device flow, and
introduce the relevant PRM structures.

This commit also adds the matcher object. The matcher object acts as a
mask and should be shared between flows. For example all rules that
should match source IP with full mask should use the same matcher. A
flow that should match dest IP or source IP but without full mask should
have a new matcher allocated.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
c322c0e558 net/mlx5: add bluefield VF support
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-09-28 01:41:01 +02:00
Shahaf Shuler
f9de87187b net/mlx5: disable ConnectX-4 Lx Multi Packet Send by default
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered
un-secure, as on some cases were the application provides incorrect mbufs
on the Tx burst the host or NIC can get stuck.

Hence, disabling the feature by default for this specific NIC.
Users can still enable this feature and enjoy the performance gain
(mostly for low number of cores) by using the txq_mpw_en devarg.

This patch will impact the out of the box performance of some application
using ConnectX-4 Lx for the sack of security and robustness.

Since we need different defaults based on the underlying device the mpw
field in the configuration struct was extended to contain also the
MLX5_ARG_UNSET option.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-28 15:27:39 +02:00
Adrien Mazarguil
3f8cb05df5 net/mlx5: fix invalid network interface index
Network interface indices being unsigned, an invalid index or error is
normally expressed through a zero value (see if_nametoindex()).

mlx5_ifindex() has a signed return type for negative values in case of
error. Since mlx5_nl.c does not check for errors, these may be fed back as
invalid interfaces indices to subsequent system calls. This usage would
have been correct if mlx5_ifindex() returned a zero value instead.

This patch makes mlx5_ifindex() unsigned for convenience.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
5366074b01 net/mlx5: fix route Netlink message overflow
Route Netlink message socket is wrongly initialized by registering to
the route link group.  This causes the socket to receive all link
message related to routes whereas the PMD do not expect to receive such
information.  In some situation it ends by filling the socket at a point
that any new message cannot be exchanged.
As the PMD is not expected to process such broadcast messages, the
parameter in the nl_group in the function is also remove.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
f872b4b99d net/mlx5: fix representors detection
On systems where the required Netlink commands are not supported but
Mellanox OFED is installed, representors information must be retrieved
through sysfs.

Fixes: 26c08b979d ("net/mlx5: add port representor awareness")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
8f9059ccee net/mlx5: add framework for switch flow rules
Because mlx5 switch flow rules are configured through Netlink (TC
interface) and have little in common with Verbs, this patch adds a separate
parser function to handle them.

- mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
  and stores the result in a buffer.

- mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.

- mlx5_nl_flow_create() instantiates a flow rule on the device based on
  such a buffer.

- mlx5_nl_flow_destroy() performs the reverse operation.

These functions are called by the existing implementation when encountering
flow rules which must be offloaded to the switch (currently relying on the
transfer attribute).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
20b71e92ef net/mlx5: lay groundwork for switch offloads
With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Moti Haimovsky
6bf10ab69b net/mlx5: support 32-bit systems
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
 of an on-going write of a DoorBell over a given UAR page.

The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 14:34:59 +02:00
Nelio Laranjeiro
60bd8c9747 net/mlx5: add count flow action
This is only supported by Mellanox OFED.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:27 +02:00
Nelio Laranjeiro
2815702bae net/mlx5: replace verbs priorities by flow
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs.  Rename this to make more
sense on what the PMD has to translate.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
78be885295 net/mlx5: handle drop queues as regular queues
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow.  Due to this, drop queues can be fully mapped on regular
queues.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Adrien Mazarguil
2b73026388 net/mlx5: probe all port representors
Probe existing port representors in addition to their master device and
associate them automatically.

To avoid collision between Ethernet devices, they are named as follows:

- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
  representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:19 +02:00
Adrien Mazarguil
26c08b979d net/mlx5: add port representor awareness
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.

Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.

This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:

- There is only one matching device which isn't identified as a
  representor, in that case use it.
- Otherwise log an error and do not probe the device.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:14 +02:00
Adrien Mazarguil
9083982ce7 net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:36:55 +02:00
Matan Azrad
1f106da2bf net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
Add support for MPLS over GRE and MPLS over UDP tunnel types as
described in the next RFCs:
1. https://tools.ietf.org/html/rfc4023
2. https://tools.ietf.org/html/rfc7510
3. https://tools.ietf.org/html/rfc4385

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
dd3331c6f1 net/mlx5: add Bluefield device id
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Yongseok Koh
7d6bf6b866 net/mlx5: add Multi-Packet Rx support
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
974f1e7ef1 net/mlx5: add new memory region support
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
d561b5dc13 net/mlx5: remove memory region support
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
df428ceef4 net/mlx5: change device reference for secondary process
rte_eth_devices[] is not shared between primary and secondary process, but
a static array to each process. The reverse pointer of device (priv->dev)
is invalid. Instead, priv has the pointer to shared data of the device,
  struct rte_eth_dev_data *dev_data;

Two macros are added,
  #define PORT_ID(priv) ((priv)->dev_data->port_id)
  #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00