Commit Graph

206 Commits

Author SHA1 Message Date
Thomas Monjalon
391797f042 drivers/bus: move driver assignment to end of probing
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa ("bus/pci: reference driver structure before mapping")

However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
2018-10-17 10:26:59 +02:00
Yongseok Koh
57123c00c1 net/mlx5: add Linux TC flower driver for E-Switch flow
Flows having 'transfer' attribute have to be inserted to E-Switch on the
NIC and the control path uses Linux TC flower interface via Netlink
socket.
This patch adds the flow driver on top of the new flow engine.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Yongseok Koh
40c9ccf9e9 net/mlx5: remove Netlink flow driver
Netlink based E-Switch flow engine will be migrated to the new flow
engine.
nl_flow will be renamed to flow_tcf as it goes through Linux TC flower
interface.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Yongseok Koh
0c76d1c9a1 net/mlx5: add abstraction for multiple flow drivers
Flow engine has to support multiple driver paths. Verbs/DV for NIC flow
steering and Linux TC flower for E-Switch flow steering. In the future,
another flow driver could be added (devX).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
51e72d386c net/mlx5: add runtime parameter to enable Direct Verbs
DV flow API is based on new kernel API and is
missing some functionality like counter but add other functionality
like encap.

In order not to affect current users even if the kernel supports
the new DV API it should be enabled only manually.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
84c406e745 net/mlx5: add flow translate function
This commit modify the conversion of the input parameters into Verbs
spec, in order to support all previous changes.

Some of those changes are:
removing the use of the parser,
storing each flow in its own flow structure.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Anatoly Burakov
5282bb1c36 mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 10:24:29 +02:00
Ori Kam
c322c0e558 net/mlx5: add bluefield VF support
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-09-28 01:41:01 +02:00
Shahaf Shuler
f9de87187b net/mlx5: disable ConnectX-4 Lx Multi Packet Send by default
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered
un-secure, as on some cases were the application provides incorrect mbufs
on the Tx burst the host or NIC can get stuck.

Hence, disabling the feature by default for this specific NIC.
Users can still enable this feature and enjoy the performance gain
(mostly for low number of cores) by using the txq_mpw_en devarg.

This patch will impact the out of the box performance of some application
using ConnectX-4 Lx for the sack of security and robustness.

Since we need different defaults based on the underlying device the mpw
field in the configuration struct was extended to contain also the
MLX5_ARG_UNSET option.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-28 15:27:39 +02:00
Yongseok Koh
2547ee7458 net/mlx5: preserve allmulticast flag for flow isolation mode
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling allmulti
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as
rte_eth_allmulticast_enable/disable() fail to set the flag
(dev->data->all_multicast). The flag is used when starting traffic by
mlx5_traffic_enable(). When switching out of flow isolation mode, allmulti
mode will not be set even though it has been enabled.

Fixes: 0887aa7f27 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-05 08:47:41 +02:00
Yongseok Koh
24b068ad71 net/mlx5: preserve promiscuous flag for flow isolation mode
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling promiscuous
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as rte_eth_promiscuous_enable/disable()
fail to set the flag (dev->data->promiscuous). The flag is used when
starting traffic by mlx5_traffic_enable(). When switching out of flow
isolation mode, promiscuous mode will not be set even though it has been
enabled.

Fixes: 0887aa7f27 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-05 08:47:40 +02:00
Nelio Laranjeiro
5366074b01 net/mlx5: fix route Netlink message overflow
Route Netlink message socket is wrongly initialized by registering to
the route link group.  This causes the socket to receive all link
message related to routes whereas the PMD do not expect to receive such
information.  In some situation it ends by filling the socket at a point
that any new message cannot be exchanged.
As the PMD is not expected to process such broadcast messages, the
parameter in the nl_group in the function is also remove.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
f872b4b99d net/mlx5: fix representors detection
On systems where the required Netlink commands are not supported but
Mellanox OFED is installed, representors information must be retrieved
through sysfs.

Fixes: 26c08b979d ("net/mlx5: add port representor awareness")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
20b71e92ef net/mlx5: lay groundwork for switch offloads
With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Moti Haimovsky
6bf10ab69b net/mlx5: support 32-bit systems
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
 of an on-going write of a DoorBell over a given UAR page.

The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 14:34:59 +02:00
Nelio Laranjeiro
af689f1f04 net/mlx5: support flow Ethernet item along with drop action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
2815702bae net/mlx5: replace verbs priorities by flow
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs.  Rename this to make more
sense on what the PMD has to translate.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
78be885295 net/mlx5: handle drop queues as regular queues
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow.  Due to this, drop queues can be fully mapped on regular
queues.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Adrien Mazarguil
6de569f5ec net/mlx5: add parameter for port representors
Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.

This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:29 +02:00
Adrien Mazarguil
116f90ad7e net/mlx5: probe port representors in natural order
Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.

This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:37:26 +02:00
Adrien Mazarguil
2b73026388 net/mlx5: probe all port representors
Probe existing port representors in addition to their master device and
associate them automatically.

To avoid collision between Ethernet devices, they are named as follows:

- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
  representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:19 +02:00
Adrien Mazarguil
26c08b979d net/mlx5: add port representor awareness
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.

Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.

This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:

- There is only one matching device which isn't identified as a
  representor, in that case use it.
- Otherwise log an error and do not probe the device.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:14 +02:00
Adrien Mazarguil
681289345e net/mlx5: re-indent generic probing function
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.

This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:10 +02:00
Adrien Mazarguil
f38c54571d net/mlx5: split PCI from generic probing
All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.

This patch divides the original function, which has become huge over time,
as follows:

1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:03 +02:00
Adrien Mazarguil
9083982ce7 net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:36:55 +02:00
Adrien Mazarguil
3ff4b0866f net/mlx5: remove redundant objects in probe function
This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:52 +02:00
Adrien Mazarguil
6057a10b3b net/mlx5: rename confusing object in probe function
There are several attribute objects in this function:

- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).

"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:46 +02:00
Thomas Monjalon
f8e9989606 remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-12 00:00:35 +02:00
Matan Azrad
1ff30d182c net/mlx5: activate Verbs cleanup on removal
Starting from rdma-core v19, Mellanox OFED 4.4, the Verbs resources
cleanup is properly activated in plug-out process when setting the
MLX5_DEVICE_FATAL_CLEANUP environment variable to 1.

Set the aforementioned variable to 1.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-04 16:48:53 +02:00
Yongseok Koh
5c0e2db619 net/mlx5: add warning message for Multi-Packet RQ
If Multi-Packet RQ is enabled but not supported by device or
kernel/library, print out a warning message.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-03 01:35:58 +02:00
Stephen Hemminger
3d96644aa3 net/mlx5: fix log initialization
The mlx5 driver had two init functions, but this could
cause log initialization to be done after the
other initialization. Also, the name of the function does
not match convention (cut/paste error?).

Fix by initializing log type first at start of the pmd_init.
This also gets rid of having two constructor functions.

Fixes: a170a30d22 ("net/mlx5: use dynamic logging")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-06-17 10:17:53 +02:00
Xueming Li
a9fc0b0ef0 net/mlx5: fix crash in device probe
This patch initializes counter descriptor struct before invoking Verbs
api to avoid segmentation fault.

Fixes: 9a761de8ea ("net/mlx5: flow counter support")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
93068a9d5a net/mlx5: fix error message in probe function
Error values passed to strerror() must be positive.

Fixes: 012ad9944d ("net/mlx5: fix probe return value polarity")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c6ce7e34ad net/mlx5: fix missing errno in probe function
Fixes: b43802b4bd ("net/mlx5: support 16 hardware priorities")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
8c3c2372ed net/mlx5: fix errno object in probe function
Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c93adccc97 net/mlx5: remove limitation on number of instances
This artificial limitation was inherited from the mlx4 code base and has no
purpose other than adding unnecessary noise.

This patch is a port of commit f2318196c7 ("net/mlx4: remove limitation
on number of instances").

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
David Marchand
44b1d513d5 net/mlx5: register memory callback only when probing
The callback should be invoked only for memory that has been registered
in a device, hence, no need to track cleanup events if no device is
present.

Bugzilla ID: 56
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-05-30 21:16:43 +02:00
Xueming Li
0ace586dee net/mlx5: fix memory region cache init
MR cache init takes place on the device configuration.
When the device is re-configured multiple times, for example when
changing the number of queue on the flight, deadlock can happen.

This patch moved MR cache init from device configuration function to
probe function to make sure init only once.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 16:28:43 +02:00
Adrien Mazarguil
e89c15b697 net/mlx5: fix crash when configure is not called
Although uncommon, applications may destroy a device immediately after
probing it without going through dev_configure() first.

This patch addresses a crash which occurs when mlx5_dev_close() calls
mlx5_mr_release() due to an uninitialized entry in the private structure.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 07:50:38 +02:00
Matan Azrad
1f106da2bf net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
Add support for MPLS over GRE and MPLS over UDP tunnel types as
described in the next RFCs:
1. https://tools.ietf.org/html/rfc4023
2. https://tools.ietf.org/html/rfc7510
3. https://tools.ietf.org/html/rfc4385

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
dd3331c6f1 net/mlx5: add Bluefield device id
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Andy Green
f11a4a7d8a net/mlx5: fix uninitialized variable in probing
Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")

Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-15 22:29:22 +02:00
Thomas Monjalon
fbe90cdd77 ethdev: add probing finish function
A new hook function is added and called inside the PMDs at the end
of the device probing:
	- in primary process, after allocating, init and config
	- in secondary process, after attaching and local init

This new function is almost empty for now.
It will be used later to add some post-initialization processing.

For the PMDs calling the helpers rte_eth_dev_create() or
rte_eth_dev_pci_generic_probe(), the hook rte_eth_dev_probing_finish()
is called from here, and not in the PMD itself.

Note that the helper rte_eth_dev_create() could be used more,
especially for vdevs, avoiding some code duplication in PMDs.

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-14 22:31:53 +01:00
Yongseok Koh
7d6bf6b866 net/mlx5: add Multi-Packet Rx support
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
974f1e7ef1 net/mlx5: add new memory region support
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
d561b5dc13 net/mlx5: remove memory region support
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
df428ceef4 net/mlx5: change device reference for secondary process
rte_eth_devices[] is not shared between primary and secondary process, but
a static array to each process. The reverse pointer of device (priv->dev)
is invalid. Instead, priv has the pointer to shared data of the device,
  struct rte_eth_dev_data *dev_data;

Two macros are added,
  #define PORT_ID(priv) ((priv)->dev_data->port_id)
  #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Raslan Darawsheh
690de2850b net/mlx5: fix resource leak in case of error
If something went wrong in mlx5_pci_prob the allocated eth dev
will cause a memory leak.

This commit release the eth dev that was previously allocated.

Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:50 +01:00
Raslan Darawsheh
e9f4166014 net/mlx5: fix double free on error handling
When attr_ctx is NULL it will attempt to free the list of devices twice.
Avoid double freeing the list by directly going to error handling.

Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:49 +01:00
Xueming Li
5afda2c6ac net/mlx5: fix SW parsing feature detection
Fixes: 5f8ba81c42 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:49 +01:00