Commit Graph

769 Commits

Author SHA1 Message Date
Shahaf Shuler
ed4c7fd92a net/mlx5: fix flow count action for shared counter
According to commit fb8fd96d42 ("ethdev: add shared counter to flow
API") the counter id should be taken into account only when the shared
flag is set.

Fixes: 60bd8c9747 ("net/mlx5: add count flow action")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2018-08-02 12:34:17 +02:00
Yaroslav Brustinov
2854ba22c5 net/mlx5: fix linkage of glue lib with gcc 4.7.2
addressing a gcc 4.7.2 bug that cannot be reproduced with latter
versions:

"bin/ld: Warning: alignment 8 of symbol `mlx5_glue' in
src/dpdk/drivers/net/mlx5/mlx5_glue.c.21.o is smaller than 16 in
src/dpdk/drivers/net/mlx5/mlx5_rxq.c.21.o"

Fix it be forcing the alignment of the glue lib.

Fixes: 0e83b8e536 ("net/mlx5: move rdma-core calls to separate file")
Cc: stable@dpdk.org

Signed-off-by: Yaroslav Brustinov <ybrustin@cisco.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
3f8cb05df5 net/mlx5: fix invalid network interface index
Network interface indices being unsigned, an invalid index or error is
normally expressed through a zero value (see if_nametoindex()).

mlx5_ifindex() has a signed return type for negative values in case of
error. Since mlx5_nl.c does not check for errors, these may be fed back as
invalid interfaces indices to subsequent system calls. This usage would
have been correct if mlx5_ifindex() returned a zero value instead.

This patch makes mlx5_ifindex() unsigned for convenience.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
919d53ad78 net/mlx5: fix count query when flow has not counter
Querying a counters on a flow without counter is ending with a
segmentation fault.

Fixes: 60bd8c9747 ("net/mlx5: add count flow action")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Yongseok Koh
24f653a7e8 net/mlx5: fix queue rollback when starting device
mlx5_rxq_start() and mlx5_rxq_stop() must be strictly paired because
internal reference counter is increased or decreased inside. Also,
mlx5_rxq_get() must be paired with mlx5_rxq_release().

Fixes: 7d6bf6b866 ("net/mlx5: add Multi-Packet Rx support")
Fixes: a1366b1a2b ("net/mlx5: add reference counter on DPDK Rx queues")
Fixes: 6e78005a9b ("net/mlx5: add reference counter on DPDK Tx queues")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Yongseok Koh
c20d4a70ca net/mlx5: fix endless loop when clearing flow flags
If one of (*priv->rxqs)[] is null, the for loop can iterate infinitely as
idx can't be increased.

Fixes: cd24d52639 ("net/mlx5: add mark/flag flow action")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
5366074b01 net/mlx5: fix route Netlink message overflow
Route Netlink message socket is wrongly initialized by registering to
the route link group.  This causes the socket to receive all link
message related to routes whereas the PMD do not expect to receive such
information.  In some situation it ends by filling the socket at a point
that any new message cannot be exchanged.
As the PMD is not expected to process such broadcast messages, the
parameter in the nl_group in the function is also remove.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-26 14:05:52 +02:00
Yongseok Koh
c618e7e82b net/mlx5: fix assert for Tx completion queue count
There should be at least one Tx CQE remained if Tx WQ and txq->elts[] have
available slots to send a packet because the size of Tx CQ is exactly
calculated from the size of other resources. As it is guaranteed, it is
checked by an assertion.

max_elts is checked after the assertion for Tx CQ. If no slot is available
in txq->elts[], the assertion would be wrong.

Fixes: 2eefbec531 ("net/mlx5: add missing sanity checks for Tx completion queue")
Fixes: 6ce84bd889 ("net/mlx5: add enhanced multi-packet send for ConnectX-5")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
f872b4b99d net/mlx5: fix representors detection
On systems where the required Netlink commands are not supported but
Mellanox OFED is installed, representors information must be retrieved
through sysfs.

Fixes: 26c08b979d ("net/mlx5: add port representor awareness")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
2bc98393ac net/mlx5: fix TCI mask filter
In mlx5_traffic_enable() the TCI mask for the VLAN is wrong causing the
sub flow engine to reject the rule.

Fixes: 272733b5eb ("net/mlx5: use flow to enable unicast traffic")
Cc: stable@dpdk.org

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2018-07-26 14:05:52 +02:00
Moti Haimovsky
79d0989213 net/mlx5: fix build with old kernels
This commit fixes compilation errors due to missing definitions
found when compiling mlx5 PMD from DPDK 17.11-LTS on Ubuntu 12.4
with kernel 3.15.

Fixes: 75ef62a943 ("net/mlx5: fix link speed capability information")
Fixes: 5bfc9fc112 ("net/mlx5: use static assert for compile-time sanity checks")
Cc: stable@dpdk.org

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
21174f2a5c net/mlx5: add port ID pattern item to switch flow rules
This enables flow rules to match traffic coming from a different DPDK port
ID associated with the device (PORT_ID pattern item), mainly for the
convenience of applications that want to deal with a single port ID for all
flow rules associated with some physical device.

Testpmd example:

- Creating a flow rule on port ID 1 to consume all traffic from port ID 0
  and direct it to port ID 2:

  flow create 1 ingress transfer pattern port_id id is 0 / end actions
     port_id id 2 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
7ac6778d50 net/mlx5: add VLAN item and actions to switch flow rules
This enables flow rules to explicitly match VLAN traffic (VLAN pattern
item) and perform various operations on VLAN headers at the switch level
(OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID and OF_SET_VLAN_PCP actions).

Testpmd examples:

- Directing all VLAN traffic received on port ID 1 to port ID 0:

  flow create 1 ingress transfer pattern eth / vlan / end actions
     port_id id 0 / end

- Adding a VLAN header to IPv6 traffic received on port ID 1 and directing
  it to port ID 0:

  flow create 1 ingress transfer pattern eth / ipv6 / end actions
     of_push_vlan ethertype 0x8100 / of_set_vlan_vid vlan_vid 42 /
     port_id id 0 / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
2bfc777e07 net/mlx5: add L2-L4 pattern items to switch flow rules
This enables flow rules to explicitly match supported combinations of
Ethernet, IPv4, IPv6, TCP and UDP headers at the switch level.

Testpmd example:

- Dropping TCPv4 traffic with a specific destination on port ID 2:

  flow create 2 ingress transfer pattern eth / ipv4 / tcp dst is 42 / end
     actions drop / end

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
9b33df8e0c net/mlx5: add fate actions to switch flow rules
This patch enables creation of rte_flow rules that direct matching traffic
to a different port (e.g. another VF representor) or drop it directly at
the switch level (PORT_ID and DROP actions).

Testpmd examples:

- Directing all traffic to port ID 0:

  flow create 1 ingress transfer pattern end actions port_id id 0 / end

- Dropping all traffic normally received by port ID 1:

  flow create 1 ingress transfer pattern end actions drop / end

Note the presence of the transfer attribute, which requests them to be
applied at the switch level. All traffic is matched due to empty pattern.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
8f9059ccee net/mlx5: add framework for switch flow rules
Because mlx5 switch flow rules are configured through Netlink (TC
interface) and have little in common with Verbs, this patch adds a separate
parser function to handle them.

- mlx5_nl_flow_transpose() converts a rte_flow rule to its TC equivalent
  and stores the result in a buffer.

- mlx5_nl_flow_brand() gives a unique handle to a flow rule buffer.

- mlx5_nl_flow_create() instantiates a flow rule on the device based on
  such a buffer.

- mlx5_nl_flow_destroy() performs the reverse operation.

These functions are called by the existing implementation when encountering
flow rules which must be offloaded to the switch (currently relying on the
transfer attribute).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
20b71e92ef net/mlx5: lay groundwork for switch offloads
With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Moti Haimovsky
6bf10ab69b net/mlx5: support 32-bit systems
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
 of an on-going write of a DoorBell over a given UAR page.

The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 14:34:59 +02:00
Shahaf Shuler
06b1fe3f6d net/mlx5: fix build with rdma-core v19
The flow counter support introduced by
commit 9a761de8ea ("net/mlx5: flow counter support") was intend to
work only with MLNX_OFED_4.3 as the upstream rdma-core
libraries were lack such support.

On rdma-core v19 the support for the flow counters was added but with
different user APIs, hence causing compilation issues on the PMD.

This patch fix the compilation errors by forcing the flow counters
to be enabled only with MLNX_OFED APIs.
Once MLNX_OFED and rdma-core APIs will be aligned, a proper patch to
support the new API will be submitted.

Fixes: 9a761de8ea ("net/mlx5: flow counter support")
Cc: stable@dpdk.org

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2018-07-12 12:53:59 +02:00
Nelio Laranjeiro
60bd8c9747 net/mlx5: add count flow action
This is only supported by Mellanox OFED.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:27 +02:00
Nelio Laranjeiro
a4a5cd21d2 net/mlx5: add flow MPLS item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:26 +02:00
Nelio Laranjeiro
f4b901a46a net/mlx5: add flow GRE item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:26 +02:00
Nelio Laranjeiro
77182481c5 net/mlx5: add flow VXLAN-GPE item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:25 +02:00
Nelio Laranjeiro
f4f06e3615 net/mlx5: add flow VXLAN item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:24 +02:00
Nelio Laranjeiro
fd0b70316b net/mlx5: support inner RSS computation
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:12:18 +02:00
Nelio Laranjeiro
df6afd377a net/mlx5: remove useless arguments in hrxq API
RSS level is necessary to had a bit in the hash_fields which is already
provided in this API, for the tunnel, it is necessary to request such
queue to compute the checksum on the inner most, this last one should
always be activated.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:04 +02:00
Nelio Laranjeiro
592f05b29a net/mlx5: add RSS flow action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:04 +02:00
Nelio Laranjeiro
c388a2f6d7 net/mlx5: use a macro for the RSS key size
ConnectX 4-5 support only 40 bytes of RSS key, using a compiled size
hash key is not necessary.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
cd24d52639 net/mlx5: add mark/flag flow action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
89464c8e89 net/mlx5: add flow TCP item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
535f686e54 net/mlx5: add flow UDP item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:03 +02:00
Nelio Laranjeiro
62b2c4d925 net/mlx5: add flow IPv6 item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
4899185ff9 net/mlx5: add flow IPv4 item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
109723ed9b net/mlx5: add flow VLAN item
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
5747b170b3 net/mlx5: add flow stop/start
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
9944ab8a13 net/mlx5: add flow queue action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:02 +02:00
Nelio Laranjeiro
af689f1f04 net/mlx5: support flow Ethernet item along with drop action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
2815702bae net/mlx5: replace verbs priorities by flow
Previous work introduce verbs priorities, whereas the PMD is making
translation between Flow priority into Verbs.  Rename this to make more
sense on what the PMD has to translate.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
78be885295 net/mlx5: handle drop queues as regular queues
Drop queues are essentially used in flows due to Verbs API, the
information if the fate of the flow is a drop or not is already present
in the flow.  Due to this, drop queues can be fully mapped on regular
queues.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Nelio Laranjeiro
b42c000e37 net/mlx5: remove flow support
This start a series to re-work the flow engine in mlx5 to easily support
flow conversion to Verbs or TC.  This is necessary to handle both regular
flows and representors flows.

As the full file needs to be clean-up to re-write all items/actions
processing, this patch starts to disable the regular code and only let the
PMD to start in isolated mode.

After this patch flow API will not be usable.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00
Adrien Mazarguil
6de569f5ec net/mlx5: add parameter for port representors
Prior to this patch, all port representors detected on a given device were
probed and Ethernet devices instantiated for each of them.

This patch adds support for the standard "representor" parameter, which
implies that port representors are not probed by default anymore, except
for the list provided through device arguments.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:29 +02:00
Adrien Mazarguil
116f90ad7e net/mlx5: probe port representors in natural order
Port representors are probed in whatever unspecified order
ibv_get_device_list() returns them.

This is counterintuitive to users since DPDK port IDs assignment almost
never follows the same sequence as representor IDs. Additionally, the
master device does not necessarily inherit the lowest DPDK port ID.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:37:26 +02:00
Adrien Mazarguil
2b73026388 net/mlx5: probe all port representors
Probe existing port representors in addition to their master device and
associate them automatically.

To avoid collision between Ethernet devices, they are named as follows:

- "{DBDF}" for master/switch devices.
- "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
  representors.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:19 +02:00
Adrien Mazarguil
26c08b979d net/mlx5: add port representor awareness
The current PCI probing method is not aware of Verbs port representors,
which appear as standard Verbs devices bound to the same PCI address and
cannot be distinguished.

Problem is that more often than not, the wrong Verbs device is used,
resulting in unexpected traffic.

This patch makes the driver discard representors to only use the master
device. If unable to identify it (e.g. kernel drivers not recent enough),
either:

- There is only one matching device which isn't identified as a
  representor, in that case use it.
- Otherwise log an error and do not probe the device.

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:14 +02:00
Adrien Mazarguil
681289345e net/mlx5: re-indent generic probing function
Since commit "net/mlx5: drop useless support for several Verbs ports"
removed an inner loop, mlx5_dev_spawn() is left with an unnecessary indent
level.

This patch eliminates a block, moves its local variables to function scope,
and re-indents its contents (diff best viewed with --ignore-all-space).

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:10 +02:00
Adrien Mazarguil
f38c54571d net/mlx5: split PCI from generic probing
All the generic probing code needs is an IB device. While this device is
currently supplied by a PCI lookup, other methods will be added soon.

This patch divides the original function, which has become huge over time,
as follows:

1. PCI-specific (mlx5_pci_probe()).
2. Verbs device (mlx5_dev_spawn()).

(Patch based on prior work from Yuanhan Liu)

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:37:03 +02:00
Adrien Mazarguil
9083982ce7 net/mlx5: drop useless support for several Verbs ports
Unlike mlx4 from which this capability was inherited, mlx5 devices expose
exactly one Verbs port per PCI bus address. Each physical port gets
assigned its own bus address with a single Verbs port.

While harmless, this code requires an extra loop that would get in the way
of subsequent refactoring.

No functional impact.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-11 15:36:55 +02:00
Adrien Mazarguil
3ff4b0866f net/mlx5: remove redundant objects in probe function
This patch gets rid of redundant calls to open the device and query its
attributes in order to simplify the code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:52 +02:00
Adrien Mazarguil
6057a10b3b net/mlx5: rename confusing object in probe function
There are several attribute objects in this function:

- IB device attributes (struct ibv_device_attr_ex device_attr).
- Direct Verbs attributes (struct mlx5dv_context attrs_out).
- Port attributes (struct ibv_port_attr).
- IB device attributes again (struct ibv_device_attr_ex device_attr_ex).

"attrs_out" is both odd and initialized using a nonstandard syntax. Rename
it "dv_attr" for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Xueming Li <xuemingl@mellanox.com>
2018-07-11 15:36:46 +02:00
Thomas Monjalon
f8e9989606 remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-12 00:00:35 +02:00
Matan Azrad
1ff30d182c net/mlx5: activate Verbs cleanup on removal
Starting from rdma-core v19, Mellanox OFED 4.4, the Verbs resources
cleanup is properly activated in plug-out process when setting the
MLX5_DEVICE_FATAL_CLEANUP environment variable to 1.

Set the aforementioned variable to 1.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-04 16:48:53 +02:00
Ferruh Yigit
70815c9eca ethdev: add new offload flag to keep CRC
DEV_RX_OFFLOAD_KEEP_CRC offload flag is added. PMDs that support
keeping CRC should advertise this offload capability.

DEV_RX_OFFLOAD_CRC_STRIP flag will remain one more release
default behavior in PMDs are to keep the CRC until this flag removed

Until DEV_RX_OFFLOAD_CRC_STRIP flag is removed:
- Setting both KEEP_CRC & CRC_STRIP is INVALID
- Setting only CRC_STRIP PMD should strip the CRC
- Setting only KEEP_CRC PMD should keep the CRC
- Not setting both PMD should keep the CRC

A helper function rte_eth_dev_is_keep_crc() has been added to be able to
change the no flag behavior with minimal changes in PMDs.

The PMDs that doesn't report the DEV_RX_OFFLOAD_KEEP_CRC offload can
remove rte_eth_dev_is_keep_crc() checks next release, related code
commented to help the maintenance task.

And DEV_RX_OFFLOAD_CRC_STRIP has been added to virtual drivers since
they don't use CRC at all, when an application requires this offload
virtual PMDs should not return error.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Allain Legacy <allain.legacy@windriver.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-03 01:35:58 +02:00
Adrien Mazarguil
f264c7980c net/mlx5: fix invalid error check
Since its return type is unsigned, if_nametoindex() returns 0 in case of
error, never -1.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
f9aaa6ac44 net/mlx5: increase number of strides
If WQE ID is used in CQE for Multi-Packet RQ, the ratio of CQE compression
drops a little bit.  In order to reach to 100Gbps with 64B traffic, it is
needed to further save PCIe bandwidth by increasing the number of strides
in a WQE. It is now 64 by default but adjustable by a PMD parameter -
mprq_log_stride_num.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
1787eb7b4d net/mlx5: use stride index in Rx completion entry
Multi-Packet Receive Queue is to receive multiple packets on a single large
buffer. The number of consumed strides in CQE is accumulated to keep track
of the current stride index. However, it is safer to directly use stride
index in CQE to avoid out-of-order situation which can possibly be caused
by introducing LRO in the future.

If Rx CQE compression is enabled, HW can be configured to store the stride
index in a mini-CQE but this will need newer version of library/driver.
Therefore, since this change, MPRQ is only supported with the newer
library/driver and Rx hash result is not supported if MPRQ is enabled along
with Rx CQE compression.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
5c0e2db619 net/mlx5: add warning message for Multi-Packet RQ
If Multi-Packet RQ is enabled but not supported by device or
kernel/library, print out a warning message.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
a49b7c75de net/mlx5: add new fields in Rx completion entry
Stride index is added to mlx5_mini_cqe8 structure and WQE ID is added to
mlx5_cqe structure.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
2e633f1f6d net/mlx5: change return value of Rx completion poll
mlx5_rx_poll_len() returns Rx hash result extracted from either mini CQE or
regular CQE. As mini CQE may not have the hash result if configured
otherwise, it shouldn't assume the first DWORD of mini CQE is always hash
result. mlx5_rx_poll_len() is changed to return pointer to the mini CQE if
compressed.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
e10245a13b net/mlx5: fix Rx buffer replenishment threshold
The threshold of buffer replenishment for vectorized Rx burst is a constant
value (64). If the size of Rx queue is comparatively small, device could
run out of buffers. For example, if the size of Rx queue is 128, buffers
are replenished only twice per a wraparound. This can cause jitter in
receiving packets and the jitter can cause unnecessary retransmission for
TCP connections.

Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86")
Fixes: 570acdb1da ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-03 01:35:58 +02:00
Shahaf Shuler
e46821e9fc net/mlx5: separate generic tunnel TSO from the standard one
The generic tunnel TSO was depended in the regular one capabilities to
be enabled.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:58 +02:00
Yongseok Koh
5cffc8b28d net/mlx5: fix error number handling
rte_errno should be saved only if error has occurred because rte_errno
could have garbage value.

Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:57 +02:00
Nelio Laranjeiro
c44fbc7cc2 net/mlx5: clean-up developer logs
Split maintainers logs from user logs.

A lot of debug logs are present providing internal information on how
the PMD works to users.  Such logs should not be available for them and
thus should remain available only when the PMD is compiled in debug
mode.

This commits removes some useless debug logs, move the Maintainers ones
under DEBUG and also move dump into debug mode only.

Cc: stable@dpdk.org

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-07-03 01:35:57 +02:00
Stephen Hemminger
3d96644aa3 net/mlx5: fix log initialization
The mlx5 driver had two init functions, but this could
cause log initialization to be done after the
other initialization. Also, the name of the function does
not match convention (cut/paste error?).

Fix by initializing log type first at start of the pmd_init.
This also gets rid of having two constructor functions.

Fixes: a170a30d22 ("net/mlx5: use dynamic logging")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-06-17 10:17:53 +02:00
Xueming Li
a9fc0b0ef0 net/mlx5: fix crash in device probe
This patch initializes counter descriptor struct before invoking Verbs
api to avoid segmentation fault.

Fixes: 9a761de8ea ("net/mlx5: flow counter support")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
93068a9d5a net/mlx5: fix error message in probe function
Error values passed to strerror() must be positive.

Fixes: 012ad9944d ("net/mlx5: fix probe return value polarity")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c6ce7e34ad net/mlx5: fix missing errno in probe function
Fixes: b43802b4bd ("net/mlx5: support 16 hardware priorities")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
8c3c2372ed net/mlx5: fix errno object in probe function
Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
Adrien Mazarguil
c93adccc97 net/mlx5: remove limitation on number of instances
This artificial limitation was inherited from the mlx4 code base and has no
purpose other than adding unnecessary noise.

This patch is a port of commit f2318196c7 ("net/mlx4: remove limitation
on number of instances").

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-06-17 10:04:48 +02:00
David Marchand
44b1d513d5 net/mlx5: register memory callback only when probing
The callback should be invoked only for memory that has been registered
in a device, hence, no need to track cleanup events if no device is
present.

Bugzilla ID: 56
Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-05-30 21:16:43 +02:00
Xueming Li
0ace586dee net/mlx5: fix memory region cache init
MR cache init takes place on the device configuration.
When the device is re-configured multiple times, for example when
changing the number of queue on the flight, deadlock can happen.

This patch moved MR cache init from device configuration function to
probe function to make sure init only once.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 16:28:43 +02:00
Adrien Mazarguil
e89c15b697 net/mlx5: fix crash when configure is not called
Although uncommon, applications may destroy a device immediately after
probing it without going through dev_configure() first.

This patch addresses a crash which occurs when mlx5_dev_close() calls
mlx5_mr_release() due to an uninitialized entry in the private structure.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-28 07:50:38 +02:00
Shahaf Shuler
1aa88b5bd9 net/mlx5: fix generic tunnel offload compatibility check
On some distros, the inbox rdma-core tree can contain the Software
Parser enum while the remaining structs still missing.

Fixes: 5f8ba81c42 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-25 17:07:40 +02:00
Yongseok Koh
52056a99c6 net/mlx5: fix SW parser offset
This is to fix the offloads introduced by commits
5f8ba81 net/mlx5: support generic tunnel offloading
5355f44 ethdev: introduce generic IP/UDP tunnel checksum and TSO

Fixes: 8589e944d0 ("net/mlx5: fix setting offsets for SW parser")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-25 17:07:40 +02:00
Yongseok Koh
8f6d9e13a9 net/mlx5: remove redundant checks
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
2018-05-23 00:35:01 +02:00
Yongseok Koh
8589e944d0 net/mlx5: fix setting offsets for SW parser
Since ConnectX-5, SW parser just complements HW parser. SW parser starts to
engage only if HW parser can't reach a header. For the older devices, HW
parser will not kick in if any of SWP offsets is set. Therefore, all of the
L3 offsets should be set regardless of HW offload. As IPv6 doesn't have
header checksum, the mbuf can't have PKT_TX_[OUTER_]IP_CKSUM if outer or
inner L3 is IPv6.

And if inner packet isn't IP, the inner offsets shouldn't be set.

Fixes: 5f8ba81c42 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
2018-05-23 00:35:01 +02:00
David Marchand
5fd4d04969 net/mlx5: fix count in xstats
With the commit af4f09f282 ("net/mlx5: prefix all functions with mlx5"),
mlx5_xstats_get() is not compliant any longer with the api.
It always returns the caller max entries count while it should return how
many entries it wrote/wanted to write.

Fixes: af4f09f282 ("net/mlx5: prefix all functions with mlx5")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-23 00:35:01 +02:00
Gavin Hu
74572f23cd net/mlx5: fix build with clang on ARM
This patch adds a pair of "()" to embrace the argument
input to the function-like macro invocation.

drivers/net/mlx5/mlx5_rxtx_vec.c:37:
drivers/net/mlx5/mlx5_rxtx_vec_neon.h:170:24: error: too many arguments
provided to function-like macro invocation
	(uint16x8_t) { 0, 0, cs_flags, rte_cpu_to_be_16(len),

Fixes: 570acdb1da ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Sirshak Das <sirshak.das@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-21 00:55:39 +02:00
Shahaf Shuler
607fc8e4a9 net/mlx5: fix default RSS level
Using inner RSS by default for GRE leads to memory corruption as the
extra flow items added for the inner RSS are not counted in the flow
attributes buffer size.

Fixing by enforcing the default RSS level to be outer. This much
simplify the flow engine and more robust.
Future optimization for out of the box RSS can be done on subsequent
commits.

Fixes: d4a405186b ("net/mlx5: support tunnel RSS level")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 19:06:29 +02:00
Shahaf Shuler
34511c25d5 net/mlx5: fix build without tunnel RSS support
IBV_RX_HASH_INNER should be referenced only when having tunnel support
in the Verbs headers.

Fixes: 80f2d0ed7f ("net/mlx5: add hardware flow debug dump")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-17 12:31:42 +02:00
Matan Azrad
1f106da2bf net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
Add support for MPLS over GRE and MPLS over UDP tunnel types as
described in the next RFCs:
1. https://tools.ietf.org/html/rfc4023
2. https://tools.ietf.org/html/rfc7510
3. https://tools.ietf.org/html/rfc4385

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
dd3331c6f1 net/mlx5: add Bluefield device id
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-17 12:31:42 +02:00
Shahaf Shuler
8fe576ad17 net/mlx5: fix flow director drop rule deletion crash
Drop flow rules are created on the ETH queue even though the parser layer
matches the flow rule layer (L3/L4)

Fixes: 6f2f4948b2 ("net/mlx5: fix flow director rule deletion crash")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-05-17 12:31:42 +02:00
Andy Green
f11a4a7d8a net/mlx5: fix uninitialized variable in probing
Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")

Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-15 22:29:22 +02:00
Yongseok Koh
c9ec2192ff net/mlx5: use correct field in a union structure
This is not a bug but it is better to use semantically correct field.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:23 +01:00
Yongseok Koh
0cfdc1808d net/mlx5: use coherent I/O memory barrier
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:22 +01:00
Yongseok Koh
5f44cfd011 net/mlx5: fix inlining segmented TSO packet
When a multi-segmented packet is inlined, data can be further inlined even
after the first segment. In case of TSO packet, extra inline data after TSO
header should be carried by an inline DSEG which has 4B inline header
recording the length of the inline data. If more than one segment is
inlined, the length doesn't count from the second segment. This will cause
a fault in HW and CQE will have an error, which is ignored by PMD.

Fixes: f895536be4 ("net/mlx5: enable inlining data from multiple segments")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:32:22 +01:00
Thomas Monjalon
fbe90cdd77 ethdev: add probing finish function
A new hook function is added and called inside the PMDs at the end
of the device probing:
	- in primary process, after allocating, init and config
	- in secondary process, after attaching and local init

This new function is almost empty for now.
It will be used later to add some post-initialization processing.

For the PMDs calling the helpers rte_eth_dev_create() or
rte_eth_dev_pci_generic_probe(), the hook rte_eth_dev_probing_finish()
is called from here, and not in the PMD itself.

Note that the helper rte_eth_dev_create() could be used more,
especially for vdevs, avoiding some code duplication in PMDs.

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-05-14 22:31:53 +01:00
Yongseok Koh
7d6bf6b866 net/mlx5: add Multi-Packet Rx support
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
18bee13096 net/mlx5: add a function to rdma-core glue
mlx5dv_create_wq() is added for the Multi-Packet RQ (a.k.a Striding RQ).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
3e1f82a1f1 net/mlx5: separate filling Rx flags
Filling in fields of mbuf becomes a separate inline function so that this
can be reused.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:52 +01:00
Yongseok Koh
974f1e7ef1 net/mlx5: add new memory region support
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
d561b5dc13 net/mlx5: remove memory region support
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Wei Dai
a4996bd89c ethdev: new Rx/Tx offloads API
This patch check if a input requested offloading is valid or not.
Any reuqested offloading must be supported in the device capabilities.
Any offloading is disabled by default if it is not set in the parameter
dev_conf->[rt]xmode.offloads to rte_eth_dev_configure() and
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If any offloading is enabled in rte_eth_dev_configure() by application,
it is enabled on all queues no matter whether it is per-queue or
per-port type and no matter whether it is set or cleared in
[rt]x_conf->offloads to rte_eth_[rt]x_queue_setup().
If a per-queue offloading hasn't be enabled in rte_eth_dev_configure(),
it can be enabled or disabled for individual queue in
ret_eth_[rt]x_queue_setup().
A new added offloading is the one which hasn't been enabled in
rte_eth_dev_configure() and is reuqested to be enabled in
rte_eth_[rt]x_queue_setup(), it must be per-queue type,
otherwise trigger an error log.
The underlying PMD must be aware that the requested offloadings
to PMD specific queue_setup() function only carries those
new added offloadings of per-queue type.

This patch can make above such checking in a common way in rte_ethdev
layer to avoid same checking in underlying PMD.

This patch assumes that all PMDs in 18.05-rc2 have already
converted to offload API defined in 17.11 . It also assumes
that all PMDs can return correct offloading capabilities
in rte_eth_dev_infos_get().

In the beginning of [rt]x_queue_setup() of underlying PMD,
add offloads = [rt]xconf->offloads |
dev->data->dev_conf.[rt]xmode.offloads; to keep same as offload API
defined in 17.11 to avoid upper application broken due to offload
API change.
PMD can use the info that input [rt]xconf->offloads only carry
the new added per-queue offloads to do some optimization or some
code change on base of this patch.

Signed-off-by: Wei Dai <wei.dai@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
df428ceef4 net/mlx5: change device reference for secondary process
rte_eth_devices[] is not shared between primary and secondary process, but
a static array to each process. The reverse pointer of device (priv->dev)
is invalid. Instead, priv has the pointer to shared data of the device,
  struct rte_eth_dev_data *dev_data;

Two macros are added,
  #define PORT_ID(priv) ((priv)->dev_data->port_id)
  #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:51 +01:00
Yongseok Koh
95d7e115be net/mlx5: fix calculation of Tx TSO inline room size
rdma-core doesn't add up max_tso_header size to max_inline_data size. The
library takes bigger value between the two.

Fixes: 43e9d9794c ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-05-14 22:31:50 +01:00
Raslan Darawsheh
690de2850b net/mlx5: fix resource leak in case of error
If something went wrong in mlx5_pci_prob the allocated eth dev
will cause a memory leak.

This commit release the eth dev that was previously allocated.

Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:50 +01:00
Raslan Darawsheh
e9f4166014 net/mlx5: fix double free on error handling
When attr_ctx is NULL it will attempt to free the list of devices twice.
Avoid double freeing the list by directly going to error handling.

Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:49 +01:00
Xueming Li
32d4246c90 net/mlx5: fix SW parser enabling
Fixes: 5f8ba81c42 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:49 +01:00
Xueming Li
5afda2c6ac net/mlx5: fix SW parsing feature detection
Fixes: 5f8ba81c42 ("net/mlx5: support generic tunnel offloading")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-05-14 22:31:49 +01:00
Nélio Laranjeiro
b7a7c97a40 net/mlx5: fix flow validation
Item spec and last are wrongly compared to the NIC capability causing a
validation failure when the mask is null.
This validation function should only verify the user is not configuring
unsupported matching fields.

Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14 22:31:48 +01:00
Shahaf Shuler
012ad9944d net/mlx5: fix probe return value polarity
mlx5 prefixed function returns a negative errno value.
the error handler on mlx5_pci_probe is doing the same.

Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14 22:31:48 +01:00
Shahaf Shuler
eac9cd58de net/mlx5: fix socket connection return value
Upon success, mlx5_socket_connect should return the fd descriptor of the
primary process

Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14 22:31:48 +01:00
Shahaf Shuler
d11d651f6d net/mlx5: add Rx and Tx tuning parameters
A new ethdev API was exposed by
commit 3be82f5cc5 ("ethdev: support PMD-tuned Tx/Rx parameters")

Enabling the PMD to provide default parameters in case no strict request
from application in order to improve the out of the box experience.

While the current API lacks the means for the PMD to provide the best
possible value, providing the best default the PMD can guess.
The values are based on Mellanox performance report and depends on the
underlying NIC capabilities.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14 22:31:48 +01:00
Shahaf Shuler
7d2e32f76c net/mlx5: fix ethtool link setting call order
According to ethtool_link_setting API recommendation ETHTOOL_GLINKSETTINGS
should be called before ETHTOOL_GSET as the later one deprecated.

Fixes: f47ba80080 ("net/mlx5: remove kernel version check")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-14 22:31:48 +01:00
Adrien Mazarguil
6f2f4948b2 net/mlx5: fix flow director rule deletion crash
Flow director rules matching traffic properties above layer 2 do not
target a fixed hash Rx queue (HASH_RXQ_ETH), it actually depends on the
highest protocol layer specified by each flow rule.

mlx5_fdir_filter_delete() makes this wrong assumption and causes a crash
when attempting to destroy flow rules with L3/L4 specifications.

Fixes: 4c3e9bcdd5 ("net/mlx5: support flow director")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-05-02 19:28:48 +02:00
Declan Doherty
fb8fd96d42 ethdev: add shared counter to flow API
Add rte_flow_action_count action data structure to enable shared
counters across multiple flows on a single port or across multiple
flows on multiple ports within the same switch domain. Also this enables
multiple count actions to be specified in a single flow action.

This patch also modifies the existing rte_flow_query API to take the
rte_flow_action structure as an input parameter instead of the
rte_flow_action_type enumeration to allow querying a specific action
from a flow rule when multiple actions of the same type are specified.

This patch also contains updates for the bonding, failsafe and mlx5 PMDs
and testpmd application which are affected by this API change.

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
2018-04-27 18:00:57 +01:00
Xueming Li
bd315baecf net/mlx5: allow flow tunnel ID 0 with outer pattern
Tunnel w/o tunnel id pattern could match any non-tunneled packet,
this patch allowed tunnel w/o tunnel id pattern after proper outer spec.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
05dda761bd net/mlx5: introduce VXLAN-GPE tunnel type
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
80f2d0ed7f net/mlx5: add hardware flow debug dump
Dump verb flow detail including flow spec type and size for debugging
purpose.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
d4a405186b net/mlx5: support tunnel RSS level
Tunnel RSS level of flow RSS action offers user a choice to do RSS hash
calculation on inner or outer RSS fields. Testpmd flow command examples:

GRE flow inner RSS:
  flow create 0 ingress pattern eth / ipv4 proto is 47 / gre / end
actions rss queues 1 2 end level 1 / end

GRE tunnel flow outer RSS:
  flow create 0 ingress pattern eth  / ipv4 proto is 47 / gre / end
actions rss queues 1 2 end level 0 / end

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
8486125353 net/mlx5: split flow RSS handling logic
This patch split out flow RSS hash field handling logic to dedicate
function.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
6ba07449ed net/mlx5: cleanup tunnel checksum offloads
Once tunnel packet type(RTE_PTYPE_TUNNEL_xxx) identified,
PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result of
inner headers, outer L3 and L4 header checksum are always valid as soon
as tunnel identified. If no tunnel identified, PKT_RX_IP_CKSUM_XXX and
PKT_RX_L4_CKSUM_XXX represent checksum result of outer L3 and L4
headers.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
3cc08bc6dd net/mlx5: support Rx tunnel type identification
This patch introduced tunnel type identification based on flow rules.
If flows of multiple tunnel types built on same queue, no tunnel type
will be returned. User application could use bits in flow mark as tunnel
type identifier.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
78a54648ff net/mlx5: support L3 VXLAN flow
This patch support L3 VXLAN, no inner L2 header comparing to standard
VXLAN protocol. L3 VXLAN using specific overlay UDP destination port to
discriminate against standard VXLAN, device parameter and FW has to be
configured to support it:
  sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_EN=1
  sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_PORT=<port>

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
96c6c65a10 net/mlx5: support GRE tunnel flow
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Xueming Li
b43802b4bd net/mlx5: support 16 hardware priorities
This patch supports new 16 Verbs flow priorities by trying to create a
simple flow of priority 15. If 16 priorities not available, fallback to
traditional 8 priorities.

Verb priority mapping:
			8 priorities	>=16 priorities
Control flow:		4-7		8-15
User normal flow:	1-3		4-7
User tunnel flow:	0-2		0-3

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:56 +01:00
Adrien Mazarguil
76e9a55b5b ethdev: add transfer attribute to flow API
This new attribute enables applications to create flow rules that do not
simply match traffic whose origin is specified in the pattern (e.g. some
non-default physical port or VF), but actively affect it by applying the
flow rule at the lowest possible level in the underlying device.

It breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_validate()

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-27 18:00:54 +01:00
Adrien Mazarguil
e58638c324 ethdev: fix TPID handling in flow API
TPID handling in rte_flow VLAN and E_TAG pattern item definitions is not
consistent with the normal stacking order of pattern items, which is
confusing to applications.

Problem is that when followed by one of these layers, the EtherType field
of the preceding layer keeps its "inner" definition, and the "outer" TPID
is provided by the subsequent layer, the reverse of how a packet looks like
on the wire:

 Wire:     [ ETH TPID = A | VLAN EtherType = B | B DATA ]
 rte_flow: [ ETH EtherType = B | VLAN TPID = A | B DATA ]

Worse, when QinQ is involved, the stacking order of VLAN layers is
unspecified. It is unclear whether it should be reversed (innermost to
outermost) as well given TPID applies to the previous layer:

 Wire:       [ ETH TPID = A | VLAN TPID = B | VLAN EtherType = C | C DATA ]
 rte_flow 1: [ ETH EtherType = C | VLAN TPID = B | VLAN TPID = A | C DATA ]
 rte_flow 2: [ ETH EtherType = C | VLAN TPID = A | VLAN TPID = B | C DATA ]

While specifying EtherType/TPID is hopefully rarely necessary, the stacking
order in case of QinQ and the lack of documentation remain an issue.

This patch replaces TPID in the VLAN pattern item with an inner
EtherType/TPID as is usually done everywhere else (e.g. struct vlan_hdr),
clarifies documentation and updates all relevant code.

It breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()

Summary of changes for PMDs that implement ETH, VLAN or E_TAG pattern
items:

- bnxt: EtherType matching is supported with and without VLAN, but TPID
  matching is not and triggers an error.

- e1000: EtherType matching is only supported with the ETHERTYPE filter,
  which does not support VLAN matching, therefore no impact.

- enic: same as bnxt.

- i40e: same as bnxt with existing FDIR limitations on allowed EtherType
  values. The remaining filter types (VXLAN, NVGRE, QINQ) do not support
  EtherType matching.

- ixgbe: same as e1000, with additional minor change to rely on the new
  E-Tag macro definition.

- mlx4: EtherType/TPID matching is not supported, no impact.

- mlx5: same as bnxt.

- mvpp2: same as bnxt.

- sfc: same as bnxt.

- tap: same as bnxt.

Fixes: b1a4b4cbc0 ("ethdev: introduce generic flow API")
Fixes: 99e7003831 ("net/ixgbe: parse L2 tunnel filter")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-27 18:00:54 +01:00
Adrien Mazarguil
18aee2861a ethdev: add encap level to RSS flow API action
RSS hash types (ETH_RSS_* macros defined in rte_ethdev.h) describe the
protocol header fields of a packet that must be taken into account while
computing RSS.

When facing encapsulated (e.g. tunneled) packets, there is an ambiguity as
to whether these should apply to inner or outer packets. Applications need
the ability to tell exactly "where" RSS must be performed.

This is addressed by adding encapsulation level information to the RSS flow
action. Its default value is 0 and stands for the usual unspecified
behavior. Other values provide a specific encapsulation level.

Contrary to the change announced by commit 676b605182 ("doc: announce
ethdev API change for RSS configuration"), this patch does not affect
struct rte_eth_rss_conf but struct rte_flow_action_rss as the former is not
used anymore by the RSS flow action. ABI impact is therefore limited to
rte_flow.

This breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-27 18:00:54 +01:00
Adrien Mazarguil
929e331934 ethdev: add hash function to RSS flow API action
By definition, RSS involves some kind of hash algorithm, usually Toeplitz.

Until now it could not be modified on a flow rule basis and PMDs had to
always assume RTE_ETH_HASH_FUNCTION_DEFAULT, which remains the default
behavior when unspecified (0).

This breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-27 18:00:54 +01:00
Adrien Mazarguil
ac8d22de23 ethdev: flatten RSS configuration in flow API
Since its inception, the rte_flow RSS action has been relying in part on
external struct rte_eth_rss_conf for compatibility with the legacy RSS API.
This structure lacks parameters such as the hash algorithm to use, and more
recently, a method to tell which layer RSS should be performed on [1].

Given struct rte_eth_rss_conf will never be flexible enough to represent a
complete RSS configuration (e.g. RETA table), this patch supersedes it by
extending the rte_flow RSS action directly.

A subsequent patch will add a field to use a non-default RSS hash
algorithm. To that end, a field named "types" replaces the field formerly
known as "rss_hf" and standing for "RSS hash functions" as it was
confusing. Actual RSS hash function types are defined by enum
rte_eth_hash_function.

This patch updates all PMDs and example applications accordingly.

It breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()

[1] commit 676b605182 ("doc: announce ethdev API change for RSS
    configuration")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-27 18:00:53 +01:00
Adrien Mazarguil
19b3bc47c6 ethdev: fix C99 flexible arrays from flow API
This patch replaces C99-style flexible arrays in struct rte_flow_action_rss
and struct rte_flow_item_raw with standard pointers to the same data.

They proved difficult to use in the field (e.g. no possibility of static
initialization) and unsuitable for C++ applications.

Affected PMDs and examples are updated accordingly.

This breaks ABI compatibility for the following public functions:

- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()

Fixes: b1a4b4cbc0 ("ethdev: introduce generic flow API")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 18:00:53 +01:00
Adrien Mazarguil
cc17feb904 ethdev: alter behavior of flow API actions
This patch makes the following changes to flow rule actions:

- List order now matters, they are redefined as performed first to last
  instead of "all simultaneously".

- Repeated actions are now supported (e.g. specifying QUEUE multiple times
  now duplicates traffic among them). Previously only the last action of
  any given kind was taken into account.

- No more distinction between terminating/non-terminating/meta actions.
  Flow rules themselves are now defined as always terminating unless a
  PASSTHRU action is specified.

These changes alter the behavior of flow rules in corner cases in order to
prepare the flow API for actions that modify traffic contents or properties
(e.g. encapsulation, compression) and for which order matter when combined.

Previously one would have to do so through multiple flow rules by combining
PASSTRHU with priority levels, however this proved overly complex to
implement at the PMD level, hence this simpler approach.

This breaks ABI compatibility for the following public functions:

- rte_flow_create()
- rte_flow_validate()

PMDs with rte_flow support are modified accordingly:

- bnxt: no change, implementation already forbids multiple actions and does
  not support PASSTHRU.

- e1000: no change, same as bnxt.

- enic: modified to forbid redundant actions, no support for default drop.

- failsafe: no change needed.

- i40e: no change, implementation already forbids multiple actions.

- ixgbe: same as i40e.

- mlx4: modified to forbid multiple fate-deciding actions and drop when
  unspecified.

- mlx5: same as mlx4, with other redundant actions also forbidden.

- sfc: same as mlx4.

- tap: implementation already complies with the new behavior except for
  the default pass-through modified as a default drop.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-04-27 18:00:53 +01:00
Xueming Li
3d140329ca net/mlx5: allow max 192B TSO inline header length
Change max inline header length to 192B to allow IPv6 VXLAN TSO headers
and header with options that more than 128B.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-27 17:34:43 +01:00
Xueming Li
5f8ba81c42 net/mlx5: support generic tunnel offloading
This commit adds support for generic tunnel TSO and checksum offload.
PMD will compute the inner/outer headers offset according to the
mbuf fields. Hardware will do calculation based on offsets and types.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-27 17:34:43 +01:00
Xueming Li
593f472c40 net/mlx5: separate TSO function in Tx data path
Separate TSO function to make logic of mlx5_tx_burst clear.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-27 17:34:43 +01:00
Nélio Laranjeiro
e0586a8d1e net/mlx5: implement multicast add list devop
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 17:34:43 +01:00
Nélio Laranjeiro
18c01b98b5 net/mlx5: split MAC address add/remove code
Move some code in DPDK callbacks to add/remove MAC addresses to internal
function.  This modification will be necessary to handle implement the
devop set_mc_addr_list.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 17:34:43 +01:00
Nélio Laranjeiro
fa80b3c9ed net/mlx5: add more checks on MAC addresses
Verify MAC address before further process.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 17:34:43 +01:00
Nélio Laranjeiro
3c0db1ab51 net/mlx5: fix flow director mask
During the transition to resurrect flow director on top of rte_flow, mask
handling was removed by mistake.

Fixes: 4c3e9bcdd5 ("net/mlx5: support flow director")
Cc: stable@dpdk.org

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-27 17:34:41 +01:00
Nélio Laranjeiro
ca42b8a8b7 net/mlx5: split L3/L4 in flow director
This will help to bring back the mask handler which was removed when this
feature was rewritten on top of rte_flow.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-27 17:34:40 +01:00
Xueming Li
2323cc3c2e net/mlx5: fix invalid flow item check
This patch fixed invalid flow item check.

Fixes: a6d83b6a92 ("net/mlx5: standardize on negative errno values")

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 17:34:00 +01:00
Yongseok Koh
a2ceae5940 net/mlx5: fix alignment of memory region
The memory region is [start, end), so if the memseg of 'end' isn't
allocated yet, the returned memseg will have zero entries and this will
make 'end' zero (nil).

Fixes: 718e35999c ("net/mlx5: use virt2memseg instead of iteration")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-27 15:54:56 +01:00
Ferruh Yigit
3fef0822ec drivers/net: update link status
Update link status related feature document items and minor updates in
some link status related functions.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-27 15:54:56 +01:00
Adrien Mazarguil
e68744e53e net/mlx5: fix RSS flow action bounds check
The number of queues provided by the application is not checked against
parser's supported maximum.

Fixes: 3d821d6fea ("net/mlx5: support RSS action flow rule")
Cc: stable@dpdk.org

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-04-27 15:54:56 +01:00
Olivier Matz
caccf8b318 ethdev: return diagnostic when setting MAC address
Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a
return code is added to notify the caller (librte_ether) if an error
occurred in the PMD.

The new default MAC address is now copied in dev->data->mac_addrs[0]
only if the operation is successful.

The patch also updates all the PMDs accordingly.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-14 00:43:30 +02:00
Shahaf Shuler
a85a606ca5 net/mlx5: fix link status initialization
Following commit 7ba5320baa ("net/mlx5: fix link status behavior")
The initial link status is no longer set as part of the port start.

When LSC interrupts are enabled, ethdev layer reads the link status
directly from the device data instead of using the PMD callback.
This may cause application to query the link as down while in fact it was
already up before the DPDK application start (and no interrupt to fix
it).

Fixes: 7ba5320baa ("net/mlx5: fix link status behavior")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-04-14 00:43:30 +02:00
Ferruh Yigit
cd8c7c7ce2 ethdev: replace bus specific struct with generic dev
Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
although it is common for all ethdev in all buses.

Replacing pci specific struct with generic device struct and updating
places that are using pci device in a way to get this information from
generic device.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-04-14 00:41:44 +02:00
Nélio Laranjeiro
db209cc32a net/mlx5: add parameter for Netlink support in VF
All Netlink request the PMD will do can also be done by a iproute2 command
line interface, enabling VF behavior configuration without having to modify
the application nor reaching PMD limits (e.g. MAC address number limit).

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-14 00:41:44 +02:00
Nélio Laranjeiro
dd4bb90bc3 net/mlx5: use Netlink to enable promisc/allmulti mode
VF devices are not able to receive promisc or allmulti traffic unless it
fully requests it though Netlink.  This will cause the request to be
processed by the PF which will handle the request and enable it.

This requires the VF to be trusted by the PF.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-14 00:41:44 +02:00
Nélio Laranjeiro
ccdcba53a3 net/mlx5: use Netlink to add/remove MAC addresses
VF devices are not able to receive traffic unless it fully requests it
though Netlink.  This will cause the request to be processed by the PF
which will add/remove the MAC address to the VF table if the VF is trusted.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-14 00:41:44 +02:00
Yongseok Koh
f84411be9e net/mlx5: remove excessive data prefetch
In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
LLC if it isn't inlined. Even though this helps reducing jitter when HW
fetches data by DMA, this can thresh the LLC with evicting precious data.
And if the size of queue is large and there are many queues, this might not
be effective. Also, if application runs on a remote node from the PCIe
link, it may not be helpful and can even cause bad results.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-14 00:40:21 +02:00
Bin Huang
0915e287a6 net/mlx5: add packet type index for TCP ack
According to CQE format:
- l4_hdr_type:
     0 - None
     1 - TCP header was present in the packet
     2 - UDP header was present in the packet
     3 - TCP header was present in the packet with Empty
         TCP ACK indication. (TCP packet <ACK> flag is set,
         and packet carries no data)
     4 - TCP header was present in the packet with TCP ACK indication.
         (TCP packet <ACK> flag is set, and packet carries data).

A packet should be identified as TCP packet if l4_hdr_type is 1, 3 or 4.
Add corresponding idx of TCP ACK to ptype table.

previous discussion:
https://www.mail-archive.com/users@dpdk.org/msg02980.html

Signed-off-by: Bin Huang <bin.huang@hxt-semitech.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-14 00:40:21 +02:00
Bruce Richardson
a11dfe9b65 net/mlx: fix warnings for unused compiler arguments
When linking the mlx glue code libraries using CC, the linker arguments in
LDFLAGS are not prefixed with -Wl. [The EXTRA_LDFLAGS are though.] This
leads to warning messages on build:

clang-5.0: warning: argument unused during compilation: '-e xport-dynamic'

Fix this by checking for $LINK_USING_CC in the Makefiles and prefixing the
LDFLAGS appropriately if set.

Fixes: 27cea11686 ("net/mlx4: spawn rdma-core dependency plug-in")
Fixes: 59b91bec12 ("net/mlx5: spawn rdma-core dependency plug-in")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-04-14 00:40:21 +02:00
Anatoly Burakov
66cc45e293 mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.

In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.

So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.

Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.

This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.

On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.

The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.

Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.

[1] http://dpdk.org/dev/patchwork/patch/34002/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:39 +02:00
Anatoly Burakov
718e35999c net/mlx5: use virt2memseg instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:55:02 +02:00
Anatoly Burakov
8594a2026b net/mlx5: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and
simplify code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 19:48:12 +02:00
Shahaf Shuler
5feecc57d9 align SPDX Mellanox copyrights
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-04-11 01:47:47 +02:00
Bruce Richardson
c022cb400e convert snprintf to strlcpy
Since we have support for the strlcpy function in DPDK, replace all
instances where a string is copied using snprintf.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
2018-04-04 17:33:08 +02:00
Shahaf Shuler
e7041f5529 net/mlx5: fix RSS key length query
The RSS key length returned by rte_eth_dev_info_get command was taken
from the
PMD private structure. This structure initialization was done only after
the port configuration.

Considering Mellanox device supports only 40B long RSS key, reporting
the fixed number instead.

Fixes: 29c1d8bb3e ("net/mlx5: handle a single RSS hash key for all protocols")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
2018-03-30 14:08:44 +02:00