Commit Graph

340 Commits

Author SHA1 Message Date
Dekel Peled
0adf23adcb net/mlx5: fix flow engine choice
Commit in fixes line sets the DV (Direct Verbs) flow engine as default.
Newer versions of DV flow engine use the DR (Direct Rules) features.
DR is supported from RDMA Core library version rdma-core-24.0.
This cause failure to start port when using older rdma-core version,
without DR support.

This patch selects DV flow engine if rdma-core version is v24.0 or
higher. Verbs flow engine is selected otherwise.

Fixes: cd4569d2bf ("net/mlx5: change default flow engine to DV")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-26 18:05:15 +01:00
Matan Azrad
1ef4cdef26 net/mlx5: fix flow tag hash list conversion
When DR is not supported and DV is supported, tag action still can be
used by the metadata feature.

Wrongly, the tag hash list was not created what caused failure in
metadata action creation.

Create the tag hash list for each DV case.

Fixes: 860897d289 ("net/mlx5: reorganize flow tables with hash list")

Signed-off-by: Matan Azrad <matan@mellanox.com>
2019-11-26 18:05:15 +01:00
Viacheslav Ovsiienko
f078ceb6ae net/mlx5: fix Tx doorbell write memory barrier
As the result of testing it was found that some hosts have
the performance penalty imposed by required write memory barrier
after doorbell writing. Before 19.08 release there was some
heuristics to decide whether write memory barrier should be
performed. For the bursts of recommended size (or multiple)
it was supposed there were some extra ongoing packets in the
next burst and write memory barrier may be skipped (supposed
to be performed in the next burst, at least after descriptor
writing).

This patch restores that behaviour, the devargs tx_db_nc=2
must be specified to engage this performance tuning feature.

Fixes: 8409a28573 ("net/mlx5: control transmit doorbell register mapping")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-20 17:36:06 +01:00
Dekel Peled
cd4569d2bf net/mlx5: change default flow engine to DV
The default flow engine is Verbs flow engine, for legacy reasons.
This patch changes the default to DV flow engine (dv_flow_en = 1).
Documentation is updated accordingly.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-20 17:36:06 +01:00
Viacheslav Ovsiienko
85c4bcbcc5 net/mlx5: fix vport index in port action
The rdma_core routine mlx5dv_dr_create_flow_action_dest_vport()
requires the vport id parameter to create port action.
The register c[0] value was used to deduce the port id value
and it fails in bonding configuration. The correct way is
to apply vport_num value queried from the rdma_core library.

Fixes: f07341e7ae ("net/mlx5: update source and destination vport translations")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-20 17:36:06 +01:00
Matan Azrad
54534725d2 net/mlx5: fix flow table hash list conversion
For the case when DR is not supported and DV is supported:
	multi-tables feature is off.
	In this case, only table 0 is supported.
	Table 0 structure wrongly was not created what prevented any
	matcher object to be created and even caused crashes.

Create the table hash list in DV case too.
Create table zero empty structure for each domain when DR is not
supported.
Allow NULL DR internal table object to be used.

Fixes: 860897d289 ("net/mlx5: reorganize flow tables with hash list")

Signed-off-by: Matan Azrad <matan@mellanox.com>
2019-11-20 17:36:06 +01:00
Viacheslav Ovsiienko
06f78b5ebc net/mlx5: fix environment variable recovery
The state of environment variable MLX5_BF_SHUT_UP was not
recovered correctly if there was no tx_db_nc devarg specified.

Fixes: 8409a28573 ("net/mlx5: control transmit doorbell register mapping")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-20 17:36:05 +01:00
Bing Zhao
e484e40323 net/mlx5: optimize tag traversal with hash list
Tag action for flow mark/flag could be reused by different flows.
When creating a new flow with mark, the existing tag resources will
be traversed in order to confirm if the action is already created.
If only one linked list is used, the searching rate will drop
significantly with the number of tag actions increasing.
By using a hash lists table, it will speed up the searching process
and in the meanwhile, the memory consumption won't be large if only
a small number tag action resources are created(compared to other
hash table implementations). The list heads array size could be
optimized with some extendable hash table in the future.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-11 14:23:02 +01:00
Bing Zhao
860897d289 net/mlx5: reorganize flow tables with hash list
In the current flow tables organization, arrays are used. This is
fast for searching, creating related object that will be used in
flow creation. But it introduces some limitation to the table index.
Then we can reorganize the flow tables information with hash list.
When using hash list, there is no need to maintain three arrays for
NIC TX, RX and FDB tables object information.
This attribute could be used together with the table ID to generate
a 64-bits key that is unique for the hash list insertion, lookup and
deletion.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
8409a28573 net/mlx5: control transmit doorbell register mapping
The rdma core library can map doorbell register in two ways,
depending on the environment variable "MLX5_SHUT_UP_BF":

  - as regular cached memory, the variable is either missing or
    set to zero. This type of mapping may cause the significant
    doorbell register writing latency and requires explicit
    memory write barrier to mitigate this issue and prevent
    write combining.

  - as non-cached memory, the variable is present and set to
    not "0" value. This type of mapping may cause performance
    impact under heavy loading conditions but the explicit write
    memory barrier is not required and it may improve core
    performance.

The new devarg is introduced "tx_db_nc", if this parameter is
set to zero, the doorbell register is forced to be mapped to
cached memory and requires explicit memory barrier after
writing to. If "tx_db_nc" is set to non-zero value the doorbell
will be mapped as non-cached memory, not requiring the memory
barrier. If "tx_db_nc" is missing the behaviour will be defined
by presence of "MLX5_SHUT_UP_BF" in environment. If variable
is missed the default value zero will be set for ARM64 hosts
and one for others.

In run time the code checks the mapping type and provides the
memory barrier after writing to tx doorbell register if it is
needed. The mapping type is extracted directly from the
uar_mmap_offset field in the queue properties.

Fixes: 18a1c20044 ("net/mlx5: implement Tx burst template")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
02e7646818 net/mlx5: clean meter resources
When the port is closed or program exits ungraceful, the meter rulers
should be flushed after the flow destroyed.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
3f373f3523 net/mlx5: support basic meter operations
This commit add the basic meter operations for meter create and destroy.

New internal functions in rte_mtr_ops callback:
1. create()
2. destroy()

The create() callback will create the corresponding flow rules on the
meter table.
The destroy() callback destroys the flow rules on the meter table.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
3bd26b23ce net/mlx5: support meter profile operations
This commit add the support of meter profile add and delete operations.

New internal functions in rte_mtr_ops callback:
1. meter_profile_add()
2. meter_profile_delete()

Only RTE_MTR_SRTCM_RFC2697 algorithm is supported and can be added. To
add other algorithm will report an error.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
27efd5dead net/mlx5: allocate flow meter registers
Meter need the metadata REG_C to have the color match between the prefix
flow and the meter flow.

As the user define or metadata feature will both use the REG_C in the
suffix flow, the color match register meter uses will not impact the
register use in the later sub flow.

Another case is that tag is add before meter flow. In this case, meter
should not touch the register the tag action is using. To avoid that
case, meter should reserve the REG_C's used by user defined MLX5_APP_TAG.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
6bc327b94f net/mlx5: fill meter capabilities using DevX
This commit add the support of fill and get the meter capabilities
from DevX.

Support items:
1. The srTCM color bind mode.
2. Meter share with multiple flows.
3. Action drop.

The color aware mode and multiple meter chaining in a flow are not
supported.

New internal function in rte_mtr_ops callback:
1. capabilities_get()

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Suanming Mou
d740eb5018 net/mlx5: add meter operation callback
Add the new mlx5_flow_meter.c file for metering support.

Signed-off-by: Suanming Mou <suanmingm@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
dd3c774f6f net/mlx5: add metadata register copy table
While reg_c[meta] can be copied to reg_b simply by modify-header
action (it is supported by hardware), it is not possible to copy
reg_c[mark] to the STE flow_tag as flow_tag is not a metadata
register and this is not supported by hardware. Instead, it
should be manually set by a flow per each unique MARK ID. For
this purpose, there should be a dedicated flow table -
RX_CP_TBL and all the Rx flow should pass by the table
to properly copy values from the register to flow tag field.

And for each MARK action, a copy flow should be added
to RX_CP_TBL according to the MARK ID like:
  (if reg_c[mark] == mark_id),
    flow_tag := mark_id / reg_b := reg_c[meta] / jump to RX_ACT_TBL

For SET_META action, there can be only one default flow like:
  reg_b := reg_c[meta] / jump to RX_ACT_TBL

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
71e254bc02 net/mlx5: split Rx flows to provide metadata copy
Values set by MARK and SET_META actions should be carried over
to the VF representor in case of flow miss on Tx path. However,
as not all metadata registers are preserved across the different
domains (NIC Rx/Tx and E-Switch FDB), as a workaround, those
values should be carried by reg_c's which are preserved across
domains and copied to STE flow_tag (MARK) and reg_b (META) fields
in the last stage of flow steering, in order to scatter those
values to flow_tag and flow_table_metadata of CQE.

While reg_c[meta] can be copied to reg_b simply by modify-header
action (it is supported by hardware), it is not possible to copy
reg_c[mark] to the STE flow_tag as flow_tag is not a metadata
register and this is not supported by hardware. Instead, it should
be manually set by a flow per MARK ID. For this purpose, there
should be a dedicated flow table - RX_CP_TBL and all the Rx flow
should pass by the table to properly copy values.

As the last action of Rx flow steering must be a terminal action
such as QUEUE, RSS or DROP, if a user flow has Q/RSS action, the
flow must be split in order to pass by the RX_CP_TBL. And the
remained Q/RSS action will be performed by another dedicated
action table - RX_ACT_TBL.

For example, for an ingress flow:
    pattern,
    actions_having_QRSS
it must be split into two flows. The first one is,
    pattern,
    actions_except_QRSS / copy (reg_c[2] := flow_id) / jump to RX_CP_TBL
and the second one in RX_ACT_TBL.
    (if reg_c[2] == flow_id),
    action_QRSS
where flow_id is uniquely allocated and managed identifier.

This patch implements the Rx flow splitting and build the RX_ACT_TBL.
Also, per each egress flow on NIC Tx, a copy action (reg_c[]= reg_a)
should be added in order to transfer metadata from WQE.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
3913937151 net/mlx5: adjust shared register according to mask
The metadata register reg_c[0] might be used by kernel or
firmware for their internal purposes. The actual used mask
can be queried from the kernel. The remaining bits can be
used by PMD to provide META or MARK feature. The code queries
the mask of reg_c[0] and adjust the resource usage dynamically.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
2d241515eb net/mlx5: add devarg for extensive metadata support
The PMD parameter dv_xmeta_en is added to control extensive
metadata support. A nonzero value enables extensive flow
metadata support if device is capable and driver supports it.
This can enable extensive support of MARK and META item of
rte_flow. The newly introduced SET_TAG and SET_META actions
do not depend on dv_xmeta_en parameter, because there is
no compatibility issue for new entities. The dv_xmeta_en is
disabled by default.

There are some possible configurations, depending on parameter
value:

- 0, this is default value, defines the legacy mode, the MARK
  and META related actions and items operate only within NIC Tx
  and NIC Rx steering domains, no MARK and META information
  crosses the domain boundaries. The MARK item is 24 bits wide,
  the META item is 32 bits wide.

- 1, this engages extensive metadata mode, the MARK and META
  related actions and items operate within all supported steering
  domains, including FDB, MARK and META information may cross
  the domain boundaries. The ``MARK`` item is 24 bits wide, the
  META item width depends on kernel and firmware configurations
  and might be 0, 16 or 32 bits. Within NIC Tx domain META data
  width is 32 bits for compatibility, the actual width of data
  transferred to the FDB domain depends on kernel configuration
  and may be vary. The actual supported width can be retrieved
  in runtime by series of rte_flow_validate() trials.

- 2, this engages extensive metadata mode, the MARK and META
  related actions and items operate within all supported steering
  domains, including FDB, MARK and META information may cross
  the domain boundaries. The META item is 32 bits wide, the MARK
  item width depends on kernel and firmware configurations and
  might be 0, 16 or 24 bits. The actual supported width can be
  retrieved in runtime by series of rte_flow_validate() trials.

If there is no E-Switch configuration the ``dv_xmeta_en`` parameter is
ignored and the device is configured to operate in legacy mode (0).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:02 +01:00
Viacheslav Ovsiienko
5e61bcdd24 net/mlx5: check metadata registers availability
The metadata registers reg_c provide support for TAG and
SET_TAG features. Although there are 8 registers are available
on the current mlx5 devices, some of them can be reserved.
The availability should be queried by iterative trial-and-error
implemented by mlx5_flow_discover_mreg_c() routine.

If reg_c is available, it can be regarded inclusively that
the extensive metadata support is possible. E.g. metadata
register copy action, supporting 16 modify header actions
(instead of 8 by default) preserving register across
different domains (FDB and NIC) and so on.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-11-11 14:23:01 +01:00
Raslan Darawsheh
5fc66630be net/mlx5: add ConnectX6-DX device ID
This adds new device id to the list of Mellanox devices
that runs mlx5 PMD.
	- ConnectX-6DX device ID
	- ConnectX-6DX SRIOV device ID

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:05 +01:00
Dekel Peled
06fa6988d8 net/mlx5: remove redundant new line in logs
DRV_LOG macro is used to print log messages, one per line.
In several locations this macro is used with redundant '\n' character
at the end of the log message, causing blank lines between log lines.

This patch removes the '\n' character where it is redundant.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
d85c7b5ea5 net/mlx5: split hairpin flows
Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
830d209161 net/mlx5: add ID generation
When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
b6b3bf86bd net/mlx5: get hairpin capabilities
This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
ae18a1ae96 net/mlx5: support Tx hairpin queues
This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
894c4a8e5a net/mlx5: prepare Tx queues to have different types
Currently all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Ori Kam
e79c9be915 net/mlx5: support Rx hairpin queues
This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Dekel Peled
2eb5dce8c0 net/mlx5: fix LRO dependency to include DV flow
Rx queue for LRO is created using DevX. Flows created on this queue
must use the DV flow engine.

This patch adds check of dv_flow_en=1 when configuring LRO support
on device spawn.
Documentation is updated accordingly.

Fixes: 175f1c21d0 ("net/mlx5: check conditions to enable LRO")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-11-08 23:15:04 +01:00
Matan Azrad
2324206337 net/mlx5: fix DevX event registration timing
The DevX counter management triggers an asynchronous event to get back
the new counters values from the HW.

The counter management doesn't trigger 2 parallel events for the same
pool, hence, the pool cannot be updated again in the event waiting time.

When the port is stopped, the DevX event mechanism wrongly was
destroyed what remained all the waiting pools in waiting state forever.

As a result, the counters of the stuck pools were never updated again.

Separate the DevX interrupt installation from the dev installation and
remove the DevX interrupt unregistration\registration from the
stop\start operations.

Now, the DevX interrupt should be installed in probe and uninstalled in
close.

Cc: stable@dpdk.org
Fixes: f15db67df0 ("net/mlx5: accelerate DV flow counter query")

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-10-23 16:43:10 +02:00
Viacheslav Ovsiienko
cc8627bc6d net/mlx5: fix direct call to rdma-core library
The routine mlx5dv_query_devx_port() was called directly
instead of using the mlx5 glue thunk.

Fixes: d5c06b1b10 ("net/mlx5: query vport index match mode and parameters")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-10-08 12:14:32 +02:00
Viacheslav Ovsiienko
fbc8341218 net/mlx5: fix device scan within switch domain
In LAG configuration the devices in the same switch domain
might be spawned on the base of different PCI devices, so
we should check all devices backed by mlx5 PMD whether they
belong to specified switch domain. When the new devices are
being created it is not possible to detect whether the
sibling devices created in the current probe() loop belong
to the driver, driver field is not filled yet (it will be
done on returned success of current probe()). This patch
updates the device scanning, allowing extra match on
current backing PCI device, is being used to create siblings.

Fixes: f7e95215ac ("net/mlx5: extend switch domain searching range")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-10-08 12:14:32 +02:00
Viacheslav Ovsiienko
92d5dd4834 net/mlx5: check sibling device configurations mismatch
The devices backed by mlx5 PMD might share the same multiport
Infiniband device context. It regards representors and slaves
of bonding device. These ports are spawned with devargs.
These patch check whether configuration deduced from these
devargs is compatible with configurations if devices
sharing the same context. It prevents the incorrect
whitelists, like:

-w 82:00.0,representor=0,dv_flow_en=1
-w 82:00.0,representor=1,dv_flow_en=0

The representors with indices [0-1] are supposed to spawned
over the same PCi device, but there is dv_flow_en parameter
mismatch.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko
bee57a0a35 net/mlx5: update switch port id in bonding configuration
With bonding configuration multiple PFs may represent the
single switching device with multiple ports as representors.
To distinguish representors belonging to different PFs we
should generated unique port ID. It is proposed to use
the PF index in bonding configuration to generate this
unique port IDs.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko
f7e95215ac net/mlx5: extend switch domain searching range
With bonding configurations the switch domain may be shared
between multiple PCI devices, we should search the switch
sibling devices within the entire set of present ethernet
devices backed by the mlx5 PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-08 12:14:29 +02:00
Viacheslav Ovsiienko
d5c06b1b10 net/mlx5: query vport index match mode and parameters
There new kernel/rdma_core [1] supports matching on metadata
register instead of vport field to provide operations over
VF LAG bonding configurations. The patch retrieves parameters
and information about the way is engaged to match vport on E-Switch.

[1] http://patchwork.ozlabs.org/cover/1122170/
    "Mellanox, mlx5 vport metadata matching"

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
790164ce1d net/mlx5: check kernel support for VF LAG bonding
If bonding Infiniband device is found the unified E-Switch
is supposed and the extra rdma-core/kernel support is needed
to retrieve vport indices. The patch introduces this feature
defines, bonding support check is added to probe routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
10dadfcb8a net/mlx5: generate bonding device name
If device is VF LAG bonding one the port name includes
the bonding Infiniband device name and looks like:

  82:00.0_mlx5_bond_0 - for master device port PF0
  82:00.1_mlx5_bond_0_representor_5 - for representor
                                           VF5 over PF1

where bonding Infiniband device mlx5_bond_0 controls
the 82:00.0 as PF0 and 82:00.1 as PF1 PCI functions.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
2e569a3703 net/mlx5: add VF LAG mode bonding device recognition
The Mellanox NICs starting from ConnectX-5 support LAG over
NIC ports internally, implemented by the NIC firmware and hardware.
The multiport NIC presents multiple physical PCI functions (PF),
with SR-IOV multiple virtual PCI functions (VFs) might be presented.
With switchdev mode the VF representors are engaged and PFs and their
VFs are connected by internal E-Switch feature. Each PF and related VFs
have dedicated E-Switch and belong to dedicated switch domain.

If NIC ports are combined to support NIC the kernel drivers introduce
the single unified Infiniband multiport devices, and all only one
unified E-Switch with single switch domain combines master PF
all all VFs. No extra DPDK bonding device is needed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
a62ec99161 net/mlx5: allocate device list explicitly
At device probing the device list to spawn was allocated
as dynamic size local variable. It was no possible to have
one unified exit point from routine due to compiler warnings.
This patch allocates the spawn device list directly with
rte_zmalloc() and it is possible to goto to unified exit
label from anywhere of the routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
5cf5f710b0 net/mlx5: update PCI address retrieving routine
The routine mlx5_ibv_device_to_pci_addr() takes Infiniband
device list object, takes the device sysfs path from there
and retrieves PCI address. The routine may be implemented
in more generic way by taking sysfs path directly as parameter
and can be used for getting PCI address of netdevs.

The generic routine is renamed to mlx5_dev_to_pci_addr()

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
46e10a4c1b net/mlx5: move backing PCI device to private context
Now all devices created over the same multiport IB device
have shared context containing the backing PCI device field.
For the VF LAG configurations it becomes possible the
representors might be connected to VF created over different
PFs. In this case representors have the different backing
PCI devices and mentioned field should be moved to device
private area.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:58 +02:00
Viacheslav Ovsiienko
c930f02c74 net/mlx5: fix ConnectX-6 VF type recognition
The PCI virtual function type was not recognized correctly
for ConnectX-6 VF.

Fixes: f0354d8423 ("net/mlx5: add ConnectX-6 device IDs")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:57 +02:00
Viacheslav Ovsiienko
a40b734b5e net/mlx5: fix BlueField VF type recognition
The PCI virtual function type was not recognized correctly
for BlueField VF.

Fixes: f38c54571d ("net/mlx5: split PCI from generic probing")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-10-07 15:00:57 +02:00
Dekel Peled
8a6a09f853 net/mlx5: support reading module EEPROM data
This patch implements ethdev operations get_module_info and
get_module_eeprom, to support ethtool commands ETHTOOL_GMODULEINFO
and ETHTOOL_GMODULEEEPROM.

New functions mlx5_get_module_info() and mlx5_get_module_eeprom()
added in mlx5_ethdev.c.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-09-20 10:19:41 +02:00
Moti Haimovsky
b41e47da25 net/mlx5: support pop flow action on VLAN header
This commit adds support for RTE_FLOW_ACTION_TYPE_OF_POP_VLAN via
direct verbs flow rules.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-09-20 10:19:41 +02:00
David Marchand
8ac3591694 remove useless include of EAL memory config header
Restrict this header inclusion to its real users.

Fixes: 028669bc9f ("eal: hide shared memory config")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-09 10:22:24 +02:00
Raslan Darawsheh
c9ba7523c4 net/mlx5: support UDP tunnel adding
This adds support for adding a new UDP tunnel port
on a specific VXLAN types.

Currently we only support VXLAN, VXLAN-GPE on ports
4789, 4790 respectively. Without having to configure
anything in the NIC.

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-09-06 17:15:14 +02:00
Viacheslav Ovsiienko
0e3d0525b2 net/mlx5: fix memory event callback list
The shared Infiniband device context should be included
into memory event callback list only once on context creation,
and removed from the list only once on context destroying.
Multiple insertions of the same object caused the infinite
loop on the list processing.

Fixes: ccb3815346 ("net/mlx5: update memory event callback for shared context")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-08-06 17:42:12 +02:00