Commit Graph

340 Commits

Author SHA1 Message Date
Viacheslav Ovsiienko
614de6c898 net/mlx5: fix default minimal data inline
The patch [Fixes] sets the default value of required minimal
inline data to 0 bytes. On some configurations (depends
on switchdev/legacy settings and FW version/settings)
the ConnectX-4LX NIC requires minimal 18 bytes of
Tx descriptor inline data to operate correctly.

Wrongly set to 0 default value may prevent NIC from operating
with out-of-the-box settings, this patch reverts default
value for ConnectX-4LX back to 18 bytes (inline L2).

Fixes: 9f350504bb ("net/mlx5: fix ConnectX-4LX minimal inline data limit")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-08-06 17:42:12 +02:00
David Christensen
20215627cd net/mlx5: fix Tx inline minimum for ConnectX-5
The function mlx5_set_min_inline() includes a switch() that checks
various PCI device IDs in order to set the txq_inline_min value.  No
value is set when the PCI device ID matches the ConnectX-5 adapters,
resulting in an assert() failure later in the function
mlx5_set_txlimit_params().

This error was encountered on an IBM Power 9 system running RHEL 7.6
w/o Mellanox OFED installed.

Fixes: 38b4b397a5 ("net/mlx5: add Tx configuration and setup")

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-08-06 17:42:12 +02:00
Viacheslav Ovsiienko
dfedf3e3f9 net/mlx5: add workaround for VLAN in virtual machine
On some virtual setups (particularly on ESXi) when we have SR-IOV and
E-Switch enabled there is the problem to receive VLAN traffic on VF
interfaces. The NIC driver in ESXi hypervisor does not setup E-Switch
vport setting correctly and VLAN traffic targeted to VF is dropped.

The patch provides the temporary workaround - if the rule
containing the VLAN pattern is being installed for VF the VLAN
network interface over VF is created, like the command does:

  ip link add link vf.if name mlx5.wa.1.100 type vlan id 100

The PMD in DPDK maintains the database of created VLAN interfaces
for each existing VF and requested VLAN tags. When all of the RTE
Flows using the given VLAN tag are removed the created VLAN interface
with this VLAN tag is deleted.

The name of created VLAN interface follows the format:

  evmlx.d1.d2, where d1 is VF interface ifindex, d2 - VLAN ifindex

Implementation limitations:

- mask in rules is ignored, rule must specify VLAN tags exactly,
  no wildcards (which are implemented by the masks) are allowed

- virtual environment is detected via rte_hypervisor() call,
  and the type of hypervisor is checked. Currently we engage
  the workaround for ESXi and unrecognized hypervisors (which
  always happen on platforms other than x86 - it means workaround
  applied for the Flow over PCI VF). There are no confirmed data
  the other hypervisors (HyperV, Qemu) need this workaround,
  we are trying to reduce the list of configurations on those
  workaround should be applied.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-08-06 17:42:12 +02:00
Viacheslav Ovsiienko
9f350504bb net/mlx5: fix ConnectX-4LX minimal inline data limit
Mellanox ConnectX-4LX NIC in configurations with disabled
E-Switch can operate without minimal required inline data
into Tx descriptor. There was the hardcoded limit set to
18B in PMD, fixed to be no limit (0B).

Fixes: 38b4b397a5 ("net/mlx5: add Tx configuration and setup")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2019-07-29 18:05:10 +02:00
Matan Azrad
bd41389e35 net/mlx5: allow LRO in regular Rx queue
LRO support was only for MPRQ, hence mprq Rx burst was selected when
LRO was configured in the port.

The current support for MPRQ is suffering from bad memory utilization
since an external mempool is allocated by the PMD for the packets data
in addition to the user mempool, besides that, the user may get packet
data addresses which were not configured by him.

Even though MPRQ has the best performance for packet receiving in the
most cases and because of the above facts it is better to remove the
automatic MPRQ select when LRO is configured.

Move MPRQ to be selected only when the user force it by the PMD
arguments including LRO case.

Allow LRO offload using the regular RQ with the regular Rx burst
function.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-29 16:54:27 +02:00
Dekel Peled
a88209b04c net/mlx5: fix doorbell release on Rx queue release
Function mlx5_rxq_release() calls mlx5_release_dbr() to release the
doorbell allocated for this Rx queue.
This call is relevant only for Rx queue objects created using
DevX API.

This patch adds the required check, to call mlx5_release_dbr()
only when relevant.
It also updates mlx5_release_dbr() to use the input offset correctly.

Fixes: dc9ceff73c ("net/mlx5: create advanced RxQ via DevX")

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-24 18:05:07 +02:00
Dekel Peled
b9d86122bf net/mlx5: store protection domain number on create
Function mlx5_alloc_shared_ibctx() allocates Protection Domain using
verbs API, as part of shared IB device context.
This patch adds reading and storing of pdn value from the created PD
object, using DV API.
The pdn value is required when creating WQ using DevX API.

This patch also updates function flow_dv_create_counter_stat_mem_mng()
which uses the pdn value as well.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
23820a7989 net/mlx5: rename hash RxQ verbs to general
Prepare for introducing use of DevX TIR object.
Hash Rx queue is currently created using verbs QP only.
The next patches will add the option to create it with a TIR object
using DevX.
This patch renames hrxq_ibv to hrxq wherever relevant, and adds
the DevX items to relevant structs.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
15c80a126d net/mlx5: rename verbs indirection table to obj
Prepare for introducing of DevX RQT object.
Rx indirection table object is currently created using verbs only.
The next patches will add the option to create an RQT object using
DevX.
This patch renames ind_table_ibv to ind_table_obj wherever relevant,
and adds the DevX items to relevant structs.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
93403560ba net/mlx5: rename RxQ verbs to general RxQ object
Prepare for introducing of DevX RxQ object.
RxQ object is currently created using verbs only.
The next patches will add the option to create RxQ object using DevX.
This patch renames rxq_ibv to rxq_obj wherever relevant, and adds the
DevX items to relevant structs.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
21cae8580f net/mlx5: allocate door-bells via DevX
When using DevX API, memory for door-bell records should be allocated
by PMD and registered using DevX API.

This patch implements the utility functions to support it:
- Add struct mlx5_devx_dbr_page, containing door-bells page data.
- Add list of struct mlx5_devx_dbr_page door-bell pages to device
  private data.
- Implement function mlx5_alloc_dbr_page() to allocate page for
  door-bell records, and register it using DevX API.
- Implement function mlx5_get_dbr(). to acquire a door-bell record
  from the door-bells page, allocating a new page if needed.
- Implement function mlx5_release_dbr() to release a door-bell
  record that is no longer needed, freeing the containing page if
  it becomes empty.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
175f1c21d0 net/mlx5: check conditions to enable LRO
Use DevX API to read device LRO capabilities.
Check if LRO is supported and can be enabled.
Check if MPRQ is supported and can be used.
Enable MPRQ for LRO use if not enabled by user.
Added note for mlx5_mprq_enabled(), to emphasize that LRO
enables MPRQ.
Disable CQE compression and CRC stripping if LRO is enabled.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
3075bd23e3 net/mlx5: add glue for create action via DevX
Add compile option HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR, and matching
dest_tir flag in device configuration structure.
Add glue function pointer dv_create_flow_action_dest_devx_tir, and
function mlx5_glue_dv_create_flow_action_dest_devx_tir(),
to invoke API mlx5dv_dr_action_create_dest_devx_tir();

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Dekel Peled
21bb6c7e62 net/mlx5: introduce LRO
Add command-line argument to set LRO session timeout.
Add LRO settings struct in PMD configuration struct.
Add support of LRO offload in port configuration.
Add macros and function to check if LRO is supported and enabled.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-23 14:31:36 +02:00
Viacheslav Ovsiienko
fa2e14d492 net/mlx5: cache associated network device index
The associated device index is retrieved via Netlink request to
underlying Infiniband device driver. This network device index
is permanent throughout the lifetime of device. We do not
spawn the rte_eth_dev ports without associated network device, and
if network device is being unbound we get the remove notification
message and rte_eth_dev port is also detached. So, we may store
the ifindex in mlx5_device_spawn() routine at rte_eth_dev port
creation and initialization time and use the cached value further
instead of doing actual Netlink request.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23 14:31:36 +02:00
Viacheslav Ovsiienko
38b4b397a5 net/mlx5: add Tx configuration and setup
This patch updates the Tx datapath control and configuration
structures and code for managing Tx datapath settings.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23 14:31:36 +02:00
Viacheslav Ovsiienko
505f1fe426 net/mlx5: add Tx devargs
This patch introduces new mlx5 PMD devarg options:

- txq_inline_min - specifies minimal amount of data to be inlined into
  WQE during Tx operations. NICs may require this minimal data amount
  to operate correctly. The exact value may depend on NIC operation
  mode, requested offloads, etc.

- txq_inline_max - specifies the maximal packet length to be completely
  inlined into WQE Ethernet Segment for ordinary SEND method. If packet
  is larger the specified value, the packet data won't be copied by the
  driver at all, data buffer is addressed with a pointer. If packet
  length is less or equal all packet data will be copied into WQE.

- txq_inline_mpw - specifies the maximal packet length to be completely
  inlined into WQE for Enhanced MPW method.

Driver documentation is also updated.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23 14:31:36 +02:00
Viacheslav Ovsiienko
a6bd4911ad net/mlx5: remove Tx implementation
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:

- deprecated devargs support
- tx_burst() routines
- related PRM definitions
- SQ configuration code
- Tx routine selection code
- incompatible Tx completion code

The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
  for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
  to "txq_inline_mpw" for compatibility issue

The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-07-23 14:31:36 +02:00
Matan Azrad
31538ef62c net/mlx5: allow basic counter management fallback
In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.

Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-07-23 14:31:35 +02:00
Matan Azrad
f15db67df0 net/mlx5: accelerate DV flow counter query
All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.

Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.

A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.

Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!

More synchronization is added with the host thread:
        Container resize uses double buffer algorithm.
        Pools growing in container uses atomic operation.
        Pool query buffer replace uses a spinlock.
        Pool minimum devX counter ID uses atomic operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-07-23 14:31:35 +02:00
Matan Azrad
5382d28c21 net/mlx5: accelerate DV flow counter transactions
The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).

Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.

Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.

The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.

The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.

The query operation is in pool resolution which updates all the pool
counter raw data by one operation.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-07-23 14:31:35 +02:00
David Marchand
b76fafb174 eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).

The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.

We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.

Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.

Fixes: 703458e19c ("bus/pci: consider only usable devices for IOVA mode")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-22 17:45:52 +02:00
Moti Haimovsky
5291c601bb net/mlx5: remove TCF support
This commit removes the support of configuring the device E-switch
using TCF since it is now possible to configure it via DR (direct
verbs rules), and by that to also remove the PMD dependency in libmnl.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2019-07-05 01:52:02 +02:00
Matan Azrad
15b0ea0053 net/mlx5: fix device arguments error detection
When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.

This behavior doesn't make sense because the user intention is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.

Stop probing and report an error in case of problematic command line
arguments.

Fixes: e72dd09b61 ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-06-14 00:01:06 +09:00
Matan Azrad
066cfecdc9 net/mlx5: add log file procedure for debug data
Add a global function in the PMD which dumps debug information to
specific file.

The data can be printed in hexadecimal format or as regular string.

The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.

The files will be created in the /var/log directory or in the current
directory.

Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-06-14 00:01:06 +09:00
Viacheslav Ovsiienko
5897ac1393 net/mlx5: fix event handler uninstall
When device is being closed and tries to unregister interrupt callback,
there is a chance the handler is still active (called in context of
eal_intr_thread_main thread). If so the rte_intr_callback_unregister
returns -EAGAIN and keeps the handler registered, causing crash when
underlaying resourse is gone away.

This race condition may happen if event handling in application takes
a long time. We should check the return code of unregistering routine
and try again to unregister the handler. The diagnostic messages are
shown once a second, while trying to unregister.

Fixes: 028b2a28c3 ("net/mlx5: update event handler for multiport IB devices")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-06-06 20:21:20 +09:00
Tom Barbette
e571ad5541 net/mlx5: support reading clock
Implements support for read_clock for the mlx5 driver. mlx5 supports
hardware timestamp offload, setting packets timestamp field to the
device clock. rte_eth_read_clock allows to read the device's current
clock value and therefore compare values on similar time base.

See rxtx_callbacks for an example.

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-06-06 20:21:20 +09:00
Anatoly Burakov
edf73dd330 ipc: handle unsupported IPC in action register
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.

For secondary processes, IPC will be crucial, so leave all
of the error handling as is.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:36 +02:00
Yongseok Koh
69c06d0e35 net/mlx: support IOVA VA mode
Set RTE_PCI_DRV_IOVA_AS_VA to driver's drv_flags as device's IOMMU takes
virtual address.

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-06-04 00:34:54 +02:00
Olivier Matz
35b2d13fd6 net: add rte prefix to ether defines
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.

Do not update the command line library to avoid adding a dependency to
librte_net.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:45 +02:00
Olivier Matz
6d13ea8e8e net: add rte prefix to ether structures
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.

Do not update the command line library to avoid adding a dependency to
librte_net.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-05-24 13:34:45 +02:00
Ori Kam
d1e64fbf63 net/mlx5: fix Direct Rules API
The RDMA-CORE Direct Rules API was changed in latest upstream code

This commit update the API accordingly.

Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-03 18:45:23 +02:00
Viacheslav Ovsiienko
ccb3815346 net/mlx5: update memory event callback for shared context
Mellanox mlx5 PMD implements the list of devices to process the memory
free events to reflect the actual memory state to Memory Regions.
Because this list contains the devices and devices may share the
same context the callback routine may be called multiple times
with the same parameter, that is not optimal. This patch modifies
the list to contain the device contexts instead of device objects
and shared context is included in the list only once.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-03 18:45:23 +02:00
Viacheslav Ovsiienko
ab3cffcfc2 net/mlx5: share Memory Regions for multiport device
The multiport Infiniband device support was introduced [1].
All active ports, belonging to the same Infiniband device use the single
shared Infiniband context of that device and share the resources:
  - QPs are created within shared context
  - Verbs flows are also created with specifying port index
  - DV/DR resources
  - Protection Domain
  - Event Handlers

This patchset adds support for Memory Regions sharing between
ports, created on the base of multiport Infiniband device.
The datapath of mlx5 uses the layered cache subsystem for
allocating/releasing Memory Regions, only the lowest layer L3
is subject to share due to performance issues.

[1] http://patches.dpdk.org/cover/51800/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-05-03 18:45:23 +02:00
Viacheslav Ovsiienko
f4a9349da8 net/mlx5: fix probing if DevX disabled
If there is the support of DevX is exposed by rdma-core but
DevX is not supported by or disabled for the specific interface
the mlx5_devx_cmd_query_hca_attr() routine returns an error
preventing the device from successful probing. The routine
should be invoked only in case of enabled DevX.

Fixes: e2b4925ef7 ("net/mlx5: support Direct Rules E-Switch")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-03 18:45:22 +02:00
Yongseok Koh
fc0aebe187 net/mlx5: fix Direct Rules build
All the library calls must be called via the glue layer.

Fixes: b2177648b8 ("net/mlx5: add Direct Rules flow data alloc/free routines")
Fixes: 79e35d0d59 ("net/mlx5: share Direct Rules/Verbs flow related structures")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-05-03 18:45:22 +02:00
Ori Kam
34fa7c0268 net/mlx5: add drop action to Direct Verbs E-Switch
This commit adds support for drop action when creating E-Switch flow
using DV.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-04-19 14:51:55 +02:00
Ori Kam
e2b4925ef7 net/mlx5: support Direct Rules E-Switch
This commit checks the for DR E-Switch support.
The support is based on both Device and Kernel.
This commit also enables the user to manually disable this this feature.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-04-19 14:51:55 +02:00
Thomas Monjalon
5294b80029 ethdev: avoid explicit check of valid port state
Some port iterations are manually checking against RTE_ETH_DEV_UNUSED
instead of using the iterators based on rte_eth_find_next().

A new macro RTE_ETH_FOREACH_VALID_DEV() is introduced, but kept private
because there should be no need of iterating over all devices in the
API. The public iterators have additional filters for ownership, parent
device or sibling ports.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2019-04-19 14:51:55 +02:00
Yongseok Koh
120dc4a7dc net/mlx5: remove device register remap
UAR (User Access Region) register does not need to be remapped for
primary process but it should be remapped only for secondary process.
UAR register table is in the process private structure in
rte_eth_devices[],
(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private

The actual UAR table follows the data structure and the table is used
for both Tx and Rx.

For Tx, BlueFlame in UAR is used to ring the doorbell.
MLX5_TX_BFREG(txq) is defined to get a register for the txq. Processes
access its own private data to acquire the register from the UAR table.

For Rx, the doorbell in UAR is required in arming CQ event. However, it
is a known issue that the register isn't remapped for secondary process.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2019-04-12 11:02:02 +02:00
Viacheslav Ovsiienko
942d13e6e7 net/mlx5: fix sharing context destroy order
At the mlx5 device closing the shared IB context was destroyed
before cleanup routines completion. As it was found on some
setups (Netlink fails with old kernel drivers and we have to use
sysfs to retrieve interface index, this requires IB device name,
which is stored in shared context) the mlx5_nl_mac_addr_flush()
requires IB device name, and if shared context is removed it
causes the segmentation fault.

Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-12 11:02:02 +02:00
Viacheslav Ovsiienko
9c2bbd0488 net/mlx5: fix device probing for old kernel drivers
Retrieving network interface index via Netlink fails in
case of old ib_core kernel driver installed - mlx5_nl_ifindex()
routine fails due to RDMA_NLDEV_ATTR_NDEV_INDEX attribute is not
supported by the old driver.

The patch allowing to retrieve the network interface index and
name via Netlink [1]. So, the problem depends on ib_core module
version - 4.16 supports getting ifindex via Netlink, 4.15 does not.

This error was ignored in previous versions of MLX5 PMD probing
routine. For single device ifindex was retrieved via sysfs
and link control was not lost, so problem just was not noticed.
In order to support MLX5 PMD functioning over old kernel driver
this patch adds ifindex retrieving via sysfs into probing routine.
It is worth to note this method works for master/standalone
device only.

[1] https://www.spinics.net/lists/linux-rdma/msg62948.html
    Linux tree: 5b2cc79d (Leon Romanovsky 2018-03-27 20:40:49 +0300 270)

Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-12 11:02:02 +02:00
Viacheslav Ovsiienko
ae4eb7dc79 net/mlx5: fix typos in comments
Fixes: 299d7dc28c ("net/mlx5: add representor recognition on Linux 5.x")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-12 11:02:02 +02:00
Viacheslav Ovsiienko
79e35d0d59 net/mlx5: share Direct Rules/Verbs flow related structures
Direct Rules/Verbs related structures are moved to
the shared context:
  - rx/tx namespaces, shared by master and representors
  - rx/tx flow tables
  - matchers
  - encap/decap action resources
  - flow tags (MARK actions)
  - modify action resources
  - jump tables

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Viacheslav Ovsiienko
b2177648b8 net/mlx5: add Direct Rules flow data alloc/free routines
We are going to share the Direct Rules and Direct Verbs flow
device data structures between master and representors in the
E-Switch configurations over multiport IB device.

The code of initializing and destroying these data is
moved to dedicated routines, this is just a preparation
step for actual data sharing.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Ori Kam
4f84a19779 net/mlx5: add Direct Rules API
Adds calls to the Direct Rules API inside the glue functions.
Due to difference in parameters between the Direct Rules and Direct
Verbs some of the glue functions API was updated.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Thomas Monjalon
d874a4eed5 net/mlx5: use port sibling iterators
Iterating over siblings was done with RTE_ETH_FOREACH_DEV()
which skips the owned ports.
The new iterators RTE_ETH_FOREACH_DEV_SIBLING()
and RTE_ETH_FOREACH_DEV_OF() are more appropriate and more correct.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
dceb502942 net/mlx5: add control of excessive memory pinning by kernel
A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller
and get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in
the extended chunk is freed, that doesn't become reusable until the
entire memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be
turned off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
2aac5b5d11 net/mlx5: sync stop/start with secondary process
Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
7be600c8d8 net/mlx5: rework PMD global data init
There's more need to have PMD global data structure. This should be
initialized once per a process regardless of how many PMD instances are
probed. mlx5_init_once() is called during probing and make sure all the
init functions are called once per a process. Currently, such global
data and its initialization functions are even scattered. Rather than
'extern'-ing such variables and calling such functions one by one making
sure it is called only once by checking the validity of such variables, it
will be better to have a global storage to hold such data and a
consolidated function having all the initializations. The existing shared
memory gets more extensively used for this purpose. As there could be
multiple secondary processes, a static storage (local to process) is also
added.

As the reserved virtual address for UAR remap is a PMD global resource,
this doesn't need to be stored in the device priv structure, but in the
PMD global data.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
9a8ab29b84 net/mlx5: replace IPC socket with EAL API
Socket API is used for IPC in order for secondary process to acquire
Verb command file descriptor. The FD is used to remap UAR address.
The multi-process APIs (rte_mp) in EAL are newly introduced.
mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs.

As it is PMD global infrastructure, only one IPC channel is established.
All the IPC message types may have port_id in the message if there is
need to reference a specific device.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
3ebe658059 net/mlx5: fix memory event on secondary process
As the memory event is propagated to secondary processes, the event is
processed redundantly. This should be processed once because the data
structure used for MR and the event is global across the processes.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Viacheslav Ovsiienko
53e5a82fd1 net/mlx5: update install/uninstall event handlers
We are implementing the support for multiport Infiniband device
with representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
f048f3d479 net/mlx5: switch to the shared IB device context
The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created within this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
d485cdca01 net/mlx5: switch to the shared context IB attributes
The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multiple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
1b782252cb net/mlx5: switch to the shared protection domain
The PMD code is updated to use Protected Domain from the
shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
17e19bc4dd net/mlx5: add IB shared context alloc/free functions
The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
ad74bc6195 net/mlx5: support multiport IB device during probing
mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the single shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
e505508a38 net/mlx5: modify get ifindex routine for multiport IB
There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
299d7dc28c net/mlx5: add representor recognition on Linux 5.x
The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Shahaf Shuler
989e999d93 net/mlx5: support PCI device DMA map and unmap
The implementation reuses the external memory registration work done by
commit[1].

Note about representors:

The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.

While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.

[1]
commit 7e43a32ee0
("net/mlx5: support externally allocated static memory")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-30 16:48:58 +01:00
Thomas Monjalon
09c9c4d23d net/mlx5: call generic strlcpy
The call to strlcpy uses either libc, libbsd or internal rte_strlcpy.
No need to call the DPDK flavor explicitly.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Viacheslav Ovsiienko
4fb27c1dfe net/mlx5: fix flow priorities probing error path
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.

Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).

Fixes: 2815702bae ("net/mlx5: replace verbs priorities by flow")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Thomas Monjalon
dbeba4cf18 net/mlx: prefix private structure
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-03-01 18:17:35 +01:00
Thomas Monjalon
714bf46ebb net/mlx: support firmware version query
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.

It is tested with command "drvinfo" of examples/ethtool/.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-02-13 12:55:38 +01:00
Yongseok Koh
2014a7fbae net/mlx5: fix deprecated library API for Rx padding
In rdma-core library IBV_WQ_FLAG_RX_END_PADDING is renamed to
IBV_WQ_FLAGS_PCI_WRITE_END_PADDING. Way to query the capability is also
changed.

Fixes: 43e9d9794c ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-18 09:47:26 +01:00
Yongseok Koh
78c7a16daa net/mlx5: fix Rx packet padding
Rx packet padding is supposed to be set by an environment variable -
MLX5_PMD_ENABLE_PADDING, but it has been missing for some time by mistake.
Rather than using such a variable, a PMD parameter (rxq_pkt_pad_en) is
added instead.

Fixes: a1366b1a2b ("net/mlx5: add reference counter on DPDK Rx queues")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-18 09:47:26 +01:00
Thomas Monjalon
72b934adce config: gather options for dlopen mlx dependency
Rename options CONFIG_RTE_LIBRTE_MLX4_DLOPEN_DEPS and
CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS to a single option
CONFIG_RTE_IBVERBS_LINK_DLOPEN.
Rename meson option enable_driver_mlx_glue to ibverbs_link.

There was no good reason for setting a different link option
for mlx4 and mlx5. Having a single common option makes it
easier to understand and unify make and meson systems.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:29 +01:00
Moti Haimovsky
f5bf91de73 net/mlx5: support flow counters using devx
This commit adds counters support when creating flows via direct
verbs. The implementation uses devx interface in order to create
query and delete the counters.
This support requires MLNX_OFED_LINUX-4.5-0.1.0.1 installation.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:29 +01:00
Wisam Jaddo
f0354d8423 net/mlx5: add ConnectX-6 device IDs
This commit includes the add of:
- ConnectX-6 device ID
- ConnectX-6 SRIOV device ID

Signed-off-by: Wisam Jaddo <wisamm@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-03 13:07:06 +01:00
Thomas Monjalon
15febafdd4 drivers/net: set close behaviour flag at probing
The ethdev flag RTE_ETH_DEV_CLOSE_REMOVE is set for drivers
having migrated to the new behaviour of rte_eth_dev_close().

As any other flag, it can be useful to know about its value
as soon as the port is probed.
Unfortunately, it was set inside the close operation,
just before being erased by memset() in rte_eth_dev_release_port().
The flag assignment is moved to the probing stage, so it can
be checked by the application in order to anticipate the behaviour.

Fixes: 42603bbdb5 ("net/mlx5: release port on close")
Fixes: 6c99085d97 ("net/vmxnet3: fix hot-unplug")
Fixes: 4d7877fde2 ("net/ena: remove resources when port is being closed")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-14 00:35:53 +01:00
Tom Barbette
26f0488344 net/mlx5: support Rx queue count API
This patch adds support for the rx_queue_count API in mlx5 driver

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Yongseok Koh
09d8b41699 net/mlx5: make vectorized Tx threshold configurable
Add txqs_max_vec parameter to configure the maximum number of Tx queues to
enable vectorized Tx. And its default value is set according to the
architecture and device type.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Yongseok Koh
f87bfa8eae net/mlx5: move device spawn configuration to probing
When a device is spawned, it does make more sense that the configuration
parameters are passed by callee. Furthermore, setting default value for
some configuration would need PCIe device ID which can be found in the
probe function.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Yongseok Koh
bc91e8db12 net/mlx5: add 128B padding of Rx completion entry
A PMD parameter (rxq_cqe_pad_en) is added to enable 128B padding of CQE on
RX side. The size of CQE is aligned with the size of a cacheline of the
core. If cacheline size is 128B, the CQE size is configured to be 128B even
though the device writes only 64B data on the cacheline. This is to avoid
unnecessary cache invalidation by device's two consecutive writes on to one
cacheline. However in some architecture, it is more beneficial to update
entire cacheline with padding the rest 64B rather than striding because
read-modify-write could drop performance a lot. On the other hand, writing
extra data will consume more PCIe bandwidth and could also drop the maximum
throughput. It is recommended to empirically set this parameter. Disabled
by default.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-11-05 15:01:25 +01:00
Ophir Munk
3a8207423a net/mlx5: close all ports on remove
With the introduction of representors several eth devices are using
the same rte device (e.g. a PCI bus). When calling port detach on one
eth device it is required that all eth devices belonging to the
same rte device have been closed in advance, then the rte device
itself can be removed/detached.
This commit implements this requirement implicitly by adding a
remove callback to struct rte_pci_driver.
The new behavior can be demonstrated in testpmd.
First we attach a representor 0 using PCI address 0000:08:00.0
testpmd> port attach  0000:08:00.0,representor=[0]
Attaching a new port...
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1013 net_mlx5
Port 0 is attached.
Done
Port 1 is attached.
Done

Port 0 is the master device (PF) - an ethdev of the PCI address.
Port 1 is representor 0 - another ethdev (representing a VF) using the
same PCI address. Next we detach port 1
testpmd> port detach 1
Removing a device...
Port 0 is closed
Port 1 is closed
Now total ports is 0
Done

Since port 0 has been implicitly closed we cannot act on it anymore.
testpmd> port stop 0
Invalid port 0

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-10-26 22:14:06 +02:00
Ophir Munk
42603bbdb5 net/mlx5: release port on close
With the introduction of representors several eth devices are using
the same rte device (e.g. a PCI bus). It is therefore required to
release the eth device resources during an eth device close operation
rather than during an rte device removal (detach) operation.
In current version many PMDs are still releasing the eth device as
part of the rte device removal. In order to allow a smooth transition
for all PMDs to behave correctly an ethdev flag RTE_ETH_DEV_CLOSE_REMOVE
is used. When this flag is set it indicates to rte_eth_dev_close() to
call rte_eth_dev_release_port(), so the port is freed during the close
operation.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-10-26 22:14:06 +02:00
Ophir Munk
206254b7dc net/mlx5: allow multiple probing for representor
Implement probing of a rte device multiple times, see [1].
Set PCI driver RTE_PCI_DRV_PROBE_AGAIN flag to enable multiple probing
of the PCI device by the PCI common driver.
Consecutive probing requests with a devargs string may contain
repetitive master and representors devices for which eth device should
be created only once. In case an eth device already exists - silently
ignore it.

[1]
commit e9d159c3d5 ("eal: allow probing a device again")

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-10-26 22:14:06 +02:00
Yongseok Koh
58b1312e9d net/mlx5: add warning message for Direct Verbs flow
In case that the library doesn't support DV flow, if enabled by
'dv_flow_en=1', print out a warning message and disable it.

Fixes: 51e72d386c ("net/mlx5: add runtime parameter to enable Direct Verbs")

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
2018-10-26 22:14:06 +02:00
Viacheslav Ovsiienko
2dd8b72167 net/mlx5: simplify flow counters support check
The redundant check of Flow counters support in runtime is removed.
The flag flow_counter_en is eliminated from the code. The Verbs
create counter function just returns an error if no counter
support presented in the system.

If there is no any of Flow counters configuration macro defined
the log message is emited, indicating the missing counter support.

mlx5_flow_validate_action_count() fuctnion is also updated due to
flow_counter_en flag removal.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2018-10-26 22:14:06 +02:00
Viacheslav Ovsiienko
0d8c6e6ba0 net/mlx5: rename flow counter configuration macro
The HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT is replaced with
HAVE_IBV_DEVICE_COUNTERS_SET_V42. At this stage it is just
macro renaming. This macro is defined if system supports
the "old" Flow counters functionality, MLNX_OFED version
from 4.2 to 4.4 is required.

We need to do this preparation before introducing the new
configuration macro (HAVE_IBV_DEVICE_COUNTERS_SET_V45) for
the "new" Flow counters support.

Both makefile and meson.build are changed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2018-10-26 22:14:06 +02:00
Thomas Monjalon
a7d3c6271d ethdev: support representor id as iterator filter
The representor id is added in rte_eth_dev_data in order to be able
to match a port with its representor id in devargs.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-26 22:14:06 +02:00
Moti Haimovsky
d53180afe3 net/mlx5: refactor TC-flow infrastructure
This commit refactors tc_flow as a preparation to coming commits
that sends different type of messages and expect differ type of replies
while still using the same underlying routines.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
2018-10-26 22:14:05 +02:00
Thomas Monjalon
e16adf08e5 ethdev: free all common data when releasing port
This is a clean-up of common ethdev data freeing.
All data freeing are moved to rte_eth_dev_release_port()
and done only in case of primary process.

It is probably fixing some memory leaks for PMDs which were
not freeing all data.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-26 22:14:05 +02:00
Thomas Monjalon
391797f042 drivers/bus: move driver assignment to end of probing
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa ("bus/pci: reference driver structure before mapping")

However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
2018-10-17 10:26:59 +02:00
Yongseok Koh
57123c00c1 net/mlx5: add Linux TC flower driver for E-Switch flow
Flows having 'transfer' attribute have to be inserted to E-Switch on the
NIC and the control path uses Linux TC flower interface via Netlink
socket.
This patch adds the flow driver on top of the new flow engine.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Yongseok Koh
40c9ccf9e9 net/mlx5: remove Netlink flow driver
Netlink based E-Switch flow engine will be migrated to the new flow
engine.
nl_flow will be renamed to flow_tcf as it goes through Linux TC flower
interface.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Yongseok Koh
0c76d1c9a1 net/mlx5: add abstraction for multiple flow drivers
Flow engine has to support multiple driver paths. Verbs/DV for NIC flow
steering and Linux TC flower for E-Switch flow steering. In the future,
another flow driver could be added (devX).

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
51e72d386c net/mlx5: add runtime parameter to enable Direct Verbs
DV flow API is based on new kernel API and is
missing some functionality like counter but add other functionality
like encap.

In order not to affect current users even if the kernel supports
the new DV API it should be enabled only manually.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Ori Kam
84c406e745 net/mlx5: add flow translate function
This commit modify the conversion of the input parameters into Verbs
spec, in order to support all previous changes.

Some of those changes are:
removing the use of the parser,
storing each flow in its own flow structure.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 18:53:49 +02:00
Anatoly Burakov
5282bb1c36 mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 10:24:29 +02:00
Ori Kam
c322c0e558 net/mlx5: add bluefield VF support
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-09-28 01:41:01 +02:00
Shahaf Shuler
f9de87187b net/mlx5: disable ConnectX-4 Lx Multi Packet Send by default
On ConnectX-4 Lx the Multi Packet Send (MPW) feature is considered
un-secure, as on some cases were the application provides incorrect mbufs
on the Tx burst the host or NIC can get stuck.

Hence, disabling the feature by default for this specific NIC.
Users can still enable this feature and enjoy the performance gain
(mostly for low number of cores) by using the txq_mpw_en devarg.

This patch will impact the out of the box performance of some application
using ConnectX-4 Lx for the sack of security and robustness.

Since we need different defaults based on the underlying device the mpw
field in the configuration struct was extended to contain also the
MLX5_ARG_UNSET option.

Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-28 15:27:39 +02:00
Yongseok Koh
2547ee7458 net/mlx5: preserve allmulticast flag for flow isolation mode
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling allmulti
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as
rte_eth_allmulticast_enable/disable() fail to set the flag
(dev->data->all_multicast). The flag is used when starting traffic by
mlx5_traffic_enable(). When switching out of flow isolation mode, allmulti
mode will not be set even though it has been enabled.

Fixes: 0887aa7f27 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-05 08:47:41 +02:00
Yongseok Koh
24b068ad71 net/mlx5: preserve promiscuous flag for flow isolation mode
mlx5_dev_ops_isolate doesn't have APIs for enabling/disabling promiscuous
mode as it can't be enabled in flow isolation mode. If the function
pointers are null, librte APIs such as rte_eth_promiscuous_enable/disable()
fail to set the flag (dev->data->promiscuous). The flag is used when
starting traffic by mlx5_traffic_enable(). When switching out of flow
isolation mode, promiscuous mode will not be set even though it has been
enabled.

Fixes: 0887aa7f27 ("net/mlx5: add new operations for isolated mode")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2018-08-05 08:47:40 +02:00
Nelio Laranjeiro
5366074b01 net/mlx5: fix route Netlink message overflow
Route Netlink message socket is wrongly initialized by registering to
the route link group.  This causes the socket to receive all link
message related to routes whereas the PMD do not expect to receive such
information.  In some situation it ends by filling the socket at a point
that any new message cannot be exchanged.
As the PMD is not expected to process such broadcast messages, the
parameter in the nl_group in the function is also remove.

Fixes: ccdcba53a3 ("net/mlx5: use Netlink to add/remove MAC addresses")
Cc: stable@dpdk.org

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
2018-07-26 14:05:52 +02:00
Nelio Laranjeiro
f872b4b99d net/mlx5: fix representors detection
On systems where the required Netlink commands are not supported but
Mellanox OFED is installed, representors information must be retrieved
through sysfs.

Fixes: 26c08b979d ("net/mlx5: add port representor awareness")

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2018-07-26 14:05:52 +02:00
Adrien Mazarguil
20b71e92ef net/mlx5: lay groundwork for switch offloads
With mlx5, unlike normal flow rules implemented through Verbs for traffic
emitted and received by the application, those targeting different logical
ports of the device (VF representors for instance) are offloaded at the
switch level and must be configured through Netlink (TC interface).

This patch adds preliminary support to manage such flow rules through the
flow API (rte_flow).

Instead of rewriting tons of Netlink helpers and as previously suggested by
Stephen [1], this patch introduces a new dependency to libmnl [2]
(LGPL-2.1) when compiling mlx5.

[1] https://mails.dpdk.org/archives/dev/2018-March/092676.html
[2] https://netfilter.org/projects/libmnl/

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-26 14:05:52 +02:00
Moti Haimovsky
6bf10ab69b net/mlx5: support 32-bit systems
This patch adds support for building and running mlx5 PMD on
32bit systems such as i686.

The main issue to tackle was handling the 32bit access to the UAR
as quoted from the mlx5 PRM:
QP and CQ DoorBells require 64-bit writes. For best performance, it
is recommended to execute the QP/CQ DoorBell as a single 64-bit write
operation. For platforms that do not support 64 bit writes, it is
possible to issue the 64 bits DoorBells through two consecutive
writes,
each write 32 bits, as described below:
* The order of writing each of the Dwords is from lower to upper
  addresses.
* No other DoorBell can be rung (or even start ringing) in the midst
 of an on-going write of a DoorBell over a given UAR page.

The last rule implies that in a multi-threaded environment, the access
to a UAR page (which can be accessible by all threads in the process)
must be synchronized (for example, using a semaphore) unless an atomic
write of 64 bits in a single bus operation is guaranteed. Such a
synchronization is not required for when ringing DoorBells on different
UAR pages.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 14:34:59 +02:00
Nelio Laranjeiro
af689f1f04 net/mlx5: support flow Ethernet item along with drop action
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-07-12 12:10:01 +02:00