This patch implements ethdev operations get_module_info and
get_module_eeprom, to support ethtool commands ETHTOOL_GMODULEINFO
and ETHTOOL_GMODULEEEPROM.
New functions mlx5_get_module_info() and mlx5_get_module_eeprom()
added in mlx5_ethdev.c.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
This commit adds support for RTE_FLOW_ACTION_TYPE_OF_POP_VLAN via
direct verbs flow rules.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Restrict this header inclusion to its real users.
Fixes: 028669bc9f ("eal: hide shared memory config")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This adds support for adding a new UDP tunnel port
on a specific VXLAN types.
Currently we only support VXLAN, VXLAN-GPE on ports
4789, 4790 respectively. Without having to configure
anything in the NIC.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The shared Infiniband device context should be included
into memory event callback list only once on context creation,
and removed from the list only once on context destroying.
Multiple insertions of the same object caused the infinite
loop on the list processing.
Fixes: ccb3815346 ("net/mlx5: update memory event callback for shared context")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The patch [Fixes] sets the default value of required minimal
inline data to 0 bytes. On some configurations (depends
on switchdev/legacy settings and FW version/settings)
the ConnectX-4LX NIC requires minimal 18 bytes of
Tx descriptor inline data to operate correctly.
Wrongly set to 0 default value may prevent NIC from operating
with out-of-the-box settings, this patch reverts default
value for ConnectX-4LX back to 18 bytes (inline L2).
Fixes: 9f350504bb ("net/mlx5: fix ConnectX-4LX minimal inline data limit")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
The function mlx5_set_min_inline() includes a switch() that checks
various PCI device IDs in order to set the txq_inline_min value. No
value is set when the PCI device ID matches the ConnectX-5 adapters,
resulting in an assert() failure later in the function
mlx5_set_txlimit_params().
This error was encountered on an IBM Power 9 system running RHEL 7.6
w/o Mellanox OFED installed.
Fixes: 38b4b397a5 ("net/mlx5: add Tx configuration and setup")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
On some virtual setups (particularly on ESXi) when we have SR-IOV and
E-Switch enabled there is the problem to receive VLAN traffic on VF
interfaces. The NIC driver in ESXi hypervisor does not setup E-Switch
vport setting correctly and VLAN traffic targeted to VF is dropped.
The patch provides the temporary workaround - if the rule
containing the VLAN pattern is being installed for VF the VLAN
network interface over VF is created, like the command does:
ip link add link vf.if name mlx5.wa.1.100 type vlan id 100
The PMD in DPDK maintains the database of created VLAN interfaces
for each existing VF and requested VLAN tags. When all of the RTE
Flows using the given VLAN tag are removed the created VLAN interface
with this VLAN tag is deleted.
The name of created VLAN interface follows the format:
evmlx.d1.d2, where d1 is VF interface ifindex, d2 - VLAN ifindex
Implementation limitations:
- mask in rules is ignored, rule must specify VLAN tags exactly,
no wildcards (which are implemented by the masks) are allowed
- virtual environment is detected via rte_hypervisor() call,
and the type of hypervisor is checked. Currently we engage
the workaround for ESXi and unrecognized hypervisors (which
always happen on platforms other than x86 - it means workaround
applied for the Flow over PCI VF). There are no confirmed data
the other hypervisors (HyperV, Qemu) need this workaround,
we are trying to reduce the list of configurations on those
workaround should be applied.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Mellanox ConnectX-4LX NIC in configurations with disabled
E-Switch can operate without minimal required inline data
into Tx descriptor. There was the hardcoded limit set to
18B in PMD, fixed to be no limit (0B).
Fixes: 38b4b397a5 ("net/mlx5: add Tx configuration and setup")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
LRO support was only for MPRQ, hence mprq Rx burst was selected when
LRO was configured in the port.
The current support for MPRQ is suffering from bad memory utilization
since an external mempool is allocated by the PMD for the packets data
in addition to the user mempool, besides that, the user may get packet
data addresses which were not configured by him.
Even though MPRQ has the best performance for packet receiving in the
most cases and because of the above facts it is better to remove the
automatic MPRQ select when LRO is configured.
Move MPRQ to be selected only when the user force it by the PMD
arguments including LRO case.
Allow LRO offload using the regular RQ with the regular Rx burst
function.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Function mlx5_rxq_release() calls mlx5_release_dbr() to release the
doorbell allocated for this Rx queue.
This call is relevant only for Rx queue objects created using
DevX API.
This patch adds the required check, to call mlx5_release_dbr()
only when relevant.
It also updates mlx5_release_dbr() to use the input offset correctly.
Fixes: dc9ceff73c ("net/mlx5: create advanced RxQ via DevX")
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Function mlx5_alloc_shared_ibctx() allocates Protection Domain using
verbs API, as part of shared IB device context.
This patch adds reading and storing of pdn value from the created PD
object, using DV API.
The pdn value is required when creating WQ using DevX API.
This patch also updates function flow_dv_create_counter_stat_mem_mng()
which uses the pdn value as well.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Prepare for introducing use of DevX TIR object.
Hash Rx queue is currently created using verbs QP only.
The next patches will add the option to create it with a TIR object
using DevX.
This patch renames hrxq_ibv to hrxq wherever relevant, and adds
the DevX items to relevant structs.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Prepare for introducing of DevX RQT object.
Rx indirection table object is currently created using verbs only.
The next patches will add the option to create an RQT object using
DevX.
This patch renames ind_table_ibv to ind_table_obj wherever relevant,
and adds the DevX items to relevant structs.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Prepare for introducing of DevX RxQ object.
RxQ object is currently created using verbs only.
The next patches will add the option to create RxQ object using DevX.
This patch renames rxq_ibv to rxq_obj wherever relevant, and adds the
DevX items to relevant structs.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When using DevX API, memory for door-bell records should be allocated
by PMD and registered using DevX API.
This patch implements the utility functions to support it:
- Add struct mlx5_devx_dbr_page, containing door-bells page data.
- Add list of struct mlx5_devx_dbr_page door-bell pages to device
private data.
- Implement function mlx5_alloc_dbr_page() to allocate page for
door-bell records, and register it using DevX API.
- Implement function mlx5_get_dbr(). to acquire a door-bell record
from the door-bells page, allocating a new page if needed.
- Implement function mlx5_release_dbr() to release a door-bell
record that is no longer needed, freeing the containing page if
it becomes empty.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Use DevX API to read device LRO capabilities.
Check if LRO is supported and can be enabled.
Check if MPRQ is supported and can be used.
Enable MPRQ for LRO use if not enabled by user.
Added note for mlx5_mprq_enabled(), to emphasize that LRO
enables MPRQ.
Disable CQE compression and CRC stripping if LRO is enabled.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Add compile option HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR, and matching
dest_tir flag in device configuration structure.
Add glue function pointer dv_create_flow_action_dest_devx_tir, and
function mlx5_glue_dv_create_flow_action_dest_devx_tir(),
to invoke API mlx5dv_dr_action_create_dest_devx_tir();
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Add command-line argument to set LRO session timeout.
Add LRO settings struct in PMD configuration struct.
Add support of LRO offload in port configuration.
Add macros and function to check if LRO is supported and enabled.
Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
The associated device index is retrieved via Netlink request to
underlying Infiniband device driver. This network device index
is permanent throughout the lifetime of device. We do not
spawn the rte_eth_dev ports without associated network device, and
if network device is being unbound we get the remove notification
message and rte_eth_dev port is also detached. So, we may store
the ifindex in mlx5_device_spawn() routine at rte_eth_dev port
creation and initialization time and use the cached value further
instead of doing actual Netlink request.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch updates the Tx datapath control and configuration
structures and code for managing Tx datapath settings.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch introduces new mlx5 PMD devarg options:
- txq_inline_min - specifies minimal amount of data to be inlined into
WQE during Tx operations. NICs may require this minimal data amount
to operate correctly. The exact value may depend on NIC operation
mode, requested offloads, etc.
- txq_inline_max - specifies the maximal packet length to be completely
inlined into WQE Ethernet Segment for ordinary SEND method. If packet
is larger the specified value, the packet data won't be copied by the
driver at all, data buffer is addressed with a pointer. If packet
length is less or equal all packet data will be copied into WQE.
- txq_inline_mpw - specifies the maximal packet length to be completely
inlined into WQE for Enhanced MPW method.
Driver documentation is also updated.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes the existing Tx datapath code
as preparation step before introducing the new
implementation. The following entities are being
removed:
- deprecated devargs support
- tx_burst() routines
- related PRM definitions
- SQ configuration code
- Tx routine selection code
- incompatible Tx completion code
The following devargs are deprecated and ignored:
- "txq_inline" is going to be converted to "txq_inline_max"
for compatibility issue
- "tx_vec_en"
- "txqs_max_vec"
- "txq_mpw_hdr_dseg_en"
- "txq_max_inline_len" is going to be converted
to "txq_inline_mpw" for compatibility issue
The deprecated devarg keys are recognized by PMD
and ignored/converted to the new ones in order not
to block device probing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
In case the asynchronous devx commands are not supported in RDMA core
fallback to use a basic counter management.
Here, the PMD counters cashe is redundant and the host thread doesn't
update it. hence, each counter operation will go to the FW and the
acceleration reduces.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
All the DV counters are cashed in the PMD memory and are contained in
pools which are contained in containers according to the counters
allocation type - batch or single.
Currently, the flow counter query is done synchronously in pool
resolution means that on the user request a FW command is triggered to
read all the counters in the pool.
A new feature of devX to asynchronously read batch of flow counters
allows to accelerate the user query operation.
Using the DPDK host thread, the PMD periodically triggers asynchronous
query in pool resolution for all the counter pools and an interrupt is
triggered by the FW when the values are updated.
In the interrupt handler the pool counter values raw data is replaced
using a double buffer algorithm (very fast).
In the user query, the PMD just returns the last query values from the
PMD cache - no system-calls and FW commands are triggered from the user
control thread on query operation!
More synchronization is added with the host thread:
Container resize uses double buffer algorithm.
Pools growing in container uses atomic operation.
Pool query buffer replace uses a spinlock.
Pool minimum devX counter ID uses atomic operation.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The DevX interface exposes a new feature to the PMD that can allocate a
batch of counters by one FW command. It can improve the flow
transaction rate (with count action).
Add a new counter pools mechanism to manage HW counters in the PMD.
So, for each flow with counter creation the PMD will try to find a free
counter in the PMD pools container and only if there is no a free
counter, it will allocate a new DevX batch counters.
Currently we cannot support batch counter for a group 0 flow, so
create a 2 container types, one which allocates counters one by
one and one which allocates X counters by the batch feature.
The allocated counters objects are never released back to the HW
assuming the flows maximum number will be close to the actual value of
the flows number.
Later, it can be updated, and dynamic release mechanism can be added.
The counters are contained in pools, each pool with 512 counters.
The pools are contained in counter containers according to the
allocation resolution type - single or batch.
The cache memory of the counters statistics is saved as raw data per
pool.
All the raw data memory is allocated for all the container in one
memory allocation and is managed by counter_stats_mem_mng structure
which registers all the raw memory to the HW.
Each pool points to one raw data structure.
The query operation is in pool resolution which updates all the pool
counter raw data by one operation.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This commit removes the support of configuring the device E-switch
using TCF since it is now possible to configure it via DR (direct
verbs rules), and by that to also remove the PMD dependency in libmnl.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.
This behavior doesn't make sense because the user intention is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.
Stop probing and report an error in case of problematic command line
arguments.
Fixes: e72dd09b61 ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Add a global function in the PMD which dumps debug information to
specific file.
The data can be printed in hexadecimal format or as regular string.
The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.
The files will be created in the /var/log directory or in the current
directory.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When device is being closed and tries to unregister interrupt callback,
there is a chance the handler is still active (called in context of
eal_intr_thread_main thread). If so the rte_intr_callback_unregister
returns -EAGAIN and keeps the handler registered, causing crash when
underlaying resourse is gone away.
This race condition may happen if event handling in application takes
a long time. We should check the return code of unregistering routine
and try again to unregister the handler. The diagnostic messages are
shown once a second, while trying to unregister.
Fixes: 028b2a28c3 ("net/mlx5: update event handler for multiport IB devices")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Implements support for read_clock for the mlx5 driver. mlx5 supports
hardware timestamp offload, setting packets timestamp field to the
device clock. rte_eth_read_clock allows to read the device's current
clock value and therefore compare values on similar time base.
See rxtx_callbacks for an example.
Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.
For secondary processes, IPC will be crucial, so leave all
of the error handling as is.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add 'RTE_' prefix to defines:
- rename ETHER_ADDR_LEN as RTE_ETHER_ADDR_LEN.
- rename ETHER_TYPE_LEN as RTE_ETHER_TYPE_LEN.
- rename ETHER_CRC_LEN as RTE_ETHER_CRC_LEN.
- rename ETHER_HDR_LEN as RTE_ETHER_HDR_LEN.
- rename ETHER_MIN_LEN as RTE_ETHER_MIN_LEN.
- rename ETHER_MAX_LEN as RTE_ETHER_MAX_LEN.
- rename ETHER_MTU as RTE_ETHER_MTU.
- rename ETHER_MAX_VLAN_FRAME_LEN as RTE_ETHER_MAX_VLAN_FRAME_LEN.
- rename ETHER_MAX_VLAN_ID as RTE_ETHER_MAX_VLAN_ID.
- rename ETHER_MAX_JUMBO_FRAME_LEN as RTE_ETHER_MAX_JUMBO_FRAME_LEN.
- rename ETHER_MIN_MTU as RTE_ETHER_MIN_MTU.
- rename ETHER_LOCAL_ADMIN_ADDR as RTE_ETHER_LOCAL_ADMIN_ADDR.
- rename ETHER_GROUP_ADDR as RTE_ETHER_GROUP_ADDR.
- rename ETHER_TYPE_IPv4 as RTE_ETHER_TYPE_IPv4.
- rename ETHER_TYPE_IPv6 as RTE_ETHER_TYPE_IPv6.
- rename ETHER_TYPE_ARP as RTE_ETHER_TYPE_ARP.
- rename ETHER_TYPE_VLAN as RTE_ETHER_TYPE_VLAN.
- rename ETHER_TYPE_RARP as RTE_ETHER_TYPE_RARP.
- rename ETHER_TYPE_QINQ as RTE_ETHER_TYPE_QINQ.
- rename ETHER_TYPE_ETAG as RTE_ETHER_TYPE_ETAG.
- rename ETHER_TYPE_1588 as RTE_ETHER_TYPE_1588.
- rename ETHER_TYPE_SLOW as RTE_ETHER_TYPE_SLOW.
- rename ETHER_TYPE_TEB as RTE_ETHER_TYPE_TEB.
- rename ETHER_TYPE_LLDP as RTE_ETHER_TYPE_LLDP.
- rename ETHER_TYPE_MPLS as RTE_ETHER_TYPE_MPLS.
- rename ETHER_TYPE_MPLSM as RTE_ETHER_TYPE_MPLSM.
- rename ETHER_VXLAN_HLEN as RTE_ETHER_VXLAN_HLEN.
- rename ETHER_ADDR_FMT_SIZE as RTE_ETHER_ADDR_FMT_SIZE.
- rename VXLAN_GPE_TYPE_IPV4 as RTE_VXLAN_GPE_TYPE_IPV4.
- rename VXLAN_GPE_TYPE_IPV6 as RTE_VXLAN_GPE_TYPE_IPV6.
- rename VXLAN_GPE_TYPE_ETH as RTE_VXLAN_GPE_TYPE_ETH.
- rename VXLAN_GPE_TYPE_NSH as RTE_VXLAN_GPE_TYPE_NSH.
- rename VXLAN_GPE_TYPE_MPLS as RTE_VXLAN_GPE_TYPE_MPLS.
- rename VXLAN_GPE_TYPE_GBP as RTE_VXLAN_GPE_TYPE_GBP.
- rename VXLAN_GPE_TYPE_VBNG as RTE_VXLAN_GPE_TYPE_VBNG.
- rename ETHER_VXLAN_GPE_HLEN as RTE_ETHER_VXLAN_GPE_HLEN.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add 'rte_' prefix to structures:
- rename struct ether_addr as struct rte_ether_addr.
- rename struct ether_hdr as struct rte_ether_hdr.
- rename struct vlan_hdr as struct rte_vlan_hdr.
- rename struct vxlan_hdr as struct rte_vxlan_hdr.
- rename struct vxlan_gpe_hdr as struct rte_vxlan_gpe_hdr.
Do not update the command line library to avoid adding a dependency to
librte_net.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The RDMA-CORE Direct Rules API was changed in latest upstream code
This commit update the API accordingly.
Fixes: 4f84a19779 ("net/mlx5: add Direct Rules API")
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Mellanox mlx5 PMD implements the list of devices to process the memory
free events to reflect the actual memory state to Memory Regions.
Because this list contains the devices and devices may share the
same context the callback routine may be called multiple times
with the same parameter, that is not optimal. This patch modifies
the list to contain the device contexts instead of device objects
and shared context is included in the list only once.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The multiport Infiniband device support was introduced [1].
All active ports, belonging to the same Infiniband device use the single
shared Infiniband context of that device and share the resources:
- QPs are created within shared context
- Verbs flows are also created with specifying port index
- DV/DR resources
- Protection Domain
- Event Handlers
This patchset adds support for Memory Regions sharing between
ports, created on the base of multiport Infiniband device.
The datapath of mlx5 uses the layered cache subsystem for
allocating/releasing Memory Regions, only the lowest layer L3
is subject to share due to performance issues.
[1] http://patches.dpdk.org/cover/51800/
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
If there is the support of DevX is exposed by rdma-core but
DevX is not supported by or disabled for the specific interface
the mlx5_devx_cmd_query_hca_attr() routine returns an error
preventing the device from successful probing. The routine
should be invoked only in case of enabled DevX.
Fixes: e2b4925ef7 ("net/mlx5: support Direct Rules E-Switch")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
All the library calls must be called via the glue layer.
Fixes: b2177648b8 ("net/mlx5: add Direct Rules flow data alloc/free routines")
Fixes: 79e35d0d59 ("net/mlx5: share Direct Rules/Verbs flow related structures")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This commit adds support for drop action when creating E-Switch flow
using DV.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
This commit checks the for DR E-Switch support.
The support is based on both Device and Kernel.
This commit also enables the user to manually disable this this feature.
Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Some port iterations are manually checking against RTE_ETH_DEV_UNUSED
instead of using the iterators based on rte_eth_find_next().
A new macro RTE_ETH_FOREACH_VALID_DEV() is introduced, but kept private
because there should be no need of iterating over all devices in the
API. The public iterators have additional filters for ownership, parent
device or sibling ports.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
UAR (User Access Region) register does not need to be remapped for
primary process but it should be remapped only for secondary process.
UAR register table is in the process private structure in
rte_eth_devices[],
(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private
The actual UAR table follows the data structure and the table is used
for both Tx and Rx.
For Tx, BlueFlame in UAR is used to ring the doorbell.
MLX5_TX_BFREG(txq) is defined to get a register for the txq. Processes
access its own private data to acquire the register from the UAR table.
For Rx, the doorbell in UAR is required in arming CQ event. However, it
is a known issue that the register isn't remapped for secondary process.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
At the mlx5 device closing the shared IB context was destroyed
before cleanup routines completion. As it was found on some
setups (Netlink fails with old kernel drivers and we have to use
sysfs to retrieve interface index, this requires IB device name,
which is stored in shared context) the mlx5_nl_mac_addr_flush()
requires IB device name, and if shared context is removed it
causes the segmentation fault.
Fixes: 17e19bc4dd ("net/mlx5: add IB shared context alloc/free functions")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Retrieving network interface index via Netlink fails in
case of old ib_core kernel driver installed - mlx5_nl_ifindex()
routine fails due to RDMA_NLDEV_ATTR_NDEV_INDEX attribute is not
supported by the old driver.
The patch allowing to retrieve the network interface index and
name via Netlink [1]. So, the problem depends on ib_core module
version - 4.16 supports getting ifindex via Netlink, 4.15 does not.
This error was ignored in previous versions of MLX5 PMD probing
routine. For single device ifindex was retrieved via sysfs
and link control was not lost, so problem just was not noticed.
In order to support MLX5 PMD functioning over old kernel driver
this patch adds ifindex retrieving via sysfs into probing routine.
It is worth to note this method works for master/standalone
device only.
[1] https://www.spinics.net/lists/linux-rdma/msg62948.html
Linux tree: 5b2cc79d (Leon Romanovsky 2018-03-27 20:40:49 +0300 270)
Fixes: ad74bc6195 ("net/mlx5: support multiport IB device during probing")
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
We are going to share the Direct Rules and Direct Verbs flow
device data structures between master and representors in the
E-Switch configurations over multiport IB device.
The code of initializing and destroying these data is
moved to dedicated routines, this is just a preparation
step for actual data sharing.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>