While at it only output driver version to dmesg(8) when hardware is present.
Differential Revision: https://reviews.freebsd.org/D29100
MFC after: 1 week
Reviewed by: kib and markj
Sponsored by: NVIDIA Networking
Overview:
This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.
This patch is based on about four main areas of work:
- Update of the IB uobjects system:
- The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
is now managed by ibcore. This also require some changes in the
kernel verbs API. The updated verbs changes are typically about
initialize and deinitialize objects, and remove allocation and
free of memory.
- Update of the uverbs IOCTL framework:
- The parsing and handling of user-space commands has been
completely refactored to integrate with the updated IB uobjects
system.
- Various changes and updates to the generic uverbs interfaces in
device drivers including the new uAPI surface.
- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.
Dependencies:
- The mlx4ib driver code has been updated with the minimum changes
needed.
- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.
Compatibility:
- All user-space facing APIs are backwards compatible after this
change.
- All kernel-space facing RDMA APIs are backwards compatible after
this change, with exception of ib_create_ah() and ib_destroy_ah()
which takes a new flag.
- The "ib_device_ops" structure exist, but only contains the driver ID
and some structure sizes.
Differences from Linux:
- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
the sizes needed for allocating various IB objects, when adding
IB device instances.
Security:
- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.
Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429
MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31149
Sponsored by: NVIDIA Networking
Properly allocate all mlx5en(4) structures from correct numa domain.
While at it cleanup unused numa domain integers deriving from the
Linux version of mlx5en(4).
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make sure the "uid" field gets properly set when destroying DCT and QP
objects by making a copy of the field when creating such objects.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
To avoid congestion on the same PCI memory register space when
traffic consists mostly of small packets.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The driver expects all TLS tags to be returned to the driver before
it can free the UMA zone where the TLS tags reside.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.
Linux commit:
65389322b28f81cc137b60a41044c2d958a7b950
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
RoCE is short for Remote direct memory access over Converged Ethernet.
ECN is short for Explicit Congestion Notification.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Querying the PCI config space for offline for every firmware command blocks
the PCI bus and affects performance. Especially for packet pacing and TLS
when objects are frequently created and destroyed.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
These fields declare which timestamp mode is supported
by the device per RQ/SQ/QP.
In addition add the ts_format field to the select the mode
for RQ/SQ/QP.
Linux commit:
a6a217dddcd544f6b75f0e2a60b6e84c1d494b7e
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the
checks of out-of-scope redundant. Let's remove them together with updating
function signature to return boolean result.
While at it remove unused "ll" parameter from ib_modify_qp_is_ok().
Linux commit:
19b1f54099b6ee334acbfbcfbdffd1d1f057216d
d31131bba5a1630304c55ea775c48cc84912ab59
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.
Linux commit:
72a7720fca37fec0daf295923f17ac5d88a613e1
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Extended atomics are supported with RC and XRC QP types, but Linux commit
a60109dc9a95 added an unneeded check to to_mlx5_access_flags().
This broke XRC QPs.
The following ib_atomic_bw invocation over XRC reproduces the issue:
ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD
It is safe to remove such checks because the QP type was already checked
in ib_modify_qp_is_ok(), which was previously called from
mlx5_ib_modify_qp().
Linux commit:
13f8d9c16693afb908ead3d2a758adbe6a79eccd
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The maximum page size in the mkey context is 2GB.
Until today, we didn't enforce this requirement in the code, and therefore,
if we got a page size larger than 2GB, we have passed zeros in the
log_page_shift instead of the actual value and the registration failed.
This patch limits the driver to use compound pages of 2GB for mkeys.
Linux commit:
762f899ae7875554284af92b821be8c083227092
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:
First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.
This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.
Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).
Linux commit:
d67bc5d4e3e100d762c0f57ea67f28bc219698a6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
- Upon error more completion events than requested may be generated,
particularly when using the completion event factor feature.
- Count number of event errors in the transmit path.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.
Linux commit:
ca5b91d63192ceaa41a6145f8c923debb64c71fa
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make the mlx5e_mode_table[] array one dimensional, because there is only
one entry, 10G ER/LR, which share the same protocol bit.
This patch only adds support for basic sub-type distinguishing for the
extended protocol bits. Use verbose ifconfig eeprom output to get actual
media type.
Remove write only "connector_type" variable while at it.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
This code practically has not sleeping points, so Giant is locked for very
long time.
Noted and reviewed by: hselasky
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Import Linux commit 534b1204ca4694db1093b15cf3e79a99fcb6a6da
Add reserved mapping to cover all the register in order to avoid setting
arbitrary values to newer FW which implements the reserved fields.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
Import Linux commit ce28f0fd670ddffcd564ce7119bdefbaf08f02d3:
Add reserved mapping to cover all the register in order to avoid
setting arbitrary values to newer FW which implements the reserved
fields.
Taken from: https://patches.linaro.org/patch/417255/
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
In particular, avoid creating TIR or installing flow rules for VXLAN
if the capability is disabled.
Reported and reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Handlers maintain flow rules and inform hardware about non-standard VxLAN
port in use. The database of the vxlan end points is maintained.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week