When requested QP type is not supported for a {device, port}, return the
error right away before validating all parameters during mad agent
registration time.
Linux commit:
798bba01b44b0ddf8cd6e542635b37cc9a9b739c
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Before calling the driver's function let's make sure port is valid.
Linux commit:
9af3f5cf9d64a056eca53bc643f6288ad28bbbb5
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Currently access to hardware stats buffer isn't protected, this can
result in multiple writes and reads at the same time to the same
memory location. This can lead to providing an incorrect value to
the user. Add a mutex to protect against it.
Linux commit:
e945130b52bea65d15f9bdf54949d4cb7a88db7f
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
If the provider driver (such as rdma_rxe) doesn't support PMA counters,
avoid exposing its directory similar to optional hw_counters directory.
If core fails to read the PMA counter, return an error so that user can
retry later if needed.
Linux commit:
0f6ef65d1c6ec8deb5d0f11f86631ec4cfe8f22e
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.
Linux commit:
72a7720fca37fec0daf295923f17ac5d88a613e1
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
This patch fixes the case where 'lifespan' entry of the hw_counters
is not writable. Currently write callback is not exposed for for
the hw_counters sysfs operation. Due to this, modifying lifespan
value results into permission denied error in below example.
echo 10 > /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan
-bash: /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan:
Permission denied
This patch adds the hook to modify any attribute which implements
store() operation.
Linux commit:
79c4d80b43b8e43684894574a508a871f0c196bf
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
From "InfiBand Architecture Specifications Volume 1":
A QP is said to have a stale connection when only one side has
connection information. A stale connection may result if the remote CM
had dropped the connection and sent a DREQ but the DREQ was never
received by the local CM. Alternatively the remote CM may have lost
all record of past connections because its node crashed and rebooted,
while the local CM did not become aware of the remote node's reboot
and therefore did not clean up stale connections.
And:
A local CM may receive a REQ/REP for a stale connection. It shall
abort the connection issuing REJ to the REQ/REP. It shall then issue
DREQ with "DREQ:remote QPN" set to the remote QPN from the REQ/REP.
This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period. Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.
Linux commit:
9315bc9a133011fdb084f2626b86db3ebb64661f
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The sysfs layout is created by CM incorrectly presented RDMA devices with
InfiniBand link layer. Layout of such devices represents device tree of
connections. By moving CM statistics to be under relevant port of IB
device, we will fix the following issues:
* Symlink name - It used device name instead of specific identifier.
* Target location - It was supposed to point to PCI-ID/infiniband_cm/
instead of PCI-ID/infiniband/
* Target name - It created extra device file under already existing
device folder, e.g. mlx5_0/mlx5_0
* Crash during boot with RDMA persistent naming patches.
sysfs: cannot create duplicate filename '/class/infiniband_cm/mlx5_0'
CPU: 29 PID: 433 Comm: modprobe Not tainted 5.0.0-rc5+ #178
Call Trace:
dump_stack+0xcc/0x180
sysfs_warn_dup.cold.3+0x17/0x2d
sysfs_do_create_link_sd.isra.2+0xd0/0xf0
device_add+0x7cb/0x1450
device_create_groups_vargs+0x1ae/0x220
device_create+0x93/0xc0
cm_add_one+0x38f/0xf60 [ib_cm]
add_client_context+0x167/0x210 [ib_core]
enable_device_and_get+0x230/0x3f0 [ib_core]
ib_register_device+0x823/0xbf0 [ib_core]
__mlx5_ib_add+0x45/0x150 [mlx5_ib]
mlx5_ib_add+0x1b3/0x5e0 [mlx5_ib]
mlx5_add_device+0x130/0x3a0 [mlx5_core]
mlx5_register_interface+0x1a9/0x270 [mlx5_core]
do_one_initcall+0x14f/0x5de
do_init_module+0x247/0x7c0
load_module+0x4c2f/0x60d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
After this change:
[leonro@server ~]$ ls -al /sys/class/infiniband/ibp0s12f0/ports/1/
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_duplicates
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_msgs
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_msgs
drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_retries
Linux commit:
c87e65cfb97c7f325132a68288ed76ba7bdcd2c6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
In the process of moving the debug counters sysfs entries, the commit
mentioned below eliminated the cm_infiniband sysfs directory.
This sysfs directory was tied to the cm_port object allocated in procedure
cm_add_one().
Before the commit below, this cm_port object was freed via a call to
kobject_put(port->kobj) in procedure cm_remove_port_fs().
Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated.
This, however, meant that kfree was never called for the cm_port buffers.
Fix this by adding explicit kfree(port) calls to functions cm_add_one()
and cm_remove_one().
Note that the kfree call in the first chunk below, in the cm_add_one error
flow, fixes an old, undetected memory leak.
Linux commit:
94635c36f3854934a46d9e812e028d4721bbb0e6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Due to the below reasons, it is better to not support alternate path receive
messages for RoCE in near term.
1. Alternate path for RoCE is not supported at rdmacm layer.
2. It is not supported in uverbs/core layer for RoCE.
3. Alternate path for IPv6 for link local address cannot resolve route
determinstically without a valid incoming interface ID whose usecase
make sense only with dual port mode.
4. init_av_from_path while processing LAP messages for IB and RoCE can
lead to adding duplicate entry of AV into the port list, leads to list
corruption.
5. rdma-core userspace a well known userspace implementation has removed
support of libucm which use ucm.ko module, which is the only module that
can trigger alternate path related messages.
6. ucm kernel module is requested to be removed from the IB core in
the following patch, https://patchwork.kernel.org/patch/10268503/ .
Linux commit:
97c45c2c28cd291e06778d9d36a0f60ee74726bc
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
During CM LAP processing, ah_attr is reinitialized on receiving
a LAP request. First likely during CM request processing.
ah_attr might get zeroed out if LAP processing fails.
Therefore, try to create a new ah_attr for the LAP message.
If the initialization fails, continue with older ah_attr.
If the initialization passes, consider the new ah_attr by
overwriting the older one.
Linux commit:
0e225dcb7681c0a8e52fb9dc68bd8ab973de4ca2
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
rdma_reject_msg() returns a pointer to a string message associated with
the transport reject reason codes.
Linux commit:
77a5db13153906a7e00740b10b2730e53385c5a8
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Extended atomics are supported with RC and XRC QP types, but Linux commit
a60109dc9a95 added an unneeded check to to_mlx5_access_flags().
This broke XRC QPs.
The following ib_atomic_bw invocation over XRC reproduces the issue:
ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD
It is safe to remove such checks because the QP type was already checked
in ib_modify_qp_is_ok(), which was previously called from
mlx5_ib_modify_qp().
Linux commit:
13f8d9c16693afb908ead3d2a758adbe6a79eccd
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
In cm_form_tid(), a two bit message sequence number is OR'ed into bit
31-30 of the lower TID value.
After Linux commit f06d26537559 ("IB/cm: Randomize starting comm ID"), the
local_id is XOR'ed with a 32-bit random value. Hence, bit 31-30 in the
lower TID now has an arbitrarily value and it makes no sense to OR in
the message sequence number.
Adding to that, the evolution in use of IDR routines in cm_alloc_id()
has always had the possibility of returning a value with bit 30 set.
In addition, said bits are never checked.
Hence, remove the encoding and the corresponding enum.
Linux commit:
87a37ce9e400e40daee537ff95343e3c94743c6d
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Remove not needed error handling when destroying a CQ. The function in
question will later on be updated to return "void".
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The maximum page size in the mkey context is 2GB.
Until today, we didn't enforce this requirement in the code, and therefore,
if we got a page size larger than 2GB, we have passed zeros in the
log_page_shift instead of the actual value and the registration failed.
This patch limits the driver to use compound pages of 2GB for mkeys.
Linux commit:
762f899ae7875554284af92b821be8c083227092
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:
First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.
This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.
Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).
Linux commit:
d67bc5d4e3e100d762c0f57ea67f28bc219698a6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
- Upon error more completion events than requested may be generated,
particularly when using the completion event factor feature.
- Count number of event errors in the transmit path.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.
Linux commit:
ca5b91d63192ceaa41a6145f8c923debb64c71fa
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make the mlx5e_mode_table[] array one dimensional, because there is only
one entry, 10G ER/LR, which share the same protocol bit.
This patch only adds support for basic sub-type distinguishing for the
extended protocol bits. Use verbose ifconfig eeprom output to get actual
media type.
Remove write only "connector_type" variable while at it.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
This code practically has not sleeping points, so Giant is locked for very
long time.
Noted and reviewed by: hselasky
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Import Linux commit 534b1204ca4694db1093b15cf3e79a99fcb6a6da
Add reserved mapping to cover all the register in order to avoid setting
arbitrary values to newer FW which implements the reserved fields.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
Import Linux commit ce28f0fd670ddffcd564ce7119bdefbaf08f02d3:
Add reserved mapping to cover all the register in order to avoid
setting arbitrary values to newer FW which implements the reserved
fields.
Taken from: https://patches.linaro.org/patch/417255/
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
In particular, avoid creating TIR or installing flow rules for VXLAN
if the capability is disabled.
Reported and reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Handlers maintain flow rules and inform hardware about non-standard VxLAN
port in use. The database of the vxlan end points is maintained.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
sys/dev/sound/pci/hda/hdaa_patches.c:
match_pin_patches: Use HDA_DEV_MATCH instead of regular ==
sys/dev/sound/pci/hda/pin_patch_realtek.h:
Add quirk for Lenovo laptops when ALC298 is used.