Commit Graph

503 Commits

Author SHA1 Message Date
Bartosz Sobczak
c7f73a1588
ofed: mask seq_num identifier to occupy only 3 bytes
The seq_num among other things is used to assign rq_psn value, which is
a 24-bit identifier.  When the seq_num is full 4-byte value, we are
usually receiving: '_ib_modify_qp rq_psn overflow, masking to 24 bits'
warning.

This is burdensome for running rdma traffic with large number of
connections, because the number of logs is growing fast.

Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com>
Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	kib@, erj@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D41531
2023-08-22 16:09:13 -07:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Bartosz Sobczak
8e6654a6a5
ofed: fix roce gid insertion for vlan interfaces
When attempting to use vlan interface the correct GID
wasn't created due to incorrect ifp validation.

The problem was introduced in
3e142e0767 ('ofed: Mechanically convert to IfAPI)

Signed-off-by: Bartosz Sobczak bartosz.sobczak@intel.com
Signed-off-by: Eric Joyner <erj@FreeBSD.org>

PR:		273043
Sponsored by:	Intel Corporation
Reviewed by:	jhibbits@, erj@
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D41426
2023-08-14 08:53:43 -07:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Justin Hibbits
12e99b63d2 ofed: Fix a logic inversion from IfAPI conversion
Reported by:	bartosz.sobczak_intel.com
Fixes:		3e142e0767 ("ofed: Mechanically convert to IfAPI")
Sponsored by:	Juniper Networks, Inc.
2023-04-19 11:56:25 -04:00
Gleb Smirnoff
a6b55ee6be net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.

This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch.  A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.

Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH.  But the tcp_lro_flush_all() asserts the presence
of network epoch.  Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.

Reviewed by:		zlei, gallatin, erj
Differential Revision:	https://reviews.freebsd.org/D39510
2023-04-17 09:08:35 -07:00
Zhenlei Huang
fc6c93b6a5 infiniband: Opt-in for net epoch
This is counterpart to e87c494015, which did the same for ethernet.

Suggested by:	hselasky
Reviewed by:	hselasky, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39405
2023-04-06 00:08:23 +08:00
Justin Hibbits
3e142e0767 ofed: Mechanically convert to IfAPI
Summary:
Because of the intricacies of this code it wasn't purely scripted, but
instead hand-mechanical.

Reviewed by:	hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38560
2023-03-24 10:04:33 -04:00
Justin Hibbits
adf62e8363 infiniband: Convert BPF handling for IfAPI
Summary:
All callers of infiniband_bpf_mtap() call it through the wrapper macro,
which checks the if_bpf member explicitly.  Since this is getting
hidden, move this check into the internal function and remove the
wrapper macro.

Reviewed by:	hselasky
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39024
2023-03-14 15:51:32 -04:00
Hans Petter Selasky
f50274674e ibcore: The use of IN_LOOPBACK() now requires a valid VNET context.
Make sure the VNET is set before using this macro.

Fixes:		efe58855f3
PR:		266054
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-09-23 13:42:03 +02:00
Hans Petter Selasky
57af517ac4 ibcore: Add support for RDMA/RoCE using VLAN(4) devices.
Classify VLAN devices as ethernet in rdma_copy_addr().
This fixes the following error message:

rdma_bind_addr: No such file or directory

Submitted by:	bartosz.sobczak@intel.com (Bartosz Sobczak)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36120
Sponsored by:	NVIDIA Networking
2022-08-22 10:04:26 +02:00
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
Gleb Smirnoff
01f11ee1c5 sdp: garbage collect sdp_ctlinput
The pr_ctlinput method was a feature of IPv4/IPv6 with exception of
pfctlinput(), which broadcasted a call to pr_ctlinput on all protocols
ever registered statically or with pf_proto_register().  Now that
this broadcast call is gone, the only protocols that get their
pr_ctlinput ever called are those that have registered itselves with
ipproto_register() or ip6proto_register().

It is entirely possible that code deleted now was dead code from very
beginning. Just a copy-paste from TCP.

Reviewed by:		rstone
Differential revision:	https://reviews.freebsd.org/D36208
2022-08-16 12:38:15 -07:00
Gleb Smirnoff
f277746e13 protosw: change prototype for pr_control
For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C.  Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.

Fixes:	886fc1e804
2022-08-12 12:08:18 -07:00
Mike Karels
a11f080e86 ofed/infiniband: fix ifdefs for new INET changes, fixing LINT-NOIP
Some of the ofed/infiniband code has INET and INET6 address handling
code without using ifdefs.  This failed with a recent change to INET,
in which IN_LOOPBACK() started using a VNET variable, and which is not
present if INET is not configured.  Add #ifdef INET, and INET6 for good
measure, in cma_loopback_addr(), along with inclusion of the options
headers in ib_cma.c.

Reviewed by:	hselasky rgrimes bz
Differential Revision: https://reviews.freebsd.org/D35835

(cherry picked from commit 752b7632776237f9c071783acdd1136ebf5f287d)
2022-07-18 08:02:01 -05:00
Gleb Smirnoff
d8596171c5 sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already
  existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private.  The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().

Note on listening queues.  Until now the sockets on a queue had zero
reference count.  And the reference were given only upon accept(2).  The
assumption was that there is no way to see the queued socket from anywhere
except its head.  This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists.  With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.

Differential revision:	https://reviews.freebsd.org/D35679
2022-07-04 12:40:51 -07:00
Hans Petter Selasky
9fc6a63522 ibcore: Fix a race with disassociate and exit_mmap()
If uverbs_user_mmap_disassociate() is called while the mmap is
concurrently doing exit_mmap then the ordering of the
rdma_user_mmap_entry_put() is not reliable.

The put must be done before uvers_user_mmap_disassociate() returns,
otherwise there can be a use after free on the ucontext, and a left over
entry in the xarray. If the put is not done here then it is done during
rdma_umap_close() later.

Add the missing put to the error exit path.

Linux commit:
39c011a538272589b9eb02ff1228af528522a22c

PR:		264473
MFC after:	3 days
Sponsored by:	NVIDIA Networking
2022-06-21 11:33:27 +02:00
Hans Petter Selasky
55d1833671 ibcore: Fix sysfs registration error flow
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.

As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).

However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.

The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.

The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.

Linux commit:
b312be3d87e4c80872cbea869e569175c5eb0f9a

PR:		264472
MFC after:	3 days
Sponsored by:	NVIDIA Networking
2022-06-21 11:33:27 +02:00
Hans Petter Selasky
66a0bc2105 ibcore: Fix use-after-free access in ucma_close()
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.

Linux commit:
ed65a4dc22083e73bac599ded6a262318cad7baf

PR:		264650
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-13 17:00:16 +02:00
Hans Petter Selasky
c4a4155053 ibcore: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
The algorithm pre-allocates a cm_id since allocation cannot be done while
holding the cm.lock spinlock, however it doesn't free it on one error
path, leading to a memory leak.

Linux commit:
c14dfddbd869bf0c2bafb7ef260c41d9cebbcfec

PR:		264248
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-05-30 20:22:18 +02:00
Hans Petter Selasky
ad7741ff69 ibcore: Fix possible memory leak in ib_mad_post_receive_mads()
If ib_dma_mapping_error() returns non-zero value,
ib_mad_post_receive_mads() will jump out of loops and return -ENOMEM
without freeing mad_priv. Fix this memory-leak problem by freeing
mad_priv in this case.

Linux commit:
a17f4bed811c60712d8131883cdba11a105d0161

PR:		264057
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-05-19 10:13:06 +02:00
Gleb Smirnoff
4328318445 sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b5 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35152
2022-05-12 13:22:12 -07:00
Hans Petter Selasky
9f580526e4 ibcore: Remove set, but not used variable.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-05-05 12:32:30 +02:00
Hans Petter Selasky
6244b53e16 ibcore: Allow passing NULL-pointers to ib_umem_release()
FreeBSD commit b633e08c70 removed the
NULL-checks from the mlx4ib-driver.

This fixes the following NULL-pointer panic when unloading mlx4ib:
ib_umem_release()
mlx4_ib_destroy_qp()
ib_destroy_qp_user()
ipoib_transport_dev_cleanup()
ipoib_dev_cleanup()
ipoib_remove_one()
ib_unregister_client()
ipoib_cleanup_module()
linker_file_sysuninit()
linker_file_unload()
kern_kldunload()
amd64_syscall()

Linux commit:
836a0fbb3e76f704ad65ddfb57f00725245e509b

MFC after:	1 week
Submitted by:	dandan@lysator.liu.se
Sponsored by:	Lysator ACS
Sponsored by:	NVIDIA Networking
2022-05-02 13:11:06 +02:00
Gordon Bergling
d048e8c619 ofed: Fix a typo in a source code comment
- s/it it/it to/

MFC after:	3 days
2022-04-09 14:39:36 +02:00
Gordon Bergling
6de6c95691 ofed: Remove a double word in a source code comment
- s/for for/for/

MFC after:	3 days
2022-04-09 11:00:48 +02:00
Gordon Bergling
7f409fe7d8 ofed: Remove a double word in a source code comment
- s/is is/is/

MFC after: 3 days
2022-04-09 09:13:22 +02:00
Hans Petter Selasky
fc99316ef9 ibcore: Fix multiple includes of same header file.
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-03-03 12:51:20 +01:00
Hans Petter Selasky
1aa593b90c ibcore: Add support for NDR link speed.
Add new IBTA speed NDR, supporting signaling rate of 100Gb.

Linux commit:
c7adf7717301558e8852949d8e3dc3748d1a4a97

MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-02-21 09:35:19 +01:00
Hans Petter Selasky
a30f71704e mlx5ib: Add support for parsing udata in mlx5_ib_create_flow().
Backport from Linux 5.17 (drivers/infiniband/hw/mlx5/fs.c)

This fixes creating flow rules from user-space after the
kernel space update based on Linux 5.7-rc1 .

Sponsored by:	NVIDIA Networking
2022-02-10 11:17:42 +01:00
Gleb Smirnoff
24e1c6ae7d domains: init with standard SYSINIT(9) or VNET_SYSINIT()
There left only three modules that used dom_init().  And netipsec
was the last one to use dom_destroy().

Differential revision:	https://reviews.freebsd.org/D33540
2022-01-03 10:15:22 -08:00
John Baldwin
c768021bda sys/ofed: Use C99 fixed-width integer types.
No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D33639
2021-12-28 09:43:09 -08:00
Mark Johnston
fa0463c384 socket: De-duplicate SBLOCKWAIT() definitions
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-09-14 09:01:32 -04:00
Mark Johnston
f94acf52a4 socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31657
2021-09-07 15:06:48 -04:00
Zhenlei Huang
62e1a437f3 routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
 for IPv4 routes. This behavior is controlled by the
 net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
 Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
 It leverages recent lltable changes committed in c541bd368f.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
  route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398
MFC after:	2 weeks
2021-08-22 22:56:08 +00:00
Alexander V. Chernikov
c541bd368f lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.
Currently we use pre-calculated headers inside LLE entries as prepend data
 for `if_output` functions. Using these headers allows saving some
 CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
 nexthops more complex, as it is not possible to store multiple
 pre-calculated headers inside lle. Additionally, the solution space is
 limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
 custom family encaps and store mbufs pending resolution. To simplify handling
 of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
 Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
 to the "parent" LLE - it is not possible to delete "child" when parent is alive.
 Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
 machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
 allowing to return such child LLEs. This change uses `LLE_SF()` macro which
 packs family and flags in a single int field. This is done to simplify merging
 back to stable/. Once this code lands, most of the cases will be converted to
 use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379
MFC after:	2 weeks
2021-08-21 17:34:35 +00:00
Hans Petter Selasky
b633e08c70 ibcore: Kernel space update based on Linux 5.7-rc1.
Overview:

This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.

This patch is based on about four main areas of work:
- Update of the IB uobjects system:
  - The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
    is now managed by ibcore. This also require some changes in the
    kernel verbs API. The updated verbs changes are typically about
    initialize and deinitialize objects, and remove allocation and
    free of memory.

- Update of the uverbs IOCTL framework:
  - The parsing and handling of user-space commands has been
    completely refactored to integrate with the updated IB uobjects
    system.

- Various changes and updates to the generic uverbs interfaces in
  device drivers including the new uAPI surface.

- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.

Dependencies:

- The mlx4ib driver code has been updated with the minimum changes
needed.

- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.

Compatibility:

- All user-space facing APIs are backwards compatible after this
  change.

- All kernel-space facing RDMA APIs are backwards compatible after
  this change, with exception of ib_create_ah() and ib_destroy_ah()
  which takes a new flag.

- The "ib_device_ops" structure exist, but only contains the driver ID
  and some structure sizes.

Differences from Linux:

- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
  the sizes needed for allocating various IB objects, when adding
  IB device instances.

Security:

- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.

Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429

MFC after:	1 week
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31149
Sponsored by:	NVIDIA Networking
2021-07-28 13:28:29 +02:00
Hans Petter Selasky
693ddf4dc4 Fix LINT kernel build issues after c3987b8ea7 .
Fixes the IPOIB_CM and SDP kernel options.

Reported by:	lwhsu @
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-07-12 18:00:30 +02:00
Hans Petter Selasky
cd2c05d323 ipoib: Fix for accessing uninitialized pointers and freed memory during attach and detach.
Call infiniband_ifdetach() early to stop ifioctl(9) calls from user-space
during device removal. Also make sure that ifioctl(9) calls are blocked from
executing until the device is fully initialized. Ideally we would delay the
infiniband_ifattach() call, but because part of the initialization is to update
the link level address, that is not possible without more significant changes.

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 15:01:19 +02:00
Hans Petter Selasky
f60da09dbb ibcore: Add some functions and definitions for selecting and querying retryable ucontext cleanup.
Linux commit:
1c77483e4c50339b0306572167ccbff6b55d051b

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:34 +02:00
Hans Petter Selasky
c3987b8ea7 ibcore: Declare ib_post_send() and ib_post_recv() arguments const
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:33 +02:00
Hans Petter Selasky
79b817084c ibcore: Implement ib_uverbs_get_ucontext_file().
Expose ib_ucontext from a given ib_uverbs_file. Drivers that use the ioctl(9)
API may have the ib_uverbs_file and need a way to get the related ib_ucontext
from it, this is enabled by this patch.

Downstream patches from this series will use it.

Linux commit:
7dc08dcfc8c86cb4457e383734ff6844ddaff876

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:33 +02:00
Hans Petter Selasky
05f4691919 ibcore: Clean up INIT_UDATA() and INIT_UDATA_BUF_OR_NULL() macro usage.
We get a harmless warning about the fact that we use the result of a
multiplication as a condition in INIT_UDATA_BUF_OR_NULL():

uverbs_main.c: In function 'ib_uverbs_write':
error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]

This avoids the problem by using an inline function in place of
the macro.

After changing INIT_UDATA_BUF_OR_NULL() to an inline function,
do the same change to INIT_UDATA() for consistency.

Using an inline function gives us better type safety here among other
issues with macros. I'm using u64_to_user_ptr() to convert the user
pointer to simplify the logic rather than adding lots of new type casts.

Linux commit:
12f727721eee61b3d19dedb95cb893b2baa9fe41
40a203396cc1c239f2e71c47c66ed03097123d2c

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
d92a9e5604 ibcore: Simplify ib_modify_qp_is_ok().
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the
checks of out-of-scope redundant. Let's remove them together with updating
function signature to return boolean result.

While at it remove unused "ll" parameter from ib_modify_qp_is_ok().

Linux commit:
19b1f54099b6ee334acbfbcfbdffd1d1f057216d
d31131bba5a1630304c55ea775c48cc84912ab59

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
0c13880ccc ibcore: Support rate limit for packet pacing
Add new member rate_limit to ib_qp_attr which holds the packet pacing
rate in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Linux commit:
528e5a1bd3f0e9b760cb3a1062fce7513712a15d

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
3d2fb36a9c ibcore: Add new IB rates.
Add the new rates that were added to Infiniband spec as part of
HDR and 2x support.

Linux commit:
a5a5d1993696419e7d5357fc3128e53d219d382e

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
721b795b72 ibcore: Don't allocate method table, if already present.
This commit aligns the code in question with upstream Linux.

Linux commit:
2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
c6ccb08686 ibcore: Fix a use-after-free in ucma_resolve_ip().
There is a race condition between ucma_close() and ucma_resolve_ip():

CPU0                            CPU1
ucma_resolve_ip():              ucma_close():

ctx = ucma_get_ctx(file, cmd.id);

        list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
                mutex_lock(&mut);
                idr_remove(&ctx_idr, ctx->id);
                mutex_unlock(&mut);
                ...
                mutex_lock(&mut);
                if (!ctx->closing) {
                        mutex_unlock(&mut);
                        rdma_destroy_id(ctx->cm_id);
                ...
                ucma_free_ctx(ctx);
        }

ret = rdma_resolve_addr();
ucma_put_ctx(ctx);

Before idr_remove(), ucma_get_ctx() could still find the ctx
and after rdma_destroy_id(), rdma_resolve_addr() may still
access id_priv pointer. Also, ucma_put_ctx() may use ctx after
ucma_free_ctx() too.

ucma_close() should call ucma_put_ctx() too which tests the
refcnt and waits for the last one releasing it. The similar
pattern is already used by ucma_destroy_id().

Linux commit:
5fe23f262e0548ca7f19fb79f89059a60d087d22

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
Hans Petter Selasky
20fea7ac64 ibcore: Define option to set ack timeout.
Define new option in 'rdma_set_option' to override calculated QP timeout
when requested to provide QP attributes to modify a QP.

At the same time, pack tos_set to be bitfield.

Linux commit:
2c1619edef61a03cb516efaa81750784c3071d10

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00