Overview:
This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.
This patch is based on about four main areas of work:
- Update of the IB uobjects system:
- The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
is now managed by ibcore. This also require some changes in the
kernel verbs API. The updated verbs changes are typically about
initialize and deinitialize objects, and remove allocation and
free of memory.
- Update of the uverbs IOCTL framework:
- The parsing and handling of user-space commands has been
completely refactored to integrate with the updated IB uobjects
system.
- Various changes and updates to the generic uverbs interfaces in
device drivers including the new uAPI surface.
- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.
Dependencies:
- The mlx4ib driver code has been updated with the minimum changes
needed.
- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.
Compatibility:
- All user-space facing APIs are backwards compatible after this
change.
- All kernel-space facing RDMA APIs are backwards compatible after
this change, with exception of ib_create_ah() and ib_destroy_ah()
which takes a new flag.
- The "ib_device_ops" structure exist, but only contains the driver ID
and some structure sizes.
Differences from Linux:
- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
the sizes needed for allocating various IB objects, when adding
IB device instances.
Security:
- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.
Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429
MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31149
Sponsored by: NVIDIA Networking
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Use the correct SGL limit within iw_cxgbe, firmwares >= 1.25.6.0 support
upto 512 entries per MR.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
Remove unused #includes of LinuxKPI headers noticed while trying to
solve LinuxKPI struct net_device and related functions.
Neither netdevice.h nor inetdevice.h nor notifier.h seem to be needed.
This takes cxgbe(4) out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: np
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29432
fibX_lookup_nh_ext().
fibX_lookup_nh_ represents pre-epoch generation of fib kpi,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D24975
as the dma_device during RDMA registration.
cxgbe's struct device cannot be used as-is because it's a native FreeBSD
driver and ibcore is LinuxKPI based.
MFC after: 1 week
MFC after: r360196
This fixes a panic that would occur when the timer tried to close a
stale socket.
Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
than 128MB, which is the maximum supported by the hardware in RDMA mode.
Obtained from: Chelsio Communications
MFC after: 3 days
Sponsored by: Chelsio Communications
Remove now-redundant items from toepcb and synq_entry and the code to
support them.
Let the driver calculate tx_align, rx_coalesce, and sndbuf by default.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21387
properly in a couple of places in the driver.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Approved by: re@ (rgrimes@)
Sponsored by: Chelsio Communications
There used to be one control queue per adapter (the mgmtq) that was
initialized during adapter init and one per port that was initialized
later during port init. This change moves all the control queues (one
per port/channel) to the adapter so that they are initialized during
adapter init and are available before any port is up. This allows the
driver to issue ctrlq work requests over any channel without having to
bring up any port.
MFH: 2 weeks
Sponsored by: Chelsio Communications
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore
Reviewed by: np @
Differential Revision: https://reviews.freebsd.org/D12563
Sponsored by: Mellanox Technologies
MFC after: 3 days
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
These firmwares come from a pre-release snapshot. The final firmwares
in this Chelsio release cycle will likely be .61.0 or later and those
will be the next "long lived" firmwares in FreeBSD head and stable
branches. .59 is being provided in head (only) for wider test exposure.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications