The current implementation assumes a static mapping between
the TOS bits and the priority code point, PCP bits.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a loopback address is detected use the network interface which
has the loopback flag set to trigger loopback logic in address resolve.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive
the port number from user input as part of its attributes and assumes
it is valid. Down on the stack, that parameter is used to access kernel
data structures. If the value is invalid, the kernel accesses memory
it should not. To prevent this, verify the port number before using it.
Linux commit:
5ecce4c9b17bed4dc9cb58bfb10447307569b77b
a62ab66b13a0f9bcb17b7b761f6670941ed5cd62
5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3
MFC after: 1 week
Sponsored by: Mellanox Technologies
RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.
Linux commit:
56d0a7d9a0f045ee27a001762deac28c7d28e2e4
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().
This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().
This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.
Linux commit:
4be3a4fa51f432ef045546d16f25c68a1ab525b9
MFC after: 1 week
Sponsored by: Mellanox Technologies
Garbage supplied by user will cause to UCMA module provide zero
memory size for memcpy(), because it wasn't checked, it will
produce unpredictable results in rdma_resolve_addr().
There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.
Fix this by introducing new variants
int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.
Linux commit:
2975d5de6428ff6d9317e9948f0968f7d42e5d74
09abfe7b5b2f442a85f4c4d59ecf582ad76088d7
84652aefb347297aa08e91e283adf7b18f77c2d5
MFC after: 1 week
Sponsored by: Mellanox Technologies
This was done by auditing all callers of ucma_get_ctx and switching the
ones that unconditionally touch ->device to ucma_get_ctx_dev. This covers
a little less than half of the call sites.
The 11 remaining call sites to ucma_get_ctx() were manually audited.
Linux commit:
4b658d1bbc16605330694bb3ef2570c465ef383d
8b77586bd8fe600d97f922c79f7222c46f37c118
MFC after: 1 week
Sponsored by: Mellanox Technologies
Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
invocation) will trigger the following kernel panic. It is caused by the
fact that such QPs missed uobject initialization.
Linux commit:
f45765872e7aae7b81feb3044aaf9886b21885ef
MFC after: 1 week
Sponsored by: Mellanox Technologies
As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:
Linux commit:
870201f95fcbd19538aef630393fe9d583eff82e
MFC after: 1 week
Sponsored by: Mellanox Technologies
The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.
Linux commit:
7688f2c3bbf55e52388e37ac5d63ca471a7712e1
MFC after: 1 week
Sponsored by: Mellanox Technologies
Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.
Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.
Linux commit:
e8980d67d6017c8eee8f9c35f782c4bd68e004c9
MFC after: 1 week
Sponsored by: Mellanox Technologies
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.
The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.
Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.
The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().
MFC after: 1 week
Sponsored by: Mellanox Technologies
cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.
Linux commit:
5c181bda77f409d89ad513528eccac5f3a416474
MFC after: 1 week
Sponsored by: Mellanox Technologies
RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding
IPv4 address is encoded into the GID according to the following rule:
GID= :ffff:<IPv4 address>
Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it
zeroed and change rdma_is_multicast_addr() to consider the new logic.
Linux commit:
be1d325a335840a86c133a56c6a911c368bac0fd
1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec
MFC after: 1 week
Sponsored by: Mellanox Technologies
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).
Add check to verify that the MLID value is in the correct address
range.
RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a
multicast group, if the QP is associated with a RoCE port, the
multicast group MLID is unused and is ignored.
During attach or detach multicast, when the QP is associated with a
port, it is enough to check the port's link layer and validate the
LID only if it is Infiniband. Otherwise, avoid validating the
multicast LID.
Linux commit:
8561eae60ff9417a50fa1fb2b83ae950dc5c1e21
5236333592244557a19694a51337df6ac018f0a7
MFC after: 1 week
Sponsored by: Mellanox Technologies
Implement a more generic solution for detecting loopback.
The problem was that the default netdevice was resolved
for loopback also when VLAN was used. Use real network
device instead of loopback device for bound device
interface.
How to test:
ucmatose -b 127.0.0.1 -p 20090
ucmatose -s 5.6.5.1 -p 20090
Note that RDMA treats the IPv4 and IPv6 loopback
addresses like any address.
MFC after: 1 week
Sponsored by: Mellanox Technologies
A list of MGID/MLID pairs is built when doing a multicast attach. When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.
If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned. Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.
Linux commit:
20c7840a77ddcb2ed2fbd66e8197db2868495751
MFC after: 1 week
Sponsored by: Mellanox Technologies
When two handlers used the same object in the old schema, we blocked
the process in the kernel. The new schema just returns -EBUSY. This
could lead to different behaviour in applications between the old
schema and the new schema. In most cases, using such handlers
concurrently could lead to crashing the process. For example, if
thread A destroys a QP and thread B modifies it, we could have the
destruction happens before the modification. In this case, we are
accessing freed memory which could lead to crashing the process.
This is true for most cases. However, attaching and detaching
a multicast address from QP concurrently is safe. Therefore, we
preserve the original behaviour by adding a lock there.
Linux commit:
f48b726920d96dcd1860df06143bdea7d6d7dcc3
MFC after: 1 week
Sponsored by: Mellanox Technologies
When resolving an IP address in ibcore, only update the source address
upon normal completion. The ibcore address resolve function does not
care about the scope ID value of the IPv6 link-local addresses and expects
this information has already been extracted into the bound_dev_if field.
Because the same IPv6 link-local address can exist on multiple interfaces
the ibcore address resolver gets confused and returns ENETUNREACH.
Instead of updating both source address and bound_dev_if just keep the
address set to any address until resolving completes. For the sake of code
symmetry a similar change has been applied to the IPv4 address resolve path.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When setting a large address resolve timeout it was observed that the
address resolving would succeed at the timeout and not when the address
was available. Make sure the address resolving requests are processed no
slower than one time every second.
While at it use "int" for jiffies instead of "unsigned long" to match
FreeBSD ticks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The original implementation used a reference to ifr_data and a cast to
do the equivalent of accessing ifr_addr. This was copied multiple
times since 1996.
Approved by: kib
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14873
When exchanging CM messages the IPv6 scope ID should be ignored
for link local addresses when doing comparisons. Make sure the
scope ID is always set to zero for link local addresses.
MFC after: 1 week
Sponsored by: Mellanox Technologies
It is not enough to cancel delayed work structures before freeing.
Always cancel delayed work synchronously before freeing!
MFC after: 1 week
Sponsored by: Mellanox Technologies
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.
Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.
Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.
The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.
Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.
MFC after: 1 week
Sponsored by: Mellanox Technologies
VLANs in ibcore.
IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.
Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.
MFC after: 1 week
Sponsored by: Mellanox Technologies
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.
Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826
Sponsored by: Mellanox Technologies
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.