The mistake came about like this: the first attempt to commit was blocked by
a pre-commit hook due to missing SVN tags. svn revert doesn't delete new
files, I guess. While reapplying the fixed diff, the non-empty target file
was just concatenated with the new contents? Ugh. :-(
This regression was introduced in the r326169 Linux v4.9 Infiniband upgrade.
Restore the functionality.
Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21298
This allows use of the shared _net_inet_sdp in more than one compilation
unit. (Nothing in-tree uses this today, but some of Isilon's out-of-tree
SDP enhancements add sysctls below the node.)
Sponsored by: Dell EMC Isilon
In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.
Reviewed by: hselasky, kib
MFC after: 3 Days
Sponsored by: Mellanox Technologies
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility. Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.
(These CUs did not show up as missing eventhandler.h in tinderbox.)
X-MFC-With: r347984
Add the new rates that were added to the Infiniband specification as part of
HDR and 2x support.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
ib_req_notify_cq may return negative value which will indicate a
failure. In the case of uncorrectable error, we will end up in an
endless loop. Fix that, by going to another loop with poll_more
only if there is anything left to poll.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
As it does for recv*(2), MSG_DONTWAIT indicates that the call should
not block, returning EAGAIN instead. Linux and OpenBSD both implement
this, so the change makes porting easier, especially since we do not
return EINVAL or so when unrecognized flags are specified.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: tuexen
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18728
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.
The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.
Linux commit:
5dc78ad1904db597bdb4427f3ead437aae86f54c
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a packet needs fragmentation, it might generate more than 3 fragments.
With the queue length 3, all fragments are generated faster than the
queue is drained, which effectively drops fourth and later fragments on
the floor.
Submitted by: kib@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
When changing the MTU of ibX network interfaces, check that the MTU was really
changed before requesting an update of the multicast rules. Else we might go
into an infinite loop joining and leaving ibX multicast groups towards the
opensm master interface.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
It is not enough to set ifnet->if_mtu to change the interface MTU.
System saves the MTU for route in the radix tree, and route cache keeps
the interface MTU as well. Since addition of the multicast group causes
recalculation of MTU, even bringing the interface up changes MTU from
4042 to 1500, which makes the system configuration inconsistent. Worse,
ip_output() prefers route MTU over interface MTU, so large packets are
not fragmented and dropped on floor.
Fix it for ipoib(4) using the same approach (or hack) as was applied
for it_tun/if_tap in r339012. Thanks to bz@ for giving the hint.
Submitted by: kib@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
A couple of places in the CM do
spin_lock_irq(&cm_id_priv->lock);
...
if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))
However when the underlying transport is RoCE, this leads to a sleeping function
being called with the lock held - the callchain is
cm_alloc_response_msg() ->
ib_create_ah_from_wc() ->
ib_init_ah_from_wc() ->
rdma_addr_find_l2_eth_by_grh() ->
rdma_resolve_ip()
and rdma_resolve_ip() starts out by doing
req = kzalloc(sizeof *req, GFP_KERNEL);
not to mention rdma_addr_find_l2_eth_by_grh() doing
wait_for_completion(&ctx.comp);
to wait for the task that rdma_resolve_ip() queues up.
Fix this by moving the AH creation out of the lock.
Linux commit:
c76161181193985087cd716fdf69b5cb6cf9ee85
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Trying to validate loopback fails because rtalloc1() resolves system
local addresses to the loopback network interface, lo0. Fix this by
explicitly checking for loopback during validation of the source
and destination network address. If the source address belongs to
a local network interface and is equal to the destination address,
there is no need to run the destination address through rtalloc1().
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
The master network interface and the VLANs may reside in different VNETs.
Make sure that all VNETs are searched when scanning for GID entries.
Submitted by: netapp
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
This prevents code from accepting RoCEv1 connections when
only ROCEv2 is enabled and vice versa.
Linux commit:
0c4386ec77cfcd0ccbdbe8c2e67dd3a49b2a4c7f
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:
Overrunning array class->method_table of 80 8-byte elements at element index 127
(byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class)
(which evaluates to 127).
Linux commit:
2fe2f378dd45847d2643638c07a7658822087836
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
The port number in the listen_id_priv has been observed to be zero which
means no port has been selected. The current code lacks a check for invalid
port number.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland. Fix a number of such handlers.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: tuexen
MFC after: 3 days
Security: kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18301
For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.
Now that we have the GID index available, introduce SGID index in
incoming CM requests and refer to the netdevice of it.
While at it fix some incorrect uses of init_net and make sure
the rdma_create_id() function stores the VNET it is passed.
Based on linux commit:
cee104334c98dd04e9dd4d9a4fa4784f7f6aada9
MFC after: 3 days
Approved by: re (gjb)
Sponsored by: Mellanox Technologies
Also fix the validate_ipv4_net_dev() and validate_ipv6_net_dev() functions
which had source and destination addresses swapped, and didn't set the
scope ID for IPv6 link-local addresses.
This allows applications like krping to work using IPoIB devices.
MFC after: 3 days
Approved by: re (gjb)
Sponsored by: Mellanox Technologies
In r337943 ifnet's if_pcp was set to the PCP value in use
instead of IFNET_PCP_NONE.
Current ibcore code assumes that if_pcp is IFNET_PCP_NONE with
VLAN interfaces so it can identify prio-tagged traffic.
Fix that by explicitly verifying that that the if_type is IFT_ETHER
and not IFT_L2VLAN.
MFC after: 3 days
Approved by: re (Marius), hselasky (mentor), kib (mentor)
Sponsored by: Mellanox Technologies
Else a NULL VNET pointer should be ignored. This fixes address resolving
when VIMAGE is disabled.
MFC after: 3 days
Sponsored by: Mellanox Technologies
When creating address handle from multicast GID, set MAC according to
the appropriate formula instead of searching for it in the GID table:
- For IPv4 multicast GID use ip_eth_mc_map().
- For IPv6 multicast GID use ipv6_eth_mc_map().
Linux commit:
9636a56fa864464896bf7d1272c701f2b9a57737
MFC after: 1 week
Sponsored by: Mellanox Technologies
The return status of ib_init_ah_from_mcmember() is ignored by
cma_ib_mc_handler(). Honor it and return error event if ah attribute
initialization failed.
Linux commit:
6d337179f28cc50ddd7e224f677b4cda70b275fc
MFC after: 1 week
Sponsored by: Mellanox Technologies
ah_attr contains the port number to which cm_id is bound. However, while
searching for GID table for matching GID entry, the port number is
ignored.
This could cause the wrong GID to be used when the ah_attr is converted to
an AH.
Linux commit:
563c4ba3bd2b8b0b21c65669ec2226b1cfa1138b
MFC after: 1 week
Sponsored by: Mellanox Technologies
The current implementation assumes a static mapping between
the TOS bits and the priority code point, PCP bits.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When a loopback address is detected use the network interface which
has the loopback flag set to trigger loopback logic in address resolve.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive
the port number from user input as part of its attributes and assumes
it is valid. Down on the stack, that parameter is used to access kernel
data structures. If the value is invalid, the kernel accesses memory
it should not. To prevent this, verify the port number before using it.
Linux commit:
5ecce4c9b17bed4dc9cb58bfb10447307569b77b
a62ab66b13a0f9bcb17b7b761f6670941ed5cd62
5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3
MFC after: 1 week
Sponsored by: Mellanox Technologies
RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.
Linux commit:
56d0a7d9a0f045ee27a001762deac28c7d28e2e4
MFC after: 1 week
Sponsored by: Mellanox Technologies
This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().
This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().
This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.
Linux commit:
4be3a4fa51f432ef045546d16f25c68a1ab525b9
MFC after: 1 week
Sponsored by: Mellanox Technologies