Commit Graph

414 Commits

Author SHA1 Message Date
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Hans Petter Selasky
ae5b45c86e Make sure the VNET is properly set when reaping mbufs in ipoib.
Else the following panic may happen:

panic()
icmp_error()
ipoib_cm_mb_reap()
linux_work_fn()
taskqueue_run_locked()
taskqueue_thread_loop()
fork_exit()
fork_trampoline()

Submitted by:	Andreas Kempe <kempe@lysator.liu.se>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-11 12:02:16 +00:00
Hans Petter Selasky
9220357857 Prevent potential underflow in ibcore.
Linux commit:
a9018adfde809d44e71189b984fa61cc89682b5e

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-15 11:46:53 +00:00
Hans Petter Selasky
ae9a8ec99f Correct MR length field to be 64-bit in ibcore.
Linux commit:
edd31551148c09608feee6b8756ad148d550ee3b

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-15 11:45:14 +00:00
Hans Petter Selasky
758a35d0bc VLAN_TRUNKDEV() requires epochification in ibcore after r353292.
Sponsored by:	Mellanox Technologies
2019-10-16 08:56:07 +00:00
Hans Petter Selasky
052bc06c25 Replace rdma_is_upper_dev_rcu() with rdma_vlan_dev_real_dev() in ibcore.
This reduces the number of references to VLAN_TRUNKDEV() in ibcore.
Currently only VLAN is supported as a child interface in FreeBSD.
Remove superfluous RCU locking.

Sponsored by:	Mellanox Technologies
2019-10-16 08:55:29 +00:00
Hans Petter Selasky
8232fd4dcd VLAN_DEVAT() requires epochification in ipoib after r353292.
Sponsored by:	Mellanox Technologies
2019-10-16 08:40:58 +00:00
Hans Petter Selasky
06656d75b1 Fix missing epochification of the ibcore code after r353292.
Sponsored by:	Mellanox Technologies
2019-10-15 11:12:31 +00:00
Hans Petter Selasky
f570a1bd09 Fix missing epochification of the ipoib code after r353292.
Sponsored by:	Mellanox Technologies
2019-10-15 11:11:21 +00:00
Gleb Smirnoff
f95dbf26d3 Convert to if_foreach_llmaddr() KPI.
Reviewed by:	hselasky
2019-10-14 20:22:25 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Hans Petter Selasky
6fe20cefa3 Make sure the transmit loop doesn't get starved in ipoib.
When the software send queue gets filled up, callbacks to
if_transmit will stop. Make sure the transmit callback
routine checks the send queue and outputs any remaining
mbufs. Else the remaining mbufs may simply sit in the
output queue blocking the transmit path.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-10-02 09:06:13 +00:00
Conrad Meyer
f49e79b56b OFED: Fix accidental double-copy of rdma_sdp.h in r351176
The mistake came about like this: the first attempt to commit was blocked by
a pre-commit hook due to missing SVN tags.  svn revert doesn't delete new
files, I guess.  While reapplying the fixed diff, the non-empty target file
was just concatenated with the new contents?  Ugh. :-(
2019-08-18 04:19:41 +00:00
Conrad Meyer
6b2f017186 OFED: Unbreak SDP support in ibcore
This regression was introduced in the r326169 Linux v4.9 Infiniband upgrade.
Restore the functionality.

Reviewed by:	hselasky
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21298
2019-08-17 18:54:07 +00:00
Conrad Meyer
1c334042f9 SDP: Fix brain-o from r351162
Lost in translation between different SDP stacks.

Reported by:	hselasky
2019-08-17 10:11:34 +00:00
Conrad Meyer
14f19c6b73 OFED: Fix ib_mad.h ib_user_mad.h include to match new uapi path
Sponsored by:	Dell EMC Isilon
2019-08-17 03:09:03 +00:00
Conrad Meyer
bab54619e9 SDP: Add a dbg() on QP events
Sponsored by:	Dell EMC Isilon
2019-08-17 03:07:41 +00:00
Conrad Meyer
1ac512e80b SDP: Also log a nice status string in RX WC error dbg()
Sponsored by:	Dell EMC Isilon
2019-08-17 03:06:46 +00:00
Conrad Meyer
92d90f6142 SDP: Include nice string names for raw event numbers in a dbg()
Sponsored by:	Dell EMC Isilon
2019-08-17 03:05:09 +00:00
Conrad Meyer
6669d5459b SDP: SYSCTL_DECL SDP-wide sysctl node in header
This allows use of the shared _net_inet_sdp in more than one compilation
unit.  (Nothing in-tree uses this today, but some of Isilon's out-of-tree
SDP enhancements add sysctls below the node.)

Sponsored by:	Dell EMC Isilon
2019-08-17 03:03:26 +00:00
Slava Shwartsman
bb43866c38 Fix prio vs. nonprio tagged traffic in RDMACM
In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.

Reviewed by:    hselasky, kib
MFC after:      3 Days
Sponsored by:   Mellanox Technologies
2019-06-04 06:21:31 +00:00
Conrad Meyer
e12be3218a Include eventhandler.h in more compilation units
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility.  Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.

(These CUs did not show up as missing eventhandler.h in tinderbox.)

X-MFC-With:	r347984
2019-05-21 01:18:43 +00:00
Hans Petter Selasky
013f1e1435 Add new rates to ibcore.
Add the new rates that were added to the Infiniband specification as part of
HDR and 2x support.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:55:47 +00:00
Hans Petter Selasky
afbe61616f Handle IB_EVENT_DEVICE_FATAL event in ipoib.
Perform flush if IB_EVENT_DEVICE_FATAL was received.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:51:49 +00:00
Hans Petter Selasky
cb678cb911 Fix endless loop in ipoib_poll().
ib_req_notify_cq may return negative value which will indicate a
failure. In the case of uncorrectable error, we will end up in an
endless loop. Fix that, by going to another loop with poll_more
only if there is anything left to poll.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:42:05 +00:00
Hans Petter Selasky
38f38e9fda Make sure to error out when arming the CQ fails in ibcore.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:45 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Mark Johnston
2f2ddd68a5 Support MSG_DONTWAIT in send*(2).
As it does for recv*(2), MSG_DONTWAIT indicates that the call should
not block, returning EAGAIN instead.  Linux and OpenBSD both implement
this, so the change makes porting easier, especially since we do not
return EINVAL or so when unrecognized flags are specified.

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	tuexen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18728
2019-01-04 17:31:50 +00:00
Slava Shwartsman
186058782a ipoib: Notify on modify QP failure only when relevant
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.

The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.

Linux commit:
5dc78ad1904db597bdb4427f3ead437aae86f54c

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:27:17 +00:00
Slava Shwartsman
a061f0eb65 ipoib: increase the non-cm queue length
When a packet needs fragmentation, it might generate more than 3 fragments.
With the queue length 3, all fragments are generated faster than the
queue is drained, which effectively drops fourth and later fragments on
the floor.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:26:47 +00:00
Slava Shwartsman
d705eff259 ipoib: Don't do a light flush when MTU is unchanged.
When changing the MTU of ibX network interfaces, check that the MTU was really
changed before requesting an update of the multicast rules. Else we might go
into an infinite loop joining and leaving ibX multicast groups towards the
opensm master interface.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:26:17 +00:00
Slava Shwartsman
099ad46e81 ipoib: correct setting MTU from inside ipoib(4).
It is not enough to set ifnet->if_mtu to change the interface MTU.
System saves the MTU for route in the radix tree, and route cache keeps
the interface MTU as well. Since addition of the multicast group causes
recalculation of MTU, even bringing the interface up changes MTU from
4042 to 1500, which makes the system configuration inconsistent. Worse,
ip_output() prefers route MTU over interface MTU, so large packets are
not fragmented and dropped on floor.

Fix it for ipoib(4) using the same approach (or hack) as was applied
for it_tun/if_tap in r339012.  Thanks to bz@ for giving the hint.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:25:47 +00:00
Slava Shwartsman
e13619b68b ibcore: Fix clearing of bound device interface.
Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:25:13 +00:00
Slava Shwartsman
a9c20af23d ibcore: ip6_dev_find() needs to know the scope ID.
Else the wrong network device can be returned for link-local addresses.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:43 +00:00
Slava Shwartsman
5ace00dffe ibcore: Fix sleeping in atomic when RoCE is used
A couple of places in the CM do

    spin_lock_irq(&cm_id_priv->lock);
    ...
    if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))

However when the underlying transport is RoCE, this leads to a sleeping function
being called with the lock held - the callchain is

    cm_alloc_response_msg() ->
      ib_create_ah_from_wc() ->
        ib_init_ah_from_wc() ->
          rdma_addr_find_l2_eth_by_grh() ->
            rdma_resolve_ip()

and rdma_resolve_ip() starts out by doing

    req = kzalloc(sizeof *req, GFP_KERNEL);

not to mention rdma_addr_find_l2_eth_by_grh() doing

    wait_for_completion(&ctx.comp);

to wait for the task that rdma_resolve_ip() queues up.

Fix this by moving the AH creation out of the lock.

Linux commit:
c76161181193985087cd716fdf69b5cb6cf9ee85

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:12 +00:00
Slava Shwartsman
0231669450 ibcore: Add missing unref of netdevice.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:23:44 +00:00
Slava Shwartsman
ae8534ca28 ibcore: Fix loopback with rdma-cm.
Trying to validate loopback fails because rtalloc1() resolves system
local addresses to the loopback network interface, lo0. Fix this by
explicitly checking for loopback during validation of the source
and destination network address. If the source address belongs to
a local network interface and is equal to the destination address,
there is no need to run the destination address through rtalloc1().

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:23:14 +00:00
Slava Shwartsman
f3cf3b7e84 ibcore: Make sure all VNETs are scanned for VLAN interfaces.
The master network interface and the VLANs may reside in different VNETs.
Make sure that all VNETs are searched when scanning for GID entries.

Submitted by:   netapp
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:22:43 +00:00
Slava Shwartsman
af60974508 ibcore: Always check return value from ib_init_ah_from_wc().
This prevents code from accepting RoCEv1 connections when
only ROCEv2 is enabled and vice versa.

Linux commit:
0c4386ec77cfcd0ccbdbe8c2e67dd3a49b2a4c7f

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:22:07 +00:00
Slava Shwartsman
9fc9810098 ibcore: Add missing check for failure.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:21:20 +00:00
Slava Shwartsman
33d7f9b8fb ibcore: Fix an array index check
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127
(byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class)
(which evaluates to 127).

Linux commit:
2fe2f378dd45847d2643638c07a7658822087836

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:20:51 +00:00
Slava Shwartsman
fe521b4443 ibcore: Check ib_find_pkey() return value.
Linux commit:
d3a2418ee36a59bc02e9d454723f3175dcf4bfd9

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:20:22 +00:00
Slava Shwartsman
4aa5230dc5 ibcore: Add support for IB_SPEED_HDR in sysfs rate printout.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:19:52 +00:00
Slava Shwartsman
475c8de7bf ibcore: Don't access invalid port.
The port number in the listen_id_priv has been observed to be zero which
means no port has been selected. The current code lacks a check for invalid
port number.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:19:21 +00:00
Slava Shwartsman
4b9b52a1bd ibcore: Discard unused error codes.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:18:50 +00:00
Slava Shwartsman
f628150b6c ibcore: Make sure GID index variable gets initialized.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:18:20 +00:00
Mark Johnston
79db6fe7aa Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
Hans Petter Selasky
7877f59352 Introduce and use sgid_index in CM requests in ibcore.
For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.

Now that we have the GID index available, introduce SGID index in
incoming CM requests and refer to the netdevice of it.

While at it fix some incorrect uses of init_net and make sure
the rdma_create_id() function stores the VNET it is passed.

Based on linux commit:
cee104334c98dd04e9dd4d9a4fa4784f7f6aada9

MFC after:		3 days
Approved by:		re (gjb)
Sponsored by:		Mellanox Technologies
2018-09-09 07:20:15 +00:00
Hans Petter Selasky
86156495f4 Implement get network interface by params function in ipoib.
Also fix the validate_ipv4_net_dev() and validate_ipv6_net_dev() functions
which had source and destination addresses swapped, and didn't set the
scope ID for IPv6 link-local addresses.

This allows applications like krping to work using IPoIB devices.

MFC after:		3 days
Approved by:		re (gjb)
Sponsored by:		Mellanox Technologies
2018-09-07 18:05:09 +00:00
Slava Shwartsman
b4df6efb4e ibcore: Fix endless loop in searching for matching VLAN device
In r337943 ifnet's if_pcp was set to the PCP value in use
instead of IFNET_PCP_NONE.
Current ibcore code assumes that if_pcp is IFNET_PCP_NONE with
VLAN interfaces so it can identify prio-tagged traffic.
Fix that by explicitly verifying that that the if_type is IFT_ETHER
and not IFT_L2VLAN.

MFC after:      3 days
Approved by:    re (Marius), hselasky (mentor), kib (mentor)
Sponsored by:   Mellanox Technologies
2018-09-06 12:26:57 +00:00