Commit Graph

287 Commits

Author SHA1 Message Date
hselasky
e949bc6053 Make sure the ib_wr_opcode enum is signed by adding a negative dummy element.
Different compilers may optimise the enum type in different ways. This ensures
coherency when range checking the value of enums in ibcore.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:51:37 +00:00
hselasky
e78622921e Make sure a valid VNET is set before trying to access the V_ip6_v6only
variable. Access the variable directly instead of going through the sysctl()
interface in the kernel.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:43:35 +00:00
hselasky
4226738767 Mark ipoib device as initialized on device open.
Set the IPOIB_FLAG_INITIALIZED on dev_open and clear it on dev_stop to
avoid a race between ipoib load and the underlying device driver.

The device module must dispatch the IB_EVENT_PORT_ACTIVE event before ipoib
module is loaded. Otherwise, the flush will fail since no one set the
IPOIB_FLAG_INITIALIZED.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 08:58:42 +00:00
hselasky
fd6624a0c7 Make sure sin_zero is zero in ibcore. Else socket address maching using
bcmp() might fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:30:10 +00:00
hselasky
7c33ef4c3c Make sure the IPv6 scope ID gets zeroed when exchanging CMA messages in ibcore.
Else the IPv6 address matching might fail. This change adds support for both
embedded and non-embedded IPv6 scope IDs when passing a IPv6 link-local socket
address to RDMA. Prior to this change only global IPv6 addresses would work
with RDMA.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:27:29 +00:00
hselasky
e56611297b Multiple fixes for using IPv6 link-local addresses with RDMA in ibcore.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback
device, lo0, as the gateway interface. Currently RDMA loopback is
not supported.

2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces
with matching IPv4 and IPv6 addresses, respectivly.

3) In addr_resolve() make sure the "ifa" pointer is always set, also when
the "ifp" is NULL. Else a NULL pointer access might happen trying to
read from the "ifa" pointer later on.

4) In rdma_addr_find_dmac_by_grh() make sure the "bound_dev_if" field
gets set properly instead of passing the scope ID through the IPv6
socket address structure. This is more in line with upstream OFED
in Linux.

5) In rdma_addr_find_smac_by_sgid() there is no need to pass the
scope ID for IPv6. Either it is stored in the "bound_dev_if" field
or ip6_dev_find() will find the correct network device regardless
of the scope ID.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:22:43 +00:00
hselasky
9f6549f2c3 The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by:		np @
Differential Revision:	https://reviews.freebsd.org/D12563
Sponsored by:		Mellanox Technologies
MFC after:		3 days
2017-10-20 08:20:15 +00:00
hselasky
15d7b05109 Make sure the IPv6 scope ID gets zeroed inside the GID. Else searching for a
valid GID entry based on IPv6 addresses can fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-10-10 12:36:41 +00:00
hselasky
3ef9976c11 Remove unsafe access to the LinuxKPI file structure from ibcore.
selwakeup() is now done by the wake_up() family of functions.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 06:34:20 +00:00
markj
04598ed48c Fix indentation.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-09-07 19:15:31 +00:00
hselasky
8b14081d49 Change reject message type when destroying cm_id in ibore.
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.

Obtained from:	Linux
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:31:10 +00:00
hselasky
67b8c14d85 Ticks are 32-bit in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:18:25 +00:00
markj
c8d732dce1 Avoid including list.h in LinuxKPI headers.
list.h includes a number of FreeBSD headers as a workaround for the
LIST_HEAD name collision. To reduce pollution, avoid including list.h
in commonly used headers when it is not explicitly needed.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11249
2017-06-18 16:43:57 +00:00
markj
bacd054b8a Fix indentation.
MFC after:	1 week
2017-06-14 16:55:23 +00:00
glebius
e35d543ec1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
glebius
5763443023 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
hselasky
316a5c4d7f Add full VNET support to the inet_get_local_port_range() function in
the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-22 15:46:31 +00:00
glebius
5aec73cf18 Make sdp compilable after r315662. 2017-03-21 09:07:05 +00:00
hselasky
f9c6d3f8a5 Add basic support for VIMAGE to the LinuxKPI and ibcore.
Support is implemented by mapping Linux's "struct net" into FreeBSD's
"struct vnet". Currently only vnet0 is supported by ibcore.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-16 09:59:35 +00:00
imp
7e6cabd06e Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
np
30189d73e6 cxgbe/iw_cxgbe: fix various double-close panics with iWARP sockets.
Sockets representing the TCP endpoints for iWARP connections are
allocated by the ibcore module.  Before this revision they were closed
either by the ibcore module or the iw_cxgbe hardware driver depending on
the state transitions during connection teardown.  This is error prone
and there were cases where both iw_cxgbe and ibcore closed the socket
leading to double-free panics.  The fix is to let ibcore close the
sockets it creates and never do it in the driver.

- Use sodisconnect instead of soclose (preceded by solinger = 0) in the
  driver to tear down an RDMA connection abruptly.  This does what's
  intended without releasing the socket's fd reference.

- Close the socket in ibcore when the iWARP iw_cm_id is destroyed.  This
  works for all kinds of sockets: clients that initiate connections,
  listeners, and sockets accepted off of listeners.

Reviewed by:	Steve Wise @ Open Grid Computing, hselasky@
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D9796
2017-02-28 19:27:41 +00:00
np
4d1144ae43 Avoid NULL dereference in a couple of sysctl handlers in ibcore.
iw_cxgbe sets ib_device->dma_device to NULL (since r311880).

Reviewed by:	hselasky@
Sponsored by:	Chelsio Communications
2017-02-23 07:48:58 +00:00
hselasky
5e41da7ccd Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4
like other PCI network drivers. The sys/ofed directory is now mainly
reserved for generic infiniband code, with exception of the mthca driver.

- Add new manual page, mlx4en(4), describing how to configure and load
mlx4en.

- All relevant driver C-files are now prefixed mlx4, mlx4_en and
mlx4_ib respectivly to avoid object filename collisions when compiling
the kernel. This also fixes an issue with proper dependency file
generation for the C-files in question.

- Device mlxen is now device mlx4en and depends on device mlx4, see
mlx4en(4). Only the network device name remains unchanged.

- The mlx4 and mlx4en modules are now built by default on i386 and
amd64 targets. Only building the mlx4ib module depends on
WITH_OFED=YES .

Sponsored by:	Mellanox Technologies
2016-09-30 08:23:06 +00:00
hselasky
c7bc609204 Set hardware stats flag to avoid double counting the number of incoming bytes.
Found by:	Ben RUBSON <ben.rubson@gmail.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-09-29 16:36:32 +00:00
np
c31c1ab37f Do not free an uninitialized pointer on soaccept failure in the iWARP
connection manager.

Sponsored by:	Chelsio Communications
2016-08-26 08:25:28 +00:00
hselasky
814240b454 Add support for setting blocking and non-blocking mode on /dev/rdma_cm
by returning success on FIONBIO and FIOASYNC IOCTLs. The actual flags
handling is done by the kern_ioctl() function.

Reported by:	Alex Bowden <alex.bowden@outlook.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-08-18 08:49:02 +00:00
markj
583afc921b mthca: Add a wrapper for the firmware's DIAG_RPRT command.
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-08-05 21:34:09 +00:00
markj
c7299da72a ipoib: Bound the number of egress mbufs buffered during pathrec lookups.
In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.

This addresses the same issue as commit 1e85b806f9 in Linux.

Reviewed by:	cem, ngie
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 22:22:11 +00:00
markj
562f522390 MFV be9130cc9: "IB/cma: Check for GID on listening devices first"
This is an optimization that improves IB connection setup times.

Discussed with:	hselasky
Obtained from:	Linux
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 20:29:09 +00:00
markj
ad041049e3 MFV 29f27e847: "IB/cma: Use cached gids"
This addresses a regression from an earlier upstream change which caused
cma_acquire_dev() to bypass the port GID cache and instead query the HCA
for each entry in its GID table. These queries can become extremely slow on
multiport devices, which has a negative impact on connection setup times.

Discussed with:	hselasky
Obtained from:	Linux
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 20:27:11 +00:00
markj
e4eaaa52a3 sdp: Destroy the RDMA ID after destroying the connection's queue pair.
This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 21:03:02 +00:00
markj
0206d7e0fa sdp: Use malloc(9) instead of the Linux compat layer.
SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 21:01:04 +00:00
markj
f3daddeba9 sdp: Use the correct socket buffer in sdp_post_recvs_needed().
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:54:43 +00:00
markj
adcdba2810 sdp: Always free received control packets after they're handled.
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:51:52 +00:00
markj
8cab2b8026 Fix the KASSERT format string arguments after r303507. 2016-07-29 20:48:42 +00:00
markj
36e6c3bcd9 sdp: Use the PCB as the rx completion handler argument.
The generic socket may be detached from the PCB before the completion
queue is drained and destroyed, so this change closes a race condition
in connection teardown.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:39:32 +00:00
markj
7ccd054937 sdp: Destroy the PCB lock before freeing to the zone.
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:36:01 +00:00
markj
f9f7369f3f sdp: Use an mbufq for received control packets.
This is simpler than the hand-rolled queue, and fixes a use-after-free.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:35:04 +00:00
markj
62a56fa648 sdp: Remove Linux build files.
They aren't useful here, and Linux seems to have largely abandoned SDP
anyway.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:33:43 +00:00
np
1610db9e87 Fix bug in iwcm that caused a panic in iw_cm_wq when krping is run
repeatedly in a tight loop.

Approved by:	re (gjb@)
Obtained from:	hselasky@ (part of larger changes in D5791)
2016-06-14 20:58:05 +00:00
sephe
7acd138965 net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties
Reviewed by:	hps, erj, tuexen
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6688
2016-06-07 04:51:50 +00:00
gnn
ada12d916d Fix up the Infiniband code to handle the new arpresolve. 2016-06-02 20:53:43 +00:00
hselasky
0d8f1c25a2 Prepare for activation of LinuxKPI module parameters as read-only
tunable SYSCTL's. Linux module parameters are associated with the
module they belong to. FreeBSD does not share this concept of a parent
module. Instead add macros which define the prefix to use for the
module parameters in the LinuxKPI consumers.

While at it convert all "bool" LinuxKPI module parameters to "byte"
type, because we don't have a "bool" type of SYSCTL in FreeBSD.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-05-25 12:03:21 +00:00
eadler
156fd4834a Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
pfg
6d50fbd9f2 sys/ofed: minor spelling fix.
No functional change.

Reviewed by:	hselasky
2016-05-06 15:37:06 +00:00
pfg
4d8fd4b25e ofed/drivers: minor spelling fixes.
No functional change.

Reviewed by:	hselasky
2016-05-06 15:16:13 +00:00
bz
7e82be83a8 Fix NOIP kernels to compile. 2016-04-24 15:56:05 +00:00
hselasky
63bfa65a47 More fixes for using IPv6 addresses with RDMA:
- Added check that the SCOPE ID is only restored for IPv6 linklocal
  addresses.

- Changes made by r237263 in the "cma_bind_addr()" function did not
  check if the socket address was of type IPv6 and used the IPv4
  socket address for IPv6 addresses. This caused the function to
  fail. Fixed this.

- In the "rdma_gid2ip()" function and some other places the "sin6_len"
  and "sin6_scope_id" fields were not set for IPv6 socket
  addresses. Fixed this.

- The scope ID is not stored as part of the GID entries and must be
  passed as an argument to "rdma_gid2ip()".

- Added new method to "struct ib_device" which returns a pointer to
  the network interface which belongs to the given infiniband
  device. This is needed to be able to get the scope ID for IPv6
  addresses via the associated ethernet interface.

- Added convenience function, "rdma_get_ipv6_scope_id()", to get the
  scope ID for IPv6 addresses.

- Implemented new "get_netdev" method for mlx4ib. Other IB controller
  drivers which want to support IPv6 addresses needs to implement this
  aswell.

- Bumped the FreeBSD version due to changing "struct ib_device".

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-22 18:16:12 +00:00
hselasky
e05cd97f08 Add KASSERT() and set error code in dead code case to help static code
analysis tools.

Suggested by:	ngie@
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-22 06:39:07 +00:00
hselasky
8e21c4207a Add missing set of the current VNET when inputting IP packets in IPoIB.
This fixes a kernel panic when using IPoIB with VIMAGE and infiniband.

PR:		208957
Sponsored by:	Mellanox Technologies
Tested by:	Justin Clift <justin@postgresql.org>
MFC after:	1 week
2016-04-22 06:33:06 +00:00