Commit Graph

331 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Brooks Davis
38d958a647 Improve copy-and-pasted versions of SIOCGIFADDR.
The original implementation used a reference to ifr_data and a cast to
do the equivalent of accessing ifr_addr. This was copied multiple
times since 1996.

Approved by:	kib
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14873
2018-03-27 20:51:49 +00:00
Hans Petter Selasky
5cd5781c75 Remove redundant integer cast in ibcore. The "ref_count" field already
has integer type.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-19 13:51:33 +00:00
Hans Petter Selasky
d4eeed42ba Make sure VNET is set when calling sa6_recoverscope() in ibcore.
Else panic will occur when VIMAGE is enabled.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:32:52 +00:00
Hans Petter Selasky
aa5962f908 Define values instead of using hardcoding.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:30:38 +00:00
Hans Petter Selasky
c131a22379 Recover IPv6 scope ID for multicast link-local addresses as well as
unicast link-local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:28:12 +00:00
Hans Petter Selasky
cc79d31d6f Embed the IPv6 scope ID before calling rtalloc1() in ibcore.
Else rtalloc1() will resolve to the loopback interface.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:25:40 +00:00
Hans Petter Selasky
d0a82c2414 Add IB_SPEED_HDR definition in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:01:00 +00:00
Hans Petter Selasky
d0a9dbc779 Make sure the IPv6 scope ID gets properly masked in ibcore.
When exchanging CM messages the IPv6 scope ID should be ignored
for link local addresses when doing comparisons. Make sure the
scope ID is always set to zero for link local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:58:51 +00:00
Hans Petter Selasky
03ae76a693 Fix for use-after-free when using delayed work structures in ibcore.
It is not enough to cancel delayed work structures before freeing.
Always cancel delayed work synchronously before freeing!

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:56:04 +00:00
Hans Petter Selasky
1456d97c01 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
Hans Petter Selasky
bf8641fedb Get correct network device when accepting incoming RDMA connections in ibcore.
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.

Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.

Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.

The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.

Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:24:30 +00:00
Hans Petter Selasky
676833dadb Pass valid if_index to rdma_addr_find_l2_eth_by_grh() in ibcore when possible.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:22:36 +00:00
Hans Petter Selasky
197461919d Add support for loopback in ibcore.
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 13:57:37 +00:00
Hans Petter Selasky
57b6d9f6ef Make sure to register the VLAN GIDs using the VLAN network interface
and not the parent one in ibcore. Else looking up the VLAN GIDs will
fail for VLAN IPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:39:34 +00:00
Hans Petter Selasky
891538abb5 Need to check for IPv6 linklocal address inside rdma_resolve_addr() in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:04:34 +00:00
Hans Petter Selasky
6d36a2c769 Map type of service, TOS, to IB or VLAN service level 1:1 in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:59:54 +00:00
Hans Petter Selasky
5b94bd8a69 Select RoCEv2 by default in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:58:37 +00:00
Hans Petter Selasky
703ea406d5 Make deletion of RoCE GID entries synchronous in ibcore.
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:57:26 +00:00
Hans Petter Selasky
42fa341d9c Add support for IPv6 link local GIDs equal to the default GID for
VLANs in ibcore.

IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.

Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:55:29 +00:00
Hans Petter Selasky
be5c019762 Do not add RoCEv2 default GID in ibcore when IPv6 is disabled to honor the
networking stack's IPv6 disabled setting. Else the offload HCA can start using
IPv6 packets for QPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:52:39 +00:00
Hans Petter Selasky
09938b2185 Add missing FreeBSD tags and SVN properties to ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:49:45 +00:00
Hans Petter Selasky
33ec1ccbae Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by:	Mellanox Technologies
2018-02-13 17:04:34 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Hans Petter Selasky
e5806bf466 Compile fix for LINT-NOIP kernel target.
Sponsored by:	Mellanox Technologies
2017-11-24 12:05:49 +00:00
Hans Petter Selasky
fd9e423c88 Make sure all tasks are cancelled synchronously in ipoib to avoid
use after free.

Sponsored by:	Mellanox Technologies
2017-11-24 09:55:20 +00:00
Hans Petter Selasky
68732efcd9 Build fix for ipoib when CONFIG_INFINIBAND_IPOIB_CM is defined.
Sponsored by:	Mellanox Technologies
2017-11-24 09:52:56 +00:00
Hans Petter Selasky
95ef56abc2 Build fix for kernel LINT target.
Sponsored by:	Mellanox Technologies
2017-11-24 09:12:13 +00:00
Hans Petter Selasky
82725ba9bf Merge ^/head r325999 through r326131. 2017-11-23 14:28:14 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Hans Petter Selasky
dd00abf2d7 Make sure the ib_wr_opcode enum is signed by adding a negative dummy element.
Different compilers may optimise the enum type in different ways. This ensures
coherency when range checking the value of enums in ibcore.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:51:37 +00:00
Hans Petter Selasky
3b8c4b3619 Make sure a valid VNET is set before trying to access the V_ip6_v6only
variable. Access the variable directly instead of going through the sysctl()
interface in the kernel.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:43:35 +00:00
Hans Petter Selasky
8dee9a7a44 Remove no longer supported mthca driver.
Sponsored by:	Mellanox Technologies
2017-11-13 10:59:38 +00:00
Hans Petter Selasky
f819030092 Merge ^/head r325505 through r325662. 2017-11-10 14:46:50 +00:00
Hans Petter Selasky
1529133ab3 Mark ipoib device as initialized on device open.
Set the IPOIB_FLAG_INITIALIZED on dev_open and clear it on dev_stop to
avoid a race between ipoib load and the underlying device driver.

The device module must dispatch the IB_EVENT_PORT_ACTIVE event before ipoib
module is loaded. Otherwise, the flush will fail since no one set the
IPOIB_FLAG_INITIALIZED.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 08:58:42 +00:00
Hans Petter Selasky
902a165f67 Make sure sin_zero is zero in ibcore. Else socket address maching using
bcmp() might fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:30:10 +00:00
Hans Petter Selasky
cee98bee7d Make sure the IPv6 scope ID gets zeroed when exchanging CMA messages in ibcore.
Else the IPv6 address matching might fail. This change adds support for both
embedded and non-embedded IPv6 scope IDs when passing a IPv6 link-local socket
address to RDMA. Prior to this change only global IPv6 addresses would work
with RDMA.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:27:29 +00:00
Hans Petter Selasky
860bbba0bb Multiple fixes for using IPv6 link-local addresses with RDMA in ibcore.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback
device, lo0, as the gateway interface. Currently RDMA loopback is
not supported.

2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces
with matching IPv4 and IPv6 addresses, respectivly.

3) In addr_resolve() make sure the "ifa" pointer is always set, also when
the "ifp" is NULL. Else a NULL pointer access might happen trying to
read from the "ifa" pointer later on.

4) In rdma_addr_find_dmac_by_grh() make sure the "bound_dev_if" field
gets set properly instead of passing the scope ID through the IPv6
socket address structure. This is more in line with upstream OFED
in Linux.

5) In rdma_addr_find_smac_by_sgid() there is no need to pass the
scope ID for IPv6. Either it is stored in the "bound_dev_if" field
or ip6_dev_find() will find the correct network device regardless
of the scope ID.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:22:43 +00:00
Hans Petter Selasky
c2c014f24c Merge ^/head r323559 through r325504. 2017-11-07 08:39:14 +00:00
Hans Petter Selasky
d05554bb99 The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by:		np @
Differential Revision:	https://reviews.freebsd.org/D12563
Sponsored by:		Mellanox Technologies
MFC after:		3 days
2017-10-20 08:20:15 +00:00
Hans Petter Selasky
aacb037742 Make sure the IPv6 scope ID gets zeroed inside the GID. Else searching for a
valid GID entry based on IPv6 addresses can fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-10-10 12:36:41 +00:00
Hans Petter Selasky
4051f0c8ed Temporary fix to avoid hitting kernel assert.
Don't dirty VM pages at this point.

Sponsored by:	Mellanox Technologies
2017-09-16 16:32:36 +00:00
Hans Petter Selasky
c4b28ce0b9 Remove no longer needed linux_poll_wakeup() calls. This is now handled by
"wake_up()" in the LinuxKPI. Accessing the file pointer directly might cause
use after free issues.

Sponsored by:	Mellanox Technologies
2017-09-16 16:31:30 +00:00
Hans Petter Selasky
526f596179 Embedding the scope ID is no longer needed for IPv6.
Sponsored by:	Mellanox Technologies
2017-09-16 16:28:48 +00:00
Hans Petter Selasky
e02ecc60b8 Set length field of socket address.
Sponsored by:	Mellanox Technologies
2017-09-16 16:28:19 +00:00
Hans Petter Selasky
6af166cc7b Fix for refcount leak.
Sponsored by:	Mellanox Technologies
2017-09-16 16:27:24 +00:00
Hans Petter Selasky
d18b4113ed Make sure the socket address length field gets set.
Sponsored by:	Mellanox Technologies
2017-09-16 16:26:46 +00:00
Hans Petter Selasky
bf00eaa12c Improve ibcore address resolving:
- Add more sanity checks.
- Preserve source port number when resolving address.
- Remove no longer needed scope ID hacks for IPv6.

Sponsored by:	Mellanox Technologies
2017-09-16 16:24:38 +00:00
Hans Petter Selasky
c69c74b892 Adapt the existing SDP ULP code to the new ibcore APIs.
Requested by:	Sobczak, Bartosz <bartosz.sobczak@intel.com>
Sponsored by:	Mellanox Technologies
2017-09-16 16:16:00 +00:00
Hans Petter Selasky
65c5a7a879 Remove unsafe access to the LinuxKPI file structure from ibcore.
selwakeup() is now done by the wake_up() family of functions.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 06:34:20 +00:00