Commit Graph

498 Commits

Author SHA1 Message Date
Hans Petter Selasky
765bd83cca Fix NULL pointer dereference during device removal in ibcore.
As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:

Linux commit:
870201f95fcbd19538aef630393fe9d583eff82e

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:16:54 +00:00
Hans Petter Selasky
ed222171a9 Fix access to non-initialized CM_ID object in ibcore.
The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.

Linux commit:
7688f2c3bbf55e52388e37ac5d63ca471a7712e1

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:15:50 +00:00
Hans Petter Selasky
91f0e6e1d8 Avoid that ib_drain_qp() triggers an out-of-bounds stack access in ibcore.
Linux commit:
a1ae7d0345edd593d6725d3218434d903a0af95d

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:14:20 +00:00
Hans Petter Selasky
7477a89ae2 Ensure that CM_ID exists prior to access it in ibcore.
Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.

Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.

Linux commit:
e8980d67d6017c8eee8f9c35f782c4bd68e004c9

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:13:11 +00:00
Hans Petter Selasky
f4546fa376 Add support for prio-tagged traffic for RDMA in ibcore.
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:11:53 +00:00
Hans Petter Selasky
682bd3fe12 Set default GID type as RoCE when resolving RoCE route in ibcore.
cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.

Linux commit:
5c181bda77f409d89ad513528eccac5f3a416474

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:09:17 +00:00
Hans Petter Selasky
b70db3278f Set RoCEv2 MGID according to spec in ibcore.
RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding
IPv4 address is encoded into the GID according to the following rule:
GID= :ffff:<IPv4 address>

Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it
zeroed and change rdma_is_multicast_addr() to consider the new logic.

Linux commit:
be1d325a335840a86c133a56c6a911c368bac0fd
1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:07:36 +00:00
Hans Petter Selasky
71698382e3 For multicast functions in ibcore, verify that LIDs are multicast LIDs.
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).

Add check to verify that the MLID value is in the correct address
range.

RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a
multicast group, if the QP is associated with a RoCE port, the
multicast group MLID is unused and is ignored.

During attach or detach multicast, when the QP is associated with a
port, it is enough to check the port's link layer and validate the
LID only if it is Infiniband. Otherwise, avoid validating the
multicast LID.

Linux commit:
8561eae60ff9417a50fa1fb2b83ae950dc5c1e21
5236333592244557a19694a51337df6ac018f0a7

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:04:36 +00:00
Hans Petter Selasky
fed17c5852 Fix for RDMA loopback over VLAN in ibcore.
Implement a more generic solution for detecting loopback.
The problem was that the default netdevice was resolved
for loopback also when VLAN was used. Use real network
device instead of loopback device for bound device
interface.

How to test:
ucmatose -b 127.0.0.1 -p 20090
ucmatose -s 5.6.5.1 -p 20090

Note that RDMA treats the IPv4 and IPv6 loopback
addresses like any address.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:02:29 +00:00
Hans Petter Selasky
f9899e4567 Add native FreeBSD support for multicast in ibcore.
This change adds support for registering multicast addresses,
both IPv4 and IPv6.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 08:59:34 +00:00
Hans Petter Selasky
7a658d6545 If the MGID/MLID pair is not on the list return an error in ibcore.
A list of MGID/MLID pairs is built when doing a multicast attach.  When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.

If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned.  Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.

Linux commit:
20c7840a77ddcb2ed2fbd66e8197db2868495751

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 08:54:40 +00:00
Hans Petter Selasky
ef24f05897 Add lock to multicast handlers in ibcore.
When two handlers used the same object in the old schema, we blocked
the process in the kernel. The new schema just returns -EBUSY. This
could lead to different behaviour in applications between the old
schema and the new schema. In most cases, using such handlers
concurrently could lead to crashing the process. For example, if
thread A destroys a QP and thread B modifies it, we could have the
destruction happens before the modification. In this case, we are
accessing freed memory which could lead to crashing the process.
This is true for most cases. However, attaching and detaching
a multicast address from QP concurrently is safe. Therefore, we
preserve the original behaviour by adding a lock there.

Linux commit:
f48b726920d96dcd1860df06143bdea7d6d7dcc3

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 08:52:29 +00:00
Hans Petter Selasky
8b767bd7d7 Only update source address when resolving is successful in ibcore.
When resolving an IP address in ibcore, only update the source address
upon normal completion. The ibcore address resolve function does not
care about the scope ID value of the IPv6 link-local addresses and expects
this information has already been extracted into the bound_dev_if field.
Because the same IPv6 link-local address can exist on multiple interfaces
the ibcore address resolver gets confused and returns ENETUNREACH.

Instead of updating both source address and bound_dev_if just keep the
address set to any address until resolving completes. For the sake of code
symmetry a similar change has been applied to the IPv4 address resolve path.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 08:48:30 +00:00
Hans Petter Selasky
e0cba2d201 Process address resolve requests at least one time per second in ibcore.
When setting a large address resolve timeout it was observed that the
address resolving would succeed at the timeout and not when the address
was available. Make sure the address resolving requests are processed no
slower than one time every second.

While at it use "int" for jiffies instead of "unsigned long" to match
FreeBSD ticks.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 08:34:49 +00:00
Hans Petter Selasky
3ea947d615 Revert r335094 and properly fix OFED build after r335053.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-06-14 07:55:10 +00:00
Matt Macy
67b3b4d245 fix OFED build after r335053 2018-06-13 23:30:54 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Brooks Davis
38d958a647 Improve copy-and-pasted versions of SIOCGIFADDR.
The original implementation used a reference to ifr_data and a cast to
do the equivalent of accessing ifr_addr. This was copied multiple
times since 1996.

Approved by:	kib
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14873
2018-03-27 20:51:49 +00:00
Hans Petter Selasky
5cd5781c75 Remove redundant integer cast in ibcore. The "ref_count" field already
has integer type.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-19 13:51:33 +00:00
Hans Petter Selasky
d4eeed42ba Make sure VNET is set when calling sa6_recoverscope() in ibcore.
Else panic will occur when VIMAGE is enabled.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:32:52 +00:00
Hans Petter Selasky
aa5962f908 Define values instead of using hardcoding.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:30:38 +00:00
Hans Petter Selasky
c131a22379 Recover IPv6 scope ID for multicast link-local addresses as well as
unicast link-local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:28:12 +00:00
Hans Petter Selasky
cc79d31d6f Embed the IPv6 scope ID before calling rtalloc1() in ibcore.
Else rtalloc1() will resolve to the loopback interface.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:25:40 +00:00
Hans Petter Selasky
d0a82c2414 Add IB_SPEED_HDR definition in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:01:00 +00:00
Hans Petter Selasky
d0a9dbc779 Make sure the IPv6 scope ID gets properly masked in ibcore.
When exchanging CM messages the IPv6 scope ID should be ignored
for link local addresses when doing comparisons. Make sure the
scope ID is always set to zero for link local addresses.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:58:51 +00:00
Hans Petter Selasky
03ae76a693 Fix for use-after-free when using delayed work structures in ibcore.
It is not enough to cancel delayed work structures before freeing.
Always cancel delayed work synchronously before freeing!

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 12:56:04 +00:00
Hans Petter Selasky
1456d97c01 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
Hans Petter Selasky
bf8641fedb Get correct network device when accepting incoming RDMA connections in ibcore.
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.

Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.

Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.

The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.

Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:24:30 +00:00
Hans Petter Selasky
676833dadb Pass valid if_index to rdma_addr_find_l2_eth_by_grh() in ibcore when possible.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:22:36 +00:00
Hans Petter Selasky
197461919d Add support for loopback in ibcore.
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 13:57:37 +00:00
Hans Petter Selasky
57b6d9f6ef Make sure to register the VLAN GIDs using the VLAN network interface
and not the parent one in ibcore. Else looking up the VLAN GIDs will
fail for VLAN IPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:39:34 +00:00
Hans Petter Selasky
891538abb5 Need to check for IPv6 linklocal address inside rdma_resolve_addr() in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 12:04:34 +00:00
Hans Petter Selasky
6d36a2c769 Map type of service, TOS, to IB or VLAN service level 1:1 in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:59:54 +00:00
Hans Petter Selasky
5b94bd8a69 Select RoCEv2 by default in ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:58:37 +00:00
Hans Petter Selasky
703ea406d5 Make deletion of RoCE GID entries synchronous in ibcore.
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:57:26 +00:00
Hans Petter Selasky
42fa341d9c Add support for IPv6 link local GIDs equal to the default GID for
VLANs in ibcore.

IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.

Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:55:29 +00:00
Hans Petter Selasky
be5c019762 Do not add RoCEv2 default GID in ibcore when IPv6 is disabled to honor the
networking stack's IPv6 disabled setting. Else the offload HCA can start using
IPv6 packets for QPs.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:52:39 +00:00
Hans Petter Selasky
09938b2185 Add missing FreeBSD tags and SVN properties to ibcore.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 11:49:45 +00:00
Hans Petter Selasky
33ec1ccbae Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by:	Mellanox Technologies
2018-02-13 17:04:34 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Hans Petter Selasky
e5806bf466 Compile fix for LINT-NOIP kernel target.
Sponsored by:	Mellanox Technologies
2017-11-24 12:05:49 +00:00
Hans Petter Selasky
fd9e423c88 Make sure all tasks are cancelled synchronously in ipoib to avoid
use after free.

Sponsored by:	Mellanox Technologies
2017-11-24 09:55:20 +00:00
Hans Petter Selasky
68732efcd9 Build fix for ipoib when CONFIG_INFINIBAND_IPOIB_CM is defined.
Sponsored by:	Mellanox Technologies
2017-11-24 09:52:56 +00:00
Hans Petter Selasky
95ef56abc2 Build fix for kernel LINT target.
Sponsored by:	Mellanox Technologies
2017-11-24 09:12:13 +00:00
Hans Petter Selasky
82725ba9bf Merge ^/head r325999 through r326131. 2017-11-23 14:28:14 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Hans Petter Selasky
dd00abf2d7 Make sure the ib_wr_opcode enum is signed by adding a negative dummy element.
Different compilers may optimise the enum type in different ways. This ensures
coherency when range checking the value of enums in ibcore.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:51:37 +00:00
Hans Petter Selasky
3b8c4b3619 Make sure a valid VNET is set before trying to access the V_ip6_v6only
variable. Access the variable directly instead of going through the sysctl()
interface in the kernel.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:43:35 +00:00
Hans Petter Selasky
8dee9a7a44 Remove no longer supported mthca driver.
Sponsored by:	Mellanox Technologies
2017-11-13 10:59:38 +00:00
Hans Petter Selasky
f819030092 Merge ^/head r325505 through r325662. 2017-11-10 14:46:50 +00:00
Hans Petter Selasky
1529133ab3 Mark ipoib device as initialized on device open.
Set the IPOIB_FLAG_INITIALIZED on dev_open and clear it on dev_stop to
avoid a race between ipoib load and the underlying device driver.

The device module must dispatch the IB_EVENT_PORT_ACTIVE event before ipoib
module is loaded. Otherwise, the flush will fail since no one set the
IPOIB_FLAG_INITIALIZED.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 08:58:42 +00:00
Hans Petter Selasky
902a165f67 Make sure sin_zero is zero in ibcore. Else socket address maching using
bcmp() might fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:30:10 +00:00
Hans Petter Selasky
cee98bee7d Make sure the IPv6 scope ID gets zeroed when exchanging CMA messages in ibcore.
Else the IPv6 address matching might fail. This change adds support for both
embedded and non-embedded IPv6 scope IDs when passing a IPv6 link-local socket
address to RDMA. Prior to this change only global IPv6 addresses would work
with RDMA.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:27:29 +00:00
Hans Petter Selasky
860bbba0bb Multiple fixes for using IPv6 link-local addresses with RDMA in ibcore.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback
device, lo0, as the gateway interface. Currently RDMA loopback is
not supported.

2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces
with matching IPv4 and IPv6 addresses, respectivly.

3) In addr_resolve() make sure the "ifa" pointer is always set, also when
the "ifp" is NULL. Else a NULL pointer access might happen trying to
read from the "ifa" pointer later on.

4) In rdma_addr_find_dmac_by_grh() make sure the "bound_dev_if" field
gets set properly instead of passing the scope ID through the IPv6
socket address structure. This is more in line with upstream OFED
in Linux.

5) In rdma_addr_find_smac_by_sgid() there is no need to pass the
scope ID for IPv6. Either it is stored in the "bound_dev_if" field
or ip6_dev_find() will find the correct network device regardless
of the scope ID.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-09 19:22:43 +00:00
Hans Petter Selasky
c2c014f24c Merge ^/head r323559 through r325504. 2017-11-07 08:39:14 +00:00
Hans Petter Selasky
d05554bb99 The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by:		np @
Differential Revision:	https://reviews.freebsd.org/D12563
Sponsored by:		Mellanox Technologies
MFC after:		3 days
2017-10-20 08:20:15 +00:00
Hans Petter Selasky
aacb037742 Make sure the IPv6 scope ID gets zeroed inside the GID. Else searching for a
valid GID entry based on IPv6 addresses can fail.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-10-10 12:36:41 +00:00
Hans Petter Selasky
4051f0c8ed Temporary fix to avoid hitting kernel assert.
Don't dirty VM pages at this point.

Sponsored by:	Mellanox Technologies
2017-09-16 16:32:36 +00:00
Hans Petter Selasky
c4b28ce0b9 Remove no longer needed linux_poll_wakeup() calls. This is now handled by
"wake_up()" in the LinuxKPI. Accessing the file pointer directly might cause
use after free issues.

Sponsored by:	Mellanox Technologies
2017-09-16 16:31:30 +00:00
Hans Petter Selasky
526f596179 Embedding the scope ID is no longer needed for IPv6.
Sponsored by:	Mellanox Technologies
2017-09-16 16:28:48 +00:00
Hans Petter Selasky
e02ecc60b8 Set length field of socket address.
Sponsored by:	Mellanox Technologies
2017-09-16 16:28:19 +00:00
Hans Petter Selasky
6af166cc7b Fix for refcount leak.
Sponsored by:	Mellanox Technologies
2017-09-16 16:27:24 +00:00
Hans Petter Selasky
d18b4113ed Make sure the socket address length field gets set.
Sponsored by:	Mellanox Technologies
2017-09-16 16:26:46 +00:00
Hans Petter Selasky
bf00eaa12c Improve ibcore address resolving:
- Add more sanity checks.
- Preserve source port number when resolving address.
- Remove no longer needed scope ID hacks for IPv6.

Sponsored by:	Mellanox Technologies
2017-09-16 16:24:38 +00:00
Hans Petter Selasky
c69c74b892 Adapt the existing SDP ULP code to the new ibcore APIs.
Requested by:	Sobczak, Bartosz <bartosz.sobczak@intel.com>
Sponsored by:	Mellanox Technologies
2017-09-16 16:16:00 +00:00
Hans Petter Selasky
65c5a7a879 Remove unsafe access to the LinuxKPI file structure from ibcore.
selwakeup() is now done by the wake_up() family of functions.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-09-09 06:34:20 +00:00
Mark Johnston
97310754e5 Fix indentation.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-09-07 19:15:31 +00:00
Hans Petter Selasky
b40951b8cd Change reject message type when destroying cm_id in ibore.
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.

Obtained from:	Linux
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:31:10 +00:00
Hans Petter Selasky
44d8a0fc60 Ticks are 32-bit in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:18:25 +00:00
Hans Petter Selasky
bca9d05fdb Merge ^/head r319973 through 321382. 2017-07-23 15:22:06 +00:00
Hans Petter Selasky
9f715dc162 ibcore: Delete old files and add new ones missed in the initial commit
for this projects branch.

Sponsored by:	Mellanox Technologies
2017-07-03 08:32:52 +00:00
Hans Petter Selasky
c19d65049e ibcore: Fix for compiling header files from user-space.
Sponsored by:	Mellanox Technologies
2017-07-03 08:30:43 +00:00
Hans Petter Selasky
0faccd643c ibcore: Fix for accessing invalid network device.
Sponsored by:	Mellanox Technologies
2017-07-03 08:29:49 +00:00
Hans Petter Selasky
ddc6ee3792 ibcore: Make sure all GID entries are deleted upon a network device
departure.

Sponsored by:	Mellanox Technologies
2017-07-03 08:28:35 +00:00
Mark Johnston
4eb18346d1 Avoid including list.h in LinuxKPI headers.
list.h includes a number of FreeBSD headers as a workaround for the
LIST_HEAD name collision. To reduce pollution, avoid including list.h
in commonly used headers when it is not explicitly needed.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11249
2017-06-18 16:43:57 +00:00
Hans Petter Selasky
478d300572 Initial RoCE/infiniband kernel update to Linux v4.9.
This patch currently supports:
- ibcore as a kernel module only
- krping as a kernel module only
- ipoib as a kernel module only

Sponsored by:	Mellanox Technologies
2017-06-15 12:47:48 +00:00
Mark Johnston
e804572d2b Fix indentation.
MFC after:	1 week
2017-06-14 16:55:23 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Hans Petter Selasky
82d0140707 Add full VNET support to the inet_get_local_port_range() function in
the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-22 15:46:31 +00:00
Gleb Smirnoff
817e1ad9d0 Make sdp compilable after r315662. 2017-03-21 09:07:05 +00:00
Hans Petter Selasky
404027276b Add basic support for VIMAGE to the LinuxKPI and ibcore.
Support is implemented by mapping Linux's "struct net" into FreeBSD's
"struct vnet". Currently only vnet0 is supported by ibcore.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-16 09:59:35 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Navdeep Parhar
017296dbb6 cxgbe/iw_cxgbe: fix various double-close panics with iWARP sockets.
Sockets representing the TCP endpoints for iWARP connections are
allocated by the ibcore module.  Before this revision they were closed
either by the ibcore module or the iw_cxgbe hardware driver depending on
the state transitions during connection teardown.  This is error prone
and there were cases where both iw_cxgbe and ibcore closed the socket
leading to double-free panics.  The fix is to let ibcore close the
sockets it creates and never do it in the driver.

- Use sodisconnect instead of soclose (preceded by solinger = 0) in the
  driver to tear down an RDMA connection abruptly.  This does what's
  intended without releasing the socket's fd reference.

- Close the socket in ibcore when the iWARP iw_cm_id is destroyed.  This
  works for all kinds of sockets: clients that initiate connections,
  listeners, and sockets accepted off of listeners.

Reviewed by:	Steve Wise @ Open Grid Computing, hselasky@
MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D9796
2017-02-28 19:27:41 +00:00
Navdeep Parhar
3d4f452402 Avoid NULL dereference in a couple of sysctl handlers in ibcore.
iw_cxgbe sets ib_device->dma_device to NULL (since r311880).

Reviewed by:	hselasky@
Sponsored by:	Chelsio Communications
2017-02-23 07:48:58 +00:00
Hans Petter Selasky
97549c34ec Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4
like other PCI network drivers. The sys/ofed directory is now mainly
reserved for generic infiniband code, with exception of the mthca driver.

- Add new manual page, mlx4en(4), describing how to configure and load
mlx4en.

- All relevant driver C-files are now prefixed mlx4, mlx4_en and
mlx4_ib respectivly to avoid object filename collisions when compiling
the kernel. This also fixes an issue with proper dependency file
generation for the C-files in question.

- Device mlxen is now device mlx4en and depends on device mlx4, see
mlx4en(4). Only the network device name remains unchanged.

- The mlx4 and mlx4en modules are now built by default on i386 and
amd64 targets. Only building the mlx4ib module depends on
WITH_OFED=YES .

Sponsored by:	Mellanox Technologies
2016-09-30 08:23:06 +00:00
Hans Petter Selasky
499b089076 Set hardware stats flag to avoid double counting the number of incoming bytes.
Found by:	Ben RUBSON <ben.rubson@gmail.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-09-29 16:36:32 +00:00
Navdeep Parhar
a5234e8ccb Do not free an uninitialized pointer on soaccept failure in the iWARP
connection manager.

Sponsored by:	Chelsio Communications
2016-08-26 08:25:28 +00:00
Hans Petter Selasky
7dc445f8d3 Add support for setting blocking and non-blocking mode on /dev/rdma_cm
by returning success on FIONBIO and FIOASYNC IOCTLs. The actual flags
handling is done by the kern_ioctl() function.

Reported by:	Alex Bowden <alex.bowden@outlook.com>
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-08-18 08:49:02 +00:00
Mark Johnston
f1c1f188f7 mthca: Add a wrapper for the firmware's DIAG_RPRT command.
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-08-05 21:34:09 +00:00
Mark Johnston
49fe7b5029 ipoib: Bound the number of egress mbufs buffered during pathrec lookups.
In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.

This addresses the same issue as commit 1e85b806f9 in Linux.

Reviewed by:	cem, ngie
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 22:22:11 +00:00
Mark Johnston
4e071758a7 MFV be9130cc9: "IB/cma: Check for GID on listening devices first"
This is an optimization that improves IB connection setup times.

Discussed with:	hselasky
Obtained from:	Linux
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 20:29:09 +00:00
Mark Johnston
82f1d3ea2f MFV 29f27e847: "IB/cma: Use cached gids"
This addresses a regression from an earlier upstream change which caused
cma_acquire_dev() to bypass the port GID cache and instead query the HCA
for each entry in its GID table. These queries can become extremely slow on
multiport devices, which has a negative impact on connection setup times.

Discussed with:	hselasky
Obtained from:	Linux
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-08-01 20:27:11 +00:00
Mark Johnston
adaf3c4906 sdp: Destroy the RDMA ID after destroying the connection's queue pair.
This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 21:03:02 +00:00
Mark Johnston
d3461164e0 sdp: Use malloc(9) instead of the Linux compat layer.
SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 21:01:04 +00:00
Mark Johnston
f66ec15275 sdp: Use the correct socket buffer in sdp_post_recvs_needed().
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:54:43 +00:00
Mark Johnston
7e968dab6d sdp: Always free received control packets after they're handled.
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:51:52 +00:00
Mark Johnston
3b3e6d882f Fix the KASSERT format string arguments after r303507. 2016-07-29 20:48:42 +00:00
Mark Johnston
a3c0b052b7 sdp: Use the PCB as the rx completion handler argument.
The generic socket may be detached from the PCB before the completion
queue is drained and destroyed, so this change closes a race condition
in connection teardown.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:39:32 +00:00
Mark Johnston
fa46ade837 sdp: Destroy the PCB lock before freeing to the zone.
Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:36:01 +00:00
Mark Johnston
2cefa87b0b sdp: Use an mbufq for received control packets.
This is simpler than the hand-rolled queue, and fixes a use-after-free.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:35:04 +00:00
Mark Johnston
30a71b3c30 sdp: Remove Linux build files.
They aren't useful here, and Linux seems to have largely abandoned SDP
anyway.

Sponsored by:	EMC / Isilon Storage Division
2016-07-29 20:33:43 +00:00
Navdeep Parhar
9e2d05841e Fix bug in iwcm that caused a panic in iw_cm_wq when krping is run
repeatedly in a tight loop.

Approved by:	re (gjb@)
Obtained from:	hselasky@ (part of larger changes in D5791)
2016-06-14 20:58:05 +00:00
Sepherosa Ziehau
36ad8372d4 net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties
Reviewed by:	hps, erj, tuexen
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6688
2016-06-07 04:51:50 +00:00
George V. Neville-Neil
fd9e88e0f9 Fix up the Infiniband code to handle the new arpresolve. 2016-06-02 20:53:43 +00:00
Hans Petter Selasky
fa201e28fc Prepare for activation of LinuxKPI module parameters as read-only
tunable SYSCTL's. Linux module parameters are associated with the
module they belong to. FreeBSD does not share this concept of a parent
module. Instead add macros which define the prefix to use for the
module parameters in the LinuxKPI consumers.

While at it convert all "bool" LinuxKPI module parameters to "byte"
type, because we don't have a "bool" type of SYSCTL in FreeBSD.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-05-25 12:03:21 +00:00
Eitan Adler
cef367e6a1 Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
Pedro F. Giffuni
a5a274d0d4 sys/ofed: minor spelling fix.
No functional change.

Reviewed by:	hselasky
2016-05-06 15:37:06 +00:00
Pedro F. Giffuni
bf5cba36db ofed/drivers: minor spelling fixes.
No functional change.

Reviewed by:	hselasky
2016-05-06 15:16:13 +00:00
Bjoern A. Zeeb
77eab110a9 Fix NOIP kernels to compile. 2016-04-24 15:56:05 +00:00
Hans Petter Selasky
0bab509b94 More fixes for using IPv6 addresses with RDMA:
- Added check that the SCOPE ID is only restored for IPv6 linklocal
  addresses.

- Changes made by r237263 in the "cma_bind_addr()" function did not
  check if the socket address was of type IPv6 and used the IPv4
  socket address for IPv6 addresses. This caused the function to
  fail. Fixed this.

- In the "rdma_gid2ip()" function and some other places the "sin6_len"
  and "sin6_scope_id" fields were not set for IPv6 socket
  addresses. Fixed this.

- The scope ID is not stored as part of the GID entries and must be
  passed as an argument to "rdma_gid2ip()".

- Added new method to "struct ib_device" which returns a pointer to
  the network interface which belongs to the given infiniband
  device. This is needed to be able to get the scope ID for IPv6
  addresses via the associated ethernet interface.

- Added convenience function, "rdma_get_ipv6_scope_id()", to get the
  scope ID for IPv6 addresses.

- Implemented new "get_netdev" method for mlx4ib. Other IB controller
  drivers which want to support IPv6 addresses needs to implement this
  aswell.

- Bumped the FreeBSD version due to changing "struct ib_device".

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-22 18:16:12 +00:00
Hans Petter Selasky
c3a74bf6d7 Add KASSERT() and set error code in dead code case to help static code
analysis tools.

Suggested by:	ngie@
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-22 06:39:07 +00:00
Hans Petter Selasky
10a7a4bf44 Add missing set of the current VNET when inputting IP packets in IPoIB.
This fixes a kernel panic when using IPoIB with VIMAGE and infiniband.

PR:		208957
Sponsored by:	Mellanox Technologies
Tested by:	Justin Clift <justin@postgresql.org>
MFC after:	1 week
2016-04-22 06:33:06 +00:00
Hans Petter Selasky
150c88d471 Fix for using IPv6 addresses with RDMA:
IPv6 addresses has a scope ID which sometimes is stored in the
"sin6_scope_id" field of "struct sockaddr_in6" and sometimes as part
of the IPv6 address itself depending on the context. If the scope ID
is not in the expected location, the IPv6 address lookups in the
so-called GID table will fail. Some code factoring has been made to
achieve a clean exit of the "addr_resolve" function via a common
"done" label.

Sponsored by:	Mellanox Technologies
Submitted by:	Shani Michaeli <shanim@mellanox.com>
MFC after:	1 week
2016-04-21 16:33:42 +00:00
Hans Petter Selasky
a296502fe0 Fix for resolving mac address when the destination address is a gateway.
Remove some dead code while at it.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-21 16:04:58 +00:00
Hans Petter Selasky
53219aa88a Properly setup arguments for if_resolvemulti() callback.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-21 11:32:22 +00:00
Hans Petter Selasky
03815ec1db Fix inverted priv check calls. Priv check returns zero on success and
an error code on failure. Refer to man 9 priv_check .

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-20 07:44:50 +00:00
Pedro F. Giffuni
d4a2eaef39 ofed: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.
Reviewed by:	hselasky
2016-04-15 12:16:15 +00:00
Hans Petter Selasky
ebcd5b5b31 Remove some unused fields.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-14 14:11:32 +00:00
Hans Petter Selasky
b03aadaffc Ensure the received IP header gets 32-bits aligned.
The FreeBSD's TCP/IP stack assumes that the IP-header is 32-bits aligned
when decoding it. Else unaligned 32-bit memory access can happen, which
not all processor architectures support.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-14 14:10:40 +00:00
Hans Petter Selasky
617e1ff301 Add missing port_up checks.
When downing a mlxen network adapter we need to check the port_up variable
to ensure we don't continue to transmit data or restart timers which can
reside in freed memory.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-04-14 14:01:28 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Sepherosa Ziehau
6dd38b8716 tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c

Reviewed by:	gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5725
2016-04-01 06:28:33 +00:00
Navdeep Parhar
0ff41bb737 Remove unnecessary dequeue_mutex (added in r294610) from the iWARP
connection manager.  Examining so_comp without synchronization with
iw_so_event_handler is a harmless race.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	Steve Wise @ Open Grid Computing
Sponsored by:	Chelsio Communications
2016-03-30 01:08:08 +00:00
Hans Petter Selasky
9795b2c6ce Add missing curly brackets in for loop.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-03-17 12:30:21 +00:00
Hans Petter Selasky
1f20c68e07 Use hardware computed Toeplitz hash for incoming flowids
Use the Toeplitz hash value as source for the flowid. This makes the
hash value more suitable for so-called hash bucket algorithms which
are used in the FreeBSD's TCP/IP stack when RSS is enabled.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-03-15 15:53:37 +00:00
Hans Petter Selasky
9319562b86 Fix witness panic in the ipoib_ioctl() function when unloading the
ipoib module.

The bpfdetach() function is trying to turn off promiscious mode on the
network interface it is attached to while holding a mutex. The fix
consists of ignoring any further calls to the ipoib_ioctl() function
when the network interface is going to be detached. The ipoib_ioctl()
function might sleep.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-03-15 15:47:26 +00:00
John Baldwin
47cedcbd72 Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem.
Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5515
2016-03-11 23:18:06 +00:00
Hans Petter Selasky
96608f1ff4 Whitespace fixes.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-04 09:07:30 +00:00
Hans Petter Selasky
c7c96d1093 LinuxKPI list updates:
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
  iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
  to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
  to reflect they are LinuxKPI internal functions.

Obtained from:	Linux
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 15:12:31 +00:00
Navdeep Parhar
097f289f25 Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.

Submitted by:	Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D4801
2016-01-22 23:33:34 +00:00
Alexander V. Chernikov
36402a681f Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
  to check if this particular mbuf needs to be dropped (and what
  error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
  regardless of RTF_HOST flag existence. This is due to upcoming routing
  changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
  to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
  error value. In fact, the only place where this result is propagated
  is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
  RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
  rte flags to ro_flags. Call this function in ip_output() after looking up/
  verifying rte.

Reviewed by:	ae
2016-01-09 16:34:37 +00:00
Gleb Smirnoff
829fae9063 Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.

Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
2016-01-08 19:03:20 +00:00
Hans Petter Selasky
b9320e2a6a Remove unused file. 2016-01-07 09:40:19 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Hans Petter Selasky
c5e5a55f8f Fix i386 build WITH_OFED=YES. Remove some redundant KASSERTs.
Suggested by:	kib, ian
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-12-04 18:20:55 +00:00
Enji Cooper
4722f6ef28 Fix scope of bridge_header and bridge_pcix_cap in mthca_reset(..)
They're only used in the __linux__ case

Differential Revision: https://reviews.freebsd.org/D4332
MFC after: 1 week
Reported by: cppcheck
Reviewed by: hselasky
Sponsored by: EMC / Isilon Storage Division
2015-12-04 09:01:58 +00:00
Hans Petter Selasky
3d23c0a436 Convert the mlxen driver to use the BUSDMA(9) APIs instead of
vtophys() when loading mbufs for transmission and reception. While at
it all pointer arithmetic and cast qualifier issues were fixed, mostly
related to transmission and reception.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4284
2015-12-03 14:56:17 +00:00
Hans Petter Selasky
6111807106 Updated the mlx4 and mlxen drivers to the latest version, v2.1.6:
- Added support for dumping the SFP EEPROM content to dmesg.
- Fixed handling of network interface capability IOCTLs.
- Fixed race when loading and unloading the mlxen driver by applying
  appropriate locking.
- Removed two unused C-files.

MFC after:	1 week
Submitted by:	Mark Bloch <markb@mellanox.com>
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4283
2015-12-03 13:29:20 +00:00
Hans Petter Selasky
3884ff1831 Add some defines needed by the coming mlx5 infiniband support.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-11-24 12:11:56 +00:00
Enji Cooper
ae9356f143 Don't leak work if __mlx4_register_vlan(..) fails in
mlx4_master_immediate_activate_vlan_qos(..)

MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D4203
Submitted by: Miles Olrich <miles.olrich@isilon.com>
Sponsored by: EMC / Isilon Storage Division
2015-11-19 01:08:16 +00:00
Hans Petter Selasky
0f5150a757 Fix integer to pointer of different size conversion warnings when
using GCC for 32-bit platforms. The integer size in this case is
hardcoded 64-bit while the pointer size is 32-bit.

Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 10:12:20 +00:00
Hans Petter Selasky
3143f07779 Fix print formatting compile warnings for Sparc64 and PowerPC platforms.
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 09:56:25 +00:00
Hans Petter Selasky
8d59ecb214 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
Hans Petter Selasky
2c8d721186 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-27 12:21:15 +00:00
Hans Petter Selasky
aac7caaf47 Add support for binding IRQs to CPUs in the LinuxKPI. The new function
added is for BSD only and does not exist in Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-10-26 13:28:34 +00:00
Hans Petter Selasky
dfcc270f25 Build fix for MIPS.
Sponsored by:	Mellanox Technologies
2015-10-26 09:34:43 +00:00
Hans Petter Selasky
63ec90e212 Build fix for non-i386 and non-amd64 platforms.
Sponsored by:	Mellanox Technologies
2015-10-23 14:52:05 +00:00
Hans Petter Selasky
2da3897d01 Rename linuxapi[.ko] into linuxkpi[.ko], to reflect that it is a
kernel programming interface module, KPI, to avoid confusion with the
existing Linux userspace binary compatibility shims. Bump the
FreeBSD_version number.

Reviewed by:	np @
Suggested by:	dumbbell @
Sponsored by:	Mellanox Technologies
2015-10-22 09:50:45 +00:00
Hans Petter Selasky
03ae38081c Remove all comments deriving from Linux.
Minor rework of ilog2() function.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 09:37:34 +00:00
Hans Petter Selasky
6f2fc610dd Remove all comments deriving from Linux. Style file for FreeBSD.
Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 08:51:49 +00:00
Hans Petter Selasky
29a2e474c0 Reimplement header file, remove all comments deriving from Linux and
update copyright to 2-clause BSD.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 07:59:46 +00:00
Hans Petter Selasky
382d6bebd3 Move location of RCS keyword according to style.
Suggested by:	jhb @
Sponsored by:	Mellanox Technologies
2015-10-20 19:08:26 +00:00
Hans Petter Selasky
f89453cfb9 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 16:02:11 +00:00
Hans Petter Selasky
65324421da Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 15:28:02 +00:00
Hans Petter Selasky
c0a8182919 Add missing dash to copyright clause.
Sponsored by:	Mellanox Technologies
2015-10-20 11:42:00 +00:00
Hans Petter Selasky
77320fe897 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 11:40:04 +00:00
Hans Petter Selasky
3f862b56a1 Merge LinuxKPI changes from DragonflyBSD:
- Remove redundant NBLONG macro and use BIT_WORD()
  and BIT_MASK() instead.
- Correctly define BIT_MASK() according to Linux and
  update all users of this macro.
- Add missing GENMASK() macro.
- Remove all comments deriving from Linux.

Sponsored by:	Mellanox Technologies
2015-10-20 09:13:35 +00:00
Hans Petter Selasky
a4b5fa85df The returned value from vm_fault_disable_pagefaults() must be stored
and passed to vm_fault_enable_pagefaults(). Else possible recursion on
the state can be lost.

Sponsored by:	Mellanox Technologies
Suggested by:	kib @
2015-10-19 16:03:08 +00:00
Hans Petter Selasky
2ca3cc5132 Merge LinuxKPI changes from DragonflyBSD:
- Redefine DIV_ROUND_UP as a function macro taking two arguments
  instead of none.
- Implement more Linux kernel functions related to various forms
  of DELAY() and basic mathematical operations.

Sponsored by:	Mellanox Technologies
2015-10-19 12:44:41 +00:00
Hans Petter Selasky
e490164bee Merge LinuxKPI changes from DragonflyBSD:
- Implement more Linux kernel functions.

Sponsored by:	Mellanox Technologies
2015-10-19 12:33:09 +00:00
Hans Petter Selasky
f556cede8a Merge LinuxKPI changes from DragonflyBSD:
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.

Reviewed by:	np @
Sponsored by:	Mellanox Technologies
2015-10-19 12:26:38 +00:00
Hans Petter Selasky
35d974cd0c Merge LinuxKPI changes from DragonflyBSD:
- Map more Linux compiler related defines to FreeBSD ones.

Sponsored by:	Mellanox Technologies
2015-10-19 12:08:06 +00:00
Hans Petter Selasky
f940cc8ffc Map two more Linux error return codes to FreeBSD ones.
Sponsored by:	Mellanox Technologies
2015-10-19 12:04:20 +00:00
Hans Petter Selasky
af5648c465 Implement IS_ERR_OR_NULL() function.
Sponsored by:	Mellanox Technologies
2015-10-19 12:00:52 +00:00
Hans Petter Selasky
2404bdddf1 Merge LinuxKPI changes from DragonflyBSD:
- Add more list related functions and macros.
- Update the hlist_for_each_entry() macro to take one less argument.

Sponsored by:	Mellanox Technologies
2015-10-19 11:57:33 +00:00
Hans Petter Selasky
64bda586e1 Merge LinuxKPI changes from DragonflyBSD:
- Reimplement ktime header file to distinguish more from Linux.
- Add new time header file to handle time related Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 11:46:48 +00:00
Hans Petter Selasky
ecfc226c7d Fix compile warning.
Sponsored by:	Mellanox Technologies
2015-10-19 11:29:50 +00:00
Hans Petter Selasky
1610bf8edf Merge LinuxKPI changes from DragonflyBSD:
- Reimplement math64 header file to distinguish more from Linux.

Sponsored by:	Mellanox Technologies
2015-10-19 11:16:38 +00:00
Hans Petter Selasky
96e8192d3c Merge LinuxKPI changes from DragonflyBSD:
- Whitespace fixes.

Sponsored by:	Mellanox Technologies
2015-10-19 11:11:15 +00:00
Hans Petter Selasky
b526833859 Merge LinuxKPI changes from DragonflyBSD:
- Avoid using PAGE_MASK, because Linux defines it differently.
  Use (PAGE_SIZE - 1) instead.
- Add support for for_each_sg_page() and sg_page_iter_dma_address().

Sponsored by:	Mellanox Technologies
2015-10-19 11:09:51 +00:00
Hans Petter Selasky
e6d1c6e382 Merge LinuxKPI changes from DragonflyBSD:
- Implement schedule_timeout().

Sponsored by:	Mellanox Technologies
2015-10-19 10:57:56 +00:00
Hans Petter Selasky
da7c18e051 Merge LinuxKPI changes from DragonflyBSD:
- Implement pagefault_disable() and pagefault_enable().

Sponsored by:	Mellanox Technologies
2015-10-19 10:56:32 +00:00
Hans Petter Selasky
dad154ab93 Merge LinuxKPI changes from DragonflyBSD:
- Added support for multiple new Linux functions.
- Properly implement DEFINE_WAIT() and init_waitqueue_head() macros.
- Removed FreeBSD specific __wait_queue_head structure definition.

Sponsored by:	Mellanox Technologies
2015-10-19 10:54:24 +00:00
Hans Petter Selasky
7c50bc1cf6 Merge LinuxKPI changes from DragonflyBSD:
- Some minor whitespace fixes.
- Added support for two new Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 10:49:15 +00:00
Alexander V. Chernikov
fb373bc2b1 Fix build broken by r287861.
Spotted by:	zb
2015-09-16 15:40:08 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Mark Johnston
4af587d062 Ensure that the MAD agent's delayed taskqueue is completely stopped
before proceeding. Otherwise, nothing prevents it from running after the
MAD agent struct has been been freed, and this results in a use-after-free
when the task's ta_pending count is incremented in the callout handler.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-09-15 23:56:31 +00:00
John Baldwin
188458ea7c Currently the Linux character device mmap handling only supports mmap
operations that map a single page that has an associated vm_page_t.
This does not permit mapping larger regions (such as a PCI memory
BAR) and it does not permit mapping addresses beyond the top of RAM
(such as a 64-bit BAR located above the top of RAM).

Instead of using a single OBJT_DEVICE object and passing the physaddr via
the offset as a hack, create a new sglist and OBJT_SG object for each
mmap request. The requested memory attribute is applied to the object
thus affecting all pages mapped by the request.

Reviewed by:	hselasky, np
MFC after:	1 week
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D3386
2015-09-03 18:27:39 +00:00
Navdeep Parhar
6b5c8394f1 Reinstate unify_tcp_port_space and associated code that was lost during
the last OFED update (r278886).

iWARP on FreeBSD is properly integrated with the network stack and the
iWARP drivers _never_ operate out of any private TCP port-space that is
invisible to the kernel.  Instead, an iWARP connection shows up as a TCP
socket (which is what it is) fully visible to the kernel and standard
tools like netstat, sockstat, etc.
2015-08-12 22:09:58 +00:00
Mark Johnston
8811063172 ipv4_is_zeronet() and ipv4_is_loopback() expect an address in network
order, but IN_ZERONET and IN_LOOPBACK expect it in host order.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-08-07 18:30:11 +00:00
Hans Petter Selasky
5884383f19 Avoid calling into the random subsystem before it is initialized.
Sponsored by:	Mellanox Technologies
2015-08-04 09:45:10 +00:00
Mark Johnston
e2e45da0e8 ib mad: fix an incorrect use of list_for_each_entry
In tf_dequeue(), if we reach the end of the list without finding a
non-cancelled element, "tmp" will be a pointer into the list head, so the
tmp->canceled check is bogus. Use a flag instead.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3244
2015-07-30 18:28:37 +00:00
Hans Petter Selasky
49557d2481 Fix broken implementation of "kvasprintf()" function by adding missing
kmalloc() call. Make function global instead of static inline to fix
compiler warnings about passing variable argument lists to inline
functions.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-07-03 11:16:20 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Gleb Smirnoff
dfd828e931 Add SIOCGI2C ioctl support to the driver. Would work only on ConnectX-3
with fresh firmware. The low level code is based on code provided by
Mellanox.

Thanks to Mellanox and their distributor Must (http://mustcompany.ru)
for providing hardware.

In collaboration with:	Andre Melkoumian <andre mellanox.com>
Reviewed by:		hselasky
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2015-05-27 13:42:28 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Hans Petter Selasky
d27c74649b Apply proper locking when iterating the multicast addresses and add a
missing check for NULL from a non-blocking "kzalloc()" function call.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Found by:	glebius @
2015-05-12 11:52:34 +00:00
Mark Johnston
760a181bb2 msecs_to_jiffies() is implemented using tvtohz(9), which always returns a
positive value since it adds the current tick to its result. This differs
from the behaviour in Linux, whose implementation does not add the extra
tick, so subtract the extra tick in the OFED compat layer implementation.
This addresses some incorrect handling of IB MAD timeouts, since some IB
code depends on msecs_to_jiffies(0) returning 0.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-10 22:21:00 +00:00
Mark Johnston
979b8eaf4b find_next_bit() and find_next_zero_bit(): if the caller-specified offset
lies within the last block of the bit set and no bits are set beyond the
offset, terminate the search immediately instead of continuing as though
there are further blocks in the set and subsequently returning an incorrect
result.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-10 22:04:42 +00:00
Mark Johnston
86d4dbf1cb Don't drop the idr lock before verifying that the newly-inserted element
is present in the tree. Otherwise there exists a window during which the
element could be removed by another thread, triggering an incorrect
assertion failure.

Reviewed by:	jeff
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-02 00:26:38 +00:00
Mateusz Guzik
90f54cbfeb fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
Hans Petter Selasky
8de7453501 Fix variable casting:
- Jiffies or ticks in FreeBSD have integer type and are not long.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-03-27 19:08:11 +00:00
Hans Petter Selasky
932493ff29 Fixes for the LinuxAPI completion wrappers:
- make sure the timeout computations are always above zero by using
the existing "linux_timer_jiffies_until()" function. Negative timeouts
can result in undefined behaviour.
- declare all completion functions like external symbols and move the
code to the LinuxAPI kernel module.
- add a proper prefix to all LinuxAPI kernel functions to avoid
namespace collision with other parts of the FreeBSD kernel.
- clean up header file inclusions in the linux/completion.h, linux/in.h
and linux/fs.h header files.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-03-27 16:16:23 +00:00
Hans Petter Selasky
2162d4f09e Add missing void pointer argument to SYSINIT() functions.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-03-18 10:50:10 +00:00
Hans Petter Selasky
54fe6f6bdf Fix problems about 32-bit ticks wraparound and unsigned long
conversion:
- The linux compat API layer casts the ticks to unsigned long which
might cause problems when the ticks value is negative.
- Guard against already expired ticks values, by checking if the
passed expiry tick is already elapsed.
- While at it avoid referring the address of an inlined function.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-03-18 10:49:17 +00:00
Hans Petter Selasky
805b1f609d Declare missing symbol and inline macro which is only used once.
MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
Submitted by:	glebius@
2015-03-18 08:46:08 +00:00
Hans Petter Selasky
b7ba031ff7 Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00