Commit Graph

68 Commits

Author SHA1 Message Date
Hans Petter Selasky
1866c98e64 Infiniband clients must be attached and detached in a specific order in ibcore.
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.

Fix this by using module_xxx_order() instead of module_xxx().

Differential Revision:	https://reviews.freebsd.org/D23973
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-06 08:50:11 +00:00
Hans Petter Selasky
a26df270c9 Allow multicast packets to be received in promiscious mode, in mlx4en(4).
Make sure we disable the multicast filter in promiscious mode aswell as when
the all multicast flag is set.

MFC after:	1 week
Found by:	Tycho Nightingale <tychon@freebsd.org>
Sponsored by:	Mellanox Technologies
2020-06-17 11:12:10 +00:00
Ryan Moeller
cbb9ccf735 Avoid trying to toggle TSO twice
Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in
various NIC drivers.

Reviewed by:	hselasky, np, gallatin, jpaetzel
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25120
2020-06-15 16:35:27 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Hans Petter Selasky
f8806d0327 Widen EPOCH(9) usage in mlx4en(4).
Make sure all receive completion callbacks are covered by the network
EPOCH(9), because this is required when calling if_input() and
ether_input() after r357012.

Convert some spaces to tabs while at it.

Sponsored by:	Mellanox Technologies
2020-01-31 10:41:47 +00:00
Gleb Smirnoff
d35738c38d Enter the network epoch in RX processing taskqueue. 2020-01-25 00:06:18 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Gleb Smirnoff
20b26072b8 Convert to if_foreach_llmaddr() KPI.
Reviewed by:	hselasky
2019-10-14 20:23:16 +00:00
Conrad Meyer
e12be3218a Include eventhandler.h in more compilation units
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility.  Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.

(These CUs did not show up as missing eventhandler.h in tinderbox.)

X-MFC-With:	r347984
2019-05-21 01:18:43 +00:00
Hans Petter Selasky
cf59f7e108 Bump the Mellanox driver version numbers and the FreeBSD version number.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:15:07 +00:00
Hans Petter Selasky
6428c27faf Make sure to error out when arming the CQ fails in mlx4ib and mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:09 +00:00
Hans Petter Selasky
c29a65e6ec Eliminate useless warning message when reading sysctl node in mlx4core.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-11 14:34:25 +00:00
Hans Petter Selasky
6f490688f5 Improve support for switching to and from command polling mode in mlx4core.
Make sure the enter and leave polling routines can be called multiple times
with same setting. Ignore setting polling or event mode twice. This fixes a
deadlock during shutdown if polling mode was already selected.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-11 14:29:50 +00:00
Hans Petter Selasky
6b94b89bdb Teardown ifnet after stopping port in the mlx4en(4) driver.
mlx4_en_stop_port() calls mlx4_en_put_qp() which can refer the link level
address of the network interface, which in turn will be freed by the
network interface detach function. Make sure the port is stopped
before detaching the network interface.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-08 09:18:29 +00:00
Hans Petter Selasky
e0ba1be6d7 Don't hold state lock while detaching network device instance in mlx4en(4).
It can happen during shutdown that the lock will recurse when the mlx4en(4)
instance is part of a lagg interface. Call ether_ifdetach() unlocked.

Backtrace:
panic(): _sx_xlock_hard: recursed on non-recursive sx &mdev->state_lock
_sx_xlock_hard()
_sx_xlock()
mlx4_en_ioctl()
if_setlladdr()
lagg_port_destroy()
lagg_port_ifdetach()
if_detach()
mlx4_en_destroy_netdev()
mlx4_en_remove()
mlx4_remove_device()
mlx4_unregister_device()
mlx4_unload_one()
mlx4_shutdown()
linux_pci_shutdown()
bus_generic_shutdown()

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-08 09:16:29 +00:00
Slava Shwartsman
0f3b263d83 mlx4/mlx5: Updated driver version to 3.5.0
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:34 +00:00
Slava Shwartsman
8567696305 mlx4en: Optimise reception of small packets.
Copy small packets like TCP ACKs into a new mbuf
reusing the existing mbuf to receive a new ethernet
frame. This avoids wasting buffer space for
small sized packets.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:39:35 +00:00
Slava Shwartsman
6217a33f85 mlx4: Make sure default VNET is set when adding a new interface.
Adding an interface might be done outside the device_attach() routine
and will then cause a panic, due to the VNET not being defined.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:39:05 +00:00
Slava Shwartsman
ec673cf60b mlx4en: Remove duplicate statistics variable assignment.
The "priv->pkstats.rx_dropped" is written twice in a row.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:38:35 +00:00
Slava Shwartsman
93bf821652 mlx4en: Add support for receiving all data using one or more MCLBYTES sized mbufs.
Also when the MTU is greater than MCLBYTES.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:32:46 +00:00
Slava Shwartsman
63d7a8d9a8 mlx4en: Add support for netdump.
Implement the needed callback functions and support for polling the driver.

Differential Revision: https://reviews.freebsd.org/D15259
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:32:15 +00:00
Slava Shwartsman
b7d573c5a4 mlx4en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:31:45 +00:00
Slava Shwartsman
5dc2eaac65 mlx4en: Add driver version to sysctl desc
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:31:14 +00:00
Slava Shwartsman
c8aa689960 mlx4: Add board identifier and firmware version to sysctl
In last mlx4 update (r325841) we lost the sysctl to show the
firmware version for mlx4 devices.
Add both board identifier and firmware version under:
sys.device.mlx4_core0.hw sysctl node.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:30:48 +00:00
Slava Shwartsman
9024b80885 mlx4core: Add checks for invalid port numbers.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:30:16 +00:00
Slava Shwartsman
65ad766f36 mlx4: Zero initialize device capabilities to avoid use of uninitialized fields.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:29:46 +00:00
Slava Shwartsman
601e19f00a mlx4core: Avoid multiplication overflow by casting multiplication.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:29:16 +00:00
Hans Petter Selasky
2df98d5eec Add missing steering rules for virtual function, VF, in mlx4en(4) driver.
When acting as a VF it is required to add steering rules for all unicast
addresses. Even if promiscious mode is selected. Else incoming data packets
will be dropped.

MFC after:		3 days
Approved by:		re (gjb)
Sponsored by:		Mellanox Technologies
2018-10-08 14:52:21 +00:00
Hans Petter Selasky
f4546fa376 Add support for prio-tagged traffic for RDMA in ibcore.
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:11:53 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Hans Petter Selasky
87e303056f Bump version information in mlx4ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:59:46 +00:00
Hans Petter Selasky
c9a80d0289 The mlx4ib(4) should not be loaded before the ibcore is initialized.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:58:58 +00:00
Hans Petter Selasky
54b55cbdd9 Disable unsupported disassociate ucontext functionality in mlx4ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 13:57:32 +00:00
Andrey V. Elsukov
c2a5dc6cd7 Add mapping for several ethernet types used by Linux to FreeBSD
ethernet types.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14594
2018-03-06 12:58:00 +00:00
Hans Petter Selasky
1456d97c01 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
Eitan Adler
02ca39cff2 sys/dev/mlx[45]: fix uses of 1 << 31
Reviewed by:		kib (D13858)
2018-01-12 06:36:44 +00:00
Conrad Meyer
556127b650 mlx4: Remove redundant declarations to fix GCC build
These were made redundant in r325841.

Reviewed by:	hselasky
MFC after:	1 week (hselasky will MFC)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13401
2017-12-07 19:57:51 +00:00
Hans Petter Selasky
55b1c6e7e4 Merge ^/head r325663 through r325841. 2017-11-15 11:28:11 +00:00
Hans Petter Selasky
c3191c2e2b Update the mlx4 core and mlx4en(4) modules towards Linux v4.9.
Background:
The coming ibcore update forces an update of mlx4ib(4) which in turn requires
an updated mlx4 core module. This also affects the mlx4en(4) module because
commonly used APIs are updated. This commit is a middle step updating the
mlx4 modules towards the new ibcore.

This change contains no major new features.

Changes in mlx4:
  a) Improved error handling when mlx4 PCI devices are
  detached inside VMs.
  b) Major update of codebase towards Linux 4.9.

Changes in mlx4ib(4):
  a) Minimal changes needed in order to compile using the
  updated mlx4 core APIs.

Changes in mlx4en(4):
  a) Update flow steering code in mlx4en to use new APIs for
  registering MAC addresses and IP addresses.
  b) Update all statistics counters to be 64-bit.
  c) Minimal changes needed in order to compile using the
  updated mlx4 core APIs.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-15 11:14:39 +00:00
Hans Petter Selasky
8cc487045e Update mlx4ib(4) to Linux 4.9.
Sponsored by:	Mellanox Technologies
2017-11-13 10:49:18 +00:00
Hans Petter Selasky
d05554bb99 The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by:		np @
Differential Revision:	https://reviews.freebsd.org/D12563
Sponsored by:		Mellanox Technologies
MFC after:		3 days
2017-10-20 08:20:15 +00:00
Ryan Libby
70da35b745 mlx4: use enum constants instead of const vars for case exprs
Follow up from r324201 to fix compilation with gcc, which complains
about non-ICE case expressions.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D12675
2017-10-14 23:25:44 +00:00
Hans Petter Selasky
62afbce910 Setup mbuf hash type properly when receiving IP packets in the mlx4en(4) driver.
Submitted by:		sephe@
Differential Revision:	https://reviews.freebsd.org/D12229
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-10-02 12:11:43 +00:00
Hans Petter Selasky
29d6b8abb4 Implement SIOCGIFRSS{KEY,HASH} for the mlx4en(4) driver.
Differential Revision:	https://reviews.freebsd.org/D12176
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-10-02 12:05:38 +00:00
Hans Petter Selasky
768a720e95 Print maximum MTU when trying to set invalid MTU in the mlx4en(4) driver.
Useful for debugging.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:32:51 +00:00
Hans Petter Selasky
a3d0173d98 Increment queue drops in the network statistics when transmitted packets
are dropped by the mlx4en(4) driver.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:30:55 +00:00
Hans Petter Selasky
f7833544f1 Add support for RX and TX statistics when the mlx4en(4) PCI device
is in VF or SRIOV mode typically in a virtual machine environment.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:27:21 +00:00
Hans Petter Selasky
1a59bf5f7a Fix for mlx4en(4) to properly call m_defrag().
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.

This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.

Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.

The counters are per-ring to avoid excessive cache misses in the
TX path.

Submitted by:		mjoras@
Differential Revision:	https://reviews.freebsd.org/D11683
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:35:02 +00:00
Hans Petter Selasky
3f38293d29 Remove some dead statistics related code and a structure field from the
mlx4en driver which is used by its Linux counterpart, but not under
FreeBSD.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-07-31 12:09:24 +00:00