Commit Graph

95 Commits

Author SHA1 Message Date
Slava Shwartsman
8718eb63f8 mlx5: Add software tx_jumbo_packets counter
This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:37 +00:00
Slava Shwartsman
8f7f07368d mlx5: Move hw.mlx5 node definition to mlx5_core.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:07 +00:00
Slava Shwartsman
07b624ed71 mlx5: Discard unused return values.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:06 +00:00
Hans Petter Selasky
16ae32f927 Add support for receive side scaling stride, RSSS, in mlx5en(4).
The receive side scaling stride parameter is a value which define the interval
between active receive side queues. The traffic for the inactive queues is
redirected to the nearest active queue by use of modulus. The default value
of this parameter is one, which means all receive side queues are used.

The point of this feature is to redirect more traffic to fewer receive side
queues in order to take more advantage of sorted large receive offload,
sorted LRO. The sorted LRO works better when more packets are accumulated
per service interval.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:28:06 +00:00
Hans Petter Selasky
0be5034007 Don't stall transmit queue on drops in mlx5en(4).
When a transmitted packet is dropped don't stall the transmit queue.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:19:36 +00:00
Hans Petter Selasky
2d32b0a304 Maximum number of mbuf frags is off-by-one for worst case scenario in mlx5en(4).
Inspecting the PRM no more than 0x3F data segments, DS, of size 16 bytes is
allowed.

Worst case scenario summary of DS usage:
Header is fixed:	2 DS
Maximum inlining:	98 => (98 - 2) / 16 = 6 DS
Remainder:		0x3F - 2 - 6 = 55 DS (mbuf frags)

Previously a value of 56 DS was used and this would work in the
normal case because not all inline data area was used up.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 11:06:07 +00:00
Hans Petter Selasky
7b9b93a8dd Update version information for the mlx5 and mlx5en(4) modules.
While at it bump some copyright dates.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:12:53 +00:00
Hans Petter Selasky
0539900214 Do not inline transmit headers and use HW VLAN tagging if supported by mlx5en(4).
Query the minimal inline mode supported by the card.
When creating a send queue, cache the queried mode and optimize the transmit
if no inlining is required.  In this case, we can avoid touching the headers
cache line and avoid dirtying several more lines by copying headers into
the send WQEs.  Also, if no inline headers are used, hardware assists in
the VLAN tag framing.

Submitted by:		kib@, slavash@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:03:30 +00:00
Hans Petter Selasky
90c8e44125 Use a mbuf header instead of a mbuf cluster for debugging interrupts in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:53:37 +00:00
Hans Petter Selasky
aa9f073c9b Add missing newline.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:43:43 +00:00
Hans Petter Selasky
2f17f76aa4 Handle jumbo frames without requiring big clusters in mlx5en(4).
The scatter list is formed by the chunks of MCLBYTES each, and larger
than default packets are returned to the stack as the mbuf chain.

Submitted by:		kib@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:42:05 +00:00
Hans Petter Selasky
f8c3349737 Enable both receive and transmit pauseframes by default in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:21:02 +00:00
Hans Petter Selasky
f2b4782c81 Add context numbers for HW elements in mlx5en(4).
To access the data, set sysctl dev.mce.N.conf.debug_stats to 1.
This enables the sysctl node dev.mce.N.hw_ctx_debug.  Its content is
the mapping of each channel' number to used receive queue and associated
completion queue, set of the transmit queues numbers and corresponding
completion queues.

Trimmed example output:
channel 30 rq 188 cq 1085
channel 30 tc 0 sq 187 cq 1084
channel 31 rq 191 cq 1087
channel 31 tc 0 sq 190 cq 1086

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:18:01 +00:00
Hans Petter Selasky
a880c1ff6a Do not hint about 'trust both' mode when the mlx5en(4) hardware does not support it.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:11:30 +00:00
Hans Petter Selasky
f0474ab919 Correctly write atomic variable in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:08:40 +00:00
Hans Petter Selasky
ed0cee0bf4 Implement support for Differentiated Service Code Point, DSCP, in mlx5en(4).
The DSCP feature is controlled using a set of sysctl(8) fields under
the qos sysctl directory entry for mlx5en(4).

For Routable RoCE QPs, the DSCP should be set in the QP's address path.
The DSCP's value is derived from the traffic class.

Linux commit:
ed88451e1f2d400fd6a743d0a481631cf9f97550

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:56:40 +00:00
Hans Petter Selasky
38535d6cab Add support for hardware rate limiting to mlx5en(4).
The hardware rate limiting feature is enabled by the RATELIMIT kernel
option. Please refer to ifconfig(8) and the txrtlmt option and the
SO_MAX_PACING_RATE set socket option for more information. This
feature is compatible with hardware transmit send offload, TSO.

A set of sysctl(8) knobs under dev.mce.<N>.rate_limit are provided to
setup the ratelimit table and also to fine tune various rate limit
related parameters.

Sponsored by:	Mellanox Technologies
2018-05-29 14:04:57 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Konstantin Belousov
952e75c763 mlx5en: Always allow VLAN id 0.
According to the 802.1Q-2014 9.6 VLAN Tag Control Information, VID value 0
means that there is no VLAN tag assigned to the packet, and only PCP and
DEI values from the tag are meaningful.  Current flow table programming
filter out such packets.

When programming VLAN filter for flow table, unconditionally add rule which
accept packets with VLAN id 0.  The packets are already handled correctly
by the network stack.

Reviewed by:	hselasky, slavash
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-05-02 20:22:03 +00:00
Hans Petter Selasky
28cfdee769 Bump driver version number in mlx5en(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-04-04 10:45:06 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Hans Petter Selasky
c0cea51b46 Don't wait for completions when a mlx5en(4) device is in internal
error state.

If the device is in internal error state the hardware will not
generate completions. Just move on to destroy the resources.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:38:12 +00:00
Hans Petter Selasky
177781564f Create designated workqueue for each mlx5en(4) device instance.
The mlx5e_destroy_ifp() function may be called from the system workqueue and
in this case trying to flush all works will cause a dead lock.
Instead of using the system workqueue, create a designated workqueue
for each mlx5en(4) device instance.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 16:59:51 +00:00
Hans Petter Selasky
9cb83c4689 Avoid more LFENCE/SFENCe on x86 in mlx5en(4),
by using the FreeBSD native fences.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:58:30 +00:00
Hans Petter Selasky
7d69d339ad Fix mlx5en(4) driver to properly call m_defrag().
When the mlx5en(4) driver was converted to using BUSDMA(9) the call to
m_defrag() was moved after the part of the TX routine that strips the
header from the mbuf chain. Before it called m_defrag it first trimmed
off the now-empty mbufs from the start of the chain. This has the side
effect of also removing the head of the chain that has M_PKTHDR set.
m_defrag() will not defrag a chain that does not have M_PKTHDR set,
thus it was effectively never defragging the mbuf chains.

As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.

Differential Revision:	https://reviews.freebsd.org/D12050
Submitted by:	mjoras@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:53:04 +00:00
Hans Petter Selasky
1eb09ad434 Use vport rather than physical-port MTU in mlx5en(4).
Set and report vport MTU rather than physical MTU,
The driver will set both vport and physical port mtu
and will rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Based on Linux upstream commit:
cd255efff9baadd654d6160e52d17ae7c568c9d3

Submitted by:	Meny Yossefi <menyy@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:47:17 +00:00
Hans Petter Selasky
7c22ae80d2 Use the device unit number for naming the ifnet interface in mlx5en(4).
Currently the ifnet interface is named mceX, where X is a monotonically
incremented value. If the device is reset due to a fatal error, then the
interface name will change.  Using the device unit number will keep the
naming consistent across the reset logic.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:43:41 +00:00
Hans Petter Selasky
10b0804509 Add support for per priority flow control, PFC, to mlx5en(4).
Add support for PFC and implement reading the per priority statistics
using the sysctl(8) interface. PFC is used together with VLAN priority
and can be enabled and disabled on a per priority basis.

Global pause frames and PFC are incompatible features and surrounding
logic has been added to warn the user about misconfiguration.

Update relevant mlx5core APIs for PFC configuration.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:40:39 +00:00
Hans Petter Selasky
788333d9a6 Use the autogenerated interface file for all commands in mlx5core.
This patch accumulates the following Linux commits:
- 90b3e38d048f09b22fb50bcd460cea65fd00b2d7
  mlx5_core: Modify CQ moderation parameters
- 09a7d9eca1a6cf5eb4f9abfdf8914db9dbd96f08
  mlx5_core: QP/XRCD commands via mlx5 ifc
- 1a412fb1caa2c1b77719ccb5ed8b0c3c2bc65da7
  mlx5_core: Modify QP commands via mlx5 ifc
- ec22eb53106be1472ba6573dc900943f52f8fd1e
  mlx5_core: MKey/PSV commands via mlx5 ifc
- 73b626c182dff06867ceba996a819e8372c9b2ce
  mlx5_core: EQ commands via mlx5 ifc
- 20ed51c643b6296789a48adc3bc2cc875a1612cf
  mlx5_core: Access register and MAD IFC commands via mlx5 ifc
- a533ed5e179cd15512d40282617909d3482a771c
  mlx5_core: Pages management commands via mlx5 ifc
- b8a4ddb2e8f44f872fb93bbda2d541b27079fd2b
  mlx5_core: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
- af1ba291c5e498973cc325c501dd8da80b234571
  mlx5_core: Refactor internal SRQ API
- b06e7de8a9d8d1d540ec122bbdf2face2a211634
  mlx5_core: Refactor device capability function
- c4f287c4a6ac489c18afc4acc4353141a8c53070
  mlx5_core: Unify and improve command interface

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 10:43:42 +00:00
Hans Petter Selasky
2e9c3a4f99 Implement priority to traffic class mapping in mlx5core.
Add support for mapping priority to traffic class via sysctl

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:23:07 +00:00
Hans Petter Selasky
cfc9c386eb Implement rate limit per traffic class in mlx5core.
Add support for rate limiting traffic class via sysctl.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:17:36 +00:00
Hans Petter Selasky
d91421515a Implement missing query for current port rate in mlx5ib(4).
- Factor out port speed definitions into new port.h header file,
  similarly as done in Linux upstream.
- Correct two existing port speed definitions in mlx5en according to
  Linux upstream.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:03:11 +00:00
Hans Petter Selasky
ecb4fcc48e Add log message for unsupported QSFPs in mlx5core.
Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:51:50 +00:00
Hans Petter Selasky
1cbc85fd04 Move the mlx5 core device pointer first in the mlx5en priv. This help simplify
checks to recognize own network devices when using mlx5ib. This patch fixes
an issues where mlx5ib fails to recognize mceX network devices for use with
RoCE.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-30 12:38:06 +00:00
Konstantin Belousov
e44f4f3547 mlx5en: Avoid SFENCe on x86
The IA32 memory model guarantees that all writes are seen in the program
order.  Also, any access to the uncacheable memory flushes the store
buffers.  As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.

Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there.  Other architectures get the right barrier instruction as
well.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-12-19 14:11:41 +00:00
Konstantin Belousov
ef23f141bc Implement hardware mlx5(4) rx timestamps.
Driver support is only provided for ConnectX4/5.

System-time timestamp is calculated based on the free-running counter
timestamp provided by hardware.  Driver periodically samples the
counter to calibrate it against the system clock and uses linear
interpolation to convert.  Stability of the crystal which drives the
clock is +-50 ppm at the operational temperature, which makes the
algorithm good enough.

The calculation is somewhat delicate because all values are 64bit and
overflow the naive formula for linear interpolation.  The calculation
drops the least significant bits in advance, see the PREC shift in
mlx5_mbuf_tstmp().

Hardware stamps can be turned off by 'ifconfig mceN -hwrxtsmp'.  Buggy
firmware might result in small but visible errors in the reported
timestamps, detectable e.g. by nonsensical (negative) RTT values for
LAN pings.

Reviewed by:	gallatin, hselasky
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D12638
2017-11-29 10:04:11 +00:00
Hans Petter Selasky
53d7bb46d5 Expose the current hardware MTU in mlx5en(4) as a separate entry
in the sysctl tree.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 14:19:22 +00:00
Hans Petter Selasky
61fd7ac087 Add support for configuring local multicast and unicast data traffic loopback
in mlx5en(4) driver via the sysctl interface.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 14:14:54 +00:00
Hans Petter Selasky
bb3616ab20 Add support for disabling and enabling RX and TX DMA rings in mlx5en(4).
This is useful for supporting setups similar to Netmap.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 14:10:41 +00:00
Hans Petter Selasky
5a93b4cd52 Refactor the flowsteering APIs used by mlx5en(4). This change is needed by
the coming ibcore and mlx5ib updates in order to support traffic redirection
to so-called raw ethernet QPs.

Remove unused E-switch related routines and files while at it.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-10 09:49:08 +00:00
Hans Petter Selasky
e5d6b589ce Make sure the doorbell lock is valid for the i386 version
of the mlx5en(4) driver.

Tested by:		gallatin @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-10-02 12:20:55 +00:00
Hans Petter Selasky
8508e4d730 Make sure the received IP header gets 32-bit aligned for short packets
in the mlx5en(4) driver.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:49:36 +00:00
Hans Petter Selasky
869dd4b498 Count drop events due to lack of PCI bandwidth as queue drops and not as
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.

Submitted by:		gallatin@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:36:57 +00:00
Hans Petter Selasky
8b48354659 Improve sysadmin visibility of physical port error counters in the
mlx5en driver.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-04-28 19:38:57 +00:00
Hans Petter Selasky
66d53750b9 Add support for reading advanced diagnostic counters.
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.

To enable reading of diagnostic counters set one or more of the
following sysctls to one:

dev.mce.X.conf.diag_general_enable=1
dev.mce.X.conf.diag_pci_enable=1

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-27 10:03:50 +00:00
Hans Petter Selasky
5e6a76be8a Enforce reading the consumer and producer counters once to ensure
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-27 08:32:50 +00:00
Hans Petter Selasky
e16c241deb Remove superfluous return statement.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 15:47:29 +00:00
Hans Petter Selasky
b98ba64027 Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.

- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.

- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
Suggested by:		gallatin @
2017-01-20 15:45:21 +00:00
Hans Petter Selasky
3dfa7645c5 Make draining a sendqueue more robust.
Add own state variable to track if a sendqueue is stopped or not.
This will prevent traffic from entering the sendqueue while it is
being destroyed.

Update drain function to wait for traffic to be transmitted before
returning when the link state is active.

Add extra checks in transmit path for stopped SQ's.

While at it:
- Use likely() for a mbuf pointer check.
- Remove redundant IFF_DRV_RUNNING check.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 12:02:40 +00:00
Hans Petter Selasky
d2bf00a918 Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 11:11:49 +00:00