Commit Graph

349 Commits

Author SHA1 Message Date
Hans Petter Selasky
2647c1e4c5 Make sure the running variable is properly set for ratelimited SQs in mlx5en(4).
Else the SQs won't be properly released when closing rate-limited connections
leading to wrong state transitions on the SQ.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:37:31 +00:00
Hans Petter Selasky
ba11bcecd8 Implement get and set nic state as global functions in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:37:03 +00:00
Hans Petter Selasky
c2a1e80706 Ticks are integer type in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:36:32 +00:00
Hans Petter Selasky
a005c157e4 Configure firmware to use RX hash format in mini CQE in mlx5en(4).
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.

While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:55 +00:00
Hans Petter Selasky
d52ffcb71c Disable CQE zipping by default in mlx5en(4).
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:35 +00:00
Hans Petter Selasky
f0dcb8dff5 Split mlx5e_update_stats_work() in mlx5en(4).
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:14 +00:00
Hans Petter Selasky
91f13f8368 Implement fast close of RX channel in mlx5en(4).
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:42 +00:00
Hans Petter Selasky
243853215d Correct number of elements for priority to traffic class mappings in mlx5en(4).
The number of priorities is always 8, while the number of traffic classes
supported can vary. While at it convert the sysctl node into an array.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:14 +00:00
Hans Petter Selasky
ffadb62f20 Remove unused module parameter in mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:29 +00:00
Hans Petter Selasky
6428c27faf Make sure to error out when arming the CQ fails in mlx4ib and mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:09 +00:00
Hans Petter Selasky
069963d772 Destroy port stats debug context in correct order in mlx5en(4).
Destroy children nodes before parent nodes.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:22 +00:00
Hans Petter Selasky
c66537d7b2 Fix tx_jumbo_packets counter in mlx5en(4).
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.

TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:03 +00:00
Hans Petter Selasky
bcfad02593 Update Ethernet extended counters in mlx5en(4).
Expose all Ethernet extended counters those counters via debug_stats
sysctl:
dev.mce.X.debug_stats

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:31:32 +00:00
Hans Petter Selasky
5169fb81ca Protect from infinite sw-reset loop in mlx5core.
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:47 +00:00
Hans Petter Selasky
192fc18d49 Disable all MSIX interrupts before shutdown in mlx5.
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:18 +00:00
Hans Petter Selasky
a3a31fde6d Import Linux code to implement mlx5_ib_disassociate_ucontext() in mlx5ib.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:29:45 +00:00
Hans Petter Selasky
983026ea83 Add temperature warning event to log in mlx5core.
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.

Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:28:18 +00:00
Hans Petter Selasky
7646dc2347 Correctly define the interface state bits in mlx5en(4).
While at it remove unused interface state bits. This also fixes and issue
during shutdown:

There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.

The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.

Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:27:29 +00:00
Hans Petter Selasky
e5eae1dc7d Enable FPGA and FPGA QP errors for EQ and call the handler in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:26:33 +00:00
Hans Petter Selasky
c322dbafd5 Add MLX5_FPGA_RELOAD IOCTL(2) to mlx5fpga.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:25:14 +00:00
Hans Petter Selasky
423530be04 Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:23:33 +00:00
Andrew Gallatin
50575ce11c Track TCP connection's NUMA domain in the inpcb
Drivers can now pass up numa domain information via the
mbuf numa domain field.  This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by:	markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20028
2019-04-25 15:37:28 +00:00
Andrew Gallatin
7687707dd4 Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets.  When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node.  Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by:	glebius, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19930
2019-04-22 19:24:21 +00:00
Andrew Gallatin
538ff57b75 mlx5en: Enable new pfil(9) KPI ethernet filtering hooks
This allows efficient filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before*
the driver has committed to (re)allocating an mbuf for the
packet. Dropped packets are treated essentially the same as an
error. Nothing is allocated, and the existing buffer is recycled. This
allows us to drop malicious packets at close to line rate with very
little CPU use.

Reviewed by:	hselasky, slavash, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19063
2019-04-15 17:14:50 +00:00
Konstantin Belousov
a748d99a17 Fix build with option RSS, removing unused variables.
Reported by:	np
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-06 21:52:40 +00:00
Konstantin Belousov
e206dc6479 Appease gcc build, remove duplicated declaration.
Reported by:	np
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-06 19:20:00 +00:00
Sean Bruno
5fc3b4acab Change u32 to uint32_t to allow the native-xtools target to build
libsysdecode.

Submitted by:	kib
2018-12-06 18:59:33 +00:00
Slava Shwartsman
0f3b263d83 mlx4/mlx5: Updated driver version to 3.5.0
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:34 +00:00
Slava Shwartsman
cc971b2261 mlx5en: Implement backpressure indication.
The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.

A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.

The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:03 +00:00
Slava Shwartsman
71defeda26 mlx5en: Improve configuration of HW LRO.
In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:33 +00:00
Slava Shwartsman
01f02abfec mlx5en: Count all transmitted and received bytes.
Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:02 +00:00
Slava Shwartsman
3230c29d72 mlx5en: Statically allocate and free the channel structure(s).
By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.

This change is also a requirement for implementing
backpressure support.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:31 +00:00
Slava Shwartsman
b3cf149325 mlx5en: Fix race in mlx5e_ethtool_debug_stats().
Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.

Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:01 +00:00
Slava Shwartsman
42390bb884 mlx5en: Add support for IFM_10G_LR and IFM_40G_ER4 media types.
Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:22:30 +00:00
Slava Shwartsman
f292a94d6b mlx5en: Don't set rate on SQs when the SQ is already stopped.
This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:59 +00:00
Slava Shwartsman
3e581cabf0 mlx5en: Fix for inlining issues in transmit path
1) Don't exceed the drivers own hardcoded TX inline limit.

The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.

2) Make sure the mlx5_query_min_inline() function returns an error code.

3) Header inlining is required when using TSO.

4) Catch failure to compute inline header size for TSO.

5) Add support for UDP when computing inline header size.

6) Fix for inlining issues with regards to DSCP.

Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:28 +00:00
Slava Shwartsman
d51ced5fae mlx5en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:57 +00:00
Slava Shwartsman
e870c0ab61 mlx5en: Implement support for bandwidth limiting in by ratio, ETS.
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.

Implement the needed firmware commands and SYSCTLs through mlx5en(4).

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:26 +00:00
Slava Shwartsman
d82f1c13ad mlx5fpga: Add set and query connect/disconnect FPGA
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:55 +00:00
Slava Shwartsman
085b35bb69 mlx5fpga: IOCTL for FPGA temperature measurement
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:23 +00:00
Slava Shwartsman
b5f5751275 mlx5fpga: Support MorseQ board
Added and supported new enum "morseQ = 4" for fpga_id field

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:18:52 +00:00
Slava Shwartsman
d50c55f17d mlx5fpga_tools initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:17:22 +00:00
Slava Shwartsman
e9dcd83155 mlx5fpga: Initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:11:20 +00:00
Slava Shwartsman
67db687393 mlx5ib: Set default active width and speed when querying port.
Make sure the active width and speed is set in case the
translate_eth_proto_oper() function doesn't recognize the
current port operation mask.

Linux commit:
7672ed33c4c15dbe9d56880683baaba4227cf940

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:49:11 +00:00
Slava Shwartsman
d3300d4aed mlx5ib: Make sure the congestion work timer does not escape the drain procedure.
If the mlx5_ib_read_cong_stats() function was running when mlx5ib was unloaded,
because this function unconditionally restarts the timer, the timer can still
be pending after the delayed work has been cancelled. To fix this simply loop
on the delayed work cancel procedure as long as it returns non-zero.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:39 +00:00
Slava Shwartsman
ac2fdeb4e7 mlx5ib: Fix null pointer dereference in mlx5_ib_create_srq
Although "create_srq_user" does overwrite "in.pas" on some paths, it
also contains at least one feasible path which does not overwrite it.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:10 +00:00
Slava Shwartsman
c9e9b5c104 mlx5ib: Fix sign extension in mlx5_ib_query_device
"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:41 +00:00
Slava Shwartsman
31c3f64819 mlx5: Fix driver version location
Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:10 +00:00
Slava Shwartsman
721a1a6a69 mlx5: Fixes to allow command polling mode to exist alongside event mode.
A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:39 +00:00
Slava Shwartsman
b084af6cdc mlx5: Fix wrong size allocation for QoS ETC TC register
The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Linux commit:
d14fcb8d877caf1b8d6bd65d444bf62b21f2070c

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:09 +00:00
Slava Shwartsman
8718eb63f8 mlx5: Add software tx_jumbo_packets counter
This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:37 +00:00
Slava Shwartsman
feb5f357ea mlx5: Implement support for configuring PCIe packet write ordering via a sysctl.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:08 +00:00
Slava Shwartsman
70b417cf90 mlx5: Extend vector argument to u64.
Else the MLX5_TRIGGERED_CMD_COMP flag will be masked away.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:38 +00:00
Slava Shwartsman
29e544513e mlx5: Add global control to disable firmware reset, for all mlx5 devices.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:08 +00:00
Slava Shwartsman
2119f825d1 mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:37 +00:00
Slava Shwartsman
8f7f07368d mlx5: Move hw.mlx5 node definition to mlx5_core.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:07 +00:00
Slava Shwartsman
63cc6d1bc2 mlx5: Convert some spaces into tabs and use device_printf() instead of printf().
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:36 +00:00
Slava Shwartsman
abb28d287b mlx5: Add SRQ fixes from Linux
Combine multiple fixes from Linux to SRQ.
Linux commits:
c73b791 IB/mlx5: Assign SRQ type earlier
0fd27a8 IB/mlx5: Fix out-of-bound access
c2b37f7 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
d63c467 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:06 +00:00
Slava Shwartsman
3b21d18587 mlx5: Fix for potential memory leaks.
Make sure allocated data gets freed in error cases.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:37 +00:00
Slava Shwartsman
07b624ed71 mlx5: Discard unused return values.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:06 +00:00
Slava Shwartsman
843a89d37e mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Linux commit:
aba462134634b502d720e15b23154f21cfa277e5

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:36 +00:00
Slava Shwartsman
2bf40c3608 mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow.

Linux commit:
28e9091e3119933c38933cb8fc48d5618eb784c8

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:05 +00:00
Slava Shwartsman
0c79f82cf0 mlx5: Notify user that the ConnectX-6 shutdown its port due to power limitation
If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.

MFC after:      3 days
Approved by:    hselasky (mentor), kib (mentor)
Sponsored by:   Mellanox Technologies
2018-10-22 10:38:38 +00:00
Hans Petter Selasky
6ed134c41b Make the MSIX module parameter limit per device, in mlx5en(4).
MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:41:09 +00:00
Hans Petter Selasky
16ae32f927 Add support for receive side scaling stride, RSSS, in mlx5en(4).
The receive side scaling stride parameter is a value which define the interval
between active receive side queues. The traffic for the inactive queues is
redirected to the nearest active queue by use of modulus. The default value
of this parameter is one, which means all receive side queues are used.

The point of this feature is to redirect more traffic to fewer receive side
queues in order to take more advantage of sorted large receive offload,
sorted LRO. The sorted LRO works better when more packets are accumulated
per service interval.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:28:06 +00:00
Hans Petter Selasky
0be5034007 Don't stall transmit queue on drops in mlx5en(4).
When a transmitted packet is dropped don't stall the transmit queue.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:19:36 +00:00
Hans Petter Selasky
2d32b0a304 Maximum number of mbuf frags is off-by-one for worst case scenario in mlx5en(4).
Inspecting the PRM no more than 0x3F data segments, DS, of size 16 bytes is
allowed.

Worst case scenario summary of DS usage:
Header is fixed:	2 DS
Maximum inlining:	98 => (98 - 2) / 16 = 6 DS
Remainder:		0x3F - 2 - 6 = 55 DS (mbuf frags)

Previously a value of 56 DS was used and this would work in the
normal case because not all inline data area was used up.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 11:06:07 +00:00
Hans Petter Selasky
7b9b93a8dd Update version information for the mlx5 and mlx5en(4) modules.
While at it bump some copyright dates.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:12:53 +00:00
Hans Petter Selasky
0539900214 Do not inline transmit headers and use HW VLAN tagging if supported by mlx5en(4).
Query the minimal inline mode supported by the card.
When creating a send queue, cache the queried mode and optimize the transmit
if no inlining is required.  In this case, we can avoid touching the headers
cache line and avoid dirtying several more lines by copying headers into
the send WQEs.  Also, if no inline headers are used, hardware assists in
the VLAN tag framing.

Submitted by:		kib@, slavash@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:03:30 +00:00
Hans Petter Selasky
90c8e44125 Use a mbuf header instead of a mbuf cluster for debugging interrupts in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:53:37 +00:00
Hans Petter Selasky
a6b2d28d05 Add module parameter to limit number of MSIX EQ vectors in mlx5en(4).
For setups having a large amount of PCI devices, it makes sense to limit the
number of MSIX vectors per PCI device, in order to avoid running out of IRQ
vectors.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:47:56 +00:00
Hans Petter Selasky
aa9f073c9b Add missing newline.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:43:43 +00:00
Hans Petter Selasky
2f17f76aa4 Handle jumbo frames without requiring big clusters in mlx5en(4).
The scatter list is formed by the chunks of MCLBYTES each, and larger
than default packets are returned to the stack as the mbuf chain.

Submitted by:		kib@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:42:05 +00:00
Hans Petter Selasky
f8c3349737 Enable both receive and transmit pauseframes by default in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:21:02 +00:00
Hans Petter Selasky
f2b4782c81 Add context numbers for HW elements in mlx5en(4).
To access the data, set sysctl dev.mce.N.conf.debug_stats to 1.
This enables the sysctl node dev.mce.N.hw_ctx_debug.  Its content is
the mapping of each channel' number to used receive queue and associated
completion queue, set of the transmit queues numbers and corresponding
completion queues.

Trimmed example output:
channel 30 rq 188 cq 1085
channel 30 tc 0 sq 187 cq 1084
channel 31 rq 191 cq 1087
channel 31 tc 0 sq 190 cq 1086

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:18:01 +00:00
Hans Petter Selasky
a880c1ff6a Do not hint about 'trust both' mode when the mlx5en(4) hardware does not support it.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:11:30 +00:00
Hans Petter Selasky
f0474ab919 Correctly write atomic variable in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:08:40 +00:00
Hans Petter Selasky
52ff436841 Remove redundant call to mlx5_vsc_find_cap() in mlx5core.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:27:46 +00:00
Hans Petter Selasky
6d54b22db7 Make sure the state variable is set atomically instead of using a mutex in mlx5core.
Device detach and setting error state may deadlock over the interface mutex
like this:

a) Detach code in mlx5en waits until error state is set while the interface
mutex is locked.
b) The set error handler needs to lock the interface mutex before it can
set the error state.

The solution is to use atomics to set the error state.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:20:01 +00:00
Hans Petter Selasky
b575d8c850 Refactor access to CR-space into using VSC APIs in mlx5core.
Remove no longer used files and APIs.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:16:32 +00:00
Hans Petter Selasky
9fc929d2e2 Remove redundant newline character in mlx5core.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:11:00 +00:00
Hans Petter Selasky
18450a3b10 Update version information for the mlx5ib module.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:07:40 +00:00
Hans Petter Selasky
62bfa774ae Don't pass unsupported events to ibcore from mlx5ib.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:59:55 +00:00
Hans Petter Selasky
14a1b9bd3a Use static device naming instead of dynamic one in mlx5ib.
When resetting mlx5core instances it can happen that the order of attach and
detach for mlx5ib instances is changed. Take the unit number for mlx5_%d from
the parent PCI device, similarly to what is done in mlx5en(4), so that there
is a direct relationship between mce<N> and mlx5_<N>.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:58:11 +00:00
Hans Petter Selasky
ed0cee0bf4 Implement support for Differentiated Service Code Point, DSCP, in mlx5en(4).
The DSCP feature is controlled using a set of sysctl(8) fields under
the qos sysctl directory entry for mlx5en(4).

For Routable RoCE QPs, the DSCP should be set in the QP's address path.
The DSCP's value is derived from the traffic class.

Linux commit:
ed88451e1f2d400fd6a743d0a481631cf9f97550

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:56:40 +00:00
Hans Petter Selasky
f4546fa376 Add support for prio-tagged traffic for RDMA in ibcore.
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:11:53 +00:00
Hans Petter Selasky
38535d6cab Add support for hardware rate limiting to mlx5en(4).
The hardware rate limiting feature is enabled by the RATELIMIT kernel
option. Please refer to ifconfig(8) and the txrtlmt option and the
SO_MAX_PACING_RATE set socket option for more information. This
feature is compatible with hardware transmit send offload, TSO.

A set of sysctl(8) knobs under dev.mce.<N>.rate_limit are provided to
setup the ratelimit table and also to fine tune various rate limit
related parameters.

Sponsored by:	Mellanox Technologies
2018-05-29 14:04:57 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Konstantin Belousov
952e75c763 mlx5en: Always allow VLAN id 0.
According to the 802.1Q-2014 9.6 VLAN Tag Control Information, VID value 0
means that there is no VLAN tag assigned to the packet, and only PCP and
DEI values from the tag are meaningful.  Current flow table programming
filter out such packets.

When programming VLAN filter for flow table, unconditionally add rule which
accept packets with VLAN id 0.  The packets are already handled correctly
by the network stack.

Reviewed by:	hselasky, slavash
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-05-02 20:22:03 +00:00
Hans Petter Selasky
28cfdee769 Bump driver version number in mlx5en(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-04-04 10:45:06 +00:00
Hans Petter Selasky
d77004ab47 Remove unused structure field in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:58:58 +00:00
Hans Petter Selasky
76ee71dcd3 Bump mlx5core driver version.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:55:31 +00:00
Hans Petter Selasky
4d5fdbe9b8 Fix for use after free in mlx5core.
Make sure the command completion handler is not called when the device is
in internal error state. This can easily trigger use after free situations.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:50:45 +00:00
Hans Petter Selasky
ca2345a05d Make sure Giant is locked when allocating bus resources in mlx5core.
During health care IRQ resources will be reallocated.
Newbus requires that Giant is locked before accessing
these resources.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:49:35 +00:00
Hans Petter Selasky
92d23c82cd Collect firmware dump when mlx5core is in device error state.
Firmware dump collecting should be triggered in case firmware syndrome
with request for reset bit is set.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:48:25 +00:00
Hans Petter Selasky
d28b6b55ba Reorganize health recovery in mlx5core.
- Move the semaphore locking and unlocking to the same function.
- Flags are no longer needed if the reset and crdump will be done in the
  same function.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:45:48 +00:00
Hans Petter Selasky
3c1274bd64 Prepare for FW dump in error state in mlx5core.
- Move firmware dump prep and cleanup to init_one() and remove_one() so that
the init and cleanup will happen only upon driver reload.
- Add some prints to indicate firmware dump.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:43:15 +00:00
Hans Petter Selasky
0a752b05a8 Properly check if crspace is supported in mlx5core.
The old code checked for MLX5_CR_SPACE_DOMAIN which is irrelevant here.
However, if dev->vsec_addr would be 0, an access to wrong offset would
happen.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:39:27 +00:00
Hans Petter Selasky
4950c6ec72 Add missing newline character in print in mlx5core.
MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:35:31 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Hans Petter Selasky
9a3d0cf097 Remove redundant prototype to fix compilation with GCC.
Reported by:	jeff@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-25 08:55:53 +00:00
Hans Petter Selasky
c0cea51b46 Don't wait for completions when a mlx5en(4) device is in internal
error state.

If the device is in internal error state the hardware will not
generate completions. Just move on to destroy the resources.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:38:12 +00:00
Hans Petter Selasky
9cd6fc88be Fix incorrect page count when mlx5core is in internal error.
Change page cleanup flow when in internal error to properly decrement
the page counts when reclaiming pages. That prevents timing out
waiting for extra pages that were actually cleaned up previously.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:35:59 +00:00
Hans Petter Selasky
94790180f3 Don't save PCI state when PCI error is detected in mlx5core.
When a PCI error is detected the PCI state could be corrupt, don't
save it in that flow. Save the state after initialization. After
restoring the PCI state during slot reset save it again, restoring
the state destroys the previously saved state info.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:34:35 +00:00
Hans Petter Selasky
f20b553d75 Add mutual exclusion mechanism for software reset of firmware in mlx5core.
Since the FW can be shared between PCI functions it is common that
more than one health poll will detected a failure, this can lead to
multiple resets.

The solution is to use a FW locking mechanism using semaphore space to
provide a way to synchronize between functions. The FW semaphore is
acquired via config cycle access. First the VSEC gateway must be
acquired, then the semaphore can be locked by writing a value to it
and confirmed it's locked by reading the same value back. The process
in the same to free the semaphore, except the value written should be
zero.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:32:03 +00:00
Hans Petter Selasky
fe242ba7c1 Issue a software reset on firmware assert in mlx5core.
If a FW assert is considered fatal, indicated by a new bit in the
health buffer, reset the FW. After the reset, follow the normal
recovery flow.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:24:09 +00:00
Hans Petter Selasky
1900b6f887 Handle software reset of firmware in error flow in mlx5core.
Some mlx5 adapter firmware allows the driver to reset the firmware in
the event of an error. When a software reset is issued on any physical
function all PFs enter reset state. This is a recoverable condition.
The existing recovery flow was designed to allow the recovery of a
VF after a PF driver reload. This patch expands the scope of that
flow to recover PFs or VFs after a SW reset has been issued.
When a software reset is issued the following occurs:

1. The NIC interface mode is set to SW_RESET (7) while the reset is in
   progress.
2. Once the reset completes the NIC interface mode is set to NIC
   disabled (1).

After the reset has been issued (added in a subsequent patch) the
health poll for other functions will detect that the NIC interface
state has been set to disabled. This will cause it to enter the
existing recovery flow.  If the PCI is still working (meaning it
doesn't return 0xff on all reads) it means recovery can proceed
immediately instead of waiting 60 seconds.

The error detetion has also been refactored to avoid incorrect or
misleading log messages.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:20:42 +00:00
Hans Petter Selasky
1fb6089c3b Hide verbose proclamation of error when forced in mlx5core.
When mlx5_enter_error_state() operation is forced by shutdown, the
messages surrounding setting the error state are not informational
and confuse users.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:11:06 +00:00
Hans Petter Selasky
519774ea5a Cancel delayed recovery work when unloading the mlx5core driver.
linux commit 2a0165a034ac024b60cca49c61e46f4afa2e4d98

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:09:09 +00:00
Hans Petter Selasky
c09025693b Add support for fast unload in shutdown flow in mlx5core.
This patch accumulates the following Linux commits:

- 8812c24d28f4972c4f2b9998bf30b1f2a1b62adf
  net/mlx5: Add fast unload support in shutdown flow
- 59211bd3b6329c3e5f4a90ac3d7f87ffa7867073
  net/mlx5: Split the load/unload flow into hardware and software flows
- 4525abeaae54560254a1bb8970b3d4c225d32ef4
  net/mlx5: Expose command polling interface

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:02:20 +00:00
Hans Petter Selasky
4bb7662b09 Improve support for health recovery in mlx5core.
This patch accumulates the following Linux commits:

- 04c0c1ab38e95105d950db5b84e727637e149ce7
  net/mlx5: PCI error recovery health care simulation
- 0179720d6be2096b8d0a4d143254ff9e77747daa
  net/mlx5: Introduce trigger_health_work function
- 3fece5d676939f42f434c63dfe1bd42d7d94e6f0
  net/mlx5: Continue health polling until it is explicitly stopped

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 17:33:14 +00:00
Hans Petter Selasky
177781564f Create designated workqueue for each mlx5en(4) device instance.
The mlx5e_destroy_ifp() function may be called from the system workqueue and
in this case trying to flush all works will cause a dead lock.
Instead of using the system workqueue, create a designated workqueue
for each mlx5en(4) device instance.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 16:59:51 +00:00
Conrad Meyer
918622dbc0 mlx5(4): Remove redundant declaration of mlx5_enter_error_state
Broken in r330644.

Sponsored by:	Dell EMC Isilon
2018-03-10 00:59:48 +00:00
Konstantin Belousov
2cec152827 Make mlx5 compilable on ILP32 arches.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-03-08 22:03:43 +00:00
Hans Petter Selasky
50e3ba10e8 Set correct SL in completion for RoCE in mlx5ib(4).
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.

While that, this patch also fills the vlan_id field in the completion if
present.

linux commit 12f8fedef2ec94c783f929126b20440a01512c14

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 16:27:31 +00:00
Hans Petter Selasky
0285276c92 Add call to setup firmware data dump structure during device load in
mlx5core.

Do not consider the inability to create a firmware dump fatal, but
inform about the situation and allow the driver to attach. The device
might not implement the needed VSC, or we might not know the layout of
the registers map. In either case, only firmware dump functionality is
limited, the network operations should be fine.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 16:19:01 +00:00
Hans Petter Selasky
9cb83c4689 Avoid more LFENCE/SFENCe on x86 in mlx5en(4),
by using the FreeBSD native fences.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:58:30 +00:00
Hans Petter Selasky
7d69d339ad Fix mlx5en(4) driver to properly call m_defrag().
When the mlx5en(4) driver was converted to using BUSDMA(9) the call to
m_defrag() was moved after the part of the TX routine that strips the
header from the mbuf chain. Before it called m_defrag it first trimmed
off the now-empty mbufs from the start of the chain. This has the side
effect of also removing the head of the chain that has M_PKTHDR set.
m_defrag() will not defrag a chain that does not have M_PKTHDR set,
thus it was effectively never defragging the mbuf chains.

As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.

Differential Revision:	https://reviews.freebsd.org/D12050
Submitted by:	mjoras@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:53:04 +00:00
Hans Petter Selasky
1eb09ad434 Use vport rather than physical-port MTU in mlx5en(4).
Set and report vport MTU rather than physical MTU,
The driver will set both vport and physical port mtu
and will rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Based on Linux upstream commit:
cd255efff9baadd654d6160e52d17ae7c568c9d3

Submitted by:	Meny Yossefi <menyy@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:47:17 +00:00
Hans Petter Selasky
7c22ae80d2 Use the device unit number for naming the ifnet interface in mlx5en(4).
Currently the ifnet interface is named mceX, where X is a monotonically
incremented value. If the device is reset due to a fatal error, then the
interface name will change.  Using the device unit number will keep the
naming consistent across the reset logic.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:43:41 +00:00
Hans Petter Selasky
1a872967ad Remove duplicate prototypes.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:37:09 +00:00
Hans Petter Selasky
e808190a59 Add kernel and userspace code to dump the firmware state of supported
ConnectX-4/5 devices in mlx5core.

The dump is obtained by reading a predefined register map from the
non-destructive crspace, accessible by the vendor-specific PCIe
capability (VSC). The dump is stored in preallocated kernel memory and
managed by the mlx5tool(8), which communicates with the driver using a
character device node.

The utility allows to store the dump in format
    <address> <value>
into a file, to reset the dump content, and to manually initiate the
dump.

A call to mlx5_fwdump() should be added at the places where a dump
must be fetched automatically. The most likely place is right before a
firmware reset request.

Submitted by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 15:21:56 +00:00
Hans Petter Selasky
4b95c6659a Add vendor specific capability interface support in mlx5core.
Add the ability to access the vendor specific space gateway in order
to support reading and writing data into the different configuration
domains.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:59:47 +00:00
Hans Petter Selasky
fba294620c Use device_printf() instead of printf() when printing warnings and errors
to dmesg(8) in mlx5core.

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:58:27 +00:00
Hans Petter Selasky
10b0804509 Add support for per priority flow control, PFC, to mlx5en(4).
Add support for PFC and implement reading the per priority statistics
using the sysctl(8) interface. PFC is used together with VLAN priority
and can be enabled and disabled on a per priority basis.

Global pause frames and PFC are incompatible features and surrounding
logic has been added to warn the user about misconfiguration.

Update relevant mlx5core APIs for PFC configuration.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:40:39 +00:00
Hans Petter Selasky
118063fb70 Add support for explicit congestion notification, ECN, to mlx5ib(4).
ECN configuration and statistics is available through a set of sysctl(8)
nodes under sys.class.infiniband.mlx5_X.cong . The ECN configuration
nodes can also be used as loader tunables.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 11:23:14 +00:00
Hans Petter Selasky
788333d9a6 Use the autogenerated interface file for all commands in mlx5core.
This patch accumulates the following Linux commits:
- 90b3e38d048f09b22fb50bcd460cea65fd00b2d7
  mlx5_core: Modify CQ moderation parameters
- 09a7d9eca1a6cf5eb4f9abfdf8914db9dbd96f08
  mlx5_core: QP/XRCD commands via mlx5 ifc
- 1a412fb1caa2c1b77719ccb5ed8b0c3c2bc65da7
  mlx5_core: Modify QP commands via mlx5 ifc
- ec22eb53106be1472ba6573dc900943f52f8fd1e
  mlx5_core: MKey/PSV commands via mlx5 ifc
- 73b626c182dff06867ceba996a819e8372c9b2ce
  mlx5_core: EQ commands via mlx5 ifc
- 20ed51c643b6296789a48adc3bc2cc875a1612cf
  mlx5_core: Access register and MAD IFC commands via mlx5 ifc
- a533ed5e179cd15512d40282617909d3482a771c
  mlx5_core: Pages management commands via mlx5 ifc
- b8a4ddb2e8f44f872fb93bbda2d541b27079fd2b
  mlx5_core: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON
- af1ba291c5e498973cc325c501dd8da80b234571
  mlx5_core: Refactor internal SRQ API
- b06e7de8a9d8d1d540ec122bbdf2face2a211634
  mlx5_core: Refactor device capability function
- c4f287c4a6ac489c18afc4acc4353141a8c53070
  mlx5_core: Unify and improve command interface

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 10:43:42 +00:00
Hans Petter Selasky
ca55159467 Fix race between PCI error handlers and health work in mlx5core.
linux commit 05ac2c0b7438ea08c5d54b48797acf9b22cb2f6f

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:58:41 +00:00
Hans Petter Selasky
7053deeb3d Avoid calling sleeping function from the health poll thread in mlx5core.
linux commit c1d4d2e92ad670168a17a57dfa182a5a5baa72d4

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:51:33 +00:00
Hans Petter Selasky
a2485fe5a6 Updates for PCI and health monitor recovery in mlx5core.
This patch accumulates the following Linux commits:

mlx5_health.c
- 78ccb25861d76a8fc5c678d762180e6918834200
  mlx5_core: Fix wrong name in struct
- 171bb2c560f45c0427ca3776a4c8f4e26e559400
  mlx5_core: Update health syndromes
- 0144a95e2ad53a40c62148f44fb0c1f9d2a0d1e9
  mlx5_core: Use accessor functions to read from device memory
- ac6ea6e81a80172612e0c9ef93720f371b198918
  mlx5_core: Use private health thread for each device
- fd76ee4da55abb21babfc69310d321b9cb9a32e0
  mlx5_core: Fix internal error detection conditions
- 2241007b3d783cbdbaa78c30bdb1994278b6f9b9
  mlx5: Clear health sick bit when starting health poll
- 712bfef60912d91033cb25739f7444d5b8d8c59f
  mlx5: Fix version printout in case of health issue
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
  mlx5_core: Add pci error handlers to mlx5_core driver

mlx5_cmd.c
- be87544de8df2b1eb34bcb5e32691287d96f9ec4
  mlx5_core: Fix async commands return code
- a31208b1e11df334d443ec8cace7636150bb8ce2
  mlx5_core: New init and exit flow for mlx5_core
- 020446e01eebc9dbe7eda038e570ab9c7ab13586
  mlx5_core: Prepare cmd interface to system errors handling
- 89d44f0a6c732db23b219be708e2fe1e03ee4842
  mlx5_core: Add pci error handlers to mlx5_core driver
- 0d834442cc247c7b3f3bd6019512ae03e96dd99a
  mlx5: Fix teardown errors that happen in pci error handler

mlx5_main.c
- 5fc7197d3a256d9c5de3134870304b24892a4908
  mlx5: Add pci shutdown callback

Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-08 09:47:09 +00:00
Hans Petter Selasky
2e9c3a4f99 Implement priority to traffic class mapping in mlx5core.
Add support for mapping priority to traffic class via sysctl

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:23:07 +00:00
Hans Petter Selasky
cfc9c386eb Implement rate limit per traffic class in mlx5core.
Add support for rate limiting traffic class via sysctl.

Submitted by:	Slava Shwartsman <slavash@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:17:36 +00:00
Hans Petter Selasky
d91421515a Implement missing query for current port rate in mlx5ib(4).
- Factor out port speed definitions into new port.h header file,
  similarly as done in Linux upstream.
- Correct two existing port speed definitions in mlx5en according to
  Linux upstream.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 15:03:11 +00:00
Hans Petter Selasky
ecb4fcc48e Add log message for unsupported QSFPs in mlx5core.
Submitted by:	Matthew Finlay <matt@mellanox.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:51:50 +00:00
Hans Petter Selasky
2289559114 Make sure default VNET is set when adding a new interface in mlx5core.
Adding an interface might be done outside the device_attach() routine
and will then cause a panic, due to the VNET not being set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:49:27 +00:00
Hans Petter Selasky
11546d068d Add timeout handle to commands with callback in mlx5core.
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get firmware response. Add delayed callback timeout work
before posting the command to firmware. In case of real firmware
command completion we will cancel the delayed work. In case of
firmware command timeout the callback timeout handler will be called
and it will simulate firmware completion with timeout error.

linux commit 65ee67084589c1783a74b4a4a5db38d7264ec8b5

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:41:29 +00:00
Hans Petter Selasky
2327a753b9 Fix potential deadlock in command mode change in mlx5core.
Call command completion handler in case of timeout when working in
interrupts mode. Avoid flushing the commands workqueue after acquiring
the semaphores to prevent a potential deadlock.

linux commit commit 9cba4ebcf374c3772f6eb61f2d065294b2451b49

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:35:28 +00:00
Hans Petter Selasky
ee1b5c5811 Use a macro in mlx5_command_str() instead of copying OP name.
linux commit 42ca502e179d0654ef441333a9d0f35c948734f3

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:29:30 +00:00
Hans Petter Selasky
2f2d3c0cf3 Disable unsupported disassociate ucontext functionality in mlx5ib(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-07 14:03:31 +00:00
Hans Petter Selasky
1456d97c01 Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-05 14:34:52 +00:00
Hans Petter Selasky
1cbc85fd04 Move the mlx5 core device pointer first in the mlx5en priv. This help simplify
checks to recognize own network devices when using mlx5ib. This patch fixes
an issues where mlx5ib fails to recognize mceX network devices for use with
RoCE.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-30 12:38:06 +00:00
Eitan Adler
02ca39cff2 sys/dev/mlx[45]: fix uses of 1 << 31
Reviewed by:		kib (D13858)
2018-01-12 06:36:44 +00:00
Konstantin Belousov
e44f4f3547 mlx5en: Avoid SFENCe on x86
The IA32 memory model guarantees that all writes are seen in the program
order.  Also, any access to the uncacheable memory flushes the store
buffers.  As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.

Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there.  Other architectures get the right barrier instruction as
well.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-12-19 14:11:41 +00:00
Konstantin Belousov
ef23f141bc Implement hardware mlx5(4) rx timestamps.
Driver support is only provided for ConnectX4/5.

System-time timestamp is calculated based on the free-running counter
timestamp provided by hardware.  Driver periodically samples the
counter to calibrate it against the system clock and uses linear
interpolation to convert.  Stability of the crystal which drives the
clock is +-50 ppm at the operational temperature, which makes the
algorithm good enough.

The calculation is somewhat delicate because all values are 64bit and
overflow the naive formula for linear interpolation.  The calculation
drops the least significant bits in advance, see the PREC shift in
mlx5_mbuf_tstmp().

Hardware stamps can be turned off by 'ifconfig mceN -hwrxtsmp'.  Buggy
firmware might result in small but visible errors in the reported
timestamps, detectable e.g. by nonsensical (negative) RTT values for
LAN pings.

Reviewed by:	gallatin, hselasky
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D12638
2017-11-29 10:04:11 +00:00
Hans Petter Selasky
c3125bc5bf Compile fixes for 32-bit architectures.
Sponsored by:	Mellanox Technologies
2017-11-24 12:08:50 +00:00
Hans Petter Selasky
937d37fc6c Merge ^/head r325842 through r325998. 2017-11-19 12:36:03 +00:00
Hans Petter Selasky
b108f35740 Remove duplicate static function prototype to fix compilation of
mlx5_fs_tree.c after r325638 when using GCC.

Found by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-11-18 20:32:09 +00:00
Hans Petter Selasky
55b1c6e7e4 Merge ^/head r325663 through r325841. 2017-11-15 11:28:11 +00:00
Hans Petter Selasky
dd00abf2d7 Make sure the ib_wr_opcode enum is signed by adding a negative dummy element.
Different compilers may optimise the enum type in different ways. This ensures
coherency when range checking the value of enums in ibcore.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-11-14 14:51:37 +00:00