Commit Graph

419 Commits

Author SHA1 Message Date
Hans Petter Selasky
caf4397197 Remove erradic assert after SVN r367149 in mlx5en(4).
The ratelimit tags may be shared, especially for unlimited TLS
traffic, and then the refcount is allowed to be greater than one
when freeing the send tag.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-24 13:07:59 +00:00
Hans Petter Selasky
7eefcb5eea Make mlx5_cmd_exec_cb() a safe API in mlx5core.
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.

The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.

Linux commit:
e355477ed9e4f401e3931043df97325d38552d54

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:15:03 +00:00
Hans Petter Selasky
f34f0a65b2 Report EQE data upon CQ completion in mlx5core.
Report EQE data upon CQ completion to let upper layers use this data.

Linux commit:
4e0e2ea1886afe8c001971ff767f6670312a9b04

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:10:53 +00:00
Hans Petter Selasky
ffdb195f31 Enhance the mlx5_core_create_cq() function in mlx5core.
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.

Linux commit:
38164b771947be9baf06e78ffdfb650f8f3e908e

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:06:10 +00:00
Hans Petter Selasky
4a64b690f1 Use mlx5core to create/destroy all Dynamically Connected Targets, DCTs.
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

Linux commit:
c5ae1954c47d3fd8815bd5a592aba18702c93f33

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:03:18 +00:00
Hans Petter Selasky
8114aeea44 Fix error handling order in create_kernel_qp in mlx5ib.
Make sure order of cleanup is exactly the opposite of initialization.

Linux commit:
f4044dac63e952ac1137b6df02b233d37696e2f5

MFC after:	1 week
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-11-16 10:00:21 +00:00
Konstantin Belousov
0b8e170d95 mlx5en: Set ifmr_current same as ifmr_active.
This both:
- makes ifconfig media line similar to that of other drivers.
- fixes ENXIO in case when paradoxical current media word is not registered.

Now e.g.
      ifconfig mce0 -mediaopt txpause,rxpause
works by disabling pauses if enabled.

Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:25:10 +00:00
Konstantin Belousov
bab0c4b1a0 mlx5en: stop ignoring pauses and flow in the media reqs.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:23:27 +00:00
Konstantin Belousov
559dbeac47 mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types.
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:22:16 +00:00
Konstantin Belousov
4ead80241a mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add().
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2020-11-12 02:21:14 +00:00
John Baldwin
b7d92a6683 Remove IF_SND_TAG_TYPE_TLS_RATE_LIMIT conditionals.
Support for TLS rate limit tags is now in the tree, so this macro is
always defined.

Reviewed by:	hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27020
2020-10-30 21:05:50 +00:00
John Baldwin
418b5444f8 Fix a couple of silly bugs in r367149.
- Assign the TLS rate limit value to the correct member of the
  rl_params for the nested rate limit tag.

- Remove a dead condition.

Pointy hat to:	jhb
2020-10-30 00:06:36 +00:00
John Baldwin
36e0a362ac Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().
This gives a more uniform API for send tag life cycle management.

Reviewed by:	gallatin, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27000
2020-10-29 23:28:39 +00:00
John Baldwin
638000c0b6 Use public interfaces to manage the nested rate limit send tag.
Each TLS send tag in mlx5 contains a nested rate limit send tag.
Previously, the driver was calling internal functions to manage the
nested tag.  Calling free methods directly instead of m_snd_tag_rele()
leaked send tag references and references on the ifp.  Changes to use
the ifp methods for the nested tag for other methods are more cosmetic
but do simplify the code.

Reviewed by:	gallatin, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26996
2020-10-29 22:22:27 +00:00
John Baldwin
521eac97f3 Support hardware rate limiting (pacing) with TLS offload.
- Add a new send tag type for a send tag that supports both rate
  limiting (packet pacing) and TLS offload (mostly similar to D22669
  but adds a separate structure when allocating the new tag type).

- When allocating a send tag for TLS offload, check to see if the
  connection already has a pacing rate.  If so, allocate a tag that
  supports both rate limiting and TLS offload rather than a plain TLS
  offload tag.

- When setting an initial rate on an existing ifnet KTLS connection,
  set the rate in the TCP control block inp and then reset the TLS
  send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit
  send tag.  This allocates the TLS send tag asynchronously from a
  task queue, so the TLS rate limit tag alloc is always sleepable.

- When modifying a rate on a connection using KTLS, look for a TLS
  send tag.  If the send tag is only a plain TLS send tag, assume we
  failed to allocate a TLS ratelimit tag (either during the
  TCP_TXTLS_ENABLE socket option, or during the send tag reset
  triggered by ktls_output_eagain) and ignore the new rate.  If the
  send tag is a ratelimit TLS send tag, change the rate on the TLS tag
  and leave the inp tag alone.

- Lock the inp lock when setting sb_tls_info for a socket send buffer
  so that the routines in tcp_ratelimit can safely dereference the
  pointer without needing to grab the socket buffer lock.

- Add an IFCAP_TXTLS_RTLMT capability flag and associated
  administrative controls in ifconfig(8).  TLS rate limit tags are
  only allocated if this capability is enabled.  Note that TLS offload
  (whether unlimited or rate limited) always requires IFCAP_TXTLS[46].

Reviewed by:	gallatin, hselasky
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26691
2020-10-29 00:23:16 +00:00
Hans Petter Selasky
194ddc011a Properly cleanup driver during remove_one() in mlx5core.
Cleanup all host resources, SYSCTLs, MSIX vectors and memory used
by the host and only leave the device allocated memory behind, if any,
because it may still be in use, when the PCI remove function is called.
Else future probe calls may fail due to SYSCTLs already existing.

MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-07 17:46:49 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Hans Petter Selasky
39b0f9c389 Poll statistics more frequently in mlx5en(4).
This makes traffic steering algorithms more accurate.

MFC after:	1 week
Submitted by:	gallatin @
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-09-14 14:24:54 +00:00
Hans Petter Selasky
78ae1e6e15 Make hardware TLS send tag allocation synchronous in mlx5en(4).
Previously the send tag was setup in the background, and all packets for
the given send tag were dropped until ready. Change this to be blocking
behaviour so that once the setsocketopt() for enabling TLS completes,
the socket is ready to send packets. Do this by simply flushing the
work request which does the needed firmware programming during send
tag allocation.

MFC after:	1 week
Sponsored by:	Mellanox Technologies // Nvidia
2020-09-01 12:21:17 +00:00
Konstantin Belousov
596b98ba16 mlx5 sriov: Add controls for VFs to set port/node GUIDs.
Setting GUIDs make RoCE offloads functional on VFs.

Reported and tested by:	chuck
Sponsored by:	Mellanox Technologies - Nvidia
MFC after:	1 week
2020-08-31 16:32:17 +00:00
Konstantin Belousov
cca1f7a12f mlx5 sriov: add error message for failed MAC programming on VF.
Sponsored by:	Mellanox Technologies - Nvidia
MFC after:	1 week
2020-08-31 16:30:52 +00:00
Konstantin Belousov
2ea114b34e mlx5en: Implement SIOCGIFDOWNREASON.
Sponsored by:	Mellanox Technologies - Nvidia
MFC after:	1 week
2020-08-31 16:27:03 +00:00
Konstantin Belousov
62daa4b6e8 mlx5_core: add mlx5_query_pddr().
And use it in mlx5_query_pddr_range_info() instead of direct register
access.

Sponsored by:	Mellanox Technologies - Nvidia
MFC after:	1 week
2020-08-31 16:25:55 +00:00
Konstantin Belousov
e088db5eae mlx5_core: Import PDDR register definitions
PDDR (Port Diagnostics Database Register) is used to read the physical
layer debug database, which contains helpful troubleshooting information
regarding the state of the link.

PDDR register can only be queried when PCAM register reports it as
supported in its register mask. A new helper macro was added to
the MLX5_CAP_* infrastructure in order to access this mask.

Sponsored by:	Mellanox Technologies - Nvidia
MFC after:	1 week
2020-08-31 16:23:51 +00:00
Hans Petter Selasky
1866c98e64 Infiniband clients must be attached and detached in a specific order in ibcore.
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.

Fix this by using module_xxx_order() instead of module_xxx().

Differential Revision:	https://reviews.freebsd.org/D23973
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-06 08:50:11 +00:00
Konstantin Belousov
92d8df2f37 mlx5_core: remove unneccessary LFENCE instruction.
Use fence instead of barrier, which is optimized to take advantage of
the x86 TSO memory model.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2020-07-02 10:44:45 +00:00
Hans Petter Selasky
11304ef50e Fix HW TLS offload regression issue after r359919, in mlx5en(4).
Changes in the mbuf layout regarding HW TLS, resulted in wrong detection
of starting mbuf. Use a boolean variable to handle this and pass m_adj()
the top mbuf, so that the packet header is adjusted correctly.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-17 11:14:54 +00:00
Ryan Moeller
cbb9ccf735 Avoid trying to toggle TSO twice
Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in
various NIC drivers.

Reviewed by:	hselasky, np, gallatin, jpaetzel
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25120
2020-06-15 16:35:27 +00:00
Hans Petter Selasky
6fe9e470bb Make sure packets generated by raw IP code is let through by mlx5en(4).
Allow the TCP header to reside in the mbuf following the IP header.
Else such packets will get dropped.

Backtrace:
mlx5e_sq_xmit()
mlx5e_xmit()
ether_output_frame()
ether_output()
ip_output_send()
ip_output()
rip_output()
sosend_generic()
sosend()
kern_sendit()
sendit()
sys_sendto()
amd64_syscall()
fast_syscall_common()

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-11 09:41:54 +00:00
Hans Petter Selasky
b63b61cc75 Extend use of unlikely() in the fast path, in mlx5en(4).
Typically the TCP/IP headers fit within the first mbuf and should not
trigger any of the error cases. Use unlikely() for these cases.

No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-11 09:38:51 +00:00
Hans Petter Selasky
9eb1e4aa21 Use const keyword when parsing the TCP/IP header in the fast path in mlx5en(4).
When parsing the TCP/IP header in the fast path, make it clear by using
the const keyword, no fields are to be modified inside the transmitted
packet.

No functional change.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-11 09:36:37 +00:00
Hans Petter Selasky
bf43f9812c Sync with Linux packet pacing enhancements in mlx5en(4).
Linux commit:
05d3ac978ed25b753bfe34fe76c50c31ee506a82

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-05-26 07:41:46 +00:00
Hans Petter Selasky
ce69b84204 Improve set progress parameters, SET PSV for HW TLS in mlx5en(4).
There is no need for a fence and there is no need to provide
the TCP sequence number.

Sponsored by:	Mellanox Technologies
2020-05-25 12:37:45 +00:00
Hans Petter Selasky
233a6665b6 Correctly set the initial vector for TLS v1.3 for mlx5en(4).
For TLS v1.3 the 12 bytes of the initial vector, IV, should just be copied
as-is from the kernel to the gcm_iv field, which hold the first 4 bytes,
and the remaining 8 bytes go to the subsequent implicit_iv field.
There is no need to consider the byte order on the 12 bytes of IV like
initially done.

Sponsored by:	Mellanox Technologies
2020-05-25 12:34:15 +00:00
Hans Petter Selasky
9550e3403e Update the TLS capability bit after recent PRM changes in mlx5en(4).
A CX6-DX firmware version equal to or newer than 12.27.0372 is
now required.

Sponsored by:	Mellanox Technologies
2020-05-25 12:31:48 +00:00
Konstantin Belousov
d0a4068359 mlx5_core: add more port module event types to decode.
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	3 days
2020-05-20 11:20:45 +00:00
Konstantin Belousov
6418350cf4 mlx5_core: add "PMD type not enabled" port module event type.
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	3 days
2020-05-20 11:10:10 +00:00
Gleb Smirnoff
6edfd179c8 Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:21:11 +00:00
Gleb Smirnoff
7b6c99d08d Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
Gleb Smirnoff
6fbcdeb6f1 Step 2.4: Stop using 'struct mbuf_ext_pgs' in drivers.
Reviewed by:	gallatin, hselasky
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:58:20 +00:00
Hans Petter Selasky
decb087cc2 Add support for reading temperature in mlx5en(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-27 14:35:39 +00:00
Andrew Gallatin
23feb56348 KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
Hans Petter Selasky
bd88e5f28f Account out of buffer as dropped packets in mlx5en(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 08:56:27 +00:00
Hans Petter Selasky
d182de8661 Remove obsolete bufring stats in mlx5en(4).
Leftover from when DRBR was removed.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-08 08:53:31 +00:00
Hans Petter Selasky
cd1442c0ff Don't drop packets having too many TCP option headers in mlx5en(4).
When using SACK it can happen there are multiple option headers.
Don't drop these packets, but instead limit the amount of inlining
to the maximum supported.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-06 09:50:20 +00:00
Hans Petter Selasky
9c9b73403c Ensure a minimum inline size of 16 bytes in mlx5en(4).
This includes 14 bytes of ethernet header and 2 bytes of VLAN header.

This allows for making assumptions about the inline size limit
in the fast transmit path later on.

Use a signed integer variable to catch underflow.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-06 09:45:49 +00:00
Hans Petter Selasky
f504949065 Count number of times transmit ring is out of buffers in mlx5en(4).
Differential Revision:	https://reviews.freebsd.org/D24273
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-04-06 09:41:22 +00:00
Konstantin Belousov
4ad58ea85b mlx5_core: lower the severity of message noting that no SR-IOV cap is present.
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:47:14 +00:00
Konstantin Belousov
bbcb656af2 mlx5: Route NIC_VPORT_CHANGE events to eswitch code.
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:44:48 +00:00
Konstantin Belousov
90959e7e37 mlx5: Read number of VF ports from the SR-IOV cap.
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2020-03-18 22:43:39 +00:00