freebsd-dev

Author	SHA1	Message	Date
Hans Petter Selasky	431980466f	Don't offset the UAR map twice in mlx5en(4). The new UAR API already offsets the UAR map pointer the mlx5en(4) is using. While at it remove some no longer needed variables for keeping track of the current BF offset. This fixes a regression issue after the new UAR allocation APIs were introduced. MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 18:35:49 +01:00
Hans Petter Selasky	f8f5b459d2	Update user access region, UAR, APIs in the core in mlx5core. This change include several changes as listed below all related to UAR. UAR is a special PCI memory area where the so-called doorbell register and blue flame register live. Blue flame is a feature for sending small packets more efficiently via a PCI memory page, instead of using PCI DMA. - All structures and functions named xxx_uuars were renamed into xxx_bfreg. - Remove partially implemented Blueflame support from mlx5en(4) and mlx5ib. - Implement blue flame register allocator. - Use blue flame register allocator in mlx5ib. - A common UAR page is now allocated by the core to support doorbell register writes for all of mlx5en and mlx5ib, instead of allocating one UAR per sendqueue. - Add support for DEVX query UAR. - Add support for 4K UAR for libmlx5. Linux commits: 7c043e908a74ae0a935037cdd984d0cb89b2b970 2f5ff26478adaff5ed9b7ad4079d6a710b5f27e7 0b80c14f009758cefeed0edff4f9141957964211 30aa60b3bd12bd79b5324b7b595bd3446ab24b52 5fe9dec0d045437e48f112b8fa705197bd7bc3c0 0118717583cda6f4f36092853ad0345e8150b286 a6d51b68611e98f05042ada662aed5dbe3279c1e MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 13:33:46 +01:00
Hans Petter Selasky	3764792007	Fix whitespace in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 13:33:46 +01:00
Hans Petter Selasky	9a47ae044b	Bump driver versions for mlx5en(4) and mlx4en(4). MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 12:35:55 +01:00
Hans Petter Selasky	89c0b4fa11	Bump some copyrights in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 12:35:55 +01:00
Hans Petter Selasky	a00718e1df	Implement SIOCGIFRSSKEY and SIOCGIFRSSHASH and mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-01-08 12:35:55 +01:00
Hans Petter Selasky	9e7fa1e66c	Collect statistics from all rate-limit queues in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-12-28 14:39:51 +01:00
Hans Petter Selasky	caf4397197	Remove erradic assert after SVN r367149 in mlx5en(4). The ratelimit tags may be shared, especially for unlimited TLS traffic, and then the refcount is allowed to be greater than one when freeing the send tag. MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-24 13:07:59 +00:00
Hans Petter Selasky	f34f0a65b2	Report EQE data upon CQ completion in mlx5core. Report EQE data upon CQ completion to let upper layers use this data. Linux commit: 4e0e2ea1886afe8c001971ff767f6670312a9b04 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-16 10:10:53 +00:00
Hans Petter Selasky	ffdb195f31	Enhance the mlx5_core_create_cq() function in mlx5core. Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Linux commit: 38164b771947be9baf06e78ffdfb650f8f3e908e MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-16 10:06:10 +00:00
Konstantin Belousov	0b8e170d95	mlx5en: Set ifmr_current same as ifmr_active. This both: - makes ifconfig media line similar to that of other drivers. - fixes ENXIO in case when paradoxical current media word is not registered. Now e.g. ifconfig mce0 -mediaopt txpause,rxpause works by disabling pauses if enabled. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:25:10 +00:00
Konstantin Belousov	bab0c4b1a0	mlx5en: stop ignoring pauses and flow in the media reqs. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:23:27 +00:00
Konstantin Belousov	559dbeac47	mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:22:16 +00:00
Konstantin Belousov	4ead80241a	mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add(). Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:21:14 +00:00
John Baldwin	b7d92a6683	Remove IF_SND_TAG_TYPE_TLS_RATE_LIMIT conditionals. Support for TLS rate limit tags is now in the tree, so this macro is always defined. Reviewed by: hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27020	2020-10-30 21:05:50 +00:00
John Baldwin	418b5444f8	Fix a couple of silly bugs in r367149. - Assign the TLS rate limit value to the correct member of the rl_params for the nested rate limit tag. - Remove a dead condition. Pointy hat to: jhb	2020-10-30 00:06:36 +00:00
John Baldwin	36e0a362ac	Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc(). This gives a more uniform API for send tag life cycle management. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000	2020-10-29 23:28:39 +00:00
John Baldwin	638000c0b6	Use public interfaces to manage the nested rate limit send tag. Each TLS send tag in mlx5 contains a nested rate limit send tag. Previously, the driver was calling internal functions to manage the nested tag. Calling free methods directly instead of m_snd_tag_rele() leaked send tag references and references on the ifp. Changes to use the ifp methods for the nested tag for other methods are more cosmetic but do simplify the code. Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26996	2020-10-29 22:22:27 +00:00
John Baldwin	521eac97f3	Support hardware rate limiting (pacing) with TLS offload. - Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type). - When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag. - When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable. - When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone. - Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock. - Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46]. Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691	2020-10-29 00:23:16 +00:00
John Baldwin	56fb710f1b	Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689	2020-10-06 17:58:56 +00:00
Hans Petter Selasky	39b0f9c389	Poll statistics more frequently in mlx5en(4). This makes traffic steering algorithms more accurate. MFC after: 1 week Submitted by: gallatin @ Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-09-14 14:24:54 +00:00
Hans Petter Selasky	78ae1e6e15	Make hardware TLS send tag allocation synchronous in mlx5en(4). Previously the send tag was setup in the background, and all packets for the given send tag were dropped until ready. Change this to be blocking behaviour so that once the setsocketopt() for enabling TLS completes, the socket is ready to send packets. Do this by simply flushing the work request which does the needed firmware programming during send tag allocation. MFC after: 1 week Sponsored by: Mellanox Technologies // Nvidia	2020-09-01 12:21:17 +00:00
Konstantin Belousov	2ea114b34e	mlx5en: Implement SIOCGIFDOWNREASON. Sponsored by: Mellanox Technologies - Nvidia MFC after: 1 week	2020-08-31 16:27:03 +00:00
Hans Petter Selasky	1866c98e64	Infiniband clients must be attached and detached in a specific order in ibcore. Currently the linking order of the infiniband, IB, modules decide in which order the clients are attached and detached. For example one IB client may use resources from another IB client. This can lead to a potential deadlock at shutdown. For example if the ipoib is unregistered after the ib_multicast client is detached, then if ipoib is using multicast addresses a deadlock may happen, because ib_multicast will wait for all its resources to be freed before returning from the remove method. Fix this by using module_xxx_order() instead of module_xxx(). Differential Revision: https://reviews.freebsd.org/D23973 MFC after: 1 week Sponsored by: Mellanox Technologies	2020-07-06 08:50:11 +00:00
Hans Petter Selasky	11304ef50e	Fix HW TLS offload regression issue after r359919, in mlx5en(4). Changes in the mbuf layout regarding HW TLS, resulted in wrong detection of starting mbuf. Use a boolean variable to handle this and pass m_adj() the top mbuf, so that the packet header is adjusted correctly. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-06-17 11:14:54 +00:00
Ryan Moeller	cbb9ccf735	Avoid trying to toggle TSO twice Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in various NIC drivers. Reviewed by: hselasky, np, gallatin, jpaetzel Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25120	2020-06-15 16:35:27 +00:00
Hans Petter Selasky	6fe9e470bb	Make sure packets generated by raw IP code is let through by mlx5en(4). Allow the TCP header to reside in the mbuf following the IP header. Else such packets will get dropped. Backtrace: mlx5e_sq_xmit() mlx5e_xmit() ether_output_frame() ether_output() ip_output_send() ip_output() rip_output() sosend_generic() sosend() kern_sendit() sendit() sys_sendto() amd64_syscall() fast_syscall_common() MFC after: 1 week Sponsored by: Mellanox Technologies	2020-06-11 09:41:54 +00:00
Hans Petter Selasky	b63b61cc75	Extend use of unlikely() in the fast path, in mlx5en(4). Typically the TCP/IP headers fit within the first mbuf and should not trigger any of the error cases. Use unlikely() for these cases. No functional change. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-06-11 09:38:51 +00:00
Hans Petter Selasky	9eb1e4aa21	Use const keyword when parsing the TCP/IP header in the fast path in mlx5en(4). When parsing the TCP/IP header in the fast path, make it clear by using the const keyword, no fields are to be modified inside the transmitted packet. No functional change. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-06-11 09:36:37 +00:00
Hans Petter Selasky	ce69b84204	Improve set progress parameters, SET PSV for HW TLS in mlx5en(4). There is no need for a fence and there is no need to provide the TCP sequence number. Sponsored by: Mellanox Technologies	2020-05-25 12:37:45 +00:00
Hans Petter Selasky	233a6665b6	Correctly set the initial vector for TLS v1.3 for mlx5en(4). For TLS v1.3 the 12 bytes of the initial vector, IV, should just be copied as-is from the kernel to the gcm_iv field, which hold the first 4 bytes, and the remaining 8 bytes go to the subsequent implicit_iv field. There is no need to consider the byte order on the 12 bytes of IV like initially done. Sponsored by: Mellanox Technologies	2020-05-25 12:34:15 +00:00
Hans Petter Selasky	9550e3403e	Update the TLS capability bit after recent PRM changes in mlx5en(4). A CX6-DX firmware version equal to or newer than 12.27.0372 is now required. Sponsored by: Mellanox Technologies	2020-05-25 12:31:48 +00:00
Gleb Smirnoff	6edfd179c8	Step 4.1: mechanically rename M_NOMAP to M_EXTPG Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:21:11 +00:00
Gleb Smirnoff	7b6c99d08d	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:12:56 +00:00
Gleb Smirnoff	6fbcdeb6f1	Step 2.4: Stop using 'struct mbuf_ext_pgs' in drivers. Reviewed by: gallatin, hselasky Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:58:20 +00:00
Hans Petter Selasky	decb087cc2	Add support for reading temperature in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-27 14:35:39 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Hans Petter Selasky	bd88e5f28f	Account out of buffer as dropped packets in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-08 08:56:27 +00:00
Hans Petter Selasky	d182de8661	Remove obsolete bufring stats in mlx5en(4). Leftover from when DRBR was removed. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-08 08:53:31 +00:00
Hans Petter Selasky	cd1442c0ff	Don't drop packets having too many TCP option headers in mlx5en(4). When using SACK it can happen there are multiple option headers. Don't drop these packets, but instead limit the amount of inlining to the maximum supported. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-06 09:50:20 +00:00
Hans Petter Selasky	9c9b73403c	Ensure a minimum inline size of 16 bytes in mlx5en(4). This includes 14 bytes of ethernet header and 2 bytes of VLAN header. This allows for making assumptions about the inline size limit in the fast transmit path later on. Use a signed integer variable to catch underflow. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-06 09:45:49 +00:00
Hans Petter Selasky	f504949065	Count number of times transmit ring is out of buffers in mlx5en(4). Differential Revision: https://reviews.freebsd.org/D24273 MFC after: 1 week Sponsored by: Mellanox Technologies	2020-04-06 09:41:22 +00:00
Konstantin Belousov	f6ca0b216a	mlx5: Integrate eswitch and mpfs management code. Reviewed by: hselasky Sponsored by: Mellanox Technologies MFC after: 2 weeks	2020-03-18 22:33:39 +00:00
Konstantin Belousov	96dad2b720	mlx5en: Support 50GBase-KR4 media type in mlx5en driver. Submitted by: Adam Peace <adam.e.peace@gmail.com> Reviewed by: hselasky Sponsored by: Mellanox Technologies MFC after: 1 week	2020-03-04 17:13:35 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Gleb Smirnoff	e87c494015	Although most of the NIC drivers are epoch ready, due to peer pressure switch over to opt-in instead of opt-out for epoch. Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch when processing its packets. Now this will create recursive entrance in epoch in >90% network drivers, but will guarantee safeness of the transition. Mark several tested drivers as IFF_KNOWSEPOCH. Reviewed by: hselasky, jeff, bz, gallatin Differential Revision: https://reviews.freebsd.org/D23674	2020-02-24 21:07:30 +00:00
John Baldwin	ff8c6681c8	Don't check the auth algorithm for GCM. The upstream OpenSSL changes only set the cipher for GCM since the authentication is redundant, and changes to OCF will soon remove the GCM authentication algorithm constants entirely for the same reason. In addition, ktls_create_session() already validates these fields and wouldn't pass down an invalid auth_algorithm value to any drivers or ktls backends. Reviewed by: hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23671	2020-02-13 23:04:11 +00:00
Hans Petter Selasky	01651e9615	Add support for debugnet in mlx5en(4). MFC after: 1 week Sponsored by: Mellanox Technologies	2020-02-12 10:03:25 +00:00
Hans Petter Selasky	e48813009c	Widen EPOCH(9) usage in mlx5en(4). Make completion event path mostly lockless using EPOCH(9). Implement a mechanism using EPOCH(9) which allows us to make the callback path for completion events mostly lockless. Simplify draining callback events using epoch_wait(). While at it make sure all receive completion callbacks are covered by the network EPOCH(9), because this is required when calling if_input() and ether_input() after r357012. Sponsored by: Mellanox Technologies	2020-01-30 12:35:13 +00:00
Hans Petter Selasky	7272f9cd77	Implement hardware TLS via send tags for mlx5en(4), which is supported by ConnectX-6 DX. Currently TLS v1.2 and v1.3 with AES 128/256 crypto over TCP/IP (v4 and v6) is supported. A per PCI device UMA zone is used to manage the memory of the send tags. To optimize performance some crypto contexts may be cached by the UMA zone, until the UMA zone finishes the memory of the given send tag. An asynchronous task is used manage setup of the send tags towards the firmware. Most importantly setting the AES 128/256 bit pre-shared keys for the crypto context. Updating the state of the AES crypto engine and encrypting data, is all done in the fast path. Each send tag tracks the TCP sequence number in order to detect non-contiguous blocks of data, which may require a dump of prior unencrypted data, to restore the crypto state prior to wire transmission. Statistics counters have been added to count the amount of TLS data transmitted in total, and the amount of TLS data which has been dumped prior to transmission. When non-contiguous TCP sequence numbers are detected, the software needs to dump the beginning of the current TLS record up until the point of retransmission. All TLS counters utilize the counter(9) API. In order to enable hardware TLS offload the following sysctls must be set: kern.ipc.mb_use_ext_pgs=1 kern.ipc.tls.ifnet.permitted=1 kern.ipc.tls.enable=1 Sponsored by: Mellanox Technologies	2019-12-06 15:36:32 +00:00

1 2 3 4 5

221 Commits