freebsd-nq

Author	SHA1	Message	Date
Navdeep Parhar	dbc5c85c66	cxgbe(4): two new debug sysctls. dev.<nexus>.<instance>.misc.tid_stats dev.<nexus>.<instance>.misc.tnl_stats MFC after: 3 days Sponsored by: Chelsio Communications	2020-12-03 22:00:41 +00:00
John Baldwin	a42f096821	Clear TLS offload mode for unsupported cipher suites and versions. If TOE TLS is requested for an unsupported cipher suite or TLS version, disable TLS processing and fall back to plain TOE. In addition, if an error occurs when saving the decryption keys in the card's memory, disable TLS processing and fall back to plain TOE. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27468	2020-12-03 21:59:47 +00:00
John Baldwin	05d5675520	Fix downgrading of TOE TLS sockets to plain TOE. If a TOE TLS socket ends up using an unsupported TLS version or ciphersuite, it must be downgraded to a "plain" TOE socket with TLS encryption/decryption performed on the host. The previous implementation of this fallback was incomplete and resulted in hung connections. Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27467	2020-12-03 21:49:20 +00:00
Navdeep Parhar	8eba75ed68	cxgbe(4): Stop but don't free netmap queues when netmap is switched off. It is common for freelists to be starving when a netmap application stops. Mailbox commands to free queues can hang in such a situation. Avoid that by not freeing the queues when netmap is switched off. Instead, use an alternate method to stop the queues without releasing the context ids. If netmap is enabled again later then the same queue is reinitialized for use. Move alloc_nm_rxq and txq to t4_netmap.c while here. MFC after: 1 week Sponsored by: Chelsio Communications	2020-12-03 08:30:29 +00:00
Navdeep Parhar	f42f3b2955	cxgbe(4): Revert r367917. r367917 fixed the backpressure on the netmap rxq being stopped but that doesn't help if some other netmap rxq is starved (because it is stopping too although the driver doesn't know this yet) and blocks the pipeline. An alternate fix that works in all cases will be checked in instead. Sponsored by: Chelsio Communications	2020-12-02 20:54:03 +00:00
Navdeep Parhar	b3718e2d7e	cxgbe(4): Catch up with in-flight netmap rx before destroying queues. The netmap application using the driver is responsible for replenishing the receive freelists and they may be totally depleted when the application exits. Packets in flight, if any, might block the pipeline in case there aren't enough buffers left in the freelist. Avoid this by filling up the freelists with a driver allocated buffer. MFC after: 1 week Sponsored by: Chelsio Communications	2020-11-21 03:27:32 +00:00
Navdeep Parhar	bdabd00d65	cxgbe/t4_tom: Handle VXLAN-encapsulated SYNs correctly. TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE rx parsing is enabled in the chip. t4_tom should pass on these SYNs to the kernel and let it deal with them as if they arrived on the non-TOE path. Reported by: Sony at Chelsio MFC after: 1 week Sponsored by: Chelsio Communications	2020-11-12 20:02:48 +00:00
Navdeep Parhar	f14d7c9516	cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs. This should have been part of r366929. MFC after: 3 days Sponsored by: Chelsio Communications	2020-11-12 01:18:05 +00:00
John Baldwin	b3ceca0c80	Clear tp->tod in t4_pcb_detach(). Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear. In particular, if a newly accepted socket falls back to non-TOE due to an active open failure, the non-TOE socket will still have tp->tod set even though TF_TOE is clear. Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D27028	2020-11-10 19:54:39 +00:00
Navdeep Parhar	de0a3472d8	cxgbe(4): Allow the PF driver to set a VF's MAC address. The MAC address can be set with the optional mac-addr property in the VF section of the iovctl.conf(5) used to instantiate the VFs. MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-11-09 00:08:35 +00:00
Navdeep Parhar	dc0800a9ad	cxgbev(4): Use the MAC address set by the the PF if there is one. Query the firmware for the MAC address set by the PF for the VF and use it instead of the firmware generated MAC if it's available. MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-11-09 00:01:13 +00:00
Navdeep Parhar	76b976ad98	cxgbe(4): Add the firmware binaries missing in r367428. Obtained from: Chelsio Communications MFC after: 5 days Sponsored by: Chelsio Communications	2020-11-08 22:30:13 +00:00
Navdeep Parhar	890efa1ab9	cxgbe(4): Update firmwares to 1.25.0.40. This fixes a potential crash in firmware 1.25.0.0 on the passive open side during TOE operation. Obtained from: Chelsio Communications MFC after: 1 week Sponsored by: Chelsio Communications	2020-11-06 19:04:20 +00:00
Mark Johnston	f7db0c9532	vmspace: Convert to refcount(9) This is mostly mechanical except for vmspace_exit(). There, use the new refcount_release_if_last() to avoid switching to vmspace0 unless other processes are sharing the vmspace. In that case, upon switching to vmspace0 we can unconditionally release the reference. Remove the volatile qualifier from vm_refcnt now that accesses are protected using refcount(9) KPIs. Reviewed by: alc, kib, mmel MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27057	2020-11-04 16:30:56 +00:00
Navdeep Parhar	e2e43aafd7	cxgbe(4): Fix min/max typo in r366958.	2020-10-23 02:24:43 +00:00
Navdeep Parhar	b8b01d9be8	cxgbe(4): refine the values reported in if_ratelimit_query. - Get the number of classes from chip_params. - Get the number of ethofld tids from the firmware. - Do not let tcp_ratelimit allocate all traffic classes. Sponsored by: Chelsio Communications	2020-10-23 01:36:54 +00:00
John Baldwin	8a82be5044	Handle CPL_RX_DATA on active TLS sockets. In certain edge cases, the NIC might have only received a partial TLS record which it needs to return to the driver. For example, if the local socket was closed while data was still in flight, a partial TLS record might be pending when the connection is closed. Receiving a RST in the middle of a TLS record is another example. When this happens, the firmware returns the the partial TLS record as plain TCP data via CPL_RX_DATA. Handle these requests by returning an error to OpenSSL (via so_error for KTLS or via an error TLS record header for the older Chelsio OpenSSL interface). Reported by: Sony Arpita Das @ Chelsio Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: Revision: https://reviews.freebsd.org/D26800	2020-10-23 00:23:54 +00:00
Navdeep Parhar	b20b25e744	cxgbe(4): fix the size of the iq/eq maps. The firmware can allocate ingress and egress context ids anywhere from its configured range. Size the iq/eq maps to match the entire range instead of assuming that the firmware always allocates the first available context id. Reported by: Baptiste Wicht @ Verisign MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-22 08:40:25 +00:00
Navdeep Parhar	37d411338e	cxgbe(4): display correct tid range for T6 based -SO cards. Reported by: Chelsio QA MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-21 20:42:29 +00:00
Navdeep Parhar	ae5da4e14d	cxgbe(4): Updates to the drop features from r366532. MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-19 21:11:49 +00:00
John Baldwin	6b7ecdcd9d	Re-enable receive flow control for TOE TLS sockets. Flow control was disabled during initial TOE TLS development to workaround a hang (and to match the Linux TOE TLS support for T6). The rest of the TOE TLS code maintained credits as if flow control was enabled which was inherited from before the workaround was added with the exception that the receive window was allowed to go negative. This negative receive window handling (rcv_over) was because I hadn't realized the full implications of disabling flow control. To clean this up, re-enable flow control on TOE TLS sockets. The existing TPF_FORCE_CREDITS workaround is sufficient for the original hang. Now that flow control is enabled, remove the rcv_over workaround and instead assert that the receive window never goes negative matching plain TCP TOE sockets. Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D26799	2020-10-19 20:08:50 +00:00
Navdeep Parhar	3f3e04a062	cxgbe(4): Fix page fault in t4_get_lb_stats with 2 port T5 cards. PR: 250449 Reported by: freqlabs@ MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-19 20:08:47 +00:00
Navdeep Parhar	472d183268	cxgbe(4): Do not request FEC when requesting speeds that don't have FEC. MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-14 10:12:39 +00:00
Navdeep Parhar	6cc4520b0a	cxgbe(4): unimplemented cudbg routines should return the correct internal error code and not an errno. Submitted by: Krishnamraju Eraparaju @ Chelsio MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-14 08:04:39 +00:00
Navdeep Parhar	31deb3cc76	cxgbe(4): More fixes for the T6 FCS error counter. r365732 was the first attempt to get an accurate count but it was writing to some read-only registers to clear them and that obviously didn't work. Instead, note the counter's value when it is supposed to be cleared and subtract it from future readings. dev.<port>.stats.rx_fcs_error should not be serviced from the MPS register for T6. The stats.* sysctls should all use T5_PORT_REG for T5 and above. This must have been missed in the initial T5 support years ago. Fix it while here. MFC after: 3 days Sponsored by: Chelsio Communications	2020-10-09 22:23:39 +00:00
Navdeep Parhar	77af2b2c85	cxgbe(4): knobs to drop various kinds of undesirable frames on ingress. These kind of drops come for free in the sense that they do not use the filter TCAM or any other resource that wouldn't normally be used during rx. Frames dropped by the hardware get counted in the MAC's rx stats but are not delivered to the driver. hw.cxgbe.attack_filter Set to 1 to enable the "attack filter". Default is 0. The attack filter will drop an incoming frame if any of these conditions is true: src ip/ip6 == dst ip/ip6; tcp and src/dst ip is not unicast; src/dst ip is loopback (127.x.y.z); src ip6 is not unicast; src/dst ip6 is loopback (::1/128) or unspecified (::/128); tcp and src/dst ip6 is mcast (ff00::/8). hw.cxgbe.drop_ip_fragments Set to 1 to drop all incoming IP fragments. Default is 0. Note that this drops valid frames. hw.cxgbe.drop_pkts_with_l2_errors Set to 1 to drop incoming frames with Layer 2 length or checksum errors. Default is 1. hw.cxgbe.drop_pkts_with_l3_errors Set to 1 to drop incoming frames with IP version, length, or checksum errors. Default is 0. hw.cxgbe.drop_pkts_with_l4_errors Set to 1 to drop incoming frames with Layer 4 length, checksum, or other errors. Default is 0. MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-10-08 10:00:13 +00:00
John Baldwin	56fb710f1b	Store the send tag type in the common send tag header. Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with their own identical headers that stored the type that the type-specific tag structures inherited from, so in practice it seems drivers need this in the tag anyway. This permits removing these extra header indirections (struct cxgbe_snd_tag and struct mlx5e_snd_tag). In addition, this permits driver-independent code to query the type of a tag, e.g. to know what type of tag is being queried via if_snd_query. Reviewed by: gallatin, hselasky, np, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26689	2020-10-06 17:58:56 +00:00
Navdeep Parhar	8741306b3b	cxgbe(4) sysctls do not need Giant. Sponsored by: Chelsio Communications	2020-10-05 22:18:04 +00:00
Navdeep Parhar	73f6606b47	cxgbe(4): set up the firmware flowc for the tid before send_abort_rpl. MFC after: 3 days Sponsored by: Chelsio Communications	2020-10-02 23:48:57 +00:00
Navdeep Parhar	7676c62aa3	cxgbe(4): validate largest_rx_cluster and safest_rx_cluster. These tunables can only be set to a valid cluster size (2K, 4K, 9K, or 16K) as documented in the man page. Anything else could lead to a panic on interface up. Reported by: mav@ MFC after: 1 week Sponsored by: Chelsio Communications	2020-10-02 05:59:55 +00:00
John Baldwin	0e99339684	Fallback to software for more GCM and CCM requests. ccr(4) uses software to handle GCM and CCM requests not supported by the crypto engine (e.g. with only AAD and no payload). This change adds a fallback for a few more requests such as those with more SGL entries than can fit in a work request (this can happen for GCM when decrypting a TLS record split across 15 or more packets). Reported by: Chelsio QA Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D26582	2020-09-29 21:51:32 +00:00
Navdeep Parhar	822967e7e5	cxgbe(4): Avoid unnecessary work in the firmware during netmap tx. Bind the netmap tx queues to a special '0xff' scheduling class which makes the firmware skip some processing related to rate limiting on the outgoing traffic. Future firmwares will do this automatically. MFC after: 1 week Sponsored by: Chelsio Communications	2020-09-29 09:25:52 +00:00
Navdeep Parhar	7efe256233	Remove duplicate line.	2020-09-29 09:11:51 +00:00
Navdeep Parhar	15ca0766ed	cxgbe(4): adjust the doorbell threshold for netmap freelists to match the maximum burst size used when fetching descriptors from the list. MFC after: 1 week Sponsored by: Chelsio Communications	2020-09-29 07:51:06 +00:00
Navdeep Parhar	f7b8615af5	cxgbe(4): display an error message when netmap cannot be enabled because the interface is down. MFC after: 1 week	2020-09-29 07:36:21 +00:00
Navdeep Parhar	a9f476580e	cxgbe(4): fixes for netmap operation with only some queues active. - Only active netmap receive queues should be in the RSS lookup table. - The RSS table should be restored for NIC operation when the last active netmap queue is switched off, not the first one. - Support repeated netmap ON/OFF on a subset of the queues. This works whether the the queues being enabled and disabled are the only ones active or not. Some kring indexes have to be reset in the driver for the second case. MFC after: 1 week Sponsored by: Chelsio Communications	2020-09-29 05:08:45 +00:00
Navdeep Parhar	30e3f2b4ea	cxgbe(4): let the PF driver use VM work requests for transmit. This allows the PF interfaces to communicate with the VF interfaces over the internal switch in the ASIC. Fix the GL limits for VM work requests while here. MFC after: 3 days Sponsored by: Chelsio Communications	2020-09-22 04:16:40 +00:00
Navdeep Parhar	7054f6ec97	cxgbe(4): add counters for mbuf pullups and defrags. MFC after: 3 days Sponsored by: Chelsio Communications	2020-09-22 03:06:36 +00:00
Navdeep Parhar	3b8506ae30	cxgbe(4): add the firmware binaries instead of the empty files that were added in r365861. Obtained from: Chelsio Communications MFC after: 3 days Sponsored by: Chelsio Communications	2020-09-18 03:11:47 +00:00
Navdeep Parhar	a4a4ad2dd9	cxgbe(4): add support for stateless offloads for VXLAN traffic. Hardware assistance includes checksumming (tx and rx), TSO, and RSS on the inner traffic in a VXLAN tunnel. Relnotes: Yes Sponsored by: Chelsio Communications	2020-09-18 03:01:47 +00:00
Navdeep Parhar	88c9c3f4dd	cxgbe(4): Update T4/5/6 firmwares to 1.25.0.0. Obtained from: Chelsio Communications MFC after: 3 days Sponsored by: Chelsio Communications	2020-09-17 22:14:11 +00:00
Navdeep Parhar	bb60ba7e22	cxgbe(4): Get the count of FCS errors from the MAC and not MPS for T6 ports. The MPS register on the T6 counts something other than FCS errors despite its name. MFC after: 3 days Sponsored by: Chelsio Communications	2020-09-14 22:15:54 +00:00
Navdeep Parhar	565b8fce23	cxgbe(4): Check for descriptors before writing a TLS or raw work request. This fixes a regression in r362905. Submitted by: jhb@ Sponsored by: Chelsio Communications	2020-08-31 22:44:59 +00:00
Alan Somers	e6f6d0c9bc	crypto(9): add CRYPTO_BUF_VMPAGE crypto(9) functions can now be used on buffers composed of an array of vm_page_t structures, such as those stored in an unmapped struct bio. It requires the running to kernel to support the direct memory map, so not all architectures can use it. Reviewed by: markj, kib, jhb, mjg, mat, bcr (manpages) MFC after: 1 week Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25671	2020-08-26 02:37:42 +00:00
Navdeep Parhar	6a59b9940e	cxgbe(4): Use large clusters for TOE rx queues when TOE+TLS is enabled. Rx is more efficient within the chip when the receive buffer size matches the TLS PDU size. MFC after: 3 days Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D26127	2020-08-23 04:16:20 +00:00
Navdeep Parhar	11a82cd688	cxgbei: destroy the worker threads' CV and mutex in stop_worker_threads. Reported by: bz@ MFC after: 3 days	2020-08-21 00:34:33 +00:00
Mark Johnston	5822a14c43	cxgbe(4): Stop checking for failures from malloc(M_WAITOK). PR: 240545 Submitted by: Andrew Reiter <arr@watson.org> Reviewed by: np MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25767	2020-07-27 19:05:53 +00:00
Navdeep Parhar	a2e160c5af	cxgbe(4): Some updates to the common code. Obtained from: Chelsio Communications MFC after: 1 week Sponsored by: Chelsio Communications	2020-07-24 23:15:42 +00:00
Navdeep Parhar	800535c2ca	cxgbev(4): Compare at most 16 bytes of the Ethernet header when trying to coalesce tx work requests. Note that Coverity will still treat this as an out-of-bounds access. We do want to compare 16B starting from ethmacdst but cmp_l2hdr was was going beyond that by 2B. cmp_l2hdr was introduced in r362905. Reported by: Coverity (CID 1430284) Sponsored by: Chelsio Communications	2020-07-13 19:15:29 +00:00
Navdeep Parhar	3bbb68f0e3	cxgbe(4): Fix a bug (introduced in r362905) where some tx traffic wasn't being reported to BPF.	2020-07-05 05:14:33 +00:00
Navdeep Parhar	d735920d33	cxgbe(4): changes in the Tx path to help increase tx coalescing. - Ask the firmware for the number of frames that can be stuffed in one work request. - Modify mp_ring to increase the likelihood of tx coalescing when there are just one or two threads that are doing most of the tx. Add teeth to the abdication mechanism by pushing the consumer lock into mp_ring. This reduces the likelihood that a consumer will get stuck with all the work even though it is above its budget. - Add support for coalesced tx WR to the VF driver. This, with the changes above, results in a 7x improvement in the tx pps of the VF driver for some common cases. The firmware vets the L2 headers submitted by the VF driver and it's a big win if the checks are performed for a batch of packets and not each one individually. Reviewed by: jhb@ MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25454	2020-07-03 04:44:23 +00:00
John Baldwin	94578db218	Reduce contention on per-adapter lock. - Move temporary sglists into the session structure and protect them with a per-session lock instead of a per-adapter lock. - Retire an unused session field, and move a debugging field under INVARIANTS to avoid using the session lock for completion handling when INVARIANTS isn't enabled. - Use counter_u64 for per-adapter statistics. Note that this helps for cases where multiple sessions are used (e.g. multiple IPsec SAs or multiple KTLS connections). It does not help for workloads that use a single session (e.g. a single GELI volume). Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25457	2020-06-26 00:01:31 +00:00
John Baldwin	4a711b8d04	Use zfree() instead of explicit_bzero() and free(). In addition to reducing lines of code, this also ensures that the full allocation is always zeroed avoiding possible bugs with incorrect lengths passed to explicit_bzero(). Suggested by: cem Reviewed by: cem, delphij Approved by: csprng (cem) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25435	2020-06-25 20:17:34 +00:00
Navdeep Parhar	7c228be30b	cxgbe(4): Add a pointer to the adapter softc in vi_info. There were quite a few places where port_info was being accessed only to get to the adapter. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25432	2020-06-25 17:04:22 +00:00
Navdeep Parhar	0cadedfc46	cxgbe(4): Add a tx_len16_to_desc helper. No functional change. MFC after: 1 week Sponsored by: Chelsio Communications	2020-06-23 07:33:29 +00:00
John Baldwin	6deb4131b8	Add support for requests with separate AAD to ccr(4). Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25290	2020-06-22 23:41:33 +00:00
Alexander V. Chernikov	b158cfb3fc	Switch cxgbe interface lookup to use fibX_lookup() from older fibX_lookup_nh_ext(). fibX_lookup_nh_ represents pre-epoch generation of fib kpi, providing less guarantees over pointer validness and requiring on-stack data copying. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D24975	2020-06-22 07:35:23 +00:00
Ryan Moeller	cbb9ccf735	Avoid trying to toggle TSO twice Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in various NIC drivers. Reviewed by: hselasky, np, gallatin, jpaetzel Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25120	2020-06-15 16:35:27 +00:00
John Baldwin	1a4a7e98eb	Explicitly zero IVs on the stack. Reviewed by: delphij Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25057	2020-06-03 22:19:52 +00:00
John Baldwin	0065d9a47f	Explicitly zero AES key schedules on the stack. Reviewed by: delphij MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25057	2020-06-03 22:18:21 +00:00
John Baldwin	20c128da91	Add explicit bzero's of sensitive data in software crypto consumers. Explicitly zero IVs, block buffers, and hashes/digests. Reviewed by: delphij Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25057	2020-06-03 22:11:05 +00:00
John Baldwin	2adc3c9417	Support separate output buffers in ccr(4). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545	2020-05-25 22:23:13 +00:00
John Baldwin	9c0e3d3a53	Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545	2020-05-25 22:12:04 +00:00
John Baldwin	3e9470482a	Various cleanups to the software encryption transform interface. - Consistently use 'void ' for key schedules / key contexts instead of a mix of 'caddr_t', 'uint8_t ', and 'void *'. - Add a ctxsize member to enc_xform similar to what auth transforms use and require callers to malloc/zfree the context. The setkey callback now supplies the caller-allocated context pointer and the zerokey callback is removed. Callers now always use zfree() to ensure key contexts are zeroed. - Consistently use C99 initializers for all statically-initialized instances of 'struct enc_xform'. - Change the encrypt and decrypt functions to accept separate in and out buffer pointers. Almost all of the backend crypto functions already supported separate input and output buffers and this makes it simpler to support separate buffers in OCF. - Remove xform_userland.h shim to permit transforms to be compiled in userland. Transforms no longer call malloc/free directly. Reviewed by: cem (earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24855	2020-05-20 21:21:01 +00:00
Navdeep Parhar	b0dede77b1	cxgbe/iw_cxgbe: Add an async callback to notify iw_cxgbe in case of a fatal error. Submitted by: Krishnamraju Eraparaju @ Chelsio MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-05-19 16:28:20 +00:00
Gleb Smirnoff	365e8da44a	Mechanically rename MBUF_EXT_PGS_ASSERT() to M_ASSERTEXTPG() to match classical M_ASSERTPKTHDR. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:27:41 +00:00
Gleb Smirnoff	6edfd179c8	Step 4.1: mechanically rename M_NOMAP to M_EXTPG Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:21:11 +00:00
Gleb Smirnoff	7b6c99d08d	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:12:56 +00:00
Gleb Smirnoff	6fbcdeb6f1	Step 2.4: Stop using 'struct mbuf_ext_pgs' in drivers. Reviewed by: gallatin, hselasky Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:58:20 +00:00
Gleb Smirnoff	c4ee38f8e8	Step 2.3: Rename mbuf_ext_pg_len() to m_epg_pagelen() that uses mbuf argument. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:52:35 +00:00
Gleb Smirnoff	49b6b60e22	Step 2.2: o Shrink sglist(9) functions to work with multipage mbufs down from four functions to two. o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf. o Rename to something matching _epg. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:46:29 +00:00
Gleb Smirnoff	0c1032665c	Continuation of multi page mbuf redesign from r359919. The following series of patches addresses three things: Now that array of pages is embedded into mbuf, we no longer need separate structure to pass around, so struct mbuf_ext_pgs is an artifact of the first implementation. And struct mbuf_ext_pgs_data is a crutch to accomodate the main idea r359919 with minimal churn. Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP. The namespace for the newfeature is somewhat inconsistent and sometimes has a lengthy prefixes. In these patches we will gradually bring the namespace to "m_epg" prefix for all mbuf fields and most functions. Step 1 of 4: o Anonymize mbuf_ext_pgs_data, embed in m_ext o Embed mbuf_ext_pgs o Start documenting all this entanglement Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 22:39:26 +00:00
John Baldwin	8cce4145fa	Add support for KTLS RX over TOE to T6. This largely reuses the TLS TOE support added in r330884. However, this uses the KTLS framework in upstream OpenSSL rather than requiring Chelsio-specific patches to OpenSSL. As with the existing TLS TOE support, use of RX offload requires setting the tls_rx_ports sysctl. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24453	2020-04-27 23:59:42 +00:00
John Baldwin	f1f9347546	Initial support for kernel offload of TLS receive. - Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number. - When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message. - Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions. - When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic(). - Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.	2020-04-27 23:17:19 +00:00
Navdeep Parhar	55eae197fc	cxgbe/crypto: Fix the key size in a couple of places to catch up with the recent OCF refactor. Sponsored by: Chelsio Communications	2020-04-23 23:54:23 +00:00
Navdeep Parhar	a3372bd833	cxgbe/iw_cxgbe: Create a LinuxKPI pci device for an adapter and use it as the dma_device during RDMA registration. cxgbe's struct device cannot be used as-is because it's a native FreeBSD driver and ibcore is LinuxKPI based. MFC after: 1 week MFC after: r360196	2020-04-22 21:54:21 +00:00
Alexander V. Chernikov	8d6708ba80	Convert TOE routing lookups to the new routing KPI. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D24388	2020-04-22 07:53:43 +00:00
John Baldwin	29fe41ddd7	Retire the CRYPTO_F_IV_GENERATE flag. The sole in-tree user of this flag has been retired, so remove this complexity from all drivers. While here, add a helper routine drivers can use to read the current request's IV into a local buffer. Use this routine to replace duplicated code in nearly all drivers. Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24450	2020-04-20 22:24:49 +00:00
John Baldwin	708652acc4	Set inp_flowid's for TOE connections. KTLS uses the flowid to distribute software encryption tasks among its pool of worker threads. Without this change, all software KTLS requests for TOE sockets ended up on the first worker thread. Note that the flowid for TOE sockets created via connect() is not a hash of the 4-tuple, but is instead the id of the TOE pcb (tid). The flowid of TOE sockets created from TOE listen sockets do use the 4-tuple RSS hash as the flowid since the firmware provides the hash in the message containing the original SYN. Reviewed by: np (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24348	2020-04-15 19:28:51 +00:00
John Baldwin	f3b6d8ad2e	Clear CPL_GET_TCB_RPL handler on module unload. This fixes a panic when unloading and reloading t4_tom.ko since the old pointer is still stored when t4_tom_load tries to set it. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24358	2020-04-15 19:23:53 +00:00
Navdeep Parhar	ddde90ac81	cxgbe/iw_cxgbe: Do not start the EP timer if soaccept fails. This fixes a panic that would occur when the timer tried to close a stale socket. Submitted by: Krishnamraju Eraparaju @ Chelsio MFC after: 1 week Sponsored by: Chelsio Communications	2020-04-15 03:40:33 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Navdeep Parhar	843b264a85	cxgbe(4): Make sure 'flags' is at the same offset in structs toepcb and synq_entry. TAILQ_ENTRY isn't always the same size as two pointers. Reported by: rmacklem@ MFC after: 3 days Sponsored by: Chelsio Communications	2020-04-13 20:12:47 +00:00
John Baldwin	94fad5ffc6	Use both crypto engines on a T6. A T6 adapter contains two crypto engines on separate channels. This commit distributes sessions between the two engines. Previously, only the first engine was used. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24347	2020-04-10 22:27:45 +00:00
John Baldwin	c034143269	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
Navdeep Parhar	aa301e5ffe	cxgbe(4): Split sge_nm_rxq into three cachelines. This reduces the lines bouncing around between the driver rx ithread and the netmap rxsync thread. There is no net change in the size of the struct (it continues to waste a lot of space). This kind of split was originally proposed in D17869 by Marc De La Gueronniere @ Verisign, Inc. MFC after: 1 week Sponsored by: Chelsio Communications	2020-03-20 05:12:16 +00:00
Navdeep Parhar	7a25fb9963	cxgbe(4): Do not display error messages related to the CLIP table if it's not in use by TOE or KTLS. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24046	2020-03-13 00:12:15 +00:00
Navdeep Parhar	87d228f935	cxgbe/t4_tom: The MSS in a FLOWC work request must not be 0. Submitted by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications	2020-03-10 21:49:56 +00:00
Navdeep Parhar	2b9010f070	cxgbe(4): Do not try to use 0 as an rx buffer address when the driver is already allocating from the safe zone and the allocation fails. This bug was introduced in r357481. MFC after: 3 days Sponsored by: Chelsio Communications	2020-03-10 21:44:20 +00:00
Navdeep Parhar	7ba6f5493d	cxgbe/t4_tom: Do not uninitialize a toepcb that has not been initialized. This fixes the following panic: --- trap 0xc, rip = 0xffffffff80c00411, rsp = 0xfffffe0025192840, rbp = 0xfffffe0025192860 --- vmem_xfree() at vmem_xfree+0xd1/frame 0xfffffe0025192860 tls_uninit_toep() at tls_uninit_toep+0x78/frame 0xfffffe0025192880 free_toepcb() at free_toepcb+0x32/frame 0xfffffe00251928a0 t4_connect() at t4_connect+0x3be/frame 0xfffffe0025192950 tcp_offload_connect() at tcp_offload_connect+0xa4/frame 0xfffffe0025192990 tcp_usr_connect() at tcp_usr_connect+0xec/frame 0xfffffe00251929f0 soconnect() at soconnect+0xae/frame 0xfffffe0025192a30 kern_connectat() at kern_connectat+0xe2/frame 0xfffffe0025192a90 sys_connect() at sys_connect+0x75/frame 0xfffffe0025192ad0 amd64_syscall() at amd64_syscall+0x137/frame 0xfffffe0025192bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0025192bf0 --- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x8008e9d8a, rsp = 0x7fffffffc0f8, rbp = 0x7fffffffc130 --- Reviewed by: jhb@ MFC after: 3 days Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23989	2020-03-06 19:56:12 +00:00
John Baldwin	6d44e8e6b5	Rename TOE TLS stats from [rt]x_tls_* to [rt]x_toe_tls_*. This more clearly differentiates TLS records encrypted and decrypted in TOE connections from those encrypted via NIC TLS. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-28 00:42:27 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Navdeep Parhar	02cd773916	cxgbe(4): Congestion drops are maintained per E-channel and not per buffer group. This fixes a bug where congestion drops on port 1 of a T6 card would incorrectly be counted as drops on port 0. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-19 00:48:58 +00:00
Navdeep Parhar	9a4a1be02c	cxgbe/iw_cxgbe: correctly enforce the max reg_mr depth. Reported by: Andrew Zhu @ Netapp Obtained from: Chelsio Communications MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-18 20:43:10 +00:00
John Baldwin	ca3b3c573e	Remove the per-TXQ tls_wrs stat. It duplicated the kern_tls_records stat and was not conditional on NIC TLS being enabled. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23670	2020-02-13 22:55:45 +00:00
Navdeep Parhar	77ad00bf36	cxgbe(4): Update T4/5/6 firmwares to 1.24.12.0. Obtained from: Chelsio Communications MFC after: 1 month Sponsored by: Chelsio Communications	2020-02-12 02:55:06 +00:00
Navdeep Parhar	21935a41fd	cxgbe(4): Add native netmap support to the main interface. This means that extra virtual interfaces (VIs) created with hw.cxgbe.num_vis are no longer required to use netmap. Use this tunable to enable native netmap support on the main interface: hw.cxgbe.native_netmap="3" There is no change in default behavior. Suggested by: jch@ MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-02-05 22:29:01 +00:00
Navdeep Parhar	f4220a703d	cxgbe(4): Add a knob to allow netmap tx traffic to be checksummed by the hardware. hw.cxgbe.nm_txcsum=1 MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-02-05 00:13:15 +00:00
Navdeep Parhar	ba8b75ae01	cxgbe(4): Allow nm_black_hole and nm_cong_drop to be set at any time. The cong_drop setting will apply to queues created after the setting is changed and not to existing queues. MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-02-05 00:08:58 +00:00
Navdeep Parhar	3479fe20e2	cxgbe(4): Report accurate rx_buf_maxsize to netmap. MFC after: 2 weeks Sponsored by: Chelsio Communications	2020-02-04 23:55:21 +00:00
Navdeep Parhar	87bbb3338e	cxgbe(4): Add pfil(9) hooks to the driver's rx. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-04 01:09:02 +00:00
Navdeep Parhar	1486d2de9e	cxgbe(4): Treat NIC rx as special and run its handler directly and not via the t4_cpl_handler dispatch table. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-04 01:01:35 +00:00
Navdeep Parhar	46e1e307ed	cxgbe(4): Retire the allow_mbufs_in_cluster optimization. This simplifies the driver's rx fast path as well as the bookkeeping code that tracks various rx buffer sizes and layouts. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-04 00:51:10 +00:00
Navdeep Parhar	d6f79b2710	cxgbe(4): Avoid ext_arg2 in rxb_free. ext_arg2 is the only item in the third cacheline in an mbuf and could be cold by the time rxb_free runs. Put the information needed by rxb_free in the same line as the refcount, which is very likely to be hot given that rxb_free runs when the refcount is decremented and reaches 0. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-03 23:50:29 +00:00
Navdeep Parhar	44c6fea82b	cxgbe(4): Do not use pack boundary > 512B unless it is explicitly requested. This is a tradeoff between PCIe efficiency during large packet rx and packing efficiency during small packet rx. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-03 23:30:39 +00:00
Navdeep Parhar	a9c4062a9a	cxgbe(4): Initialize the rx buffer's metadata on first-use and not on allocation. refill_fl doesn't touch any part of a freshly allocated cluster after this change. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-03 23:25:12 +00:00
Navdeep Parhar	9087a3df60	cxgbe(4): Only checksummed TCP should be considered for LRO. This avoids the per-packet nanouptime in tcp_lro_rx for traffic that's not even TCP. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-03 23:06:42 +00:00
Navdeep Parhar	46d29cab25	cxgbe/iw_cxgbe: Do not allow memory registrations with page size greater than 128MB, which is the maximum supported by the hardware in RDMA mode. Obtained from: Chelsio Communications MFC after: 3 days Sponsored by: Chelsio Communications	2020-01-14 01:43:04 +00:00
Bjoern A. Zeeb	334fc5822b	vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks	2020-01-08 23:30:26 +00:00
Gleb Smirnoff	e9edde4110	Fix a typo - passing wrong mbuf pointer to needs_udp_csum(). Will trigger panic only on a kernel with RATELIMIT. Submitted by: rrs	2020-01-07 21:29:42 +00:00
Navdeep Parhar	93065a5afd	cxgbe(4): check if the firmware supports FW_RI_FR_NSMR_TPTE_WR work request. This is used by iw_cxgbe to figure out how best to register memory. MFC after: 1 month Sponsored by: Chelsio Communications	2019-12-18 19:10:30 +00:00
John Baldwin	93dafad57a	Expand net epoch in the cxgbe TOE driver to satisfy assertions. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22483	2019-12-13 23:33:54 +00:00
Navdeep Parhar	c0236bd93d	cxgbe(4): Use the _XT variant of the CPL used to transmit NIC traffic. CPL_TX_PKT_XT disables the internal parser on the chip and instead relies on the driver to provide the exact length of the L2 and L3 headers. This allows hw checksumming and TSO to be used with L2 and L3 encapsulations that the chip doesn't understand directly. Note that netmap tx still uses the old CPL as it never uses the hw to generate the checksum on tx. Reviewed by: jhb@ MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22788	2019-12-13 20:38:58 +00:00
Navdeep Parhar	82694ec0c0	cxgbe(4): Never use hardware checksumming in netmap tx. MFC after: 1 week Sponsored by: Chelsio Communications	2019-12-12 21:33:00 +00:00
Navdeep Parhar	c08c2d42cf	cxgbe(4): Simplify the firmware version checks a bit. No functional change. MFC after: 1 week	2019-12-10 20:12:21 +00:00
Navdeep Parhar	aa7bdbc00c	cxgbe(4): Use TX_PKTS2 work requests in netmap Tx if it's available. TX_PKTS2 is more efficient within the firmware and this improves netmap Tx by a few Mpps in some common scenarios. MFC after: 1 week Sponsored by: Chelsio Communications	2019-12-10 08:16:19 +00:00
Navdeep Parhar	6f012c14bc	cxgbe(4): Update T4/5/6 firmwares to 1.24.11.0. These were obtained from the Chelsio Unified Wire v3.12.0.1 beta release. Note that the firmwares are not uuencoded any more. MFH: 1 month Sponsored by: Chelsio Communications	2019-12-10 07:45:10 +00:00
Navdeep Parhar	168bde45c2	cxgbe/iw_cxgbe: Support 64b length in the memory registration routines. Submitted by: bharat @ chelsio MFC after: 1 week Sponsored by: Chelsio Communications	2019-12-09 19:10:42 +00:00
Michael Tuexen	fa49a96419	In order for the TCP Handshake to support ECN++, and further ECN-related improvements, the ECN bits need to be exposed to the TCP SYNcache. This change is a minimal modification to the function headers, without any functional change intended. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, rrs@, tuexen@ Differential Revision: https://reviews.freebsd.org/D22436	2019-12-01 18:05:02 +00:00
Navdeep Parhar	e3338dee08	cxgbe(4): Allow the driver to specify multiple FECs that the firmware should try in order to link up with the peer. Various FEC variables within the driver can now have multiple bits set instead of being powers of 2. 0 and -1 in the user knobs still mean no FEC and auto (driver decides) respectively for backward compatibility, but no-FEC and auto now have their own bits in the internal representation. There is a new bit that can be set to request the FEC recommended by the cable/transceiver module. Add sysctls to display link related capabilities of the local side as well as the link partner. Note that all this needs a new firmware and the documentation for the driver FEC knobs will be updated after that firmware is added to the driver. MFC after: 1 week Sponsored by: Chelsio Communications	2019-11-26 05:54:25 +00:00
Navdeep Parhar	515a40d5d9	cxgbe(4): sysctl to reset the temperature/voltage sensor. # sysctl dev.<nexus>.<inst>.reset_sensor=1 # sysctl dev.t6nex.0.reset_sensor=1 MFC after: 1 week Sponsored by: Chelsio Communications	2019-11-24 16:40:54 +00:00
Navdeep Parhar	e56d731b7d	cxgbe(4): Update the firmware interface header. This allows the driver to be updated for the next firmware without waiting for it to be released. MFC after: 2 weeks Sponsored by: Chelsio Communications	2019-11-24 05:37:28 +00:00
John Baldwin	bddf73433e	NIC KTLS for Chelsio T6 adapters. This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections. NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment the encrypted TLS frames output by the crypto engine. Instead, the TOE is placed into a special setup to permit "dummy" connections to be associated with regular sockets using KTLS. This permits using the TOE to segment the encrypted TLS records. However, this approach does have some limitations: 1) Regular TOE sockets cannot be used when the TOE is in this special mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but not both at the same time. 2) In NIC KTLS mode, the TOE is only able to accept a per-connection timestamp offset that varies in the upper 4 bits. Put another way, only connections whose timestamp offset has the 28 lower bits cleared can use NIC KTLS and generate correct timestamps. The driver will refuse to enable NIC KTLS on connections with a timestamp offset with any of the lower 28 bits set. To use NIC KTLS, users can either disable TCP timestamps by setting the net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the tcp_new_ts_offset() function to clear the lower 28 bits of the generated offset. 3) Because the TCP segmentation relies on fields mirrored in a TCB in the TOE, not all fields in a TCP packet can be sent in the TCP segments generated from a TLS record. Specifically, for packets containing TCP options other than timestamps, the driver will inject an "empty" TCP packet holding the requested options (e.g. a SACK scoreboard) along with the segments from the TLS record. These empty TCP packets are counted by the dev.cc.N.txq.M.kern_tls_options sysctls. Unlike TOE TLS which is able to buffer encrypted TLS records in on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS records for retransmit requests as well as non-retransmit requests that do not include the start of a TLS record but do include the trailer. The T6 NIC KTLS code tries to optimize some of the cases for requests to transmit partial TLS records. In particular it attempts to minimize sending "waste" bytes that have to be given as input to the crypto engine but are not needed on the wire to satisfy mbufs sent from the TCP stack down to the driver. TCP packets for TLS requests are broken down into the following classes (with associated counters): - Mbufs that send an entire TLS record in full do not have any waste bytes (dev.cc.N.txq.M.kern_tls_full). - Mbufs that send a short TLS record that ends before the end of the trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC, the encryption must always start at the beginning, so if the mbuf starts at an offset into the TLS record, the offset bytes will be "waste" bytes. For sockets using AES-GCM, the encryption can start at the 16 byte block before the starting offset capping the waste at 15 bytes. - Mbufs that send a partial TLS record that has a non-zero starting offset but ends at the end of the trailer (dev.cc.N.txq.M.kern_tls_partial). In order to compute the authentication hash stored in the trailer, the entire TLS record must be sent as input to the crypto engine, so the bytes before the offset are always "waste" bytes. In addition, other per-txq sysctls are provided: - dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq using AES-CBC. - dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq using AES-GCM. - dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to compensate for the TOE engine not being able to set FIN on the last segment of a TLS record if the TLS record mbuf had FIN set. - dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this txq including full, short, and partial records. - dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header and payload) sent for TLS record requests. - dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS record requests. To enable NIC KTLS with T6, set the following tunables prior to loading the cxgbe(4) driver: hw.cxgbe.config_file=kern_tls hw.cxgbe.kern_tls=1 Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21962	2019-11-21 19:30:31 +00:00
Navdeep Parhar	5877e649f0	cxgbev(4): Catch up with the pciids in the PF driver. MFC after: 3 days Sponsored by: Chelsio Communications	2019-11-15 18:48:14 +00:00
Gleb Smirnoff	782b97cb80	Fix regression from r353841: ctx.rc needs to be initialized, otherwise driver might silently fail to initialize. Pointy hat to: glebius	2019-11-15 18:02:37 +00:00
John Baldwin	a1b2b6e184	Create a file to hold shared routines for dealing with T6 key contexts. ccr(4) and TLS support in cxgbe(4) construct key contexts used by the crypto engine in the T6. This consolidates some duplicated code for helper functions used to build key contexts. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22156	2019-11-13 00:53:45 +00:00
Navdeep Parhar	43b5712444	cxgbe(4): Query Vdd from the firmware if its last known value is 0. TVSENSE may not be ready by the time t4_fw_initialize returns and the firmware returns 0 if the driver asks for the Vdd before the sensor is ready. MFC after: 1 week Sponsored by: Chelsio Communications	2019-11-08 01:13:12 +00:00
Gleb Smirnoff	1a49612526	Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). Remove few outdated comments and extraneous assertions. No functional change here.	2019-11-07 00:08:34 +00:00
Navdeep Parhar	2c4c3f83e7	cxgbe(4): Use correct size while converting lpacaps32 to native endianness.	2019-10-31 00:35:26 +00:00
Navdeep Parhar	adb0cd8408	cxgbe(4): Use correct FetchBurstMin values for T6. MFC after: 1 week Sponsored by: Chelsio Communications	2019-10-25 21:53:05 +00:00
John Baldwin	e38a50e8b6	Split Chelsio send tags into a generic base tag and a ratelimit tag. NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a distinct tag from a ratelimit tag. To support this, refactor cxgbe_snd_tag to be a simple send tag with a type and convert the existing ratelimit tag to a new cxgbe_rate_tag structure. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22072	2019-10-22 20:41:54 +00:00
John Baldwin	866a7f286f	Always allocate the atid table during attach. Previously the table was allocated on first use by TOE and the ratelimit code. The forthcoming NIC KTLS code also uses this table. Allocate it unconditionally during attach to simplify consumers. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D22028	2019-10-22 20:01:47 +00:00
Gleb Smirnoff	02cc07d105	Convert to if_foreach_llmaddr() KPI.	2019-10-21 18:11:11 +00:00
Navdeep Parhar	693a9dfce2	cxgbe(4): An EQ update can be requested in a TX_PKTS2 work request. MFC after: 1 week Sponsored by: Chelsio Communications	2019-10-15 17:35:39 +00:00
John Baldwin	aeb63511bd	Remove an unused parameter from get_new_keyid().	2019-10-14 18:02:56 +00:00
John Baldwin	b60229e2f1	Remove adapters from t4_list earlier during detach. This ensures the clip task won't race with t4_destroy_clip_table. While here, make some mutex destroys unconditional since attach always initializes them. Reviewed by: np MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21952	2019-10-09 21:08:51 +00:00
John Baldwin	4f13842f75	Add support for KTLS in the Chelsio TOE module. This adds a TOE hook to allocate a KTLS session. It also recognizes TLS mbufs in the socket buffer and sends those to the NIC using a TLS work request to encrypt the record before segmenting it. TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in addition to enabling KTLS. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:40:42 +00:00
John Baldwin	c59050aab5	Set the FID field in lookaside crypto requests to the rx queue ID. The PCI block in the adapter requires this field to be set to a valid queue ID. It is not clear why it did not fail on all machines, but the effect was that crypto operations reading input data via DMA failed with an internal PCI read error on machines with 128G or more of RAM. Reported by: gallatin Reviewed by: np MFC after: 3 days Sponsored by: Chelsio Communications	2019-10-08 20:22:05 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Navdeep Parhar	5fc7854e69	cxgbe/t4_tom: Use the correct value of sndbuf in AIO Tx. This should have been part of r351540. Sponsored by: Chelsio Communications	2019-08-28 23:31:58 +00:00
Navdeep Parhar	c537e887ac	cxgbe/t4_tom: Initialize all TOE connection parameters in one place. Remove now-redundant items from toepcb and synq_entry and the code to support them. Let the driver calculate tx_align, rx_coalesce, and sndbuf by default. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21387	2019-08-27 04:19:40 +00:00
Navdeep Parhar	241c83909c	cxgbe/t4_tom: Limit work requests with immediate payload to a single descriptor. The per-tid tx credits are in demand during active Tx and it's best not to use too many just for payload. Sponsored by: Chelsio Communications	2019-08-27 01:16:02 +00:00
Navdeep Parhar	c5560a884d	cxgbe/t4_tom: Any invalid scaling factor in the hardware's wsf field implies that window scaling is not in use. MFC after: 3 days Sponsored by: Chelsio Communications	2019-08-23 22:41:16 +00:00
Navdeep Parhar	4e4469cf3c	whitespace nit.	2019-08-23 22:34:14 +00:00
Navdeep Parhar	8bf3090312	cxgbe(4): Use the same buffer size for TOE rx queues as the NIC rx queues. This is a minor simplification. MFC after: 1 week Sponsored by: Chelsio Communications	2019-08-23 22:22:34 +00:00
Randall Stewart	20abea6663	This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20953	2019-08-01 14:17:31 +00:00
Navdeep Parhar	6620004df5	cxgbe(4): Completely ignore all top level interrupts that are not enabled. The driver used to log any non-zero cause and when running with a single line interrupt it would spam the console/logs with reports of interrupts that are of no interest to anyone. MFC after: 1 week Sponsored by: Chelsio Communications	2019-07-12 20:59:10 +00:00
Navdeep Parhar	f8f1b9674e	cxgbe(4): Clear the freelist statistics in the clearstats ioctl. Move all clearstats code into its own function while here. MFC after: 1 week Sponsored by: Chelsio Communications	2019-07-09 22:24:22 +00:00
Navdeep Parhar	a920680df5	cxgbe(4): Use the simplest configuration possible when falling back from the default configuration. MFC after: 1 week Sponsored by: Chelsio Communications	2019-07-09 19:32:31 +00:00
Li-Wen Hsu	57f0337a57	Fix gcc build for cxgbe(4) Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20879	2019-07-08 19:59:15 +00:00

1 2 3 4 5 ...

1128 Commits