freebsd-nq

Author	SHA1	Message	Date
John Baldwin	8a82be5044	Handle CPL_RX_DATA on active TLS sockets. In certain edge cases, the NIC might have only received a partial TLS record which it needs to return to the driver. For example, if the local socket was closed while data was still in flight, a partial TLS record might be pending when the connection is closed. Receiving a RST in the middle of a TLS record is another example. When this happens, the firmware returns the the partial TLS record as plain TCP data via CPL_RX_DATA. Handle these requests by returning an error to OpenSSL (via so_error for KTLS or via an error TLS record header for the older Chelsio OpenSSL interface). Reported by: Sony Arpita Das @ Chelsio Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: Revision: https://reviews.freebsd.org/D26800	2020-10-23 00:23:54 +00:00
John Baldwin	6b7ecdcd9d	Re-enable receive flow control for TOE TLS sockets. Flow control was disabled during initial TOE TLS development to workaround a hang (and to match the Linux TOE TLS support for T6). The rest of the TOE TLS code maintained credits as if flow control was enabled which was inherited from before the workaround was added with the exception that the receive window was allowed to go negative. This negative receive window handling (rcv_over) was because I hadn't realized the full implications of disabling flow control. To clean this up, re-enable flow control on TOE TLS sockets. The existing TPF_FORCE_CREDITS workaround is sufficient for the original hang. Now that flow control is enabled, remove the rcv_over workaround and instead assert that the receive window never goes negative matching plain TCP TOE sockets. Reviewed by: np MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D26799	2020-10-19 20:08:50 +00:00
Navdeep Parhar	73f6606b47	cxgbe(4): set up the firmware flowc for the tid before send_abort_rpl. MFC after: 3 days Sponsored by: Chelsio Communications	2020-10-02 23:48:57 +00:00
Navdeep Parhar	7c228be30b	cxgbe(4): Add a pointer to the adapter softc in vi_info. There were quite a few places where port_info was being accessed only to get to the adapter. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25432	2020-06-25 17:04:22 +00:00
Alexander V. Chernikov	b158cfb3fc	Switch cxgbe interface lookup to use fibX_lookup() from older fibX_lookup_nh_ext(). fibX_lookup_nh_ represents pre-epoch generation of fib kpi, providing less guarantees over pointer validness and requiring on-stack data copying. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D24975	2020-06-22 07:35:23 +00:00
Gleb Smirnoff	365e8da44a	Mechanically rename MBUF_EXT_PGS_ASSERT() to M_ASSERTEXTPG() to match classical M_ASSERTPKTHDR. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:27:41 +00:00
Gleb Smirnoff	6edfd179c8	Step 4.1: mechanically rename M_NOMAP to M_EXTPG Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:21:11 +00:00
Gleb Smirnoff	7b6c99d08d	Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed: s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-03 00:12:56 +00:00
Gleb Smirnoff	6fbcdeb6f1	Step 2.4: Stop using 'struct mbuf_ext_pgs' in drivers. Reviewed by: gallatin, hselasky Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:58:20 +00:00
Gleb Smirnoff	c4ee38f8e8	Step 2.3: Rename mbuf_ext_pg_len() to m_epg_pagelen() that uses mbuf argument. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:52:35 +00:00
Gleb Smirnoff	49b6b60e22	Step 2.2: o Shrink sglist(9) functions to work with multipage mbufs down from four functions to two. o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf. o Rename to something matching _epg. Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 23:46:29 +00:00
Gleb Smirnoff	0c1032665c	Continuation of multi page mbuf redesign from r359919. The following series of patches addresses three things: Now that array of pages is embedded into mbuf, we no longer need separate structure to pass around, so struct mbuf_ext_pgs is an artifact of the first implementation. And struct mbuf_ext_pgs_data is a crutch to accomodate the main idea r359919 with minimal churn. Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP. The namespace for the newfeature is somewhat inconsistent and sometimes has a lengthy prefixes. In these patches we will gradually bring the namespace to "m_epg" prefix for all mbuf fields and most functions. Step 1 of 4: o Anonymize mbuf_ext_pgs_data, embed in m_ext o Embed mbuf_ext_pgs o Start documenting all this entanglement Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598	2020-05-02 22:39:26 +00:00
John Baldwin	8cce4145fa	Add support for KTLS RX over TOE to T6. This largely reuses the TLS TOE support added in r330884. However, this uses the KTLS framework in upstream OpenSSL rather than requiring Chelsio-specific patches to OpenSSL. As with the existing TLS TOE support, use of RX offload requires setting the tls_rx_ports sysctl. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24453	2020-04-27 23:59:42 +00:00
John Baldwin	f1f9347546	Initial support for kernel offload of TLS receive. - Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number. - When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message. - Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions. - When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic(). - Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.	2020-04-27 23:17:19 +00:00
Alexander V. Chernikov	8d6708ba80	Convert TOE routing lookups to the new routing KPI. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D24388	2020-04-22 07:53:43 +00:00
John Baldwin	708652acc4	Set inp_flowid's for TOE connections. KTLS uses the flowid to distribute software encryption tasks among its pool of worker threads. Without this change, all software KTLS requests for TOE sockets ended up on the first worker thread. Note that the flowid for TOE sockets created via connect() is not a hash of the 4-tuple, but is instead the id of the TOE pcb (tid). The flowid of TOE sockets created from TOE listen sockets do use the 4-tuple RSS hash as the flowid since the firmware provides the hash in the message containing the original SYN. Reviewed by: np (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24348	2020-04-15 19:28:51 +00:00
John Baldwin	f3b6d8ad2e	Clear CPL_GET_TCB_RPL handler on module unload. This fixes a panic when unloading and reloading t4_tom.ko since the old pointer is still stored when t4_tom_load tries to set it. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24358	2020-04-15 19:23:53 +00:00
Andrew Gallatin	23feb56348	KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache. This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext. In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs. This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone, This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons. In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache. Many thanks to glebius for helping to make this better in the Netflix tree. Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213	2020-04-14 14:46:06 +00:00
Navdeep Parhar	843b264a85	cxgbe(4): Make sure 'flags' is at the same offset in structs toepcb and synq_entry. TAILQ_ENTRY isn't always the same size as two pointers. Reported by: rmacklem@ MFC after: 3 days Sponsored by: Chelsio Communications	2020-04-13 20:12:47 +00:00
John Baldwin	c034143269	Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_' fields are always used for auth keys and settings and 'csp_cipher_' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677	2020-03-27 18:25:23 +00:00
Navdeep Parhar	87d228f935	cxgbe/t4_tom: The MSS in a FLOWC work request must not be 0. Submitted by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications	2020-03-10 21:49:56 +00:00
Navdeep Parhar	7ba6f5493d	cxgbe/t4_tom: Do not uninitialize a toepcb that has not been initialized. This fixes the following panic: --- trap 0xc, rip = 0xffffffff80c00411, rsp = 0xfffffe0025192840, rbp = 0xfffffe0025192860 --- vmem_xfree() at vmem_xfree+0xd1/frame 0xfffffe0025192860 tls_uninit_toep() at tls_uninit_toep+0x78/frame 0xfffffe0025192880 free_toepcb() at free_toepcb+0x32/frame 0xfffffe00251928a0 t4_connect() at t4_connect+0x3be/frame 0xfffffe0025192950 tcp_offload_connect() at tcp_offload_connect+0xa4/frame 0xfffffe0025192990 tcp_usr_connect() at tcp_usr_connect+0xec/frame 0xfffffe00251929f0 soconnect() at soconnect+0xae/frame 0xfffffe0025192a30 kern_connectat() at kern_connectat+0xe2/frame 0xfffffe0025192a90 sys_connect() at sys_connect+0x75/frame 0xfffffe0025192ad0 amd64_syscall() at amd64_syscall+0x137/frame 0xfffffe0025192bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0025192bf0 --- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x8008e9d8a, rsp = 0x7fffffffc0f8, rbp = 0x7fffffffc130 --- Reviewed by: jhb@ MFC after: 3 days Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23989	2020-03-06 19:56:12 +00:00
John Baldwin	6d44e8e6b5	Rename TOE TLS stats from [rt]x_tls_* to [rt]x_toe_tls_*. This more clearly differentiates TLS records encrypted and decrypted in TOE connections from those encrypted via NIC TLS. MFC after: 1 week Sponsored by: Chelsio Communications	2020-02-28 00:42:27 +00:00
Bjoern A. Zeeb	334fc5822b	vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks	2020-01-08 23:30:26 +00:00
John Baldwin	93dafad57a	Expand net epoch in the cxgbe TOE driver to satisfy assertions. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22483	2019-12-13 23:33:54 +00:00
Michael Tuexen	fa49a96419	In order for the TCP Handshake to support ECN++, and further ECN-related improvements, the ECN bits need to be exposed to the TCP SYNcache. This change is a minimal modification to the function headers, without any functional change intended. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, rrs@, tuexen@ Differential Revision: https://reviews.freebsd.org/D22436	2019-12-01 18:05:02 +00:00
John Baldwin	bddf73433e	NIC KTLS for Chelsio T6 adapters. This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections. NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment the encrypted TLS frames output by the crypto engine. Instead, the TOE is placed into a special setup to permit "dummy" connections to be associated with regular sockets using KTLS. This permits using the TOE to segment the encrypted TLS records. However, this approach does have some limitations: 1) Regular TOE sockets cannot be used when the TOE is in this special mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but not both at the same time. 2) In NIC KTLS mode, the TOE is only able to accept a per-connection timestamp offset that varies in the upper 4 bits. Put another way, only connections whose timestamp offset has the 28 lower bits cleared can use NIC KTLS and generate correct timestamps. The driver will refuse to enable NIC KTLS on connections with a timestamp offset with any of the lower 28 bits set. To use NIC KTLS, users can either disable TCP timestamps by setting the net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the tcp_new_ts_offset() function to clear the lower 28 bits of the generated offset. 3) Because the TCP segmentation relies on fields mirrored in a TCB in the TOE, not all fields in a TCP packet can be sent in the TCP segments generated from a TLS record. Specifically, for packets containing TCP options other than timestamps, the driver will inject an "empty" TCP packet holding the requested options (e.g. a SACK scoreboard) along with the segments from the TLS record. These empty TCP packets are counted by the dev.cc.N.txq.M.kern_tls_options sysctls. Unlike TOE TLS which is able to buffer encrypted TLS records in on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS records for retransmit requests as well as non-retransmit requests that do not include the start of a TLS record but do include the trailer. The T6 NIC KTLS code tries to optimize some of the cases for requests to transmit partial TLS records. In particular it attempts to minimize sending "waste" bytes that have to be given as input to the crypto engine but are not needed on the wire to satisfy mbufs sent from the TCP stack down to the driver. TCP packets for TLS requests are broken down into the following classes (with associated counters): - Mbufs that send an entire TLS record in full do not have any waste bytes (dev.cc.N.txq.M.kern_tls_full). - Mbufs that send a short TLS record that ends before the end of the trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC, the encryption must always start at the beginning, so if the mbuf starts at an offset into the TLS record, the offset bytes will be "waste" bytes. For sockets using AES-GCM, the encryption can start at the 16 byte block before the starting offset capping the waste at 15 bytes. - Mbufs that send a partial TLS record that has a non-zero starting offset but ends at the end of the trailer (dev.cc.N.txq.M.kern_tls_partial). In order to compute the authentication hash stored in the trailer, the entire TLS record must be sent as input to the crypto engine, so the bytes before the offset are always "waste" bytes. In addition, other per-txq sysctls are provided: - dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq using AES-CBC. - dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq using AES-GCM. - dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to compensate for the TOE engine not being able to set FIN on the last segment of a TLS record if the TLS record mbuf had FIN set. - dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this txq including full, short, and partial records. - dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header and payload) sent for TLS record requests. - dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS record requests. To enable NIC KTLS with T6, set the following tunables prior to loading the cxgbe(4) driver: hw.cxgbe.config_file=kern_tls hw.cxgbe.kern_tls=1 Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21962	2019-11-21 19:30:31 +00:00
John Baldwin	a1b2b6e184	Create a file to hold shared routines for dealing with T6 key contexts. ccr(4) and TLS support in cxgbe(4) construct key contexts used by the crypto engine in the T6. This consolidates some duplicated code for helper functions used to build key contexts. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22156	2019-11-13 00:53:45 +00:00
Gleb Smirnoff	1a49612526	Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). Remove few outdated comments and extraneous assertions. No functional change here.	2019-11-07 00:08:34 +00:00
John Baldwin	866a7f286f	Always allocate the atid table during attach. Previously the table was allocated on first use by TOE and the ratelimit code. The forthcoming NIC KTLS code also uses this table. Allocate it unconditionally during attach to simplify consumers. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D22028	2019-10-22 20:01:47 +00:00
John Baldwin	aeb63511bd	Remove an unused parameter from get_new_keyid().	2019-10-14 18:02:56 +00:00
John Baldwin	4f13842f75	Add support for KTLS in the Chelsio TOE module. This adds a TOE hook to allocate a KTLS session. It also recognizes TLS mbufs in the socket buffer and sends those to the NIC using a TLS work request to encrypt the record before segmenting it. TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in addition to enabling KTLS. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:40:42 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Navdeep Parhar	5fc7854e69	cxgbe/t4_tom: Use the correct value of sndbuf in AIO Tx. This should have been part of r351540. Sponsored by: Chelsio Communications	2019-08-28 23:31:58 +00:00
Navdeep Parhar	c537e887ac	cxgbe/t4_tom: Initialize all TOE connection parameters in one place. Remove now-redundant items from toepcb and synq_entry and the code to support them. Let the driver calculate tx_align, rx_coalesce, and sndbuf by default. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21387	2019-08-27 04:19:40 +00:00
Navdeep Parhar	241c83909c	cxgbe/t4_tom: Limit work requests with immediate payload to a single descriptor. The per-tid tx credits are in demand during active Tx and it's best not to use too many just for payload. Sponsored by: Chelsio Communications	2019-08-27 01:16:02 +00:00
Navdeep Parhar	c5560a884d	cxgbe/t4_tom: Any invalid scaling factor in the hardware's wsf field implies that window scaling is not in use. MFC after: 3 days Sponsored by: Chelsio Communications	2019-08-23 22:41:16 +00:00
Navdeep Parhar	4e4469cf3c	whitespace nit.	2019-08-23 22:34:14 +00:00
Li-Wen Hsu	57f0337a57	Fix gcc build for cxgbe(4) Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20879	2019-07-08 19:59:15 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
John Baldwin	7b17c92129	Use unmapped (M_NOMAP) mbufs for zero-copy AIO writes via TOE. Previously the TOE code used its own custom unmapped mbufs via EXT_FLAG_VENDOR1. The old version always wired the entire AIO request buffer first for the duration of the AIO operation and constructed multiple mbufs which used the wired buffer as an external buffer. The new version determines how much room is available in the socket buffer and only wires the pages needed for the available room building chains of M_NOMAP mbufs. This means that a large AIO write will now limit the amount of wired memory it uses to the size of the socket buffer. Reviewed by: gallatin, np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20839	2019-07-03 16:06:11 +00:00
John Baldwin	d76bbe175a	Add support for IFCAP_NOMAP to cxgbe(4). Since cxgbe(4) uses sglist instead of bus_dma, this required updates to the code that generates scatter/gather lists for packets. Also, unmapped mbufs are always sent via DMA and never as immediate data in the payload of a work request. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: np Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:52:21 +00:00
Navdeep Parhar	8674e626c6	cxgbe/t4_tom: Tweaks to some of the AIO related CTRs. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-28 19:57:42 +00:00
Navdeep Parhar	74a155edb0	cxgbe/t4_tom: the AIO tx job queue must be empty by the time the driver releases the offload resources associated with the tid. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20798	2019-06-28 19:27:45 +00:00
Navdeep Parhar	d49be2a696	cxgbe/t4_tom: Mark the socket's receive as done before calling handle_ddp_close. This eliminates a bad race where an aio_ddp_requeue that happened to run after handle_ddp_close could bump up the active count. Discussed with: jhb@ MFC after: 3 days Sponsored by: Chelsio Communications	2019-06-28 04:02:56 +00:00
Navdeep Parhar	b7acf27c2e	cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom. t_maxseg was changed in r293284 to not have any adjustment for TCP timestamps. t4_tom inadvertently went back to pre-r293284 semantics in r332506. Sponsored by: Chelsio Communications	2019-06-28 02:41:17 +00:00
John Baldwin	7f63b888c7	Hold an explicit reference on the socket for the aiotx task. Previously, the aiotx task relied on the aio jobs in the queue to hold a reference on the socket. However, when the last job is completed, there is nothing left to hold a reference to the socket buffer lock used to check if the queue is empty. In addition, if the last job on the queue is cancelled, the task can run with no queued jobs holding a reference to the socket buffer lock the task uses to notice the queue is empty. Fix these races by holding an explicit reference on the socket when the task is queued and dropping that reference when the task completes. Reviewed by: np MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20539	2019-06-27 19:36:30 +00:00
Navdeep Parhar	17795d8234	cxgbe/t4_tom: DDP_DEAD is a ddp flag and not a toepcb flag. The driver was in effect setting TPF_ABORT_SHUTDOWN on the toepcb instead of what was intended. MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-20 20:06:19 +00:00
John Baldwin	5f37b74d5d	Fix debug trace after removal of pdu_overhead. MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-07 21:30:11 +00:00
Navdeep Parhar	ebb8639822	cxgbe/t4_tom: adjust the hardware receive window to match changes to the receive sockbuf's high water mark. Calculate rx credits on the spot instead of tracking sbused/sb_cc and rx_credits in the toepcb. The previous method worked when the high water mark changed due to SB_AUTOSIZE but not when it was adjusted directly (for example, by the soreserve in nfsrvd_addsock). This fixes a connection hang while running iozone over an NFS mounted share where nfsd's TCP sockets are being handled by t4_tom. MFC after: 3 days Sponsored by: Chelsio Communications	2019-06-01 03:03:48 +00:00

1 2 3 4 5

225 Commits