Commit Graph

103 Commits

Author SHA1 Message Date
Andrew Gallatin
1985585233 ktls: re-work alloc thread
When the ktls_buffer zone needs to expand, it may fail due
to a lack of physically contiguous memory.  We tried to rectify
that by introducing an alloc thread to provide a context where
it is harmless to sleep, and letting that thread repopulate
the ktls_buffer zone.

However, it turns out that M_WAITOK is not enough, and we
must call vm_page_reclaim_contig_domain() to reclaim contig
memory. Worse, M_WAITOK results in the allocation essentially
busy-looping around vm_domain_alloc_fail() returning EAGIN,
causing vm_page_alloc_noobj_contig_domain() to loop and resulting
in the alloc thread consuming 100% CPU.

To fix this, we change the alloc thread to call
vm_page_reclaim_contig_domain_ext()

In order to prevent the busy loop around vm_domain_alloc_fail(), we
must change the uma_zalloc flags to M_NORECLAIM | M_NOWAIT.  However,
once that is done, these allocations become no different than the
allocations done in the critical path in ktls_buffer_alloc(), so its
best to just eliminate them.

Since we're no longer doing allocations but just calling
vm_page_reclaim_contig_domain_ext(), the name has changed to the ktls
reclaim thread.

Reviewed by: jhb, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D39421
2023-05-09 13:09:34 -04:00
John Baldwin
d2dab20c2a ktls: Drop all the INET and INET6 compile-time guards.
Consistent with 9fd0d9b16e, KERN_TLS is
not supported on kernels without any INET support.

Reviewed by:	gallatin, hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39232
2023-03-23 14:29:07 -07:00
Mark Johnston
b4b33821fa ktls: Fix interlocking between ktls_enable_rx() and listen(2)
The TCP_TXTLS_ENABLE and TCP_RXTLS_ENABLE socket option handlers check
whether the socket is listening socket and fail if so, but this check is
racy.  Since we have to lock the socket buffer later anyway, defer the
check to that point.

ktls_enable_tx() locks the send buffer's I/O lock, which will fail if
the socket is a listening socket, so no explicit checks are needed.  In
ktls_enable_rx(), which does not acquire the I/O lock (see the review
for some discussion on this), use an explicit SOLISTENING() check after
locking the recv socket buffer.

Otherwise, a concurrent solisten_proto() call can trigger crashes and
memory leaks by wiping out socket buffers as ktls_enable_*() is
modifying them.

Also make sure that a KTLS-enabled socket can't be converted to a
listening socket, and use SOCK_(SEND|RECV)BUF_LOCK macros instead of the
old ones while here.

Add some simple regression tests involving listen(2).

Reported by:	syzkaller
MFC after:	2 weeks
Reviewed by:	gallatin, glebius, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38504
2023-03-21 16:04:00 -04:00
Justin Hibbits
084846271a ktls: Use IfAPI accessors to get capabilities
Summary:
Avoid referencing the ifnet struct directly, and use the IfAPI accessors
instead.

Reviewed by:	gallatin
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38932
2023-03-07 09:47:00 -05:00
Andrew Gallatin
d24b032bec ktls: Fix comments & whitespace issues with c0e4090e3d
Address some last minute review feedback on c0e4090e3d
by fixing spacing around comments, and clarifying that the
newly added destroy_task is not related to tls 1.0.
No functional change intended.

Pointed out by: jhb
Sponsored by: Netflix
2023-02-09 14:11:24 -05:00
Andrew Gallatin
c0e4090e3d ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()

When we implemented ifnet kTLSe, we set a flag in the tx socket
buffer (SB_TLS_IFNET) to indicate ifnet kTLS.  This flag meant that
now, or in the past, ifnet ktls was active on a socket.  Later,
I added code to switch ifnet ktls sessions to software in the case
of lossy TCP connections that have a high retransmit rate.
Because TCP was using SB_TLS_IFNET to know if it needed to do math
to calculate the retransmit ratio and potentially call into
ktls_disable_ifnet(), it was doing unneeded work long after
a session was moved to software.

This patch carefully tracks whether or not ifnet ktls is still enabled
on a TCP connection.  Because the inp is now embedded in the tcpcb, and
because TCP is the most frequent accessor of this state, it made sense to
move this from the socket buffer flags to the tcpcb. Because we now need
reliable access to the tcbcb, we take a ref on the inp when creating a tx
ktls session.

While here, I noticed that rack/bbr were incorrectly implementing
tfb_hwtls_change(), and applying the change to all pending sends,
when it should apply only to future sends.

This change reduces spurious calls to  ktls_disable_ifnet() by 95% or so
in a Netflix CDN environment.

Reviewed by: markj, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D38380
2023-02-09 12:44:44 -05:00
John Baldwin
846e4a206f ktls_disable_ifnet_help: Set curvnet around sorele().
This is required in kernels with VIMAGE such as GENERIC.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-01-18 15:39:04 -08:00
John Baldwin
07be751727 ktls: Post receive errors on partially closed sockets.
If an error such as an invalid record or one whose decryption fails is
detected on a socket that has received a RST then ktls_drop() could
ignore the error since INP_DROPPED could already be set.  In this case
soreceive_generic hangs since it does not return from a KTLS socket
with pending encrypted data unless there is an error (so_error) (this
behavior is to ensure that soreceive_generic doesn't return a
premature EOF when there is pending data still being decrypted).

Note that this was a bug prior to
69542f2682 as tcp_usr_abort would also
have ignored the error in this case.

Reviewed by:	gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37775
2022-12-27 16:00:17 -08:00
John Baldwin
69542f2682 ktls: Close a race with setting so_error when dropping a connection.
pr_abort calls tcp_usr_abort which calls tcp_drop with ECONNABORTED.
After pr_abort returns, the so_error is then set to a more specific
error.  However, a reader can observe and return the ECONNABORTED
error before so_error is set to the desired error value.  This is
resulting in spurious test failures of recently added tests for
invalid conditions such as invalid headers.

To fix, refactor the code to abort a connection to call tcp_drop
directly with the desired error value.  ktls_reset_send_tag already
calls tcp_drop directly when it aborts a connection due to an error.

Reviewed by:	gallatin
Reported by:	CI (jenkins), gallatin, olivier
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37692
2022-12-15 12:06:26 -08:00
John Baldwin
9a673b7158 ktls: Add software support for AES-CBC decryption for TLS 1.1+.
This is mainly intended to provide a fallback for TOE TLS which may
need to use software decryption for an initial record at the start
of a connection.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37370
2022-11-15 12:02:03 -08:00
John Baldwin
5920f99d21 ktls: Inline ktls_cleanup() into ktls_destroy().
Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37353
2022-11-11 16:01:02 -08:00
John Baldwin
d01db2b837 ktls: Don't leak ktls session objects for certain errors.
ktls_cleanup() does not free ktls session objects, it merely
cleans (and frees) members of the object.

Change callers to use ktls_free() instead.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37352
2022-11-11 16:00:37 -08:00
Gleb Smirnoff
8840ae2288 tcp: don't store VNET in every tcpcb, take it from the inpcbinfo
Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37125
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Hans Petter Selasky
0e391a3197 ktls: Add missing NULL pointer check for TLS RX hardware offload.
The send tag pointer may be NULL when the ktls_reset_receive_tag()
function is invoked. Add check for this.

Reviewed by:	gallatin @
Sponsored by:	NVIDIA Networking
2022-09-06 13:49:23 +02:00
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
Hans Petter Selasky
fe8c78f0d2 ktls: Add full support for TLS RX offloading via network interface.
Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet
header to figure out if an incoming mbuf has been fully offloaded or
not. This information follows the packet stream via the LRO engine, IP
stack and finally to the TCP stack. The TCP stack preserves the mbuf
packet header also when re-assembling packets after packet loss. When
the mbuf goes into the socket buffer the packet header is demoted and
the offload information is transferred to "m_flags" . Later on a
worker thread will analyze the mbuf flags and decide if the mbufs
making up a TLS record indicate a fully-, partially- or not decrypted
TLS record. Based on these three cases the worker thread will either
pass the packet on as-is or recrypt the decrypted bits, if any, or
decrypt the packet as usual.

During packet loss the kernel TLS code will call back into the network
driver using the send tag, informing about the TCP starting sequence
number of every TLS record that is not fully decrypted by the network
interface. The network interface then stores this information in a
compressed table and starts asking the hardware if it has found a
valid TLS header in the TCP data payload. If the hardware has found a
valid TLS header and the referred TLS header is at a valid TCP
sequence number according to the TCP sequence numbers provided by the
kernel TLS code, the network driver then informs the hardware that it
can resume decryption.

Care has been taken to not merge encrypted and decrypted mbuf chains,
in the LRO engine and when appending mbufs to the socket buffer.

The mbuf's leaf network interface pointer is used to figure out from
which network interface the offloading rule should be allocated. Also
this pointer is used to track route changes.

Currently mbuf send tags are used in both transmit and receive
direction, due to convenience, but may get a new name in the future to
better reflect their usage.

Reviewed by:	jhb@ and gallatin@
Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:58:09 +02:00
Hans Petter Selasky
f0fca64618 ktls: Refer send tag pointer once.
So that the asserts and the actual code see the same values.

Differential revision:	https://reviews.freebsd.org/D32356
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-07 12:57:03 +02:00
Gleb Smirnoff
b46667c63e sockbuf: merge two versions of sbcreatecontrol() into one
No functional change.
2022-05-17 10:10:42 -07:00
John Baldwin
a4c5d490f6 KTLS: Move OCF function pointers out of ktls_session.
Instead, create a switch structure private to ktls_ocf.c and store a
pointer to the switch in the ocf_session.  This will permit adding an
additional function pointer needed for NIC TLS RX without further
bloating ktls_session.

Reviewed by:	hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D35011
2022-04-22 15:52:12 -07:00
John Baldwin
cd0525f615 ktls: Write-lock the INP when changing a transmit TLS session.
The TCP rate pacing code relies on being able to read this pointer
safely while holding an INP lock.  The initial TLS session pointer is
set while holding the write lock already.

Reviewed by:	gallatin, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D34086
2022-02-11 15:16:25 -08:00
Mark Johnston
5de79eeddb ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode
There was nothing preventing one from sending an empty fragment on an
arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this
could not happen.  Though the transmit path handles this case for TLS
1.0 with AES-CBC, we should be strict and allow empty fragments only in
modes where it is explicitly allowed.

Modify sosend_generic() to reject writes to a KTLS-enabled socket if the
number of data bytes is zero, so that userspace cannot trigger the
aforementioned assertion.

Add regression tests to exercise this case.

Reported by:	syzkaller
Reviewed by:	gallatin, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34195
2022-02-08 12:40:41 -05:00
John Baldwin
d958bc7963 ktls: Try to enable TOE TLS after marking existing data not ready.
At the moment this is mostly a no-op but in the future there will be
in-flight encrypted data which requires software decryption.  This
same setup is also needed for NIC TLS RX.

Note that this does break TOE TLS RX for AES-CBC ciphers since there
is no software fallback for AES-CBC receive.  This will be resolved
one way or another before 14.0 is released.

Reviewed by:	hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34082
2022-01-31 16:39:21 -08:00
Hans Petter Selasky
9e2cce7e6a Implement a function to get the next TCP- and TLS- receive sequence number.
This function will be used by coming TLS hardware receive offload support.

Differential Revision:	https://reviews.freebsd.org/D32356
Discussed with:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-01-26 12:55:00 +01:00
Mark Johnston
6be8944d96 ktls: Zero out TLS_GET_RECORD control messages
Otherwise we end up copying one uninitialized byte into the socket
buffer.

Reported by:	KMSAN
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33953
2022-01-20 15:42:46 -05:00
John Baldwin
05a1d0f5d7 ktls: Support for TLS 1.3 receive offload.
Note that support for TLS 1.3 receive offload in OpenSSL is still an
open pull request in active development.  However, potential changes
to that pull request should not affect the kernel interface.

Reviewed by:	hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33007
2021-12-14 11:01:05 -08:00
Mateusz Guzik
a90b85dd5a ktls: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-14 14:44:37 +00:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
John Baldwin
900a28fe33 ktls: Reject some invalid cipher suites.
- Reject AES-CBC cipher suites for TLS 1.0 and TLS 1.1 using auth
  algorithms other than SHA1-HMAC.

- Reject AES-GCM cipher suites for TLS versions older than 1.2.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32842
2021-11-15 11:30:12 -08:00
John Baldwin
e3ba94d4f3 Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference.  Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.

Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.

The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket.  If that shortcut
fails, it locks the socket and calls sorele_locked().

Reviewed by:	kib, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32741
2021-11-09 10:50:12 -08:00
John Baldwin
96668a81ae ktls: Always create a software backend for receive sessions.
A future change to TOE TLS will require a software fallback for the
first few TLS records received.  Future support for NIC TLS on receive
will also require a software fallback for certain cases.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32566
2021-10-21 09:37:17 -07:00
John Baldwin
c57dbec69a ktls: Add a routine to query information in a receive socket buffer.
In particular, ktls_pending_rx_info() determines which TLS record is
at the end of the current receive socket buffer (including
not-yet-decrypted data) along with how much data in that TLS record is
not yet present in the socket buffer.

This is useful for future changes to support NIC TLS receive offload
and enhancements to TOE TLS receive offload.  Those use cases need a
way to synchronize a state machine on the NIC with the TLS record
boundaries in the TCP stream.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32564
2021-10-21 09:36:29 -07:00
Mark Johnston
84c3922243 Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32006
2021-10-19 21:22:56 -04:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
John Baldwin
a72ee35564 ktls: Defer creation of threads and zones until first use.
Run ktls_init() when the first KTLS session is created rather than
unconditionally during boot.  This avoids creating unused threads and
allocating unused resources on systems which do not use KTLS.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32487
2021-10-14 15:48:34 -07:00
John Baldwin
9f03d2c001 ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record.  As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.

If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order.  For TLS 1.1
and later this is fine, but this can break for TLS 1.0.

To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.

- In ktls_enqueue(), check if a TLS record being queued is the next
  record expected for a TLS 1.0 session.  If not, it is placed in
  sorted order in the pending_records queue in the TLS session.

  If it is the next expected record, queue it for SW encryption like
  normal.  In addition, check if this new record (really a potential
  batch of records) was holding up any previously queued records in
  the pending_records queue.  Any of those records that are now in
  order are also placed on the queue for SW encryption.

- In ktls_destroy(), free any TLS records on the pending_records
  queue.  These mbufs are marked M_NOTREADY so were not freed when the
  socket buffer was purged in sbdestroy().  Instead, they must be
  freed explicitly.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32381
2021-10-13 12:30:15 -07:00
John Baldwin
a63752cce6 ktls: Reject attempts to enable AES-CBC with TLS 1.3.
AES-CBC cipher suites are not supported in TLS 1.3.

Reported by:	syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com
Reviewed by:	tuexen, markj
Differential Revision:	https://reviews.freebsd.org/D32404
2021-10-13 12:12:58 -07:00
Mark Johnston
bf25678226 ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden.  Fix this by using a separate output
parameter.  Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31977
2021-09-17 14:19:05 -04:00
John Baldwin
c782ea8bb5 Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'.  A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions.  Now, those type-specific functions are saved
in the switch structure and invoked directly.  In addition, this more
gracefully permits multiple implementations of the same tag within a
driver.  In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by:	gallatin, hselasky, kib (older version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31572
2021-09-14 11:43:41 -07:00
Mark Johnston
f94acf52a4 socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31657
2021-09-07 15:06:48 -04:00
John Baldwin
470e851c4b ktls: Support asynchronous dispatch of AEAD ciphers.
KTLS OCF support was originally targeted at software backends that
used host CPU cycles to encrypt TLS records.  As a result, each KTLS
worker thread queued a single TLS record at a time and waited for it
to be encrypted before processing another TLS record.  This works well
for software backends but limits throughput on OCF drivers for
coprocessors that support asynchronous operation such as qat(4) or
ccr(4).  This change uses an alternate function (ktls_encrypt_async)
when encrypt TLS records via a coprocessor.  This function queues TLS
records for encryption and returns.  It defers the work done after a
TLS record has been encrypted (such as marking the mbufs ready) to a
callback invoked asynchronously by the coprocessor driver when a
record has been encrypted.

- Add a struct ktls_ocf_state that holds the per-request state stored
  on the stack for synchronous requests.  Asynchronous requests malloc
  this structure while synchronous requests continue to allocate this
  structure on the stack.

- Add a ktls_encrypt_async() variant of ktls_encrypt() which does not
  perform request completion after dispatching a request to OCF.
  Instead, the ktls_ocf backends invoke ktls_encrypt_cb() when a TLS
  record request completes for an asynchronous request.

- Flag AEAD software TLS sessions as async if the backend driver
  selected by OCF is an async driver.

- Pull code to create and dispatch an OCF request out of
  ktls_encrypt() into a new ktls_encrypt_one() function used by both
  ktls_encrypt() and ktls_encrypt_async().

- Pull code to "finish" the VM page shuffling for a file-backed TLS
  record into a helper function ktls_finish_noanon() used by both
  ktls_encrypt() and ktls_encrypt_cb().

Reviewed by:	markj
Tested on:	ccr(4) (jhb), qat(4) (markj)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D31665
2021-08-30 13:11:52 -07:00
John Baldwin
d16cb228c1 ktls: Fix accounting for TLS 1.0 empty fragments.
TLS 1.0 empty fragment mbufs have no payload and thus m_epg_npgs is
zero.  However, these mbufs need to occupy a "unit" of space for the
purposes of M_NOTREADY tracking similar to regular mbufs.  Previously
this was done for the page count returned from ktls_frame() and passed
to ktls_enqueue() as well as the page count passed to pru_ready().

However, sbready() and mb_free_notready() only use m_epg_nrdy to
determine the number of "units" of space in an M_EXT mbuf, so when a
TLS 1.0 fragment was marked ready it would mark one unit of the next
mbuf in the socket buffer as ready as well.  To fix, set m_epg_nrdy to
1 for empty fragments.  This actually simplifies the code as now only
ktls_frame() has to handle TLS 1.0 fragments explicitly and the rest
of the KTLS functions can just use m_epg_nrdy.

Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D31536
2021-08-16 10:42:46 -07:00
Andrew Gallatin
95c51fafa4 ktls: Init reset tag task for cloned sessions
When cloning a ktls session (which is needed when we need to
switch output NICs for a NIC TLS session), we need to also
init the reset task, like we do when creating a new tls session.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-11 14:06:43 -04:00
Andrew Gallatin
09066b9866 ktls: Use the new PNOLOCK flag
Use the new PNOLOCK flag to tsleep() to indicate that
we are managing potential races, and don't need to
sleep with a lock, or have a backstop timeout.

Reviewed by: jhb
Sponsored by: Netflix
2021-08-05 17:19:12 -04:00
Andrew Gallatin
2694c869ff ktls: fix a panic with INVARIANTS
98215005b7 introduced a new
thread that uses tsleep(..0) to sleep forever.  This hit
an assert due to sleeping with a 0 timeout.

So spell "forever" using SBT_MAX instead, which does not
trigger the assert.

Pointy hat to: gallatin
Pointed out by: emaste
Sponsored by: Netflix
2021-08-05 13:09:06 -04:00
Andrew Gallatin
98215005b7 ktls: start a thread to keep the 16k ktls buffer zone populated
Ktls recently received an optimization where we allocate 16k
physically contiguous crypto destination buffers. This provides a
large (more than 5%) reduction in CPU use in our
workload. However, after several days of uptime, the performance
benefit disappears because we have frequent allocation failures
from the ktls buffer zone.

It turns out that when load drops off, the ktls buffer zone is
trimmed, and some 16k buffers are freed back to the OS. When load
picks back up again, re-allocating those 16k buffers fails after
some number of days of uptime because physical memory has become
fragmented. This causes allocations to fail, because they are
intentionally done without M_NORECLAIM, so as to avoid pausing
the ktls crytpo work thread while the VM system defragments
memory.

To work around this, this change starts one thread per VM domain
to allocate ktls buffers with M_NORECLAIM, as we don't care if
this thread is paused while memory is defragged. The thread then
frees the buffers back into the ktls buffer zone, thus allowing
future allocations to succeed.

Note that waking up the thread is intentionally racy, but neither
of the races really matter. In the worst case, we could have
either spurious wakeups or we could have to wait 1 second until
the next rate-limited allocation failure to wake up the thread.

This patch has been in use at Netflix on a handful of servers,
and seems to fix the issue.

Differential Revision: https://reviews.freebsd.org/D31260
Reviewed by: jhb, markj,  (jtl, rrs, and dhw reviewed earlier version)
Sponsored by: Netflix
2021-08-05 10:19:12 -04:00
Andrew Gallatin
4150a5a87e ktls: fix NOINET build
Reported by: mjguzik
Sponsored by: Netflix
2021-07-07 10:40:02 -04:00