Commit Graph

236 Commits

Author SHA1 Message Date
John Baldwin
a9f0cf4838 cxgbe: Fix some merge-o's for the per-rxq iSCSI counters.
I botched a few of the changes when rebasing the changes in
4b6ed0758d across the changes in
43bbae1948.

- Move the counter allocations into alloc_ofld_rxq().

- Free the counters freeing an ofld rxq.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D30267
2021-05-19 15:56:31 -07:00
John Baldwin
4b6ed0758d cxgbe: Make the TOE ISCSI RX stats per-queue instead of per adapter.
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29903
2021-05-14 12:16:33 -07:00
Navdeep Parhar
b9820bca18 cxgbe(4): Do not panic when tx is called with invalid checksum requests.
There is no need to panic in if_transmit if the checksums requested are
inconsistent with the frame being transmitted.  This typically indicates
that the kernel and driver were built with different INET/INET6 options,
or there is some other kernel bug.  The driver should just throw away
the requests that it doesn't understand and move on.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-04-28 14:04:53 -07:00
Navdeep Parhar
43bbae1948 cxgbe(4): Separate the sw- and hw-specific parts of resource allocations
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC.  This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.

This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):

/* Idempotent */
alloc_foo
{
	if (!SW_ALLOCATED)
		init_iq/init_eq/init_fl		no-fail sw init
		alloc_iq_fl/alloc_eq/alloc_wrq	may-fail sw alloc
		add_foo_sysctls, etc.		no-fail post-alloc items
	if (!HW_ALLOCATED)
		alloc_iq_fl_hwq/alloc_eq_hwq	hw resource allocation
}

/* Idempotent */
free_foo
{
	if (!HW_ALLOCATED)
		free_iq_fl_hwq/free_eq_hwq	release hw resources
	if (!SW_ALLOCATED)
		free_iq_fl/free_eq/free_wrq	release sw resources
}

The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent.  The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2021-04-26 14:09:59 -07:00
Navdeep Parhar
d107ee06f3 cxgbe(4): RSS hash for VXLAN traffic is computed from the inner frame.
Sponsored by:	Chelsio Communications
2021-04-13 16:50:12 -07:00
John Baldwin
568e69e4eb cxgbe: Add counters for iSCSI PDUs transmitted via TOE.
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29297
2021-04-12 13:57:45 -07:00
John Baldwin
fe496dc02a cxgbe: Make the TOE TLS stats per-queue instead of per-port.
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29383
2021-03-26 15:19:58 -07:00
John Baldwin
077ba6a845 cxgbe: Add a struct sge_ofld_txq type.
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues.  Currently it only holds a work queue but will
include additional state in future changes.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D29382
2021-03-26 15:19:58 -07:00
Navdeep Parhar
15f3355567 cxgbe(4): Allow a T6 adapter to switch between TOE and NIC TLS mode.
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload.  With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps.  hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.

* Enable nic_ktls_ofld in the default configuration file and use the
  firmware instead of direct register manipulation to apply/rollback
  NIC TLS configuration.  This allows the driver to switch the hardware
  between TOE and NIC TLS mode in a safe manner.  Note that the
  configuration is adapter-wide and not per-port.

* Remove the kern_tls config file as it works with 100G T6 cards only
  and leads to firmware crashes with 25G cards.  The configurations
  included with the driver (with the exception of the FPGA configs) are
  supposed to work with all adapters.

Reported by:	Veeresh U.K. at Chelsio
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Reviewed by:	jhb@
Differential Revision: https://reviews.freebsd.org/D29291
2021-03-25 12:39:41 -07:00
Navdeep Parhar
473f6163e3 cxgbe(4): use standard sysctl routines to deal with 16b values.
These routines to handle 8b and 16b types were added in r289773 5+ years
ago.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2021-03-19 10:56:24 -07:00
Navdeep Parhar
fae028dd97 cxgbe(4): Break up t4_read_chip_settings.
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-02-18 01:22:42 -08:00
Alexander Motin
294e62bebf cxgbe(4): Save proper zone index on low memory in refill_fl().
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones.  But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx.  It caused problems
with the later use of that cluster, including memory and/or data
corruption.

While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Reviewed by:	np, jhb
Differential Revision:	https://reviews.freebsd.org/D28716
2021-02-16 21:15:28 -05:00
Navdeep Parhar
3447df8bc5 cxgbe(4): Fixes to tx coalescing.
- The behavior implemented in r362905 resulted in delayed transmission
  of packets in some cases, causing performance issues.  Use a different
  heuristic to predict tx requests.

- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
  entirely.  It can be changed at any time.  There is no change in
  default behavior.
2021-02-01 03:00:09 -08:00
Navdeep Parhar
8eba75ed68 cxgbe(4): Stop but don't free netmap queues when netmap is switched off.
It is common for freelists to be starving when a netmap application
stops.  Mailbox commands to free queues can hang in such a situation.
Avoid that by not freeing the queues when netmap is switched off.
Instead, use an alternate method to stop the queues without releasing
the context ids.  If netmap is enabled again later then the same queue
is reinitialized for use.  Move alloc_nm_rxq and txq to t4_netmap.c
while here.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-12-03 08:30:29 +00:00
Navdeep Parhar
b20b25e744 cxgbe(4): fix the size of the iq/eq maps.
The firmware can allocate ingress and egress context ids anywhere from
its configured range.  Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.

Reported by:	Baptiste Wicht @ Verisign
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-22 08:40:25 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Navdeep Parhar
8741306b3b cxgbe(4) sysctls do not need Giant.
Sponsored by:	Chelsio Communications
2020-10-05 22:18:04 +00:00
Navdeep Parhar
7676c62aa3 cxgbe(4): validate largest_rx_cluster and safest_rx_cluster.
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page.  Anything else could lead to a
panic on interface up.

Reported by:	mav@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-02 05:59:55 +00:00
Navdeep Parhar
30e3f2b4ea cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC.  Fix the GL limits for VM work requests
while here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-22 04:16:40 +00:00
Navdeep Parhar
7054f6ec97 cxgbe(4): add counters for mbuf pullups and defrags.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-22 03:06:36 +00:00
Navdeep Parhar
a4a4ad2dd9 cxgbe(4): add support for stateless offloads for VXLAN traffic.
Hardware assistance includes checksumming (tx and rx), TSO, and RSS on
the inner traffic in a VXLAN tunnel.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2020-09-18 03:01:47 +00:00
Navdeep Parhar
565b8fce23 cxgbe(4): Check for descriptors before writing a TLS or raw work request.
This fixes a regression in r362905.

Submitted by:	jhb@
Sponsored by:	Chelsio Communications
2020-08-31 22:44:59 +00:00
Navdeep Parhar
6a59b9940e cxgbe(4): Use large clusters for TOE rx queues when TOE+TLS is enabled.
Rx is more efficient within the chip when the receive buffer size
matches the TLS PDU size.

MFC after:	3 days
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26127
2020-08-23 04:16:20 +00:00
Navdeep Parhar
800535c2ca cxgbev(4): Compare at most 16 bytes of the Ethernet header when trying
to coalesce tx work requests.

Note that Coverity will still treat this as an out-of-bounds access.  We
do want to compare 16B starting from ethmacdst but cmp_l2hdr was was
going beyond that by 2B.

cmp_l2hdr was introduced in r362905.

Reported by:	Coverity (CID 1430284)
Sponsored by:	Chelsio Communications
2020-07-13 19:15:29 +00:00
Navdeep Parhar
3bbb68f0e3 cxgbe(4): Fix a bug (introduced in r362905) where some tx traffic wasn't
being reported to BPF.
2020-07-05 05:14:33 +00:00
Navdeep Parhar
d735920d33 cxgbe(4): changes in the Tx path to help increase tx coalescing.
- Ask the firmware for the number of frames that can be stuffed in one
  work request.

- Modify mp_ring to increase the likelihood of tx coalescing when there
  are just one or two threads that are doing most of the tx.  Add teeth
  to the abdication mechanism by pushing the consumer lock into mp_ring.
  This reduces the likelihood that a consumer will get stuck with all
  the work even though it is above its budget.

- Add support for coalesced tx WR to the VF driver.  This, with the
  changes above, results in a 7x improvement in the tx pps of the VF
  driver for some common cases.  The firmware vets the L2 headers
  submitted by the VF driver and it's a big win if the checks are
  performed for a batch of packets and not each one individually.

Reviewed by:	jhb@
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25454
2020-07-03 04:44:23 +00:00
Navdeep Parhar
7c228be30b cxgbe(4): Add a pointer to the adapter softc in vi_info.
There were quite a few places where port_info was being accessed only to
get to the adapter.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25432
2020-06-25 17:04:22 +00:00
Navdeep Parhar
0cadedfc46 cxgbe(4): Add a tx_len16_to_desc helper.
No functional change.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-06-23 07:33:29 +00:00
Gleb Smirnoff
365e8da44a Mechanically rename MBUF_EXT_PGS_ASSERT() to M_ASSERTEXTPG() to match
classical M_ASSERTPKTHDR.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:27:41 +00:00
Gleb Smirnoff
6edfd179c8 Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:21:11 +00:00
Gleb Smirnoff
7b6c99d08d Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
Gleb Smirnoff
c4ee38f8e8 Step 2.3: Rename mbuf_ext_pg_len() to m_epg_pagelen() that
uses mbuf argument.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:52:35 +00:00
Gleb Smirnoff
49b6b60e22 Step 2.2:
o Shrink sglist(9) functions to work with multipage mbufs down from
  four functions to two.
o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf.
o Rename to something matching _epg.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:46:29 +00:00
Gleb Smirnoff
0c1032665c Continuation of multi page mbuf redesign from r359919.
The following series of patches addresses three things:

Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.

Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.

The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.

Step 1 of 4:

 o Anonymize mbuf_ext_pgs_data, embed in m_ext
 o Embed mbuf_ext_pgs
 o Start documenting all this entanglement

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:39:26 +00:00
Andrew Gallatin
23feb56348 KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
Navdeep Parhar
aa301e5ffe cxgbe(4): Split sge_nm_rxq into three cachelines.
This reduces the lines bouncing around between the driver rx ithread and
the netmap rxsync thread.  There is no net change in the size of the
struct (it continues to waste a lot of space).

This kind of split was originally proposed in D17869 by Marc De La
Gueronniere @ Verisign, Inc.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-03-20 05:12:16 +00:00
Navdeep Parhar
2b9010f070 cxgbe(4): Do not try to use 0 as an rx buffer address when the driver is
already allocating from the safe zone and the allocation fails.

This bug was introduced in r357481.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-03-10 21:44:20 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
John Baldwin
ca3b3c573e Remove the per-TXQ tls_wrs stat.
It duplicated the kern_tls_records stat and was not conditional on NIC
TLS being enabled.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23670
2020-02-13 22:55:45 +00:00
Navdeep Parhar
87bbb3338e cxgbe(4): Add pfil(9) hooks to the driver's rx.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-04 01:09:02 +00:00
Navdeep Parhar
1486d2de9e cxgbe(4): Treat NIC rx as special and run its handler directly and not
via the t4_cpl_handler dispatch table.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-04 01:01:35 +00:00
Navdeep Parhar
46e1e307ed cxgbe(4): Retire the allow_mbufs_in_cluster optimization.
This simplifies the driver's rx fast path as well as the bookkeeping
code that tracks various rx buffer sizes and layouts.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-04 00:51:10 +00:00
Navdeep Parhar
d6f79b2710 cxgbe(4): Avoid ext_arg2 in rxb_free.
ext_arg2 is the only item in the third cacheline in an mbuf and could be
cold by the time rxb_free runs.  Put the information needed by rxb_free
in the same line as the refcount, which is very likely to be hot given
that rxb_free runs when the refcount is decremented and reaches 0.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:50:29 +00:00
Navdeep Parhar
44c6fea82b cxgbe(4): Do not use pack boundary > 512B unless it is explicitly
requested.

This is a tradeoff between PCIe efficiency during large packet rx and
packing efficiency during small packet rx.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:30:39 +00:00
Navdeep Parhar
a9c4062a9a cxgbe(4): Initialize the rx buffer's metadata on first-use and not on
allocation.

refill_fl doesn't touch any part of a freshly allocated cluster after
this change.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:25:12 +00:00
Navdeep Parhar
9087a3df60 cxgbe(4): Only checksummed TCP should be considered for LRO.
This avoids the per-packet nanouptime in tcp_lro_rx for traffic that's
not even TCP.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-02-03 23:06:42 +00:00
Gleb Smirnoff
e9edde4110 Fix a typo - passing wrong mbuf pointer to needs_udp_csum(). Will
trigger panic only on a kernel with RATELIMIT.

Submitted by:	rrs
2020-01-07 21:29:42 +00:00
Navdeep Parhar
c0236bd93d cxgbe(4): Use the _XT variant of the CPL used to transmit NIC traffic.
CPL_TX_PKT_XT disables the internal parser on the chip and instead
relies on the driver to provide the exact length of the L2 and L3
headers.  This allows hw checksumming and TSO to be used with L2 and
L3 encapsulations that the chip doesn't understand directly.

Note that netmap tx still uses the old CPL as it never uses the hw
to generate the checksum on tx.

Reviewed by:	jhb@
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22788
2019-12-13 20:38:58 +00:00
Navdeep Parhar
aa7bdbc00c cxgbe(4): Use TX_PKTS2 work requests in netmap Tx if it's available.
TX_PKTS2 is more efficient within the firmware and this improves netmap
Tx by a few Mpps in some common scenarios.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-12-10 08:16:19 +00:00
John Baldwin
bddf73433e NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.

NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine.  Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS.  This permits using the
TOE to segment the encrypted TLS records.  However, this approach does
have some limitations:

1) Regular TOE sockets cannot be used when the TOE is in this special
   mode.  One can use either TOE and TOE-based KTLS or NIC KTLS, but
   not both at the same time.

2) In NIC KTLS mode, the TOE is only able to accept a per-connection
   timestamp offset that varies in the upper 4 bits.  Put another way,
   only connections whose timestamp offset has the 28 lower bits
   cleared can use NIC KTLS and generate correct timestamps.  The
   driver will refuse to enable NIC KTLS on connections with a
   timestamp offset with any of the lower 28 bits set.  To use NIC
   KTLS, users can either disable TCP timestamps by setting the
   net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
   tcp_new_ts_offset() function to clear the lower 28 bits of the
   generated offset.

3) Because the TCP segmentation relies on fields mirrored in a TCB in
   the TOE, not all fields in a TCP packet can be sent in the TCP
   segments generated from a TLS record.  Specifically, for packets
   containing TCP options other than timestamps, the driver will
   inject an "empty" TCP packet holding the requested options (e.g. a
   SACK scoreboard) along with the segments from the TLS record.
   These empty TCP packets are counted by the
   dev.cc.N.txq.M.kern_tls_options sysctls.

Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer.  The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records.  In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.

TCP packets for TLS requests are broken down into the following
classes (with associated counters):

- Mbufs that send an entire TLS record in full do not have any waste
  bytes (dev.cc.N.txq.M.kern_tls_full).

- Mbufs that send a short TLS record that ends before the end of the
  trailer (dev.cc.N.txq.M.kern_tls_short).  For sockets using AES-CBC,
  the encryption must always start at the beginning, so if the mbuf
  starts at an offset into the TLS record, the offset bytes will be
  "waste" bytes.  For sockets using AES-GCM, the encryption can start
  at the 16 byte block before the starting offset capping the waste at
  15 bytes.

- Mbufs that send a partial TLS record that has a non-zero starting
  offset but ends at the end of the trailer
  (dev.cc.N.txq.M.kern_tls_partial).  In order to compute the
  authentication hash stored in the trailer, the entire TLS record
  must be sent as input to the crypto engine, so the bytes before the
  offset are always "waste" bytes.

In addition, other per-txq sysctls are provided:

- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
  using AES-CBC.

- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
  using AES-GCM.

- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
  compensate for the TOE engine not being able to set FIN on the last
  segment of a TLS record if the TLS record mbuf had FIN set.

- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
  txq including full, short, and partial records.

- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
  and payload) sent for TLS record requests.

- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
  record requests.

To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:

hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21962
2019-11-21 19:30:31 +00:00