freebsd-nq/sys/dev/cxgbe
Andrew Gallatin 23feb56348 KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
..
common cxgbe(4): Congestion drops are maintained per E-channel and not per 2020-02-19 00:48:58 +00:00
crypto KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. 2020-04-14 14:46:06 +00:00
cudbg
cxgbei Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) 2020-02-26 14:26:36 +00:00
firmware cxgbe(4): Update T4/5/6 firmwares to 1.24.12.0. 2020-02-12 02:55:06 +00:00
iw_cxgbe Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) 2020-02-26 14:26:36 +00:00
tom KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. 2020-04-14 14:46:06 +00:00
adapter.h Use both crypto engines on a T6. 2020-04-10 22:27:45 +00:00
if_cc.c
if_ccv.c
if_cxl.c
if_cxlv.c
offload.h NIC KTLS for Chelsio T6 adapters. 2019-11-21 19:30:31 +00:00
osdep.h cxgbe(4): Add adapter information to messages logged by the OS-agnostic 2019-01-29 00:49:12 +00:00
t4_clip.c cxgbe(4): Do not display error messages related to the CLIP table if 2020-03-13 00:12:15 +00:00
t4_clip.h Move CLIP table handling out of TOM and into the base driver. 2018-11-29 01:15:53 +00:00
t4_filter.c Always allocate the atid table during attach. 2019-10-22 20:01:47 +00:00
t4_if.m
t4_ioctl.h
t4_iov.c cxgbev(4): Catch up with the pciids in the PF driver. 2019-11-15 18:48:14 +00:00
t4_l2t.c NIC KTLS for Chelsio T6 adapters. 2019-11-21 19:30:31 +00:00
t4_l2t.h NIC KTLS for Chelsio T6 adapters. 2019-11-21 19:30:31 +00:00
t4_main.c Rename TOE TLS stats from [rt]x_tls_* to [rt]x_toe_tls_*. 2020-02-28 00:42:27 +00:00
t4_mp_ring.c
t4_mp_ring.h
t4_netmap.c cxgbe(4): Split sge_nm_rxq into three cachelines. 2020-03-20 05:12:16 +00:00
t4_sched.c cxgbe(4): Use the _XT variant of the CPL used to transmit NIC traffic. 2019-12-13 20:38:58 +00:00
t4_sge.c KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself. 2020-04-14 14:46:06 +00:00
t4_smt.c
t4_smt.h
t4_tracer.c
t4_vf.c cxgbev(4): Catch up with the pciids in the PF driver. 2019-11-15 18:48:14 +00:00