Commit Graph

164 Commits

Author SHA1 Message Date
Justin Hibbits
6472761966 IfAPI: use IfAPI in mbuf
Sponsored by:	Juniper Networks, Inc.
2023-02-06 12:32:04 -05:00
Justin Hibbits
1e6131bad6 IfAPI: Add needed APIs for mbuf support
Summary:
Add 2 new APIs for supporting recent mbuf changes:
* 36e0a362ac added the m_snd_tag_alloc() wrapper around
  if_snd_tag_alloc().  Push this down to the ifnet level.
* 4d7a1361ef adds the m_rcvif_serialize()/m_rcvif_restore() KPIs to
  serialize and restore an ifnet pointer.  Add the necessary wrapper to
  get the index generation for this.

Reviewed By:	jhb
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38340
2023-02-06 12:32:04 -05:00
Zhenlei Huang
440217b0af debugnet: Fix parameter order in the calls to m_get()
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36643
2022-09-21 06:55:20 -04:00
Brooks Davis
840327e5dd mbuf: Don't support PAGE_SIZE < 4K
The Vax supported such things, but FreeBSD does not.  This further
implies that MJUMPAGESIZE > MCLBYTES so assert this and remove code
handling them being equal.

Reviewed by:	kp, imp, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D36320
2022-08-24 18:34:07 +01:00
Gleb Smirnoff
81a34d374e protosw: retire pr_drain and use EVENTHANDLER(9) directly
The method was called for two different conditions: 1) the VM layer is
low on pages or 2) one of UMA zones of mbuf allocator exhausted.
This change 2) into a new event handler, but all affected network
subsystems modified to subscribe to both, so this change shall not
bring functional changes under different low memory situations.

There were three subsystems still using pr_drain: TCP, SCTP and frag6.
The latter had its protosw entry for the only reason to register its
pr_drain method.

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36164
2022-08-17 11:50:31 -07:00
Hans Petter Selasky
4d88d81c31 mbuf(9): Implement a leaf network interface field in the mbuf packet header.
When packets are received they may traverse several network interfaces like
vlan(4) and lagg(9). When doing receive side offloads it is important to
know the first network interface entry point, because that is where all
offloading is taking place. This makes it possible to track receive
side route changes for multiport setups, for example when lagg(9) receives
traffic from more than one port. This avoids having to install multiple
offloading rules for the same stream.

This field works similar to the existing "rcvif" mbuf packet header field.

Submitted by:	jhb@
Reviewed by:	gallatin@ and gnn@
Differential revision:	https://reviews.freebsd.org/D35339
Sponsored by:	NVIDIA Networking
Sponsored by:	Netflix
2022-06-07 12:54:42 +02:00
Kristof Provost
613acc6483 mbuf: do not restore dying interfaces
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.

However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.

This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).

Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34076

(cherry picked from commit 703e533da5)
2022-05-05 14:38:08 -04:00
Gleb Smirnoff
4d7a1361ef ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918

(cherry picked from commit e1882428dc)
2022-05-05 14:38:07 -04:00
Marko Zec
6c741ffbfa Revert "mbuf: do not restore dying interfaces"
This reverts commit 703e533da5.

Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"

This reverts commit e1882428dc.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:40 +02:00
Kristof Provost
703e533da5 mbuf: do not restore dying interfaces
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.

However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.

This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).

Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34076
2022-01-28 23:09:08 +01:00
Gleb Smirnoff
e1882428dc ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918
2022-01-26 21:58:50 -08:00
Alexander Motin
c6c52d8e39 kern: Remove CTLFLAG_NEEDGIANT from some more sysctls.
MFC after:	2 weeks
2021-12-26 23:07:33 -05:00
Mateusz Guzik
0a048d4a98 mbuf: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:59:11 +00:00
Kristof Provost
b6cbbcae40 m_get3(): actually use the selected zone
Reported by:	markj
2021-11-17 03:09:20 +01:00
Mark Johnston
32854e528a mbuf: Properly set the default value for mb_use_ext_pgs
Reported by:	Jenkins
Fixes:	fcaa890c44 ("mbuf: Only allow extpg mbufs if the system has a direct map")
Pointy hat:	markj
2021-11-16 16:23:11 -05:00
Mark Johnston
fcaa890c44 mbuf: Only allow extpg mbufs if the system has a direct map
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs.  This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.

I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization.  If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.

So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map.  Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.

Reviewed by:	jhb
Discussed with:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32940
2021-11-16 13:31:04 -05:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
John Baldwin
c782ea8bb5 Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'.  A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions.  Now, those type-specific functions are saved
in the switch structure and invoked directly.  In addition, this more
gracefully permits multiple implementations of the same tag within a
driver.  In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by:	gallatin, hselasky, kib (older version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31572
2021-09-14 11:43:41 -07:00
Kristof Provost
a051ca72e2 Introduce m_get3()
Introduce m_get3() which is similar to m_get2(), but can allocate up to
MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE).

This simplifies the bpf improvement in f13da24715.

Suggested by:	glebius
Differential Revision:	https://reviews.freebsd.org/D31455
2021-08-18 08:48:27 +02:00
Mark Johnston
100949103a uma: Add KMSAN hooks
For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO).  Some zones are exempted from
this when it would otherwise raise false positives.

Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations.  This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report.  For example:
  panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe

Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:54 -04:00
Mateusz Guzik
0a718a6e6e mbuf: replace all direct uma_zfree(zone_mbuf) calls with m_free_raw
Reviewed by:	donner
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31082
2021-07-07 11:05:46 +00:00
Mateusz Guzik
05462babd4 mbuf: add m_free_raw to be used instead of directly calling uma_zfree
The intent is to remove all direct zone_mbuf consumers so that ctor/dtor
from that zone can be reimplemented as wrappers around uma, avoiding an
indirect function call.

Reviewed by:	kbowling
Discussed with:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30959
2021-07-02 08:30:22 +00:00
Mateusz Guzik
fb32c8dbeb iflib: retire MB_DTOR_SKIP
The flag was added in 2016 but remains unused.

Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30958
2021-07-02 08:30:22 +00:00
Andrew Gallatin
52cd25eb1a mbuf: enable ext_pgs ("unmapped") mbufs by default
Ext_pg mbufs allow carrying multiple pages per mbuf. This
reduces mbuf linked list traversals, especially in socket
buffers, thereby reducing cache misses and CPU use for
applications using sendfile.  Note that ext_pages use
unmapped pages, eliminating KVA mapping costs on 32-bit
platforms.

Ext_pg mbufs are also required for ktls (KERN_TLS), and having
them disabled by default is a stumbling block for those
wishing to enable ktls.

Reviewed-by:	jhb, glebius
Sponsored by:	Netfix
2021-01-08 13:43:30 -05:00
John Baldwin
36e0a362ac Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().
This gives a more uniform API for send tag life cycle management.

Reviewed by:	gallatin, hselasky
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27000
2020-10-29 23:28:39 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Andrew Gallatin
796d4eb89e make m_getm2() resilient to zone_jumbop exhaustion
When the zone_jumbop is exhausted, most things using
using sosend* (like sshd)  will eventually
fail or hang if allocations are limited to the
depleted jumbop zone.  This makes it imossible to
communicate with a box which is under an attach which
exhausts the jumbop zone.

Rather than depending on the page size zone, also try cluster
allocations to satisfy larger requests.  This allows me
to ssh to, and serve 100Gb/s of traffic from a server which
under attack and has had its page-sized zone exhausted.

Reviewed by:	glebius, markj, rmacklem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26150
2020-08-31 13:53:14 +00:00
Andrey V. Elsukov
edde7a538b Add m__getjcl SDT probe.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-08-05 11:39:09 +00:00
Jeff Roberson
03270b59ee Use zone nomenclature that is consistent with UMA. 2020-06-21 04:59:02 +00:00
Rick Macklem
84d746de21 Add two functions that create M_EXTPG mbufs with anonymous pages.
These two functions are needed by nfs-over-tls, but could also be
useful for other purposes.
mb_alloc_ext_plus_pages() - Allocates a M_EXTPG mbuf and enough anonymous
      pages to store "len" data bytes.
mb_mapped_to_unmapped() - Copies the data from a list of mapped (non-M_EXTPG)
      mbufs into a list of M_EXTPG mbufs allocated with anonymous pages.
      This is roughly the inverse of mb_unmapped_to_ext().

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D25182
2020-06-10 02:51:39 +00:00
Gleb Smirnoff
61664ee700 Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:37:16 +00:00
Gleb Smirnoff
365e8da44a Mechanically rename MBUF_EXT_PGS_ASSERT() to M_ASSERTEXTPG() to match
classical M_ASSERTPKTHDR.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:27:41 +00:00
Gleb Smirnoff
6edfd179c8 Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:21:11 +00:00
Gleb Smirnoff
7b6c99d08d Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
Gleb Smirnoff
bccf6e26e9 Step 2.5: Stop using 'struct mbuf_ext_pgs' in the kernel itself.
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:08:05 +00:00
Gleb Smirnoff
b363a438b1 Make MBUF_EXT_PGS_ASSERT_SANITY() a macro, so that it prints file:line.
While here, stop using struct mbuf_ext_pgs.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:03:39 +00:00
Gleb Smirnoff
c4ee38f8e8 Step 2.3: Rename mbuf_ext_pg_len() to m_epg_pagelen() that
uses mbuf argument.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:52:35 +00:00
Gleb Smirnoff
d90fe9d0cd Step 2.1: Build TLS workqueue from mbufs, not struct mbuf_ext_pgs.
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 23:38:13 +00:00
Gleb Smirnoff
4c9f0f982f In mb_unmapped_compress() we don't need mbuf structure to keep data,
but we need buffer of MLEN bytes.  This isn't just a simplification,
but important fixup, because previous commit shrinked sizeof(struct
mbuf) down below MSIZE, and instantiating an mbuf on stack no longer
provides enough data.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:44:23 +00:00
Gleb Smirnoff
0c1032665c Continuation of multi page mbuf redesign from r359919.
The following series of patches addresses three things:

Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.

Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.

The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.

Step 1 of 4:

 o Anonymize mbuf_ext_pgs_data, embed in m_ext
 o Embed mbuf_ext_pgs
 o Start documenting all this entanglement

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-02 22:39:26 +00:00
Andrew Gallatin
23feb56348 KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Ryan Libby
10c8fb47d9 uma: convert mbuf_jumbo_alloc to UMA_ZONE_CONTIG & tag others
Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc).  Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23238
2020-02-04 22:40:23 +00:00
Gleb Smirnoff
0017b2adac Couple protocol drain routines (frag6_drain and sctp_drain) may send
packets.  An unexpected behaviour for memory reclamation routine.
Anyway, we need enter the network epoch for doing that.
2020-02-03 20:48:57 +00:00
Jeff Roberson
727c691857 Use a separate lock for the zone and keg. This provides concurrency
between populating buckets from the slab layer and fetching full buckets
from the zone layer.  Eliminate some nonsense locking patterns where
we lock to fetch a single variable.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22828
2020-01-04 03:15:34 +00:00
Ryan Libby
30be9685a3 mbuf zones: take out the trash
The mbuf zones were explicitly specifying the uma trash procedures on
zcreate, conditionally on INVARIANTS, because that used to be necessary
in order to get use-after-free checking for uma zones with non-empty
constructors or destructors.  After r355137 uma automatically invokes
the trash constructor and destructor as long as no init and fini are
specified.  This now allows the mbuf zones to pass their constructors
and destructors without needing to add on the uma trash procedures
conditionally.

Reviewed by:	cem, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22583
2019-12-04 18:21:29 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Andrew Gallatin
b2dba6634b kTLS: Fix a bug where we would not encrypt anon data inplace.
Software Kernel TLS needs to allocate a new destination crypto
buffer when encrypting data from the page cache, so as to avoid
overwriting shared clear-text file data with encrypted data
specific to a single socket. When the data is anonymous, eg, not
tied to a file, then we can encrypt in place and avoid allocating
a new page. This fixes a bug where the existing code always
assumes the data is private, and never encrypts in place. This
results in unneeded page allocations and potentially more memory
bandwidth consumption when doing socket writes.

When the code was written at Netflix, ktls_encrypt() looked at
private sendfile flags to determine if the pages being encrypted
where part of the page cache (coming from sendfile) or
anonymous (coming from sosend). This was broken internally at
Netflix when the sendfile flags were made private, and the
M_WRITABLE() check was added. Unfortunately, M_WRITABLE() will
always be false for M_NOMAP mbufs, since one cannot just mtod()
them.

This change introduces a new flags field to the mbuf_ext_pgs
struct by stealing a byte from the tls hdr. Note that the current
header is still 2 bytes larger than the largest header we
support: AES-CBC with explicit IV. We set MBUF_PEXT_FLAG_ANON
when creating an unmapped mbuf in m_uiotombuf_nomap() (which is
the path that socket writes take), and we check for that flag in
ktls_encrypt() when looking for anon pages.

Reviewed by:	jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21796
2019-09-27 20:08:19 +00:00
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00