freebsd-skq/sys/netinet
jhb 520aafe3ec Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
2019-06-29 00:48:33 +00:00
..
cc Prevent cwnd to collapse down to 1 MSS after exiting recovery. 2019-05-09 07:11:08 +00:00
khelp
libalias Separate kernel crc32() implementation to its own header (gsb_crc32.h) and 2019-06-17 19:49:08 +00:00
netdump netdump: Buffer pages to avoid calling netdump_send() on each 4KB write. 2019-05-31 18:29:12 +00:00
tcp_stacks Add the ability to limit how much the code will fragment the RACK send map 2019-06-19 13:55:00 +00:00
accf_data.c
accf_dns.c
accf_http.c
icmp6.h Initial implementation of draft-ietf-6man-ipv6only-flag. 2018-10-30 20:08:48 +00:00
icmp_var.h
if_ether.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_ether.h Retire arpresolve_addr(), which is not used anywhere, from if_ether.c. 2018-11-17 16:08:36 +00:00
igmp_var.h Separate list manipulation locking from state change in multicast 2018-05-02 19:36:29 +00:00
igmp.c Mechanical cleanup of epoch(9) usage in network stack. 2019-01-09 01:11:19 +00:00
igmp.h
in_cksum.c
in_debug.c CK: update consumers to use CK macros across the board 2018-05-24 23:21:23 +00:00
in_fib.c Existense of PCB route caching doesn't allow us to use new fast route 2019-05-08 23:39:24 +00:00
in_fib.h Existense of PCB route caching doesn't allow us to use new fast route 2019-05-08 23:39:24 +00:00
in_gif.c Add the check that current VNET is ready and access to srchash is allowed. 2018-10-23 13:11:45 +00:00
in_jail.c Move most of the contents of opt_compat.h to opt_global.h. 2018-04-06 17:35:35 +00:00
in_kdtrace.c Define sctp probes only when SCTP is configured. 2018-09-06 14:15:03 +00:00
in_kdtrace.h Add support for send, receive and state-change DTrace providers for 2018-08-22 21:23:32 +00:00
in_mcast.c Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
in_pcb.c Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
in_pcb.h Track TCP connection's NUMA domain in the inpcb 2019-04-25 15:37:28 +00:00
in_pcbgroup.c Fix PCBGROUPS build post CK conversion of pcbinfo 2018-06-13 23:19:54 +00:00
in_prot.c Move most of the contents of opt_compat.h to opt_global.h. 2018-04-06 17:35:35 +00:00
in_proto.c Remove empty encap_init() function. 2018-05-29 12:32:08 +00:00
in_rmx.c
in_rss.c
in_rss.h
in_systm.h
in_var.h Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
in.c Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code 2019-04-04 19:01:13 +00:00
in.h Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
ip6.h carp: Set DSCP value CS7 2018-07-01 08:37:07 +00:00
ip_carp.c Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
ip_carp.h
ip_divert.c Make second argument of ip_divert(), that specifies packet direction a bool. 2019-03-14 22:23:09 +00:00
ip_divert.h
ip_dummynet.h
ip_ecn.c
ip_ecn.h
ip_encap.c Include <sys/eventhandler.h> to fix the build. 2018-10-21 18:39:34 +00:00
ip_encap.h Add KPI that can be used by tunneling interfaces to handle IP addresses 2018-10-21 17:55:26 +00:00
ip_fastfwd.c New pfil(9) KPI together with newborn pfil API and control utility. 2019-01-31 23:01:03 +00:00
ip_fw.h Add "tcpmss" opcode to match the TCP MSS value. 2019-06-21 10:54:51 +00:00
ip_gre.c Add GRE-in-UDP encapsulation support as defined in RFC8086. 2019-04-24 09:05:45 +00:00
ip_icmp.c Add CTLFLAG_VNET to the net.inet.icmp.tstamprepl definition. 2019-03-26 22:14:50 +00:00
ip_icmp.h
ip_id.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
ip_input.c Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code 2019-04-04 19:01:13 +00:00
ip_mroute.c Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
ip_mroute.h
ip_options.c Mechanical cleanup of epoch(9) usage in network stack. 2019-01-09 01:11:19 +00:00
ip_options.h
ip_output.c Add an external mbuf buffer type that holds multiple unmapped pages. 2019-06-29 00:48:33 +00:00
ip_reass.c Revert r346530 until further. 2019-04-22 19:36:19 +00:00
ip_var.h Convert all IPv4 and IPv6 multicast memberships into using a STAILQ 2019-06-25 11:54:41 +00:00
ip.h carp: Set DSCP value CS7 2018-07-01 08:37:07 +00:00
pim_var.h Rework IP encapsulation handling code. 2018-06-05 20:51:01 +00:00
pim.h
raw_ip.c When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option, 2019-04-13 10:47:47 +00:00
sctp_asconf.c Plug mbuf leak in the SCTP input path in an error case. 2018-09-30 21:54:02 +00:00
sctp_asconf.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_auth.c Mitigate providing a timing signal if the COOKIE or AUTH 2018-10-01 14:05:31 +00:00
sctp_auth.h Remove unused code. 2018-09-18 10:53:07 +00:00
sctp_bsd_addr.c Mechanical cleanup of epoch(9) usage in network stack. 2019-01-09 01:11:19 +00:00
sctp_bsd_addr.h Revert https://svnweb.freebsd.org/changeset/base/336503 2018-07-19 20:11:14 +00:00
sctp_cc_functions.c Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_constants.h Limit the user-controllable amount of memory the kernel allocates 2019-01-16 11:33:47 +00:00
sctp_crc32.c Separate kernel crc32() implementation to its own header (gsb_crc32.h) and 2019-06-17 19:49:08 +00:00
sctp_crc32.h
sctp_dtrace_declare.h
sctp_dtrace_define.h Add support for send, receive and state-change DTrace providers for 2018-08-22 21:23:32 +00:00
sctp_header.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_indata.c Fix the handling of fragmented unordered messages when using DATA chunks 2019-03-25 09:47:22 +00:00
sctp_indata.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_input.c Mitigate providing a timing signal if the COOKIE or AUTH 2018-10-01 14:05:31 +00:00
sctp_input.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_lock_bsd.h
sctp_os_bsd.h Use arc4rand() instead of read_random() in the SCTP and TCP code. 2018-08-23 19:10:45 +00:00
sctp_os.h
sctp_output.c Fix build issue for the userland stack. 2019-03-24 12:13:05 +00:00
sctp_output.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_pcb.c Improve locking when tearing down an SCTP association. 2019-03-25 15:23:20 +00:00
sctp_pcb.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_peeloff.c Use the stacb instead of the asoc in state macros. 2018-08-13 13:58:45 +00:00
sctp_peeloff.h
sctp_ss_functions.c Initialize scheduler specific data for the FCFS scheduler. 2019-03-25 16:40:54 +00:00
sctp_structs.h Fix build issue for the userland stack. 2019-03-24 12:13:05 +00:00
sctp_syscalls.c netinet silence warnings 2018-05-19 05:56:21 +00:00
sctp_sysctl.c Plug some networking sysctl leaks. 2018-11-22 20:49:41 +00:00
sctp_sysctl.h Add initial descriptions for SCTP related MIB variable. 2018-10-26 21:04:17 +00:00
sctp_timer.c Refactor the SHUTDOWN_PENDING state handling. 2018-08-21 13:25:32 +00:00
sctp_timer.h
sctp_uio.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp_usrreq.c Allow sending on demand SCTP HEARTBEATS only in the ESTABLISHED state. 2019-05-19 17:53:36 +00:00
sctp_var.h Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
sctp.h Limit the size of messages sent on 1-to-many style SCTP sockets with the 2019-03-23 22:56:03 +00:00
sctputil.c Improve input validation for the IPPROTO_SCTP level socket options 2019-05-19 17:28:00 +00:00
sctputil.h Improve input validation for the IPPROTO_SCTP level socket options 2019-05-19 17:28:00 +00:00
siftr.c Repair siftr(4): PFIL_IN and PFIL_OUT are defines of some value, relying 2019-02-01 08:10:26 +00:00
tcp_debug.c
tcp_debug.h
tcp_fastopen.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
tcp_fastopen.h Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option. 2018-02-26 03:03:41 +00:00
tcp_fsm.h Revert r334843, and partially revert r335180. 2018-06-23 06:53:53 +00:00
tcp_hostcache.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
tcp_hostcache.h
tcp_hpts.c Bind TCP HPTS (pacer) threads to NUMA domains 2019-05-10 13:41:19 +00:00
tcp_hpts.h Regularize the Netflix copyright 2019-02-04 21:28:25 +00:00
tcp_input.c Don't use C++ style comments. 2019-05-09 21:00:15 +00:00
tcp_log_buf.c Fix a small bug in the tcp_log_id where the bucket 2019-04-10 18:58:11 +00:00
tcp_log_buf.h Regularize the Netflix copyright 2019-02-04 21:28:25 +00:00
tcp_lro.c Update tcp_lro with tested bugfixes from Netflix and LLNW: 2018-03-09 00:08:43 +00:00
tcp_lro.h
tcp_offload.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
tcp_offload.h Add a hook to allow the toedev handling an offloaded connection to 2018-04-03 01:08:54 +00:00
tcp_output.c Undo my previous erroneous commit changing the tcp_output kassert. 2019-04-03 19:35:07 +00:00
tcp_pcap.c Add an external mbuf buffer type that holds multiple unmapped pages. 2019-06-29 00:48:33 +00:00
tcp_pcap.h
tcp_reass.c This patch addresses an issue brought up by bz@ in D18968: 2019-02-21 09:34:47 +00:00
tcp_sack.c Receiver side DSACK implemenation. 2019-05-09 07:34:15 +00:00
tcp_seq.h r330675 introduced an extra window check in the LRO code to ensure it 2018-04-03 13:54:38 +00:00
tcp_subr.c Reject attempts to register a TCP stack being unloaded. 2019-06-27 22:34:05 +00:00
tcp_syncache.c When an ACK segment as the third message of the three way handshake is 2019-05-26 17:18:14 +00:00
tcp_syncache.h The handling of RST segments in the SYN-RCVD state exists in the 2018-10-18 19:21:18 +00:00
tcp_timer.c Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial 2019-03-23 21:36:59 +00:00
tcp_timer.h Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial 2019-03-23 21:36:59 +00:00
tcp_timewait.c Fix a byte ordering issue for the advertised receiver window in ACK 2019-02-15 09:45:17 +00:00
tcp_usrreq.c Add an external mbuf buffer type that holds multiple unmapped pages. 2019-06-29 00:48:33 +00:00
tcp_var.h tcp_autorcvbuf_inc was removed in r344433. 2019-03-29 21:39:47 +00:00
tcp.h This commit brings in a new refactored TCP stack called Rack. 2018-06-07 18:18:13 +00:00
tcpip.h
toecore.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
toecore.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
udp_usrreq.c After parts of the locking fixes in r346595, syzkaller found 2019-06-01 14:57:42 +00:00
udp_var.h
udp.h
udplite.h Add a dtrace provider for UDP-Lite. 2018-07-31 22:56:03 +00:00