freebsd-dev/sys/netinet
Hans Petter Selasky f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
..
cc Fix a variety of cosmetic typos and misspellings 2017-01-15 18:00:45 +00:00
khelp Remove "long" variables from the TCP stack (not including the modular 2016-10-06 16:28:34 +00:00
libalias sys/net*: minor spelling fixes. 2016-05-03 18:05:43 +00:00
tcp_stacks Followup to mtod removal in main stack (r311225). Continued removal 2017-01-04 04:00:28 +00:00
accf_data.c
accf_dns.c
accf_http.c
icmp6.h Add missing constants from RFCs 4443 and 6550 2016-06-06 00:35:45 +00:00
icmp_var.h Use counter_ratecheck() in the ICMP rate limiting. 2016-12-09 17:59:15 +00:00
if_atm.c
if_atm.h
if_ether.c Add GARP retransmit capability 2016-10-02 01:42:45 +00:00
if_ether.h This change re-adds L2 caching for TCP and UDP, as originally added in D4306 2016-06-02 17:51:29 +00:00
igmp_var.h
igmp.c With clang 3.9.0, compiling sys/netinet/igmp.c results in the following 2016-09-04 17:23:10 +00:00
igmp.h
in_cksum.c
in_debug.c
in_fib.c MFP r287070,r287073: split radix implementation and route table structure. 2016-01-25 06:33:15 +00:00
in_fib.h Merge helper fib* functions used for basic lookups. 2015-12-08 10:50:03 +00:00
in_gif.c Merge helper fib* functions used for basic lookups. 2015-12-08 10:50:03 +00:00
in_jail.c Move IPv4-specific jail functions to new file netinet/in_jail.c 2016-08-09 02:16:21 +00:00
in_kdtrace.c Fix style issues around existing SDT probes. 2015-12-16 23:39:27 +00:00
in_kdtrace.h Fix style issues around existing SDT probes. 2015-12-16 23:39:27 +00:00
in_mcast.c sys/net*: minor spelling fixes. 2016-05-03 18:05:43 +00:00
in_pcb.c Implement kernel support for hardware rate limited sockets. 2017-01-18 13:31:17 +00:00
in_pcb.h Implement kernel support for hardware rate limited sockets. 2017-01-18 13:31:17 +00:00
in_pcbgroup.c Unbreak the RSS/PCBGROUp build. 2016-03-31 00:53:23 +00:00
in_prot.c Remove BSD and USL copyright and update license block in in_prot.c, as the 2016-07-28 18:39:30 +00:00
in_proto.c The pr_destroy field does not allow us to run the teardown code in a 2016-06-01 10:14:04 +00:00
in_rmx.c Code duplication but rib_head is special. Not found an easy way to go 2016-02-03 21:56:51 +00:00
in_rss.c
in_rss.h
in_systm.h Prepare for network stack as a module 2016-07-27 20:34:09 +00:00
in_var.h Add GARP retransmit capability 2016-10-02 01:42:45 +00:00
in.c Add GARP retransmit capability 2016-10-02 01:42:45 +00:00
in.h Don't iterate over the ifnet addr list in ip_output() 2016-08-18 22:59:00 +00:00
ip6.h
ip_carp.c Unbreak ip_carp with WITHOUT_INET6 enabled by conditionalizing all IPv6 2016-12-30 21:33:01 +00:00
ip_carp.h
ip_divert.c The pr_destroy field does not allow us to run the teardown code in a 2016-06-01 10:14:04 +00:00
ip_divert.h
ip_dummynet.h Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE). 2016-05-26 21:40:13 +00:00
ip_ecn.c
ip_ecn.h
ip_encap.c Remove sys/eventhandler.h from net/route.h 2016-01-09 09:34:39 +00:00
ip_encap.h
ip_fastfwd.c When we are sending IP fragments, update ip pointers in IP_PROBE() for 2016-12-29 19:57:46 +00:00
ip_fw.h Add stats reset command implementation to NPTv6 module 2016-08-13 16:45:14 +00:00
ip_gre.c
ip_icmp.c Fix build for 32-bit machines. 2016-12-09 20:50:35 +00:00
ip_icmp.h Add support for handling ICMP and ICMP6 messages sent in response 2016-04-29 20:22:01 +00:00
ip_id.c Replace a number of conflations of mp_ncpus and mp_maxid with either 2016-07-06 14:09:49 +00:00
ip_input.c Add a new socket option SO_TS_CLOCK to pick from several different clock 2017-01-16 17:46:38 +00:00
ip_ipsec.c Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated 2016-08-21 18:55:30 +00:00
ip_ipsec.h
ip_mroute.c Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. 2016-09-15 07:41:48 +00:00
ip_mroute.h
ip_options.c sys/net*: minor spelling fixes. 2016-05-03 18:05:43 +00:00
ip_options.h
ip_output.c Implement kernel support for hardware rate limited sockets. 2017-01-18 13:31:17 +00:00
ip_reass.c
ip_var.h The pr_destroy field does not allow us to run the teardown code in a 2016-06-01 10:14:04 +00:00
ip.h sys/net*: minor spelling fixes. 2016-05-03 18:05:43 +00:00
pim_var.h
pim.h
raw_ip.c Ensure that the buffer length and the length provided in the IPv4 2017-01-13 10:55:26 +00:00
sctp_asconf.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_asconf.h Whitespace changes. 2016-12-06 10:21:25 +00:00
sctp_auth.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_auth.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_bsd_addr.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_bsd_addr.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_cc_functions.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_constants.h Cleanup the names of SSN, SID, TSN, FSN, PPID and MID. 2016-12-07 19:30:59 +00:00
sctp_crc32.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_crc32.h Whitespace changes. 2016-12-06 10:21:25 +00:00
sctp_dtrace_declare.h
sctp_dtrace_define.h This is work done by Michael Tuexen and myself at the IETF. This 2016-04-07 09:10:34 +00:00
sctp_header.h Cleanup the names of SSN, SID, TSN, FSN, PPID and MID. 2016-12-07 19:30:59 +00:00
sctp_indata.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_indata.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_input.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_input.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_lock_bsd.h netinet/sctp*: minor spelling fixes in comments. 2016-05-02 20:56:11 +00:00
sctp_os_bsd.h Whitespace changes. 2016-12-06 10:21:25 +00:00
sctp_os.h
sctp_output.c Consistent handling of errors reported from the lower layer. 2016-12-27 22:14:41 +00:00
sctp_output.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_pcb.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_pcb.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_peeloff.c
sctp_peeloff.h Whitespace changes. 2016-12-06 10:21:25 +00:00
sctp_ss_functions.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_structs.h Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_syscalls.c Use getsock_cap() instead of deprecated fgetsock(). 2017-01-13 16:54:44 +00:00
sctp_sysctl.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_sysctl.h Retire net.inet.sctp.strict_sacks and net.inet.sctp.strict_data_order 2016-05-12 16:34:59 +00:00
sctp_timer.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_timer.h Code cleanup which will silence a warning in PVS / D5245. 2016-02-17 18:04:22 +00:00
sctp_uio.h Whitespace changes. 2016-12-06 10:21:25 +00:00
sctp_usrreq.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctp_var.h Cleanup the names of SSN, SID, TSN, FSN, PPID and MID. 2016-12-07 19:30:59 +00:00
sctp.h This is work done by Michael Tuexen and myself at the IETF. This 2016-04-07 09:10:34 +00:00
sctputil.c Whitespace changes. 2016-12-26 11:06:41 +00:00
sctputil.h Whitespace changes. 2016-12-26 11:06:41 +00:00
siftr.c Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem. 2016-03-11 23:18:06 +00:00
tcp_debug.c Remove "long" variables from the TCP stack (not including the modular 2016-10-06 16:28:34 +00:00
tcp_debug.h
tcp_fastopen.c Fix kernel build with TCP_RFC7413 option 2016-08-11 23:52:24 +00:00
tcp_fastopen.h Implementation of server-side TCP Fast Open (TFO) [RFC7413]. 2015-12-24 19:09:48 +00:00
tcp_fsm.h Update TCPS_HAVERCVDFIN() macro to correctly include all states a connection 2016-08-26 17:48:54 +00:00
tcp_hostcache.c sysctl net.inet.tcp.hostcache.list in a jail can see connections from other 2017-01-05 17:22:09 +00:00
tcp_hostcache.h Remove "long" variables from the TCP stack (not including the modular 2016-10-06 16:28:34 +00:00
tcp_input.c Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and 2017-01-04 02:19:13 +00:00
tcp_lro.c Pass the number of segments coalesced by LRO up the stack by repurposing the 2016-08-25 13:33:32 +00:00
tcp_lro.h tcp/lro: Implement hash table for LRO entries. 2016-08-02 06:36:47 +00:00
tcp_offload.c Augment struct tcpstat with tcps_states[], which is used for book-keeping 2016-01-27 00:45:46 +00:00
tcp_offload.h
tcp_output.c Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and 2017-01-04 02:19:13 +00:00
tcp_pcap.c The TCPPCAP debugging feature caches recently-used mbufs for use in 2016-07-06 16:17:13 +00:00
tcp_pcap.h The TCPPCAP debugging feature caches recently-used mbufs for use in 2016-07-06 16:17:13 +00:00
tcp_reass.c Remove sys/eventhandler.h from net/route.h 2016-01-09 09:34:39 +00:00
tcp_sack.c Remove a KASSERT which is not always true. 2016-12-25 17:37:18 +00:00
tcp_seq.h Remove "long" variables from the TCP stack (not including the modular 2016-10-06 16:28:34 +00:00
tcp_subr.c Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and 2017-01-04 02:19:13 +00:00
tcp_syncache.c Remove assigned only variable. 2016-12-21 22:47:10 +00:00
tcp_syncache.h Grab a snap amount of TCP connections in syncache from tcpstat. 2016-01-27 00:48:05 +00:00
tcp_timer.c The code currently resets the keepalive timer each time a packet is 2016-10-14 14:57:43 +00:00
tcp_timer.h This cleans up the timer code in TCP and also makes it so we do not 2016-08-16 12:40:56 +00:00
tcp_timewait.c Ensure that TCP state changes to state-closing are reported via dtrace. 2016-11-19 14:45:08 +00:00
tcp_usrreq.c Fix a double-free when an inp transitions to INP_TIMEWAIT state 2016-10-18 07:16:49 +00:00
tcp_var.h Fix slight type mismatch between so_options defined in sys/socketvar.h 2017-01-12 10:14:54 +00:00
tcp.h Provide new socket option TCP_CCALGOOPT, which stands for TCP congestion 2016-01-22 02:07:48 +00:00
tcpip.h
toecore.c This change re-adds L2 caching for TCP and UDP, as originally added in D4306 2016-06-02 17:51:29 +00:00
toecore.h
udp_usrreq.c r297225 broke udp_output() for the case where the "addr" argument 2016-10-01 19:39:09 +00:00
udp_var.h The pr_destroy field does not allow us to run the teardown code in a 2016-06-01 10:14:04 +00:00
udp.h
udplite.h