Commit Graph

6027 Commits

Author SHA1 Message Date
Matt Macy
e5e3e746fe Fix a potential use after free in getsockopt() access to inp_options
Discussed with: jhb
Reviewed by:	sbruno, transport
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-22 20:02:14 +00:00
Matt Macy
2269988749 NULL out cc_data in pluggable TCP {cc}_cb_destroy
When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has
a destructor (newreno_cb_destroy) for per connection state. Other congestion
controls may allocate and free cc_data on entry and exit, but the field is
never explicitly NULLed if moving back to NewReno which only internally
allocates stateful data (no entry contstructor) resulting in a situation where
newreno_cb_destory might be called on a junk pointer.

 -    NULL out cc_data in the framework after calling {cc}_cb_destroy
 -    free(9) checks for NULL so there is no need to perform not NULL checks
     before calling free.
 -    Improve a comment about NewReno in tcp_ccalgounload

This is the result of a debugging session from Jason Wolfe, Jason Eggleston,
and mmacy@ and very helpful insight from lstewart@.

Submitted by: Kevin Bowling
Reviewed by: lstewart
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16282
2018-07-22 05:37:58 +00:00
Michael Tuexen
34fc9072ce Set the IPv4 version in the IP header for UDP and UDPLite. 2018-07-21 02:14:13 +00:00
Michael Tuexen
e1526d5a5b Add missing dtrace probes for received UDP packets.
Fire UDP receive probes when a packet is received and there is no
endpoint consuming it. Fire the probe also if the TTL of the
received packet is smaller than the minimum required by the endpoint.

Clarify also in the man page, when the probe fires.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:32:20 +00:00
Michael Tuexen
0053ed28ff Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
Michael Tuexen
b0471b4b95 Revert https://svnweb.freebsd.org/changeset/base/336503
since I also ran the export script with different parameters.
2018-07-19 20:11:14 +00:00
Michael Tuexen
7679e49dd4 Whitespace changes due to change if ident. 2018-07-19 19:33:42 +00:00
Randall Stewart
8de9ac5eec Bump the ICMP echo limits to match the RFC
Reviewed by:	tuexen
Sponsored by: Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D16333
2018-07-18 22:49:53 +00:00
Andrey V. Elsukov
acf673edf0 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
Sean Bruno
c8b1bdc31c There was quite a bit of feedback on r336282 that has led to the
submitter to want to revert it.
2018-07-14 23:53:51 +00:00
Sean Bruno
179a28b098 Fixup memory management for fetching options in ip_ctloutput()
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-14 16:19:46 +00:00
Mark Johnston
aaf268f9f6 Remove a duplicate check.
PR:		229663
Submitted by:	David Binderman <dcb314@hotmail.com>
MFC after:	3 days
2018-07-11 14:54:56 +00:00
Brooks Davis
3a20f06a1c Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
Michael Tuexen
c9da58534d Add support for printing the TCP FO client-side cookie cache via the
sysctl interface. This is similar to the TCP host cache.

Reviewed by:		pkelsey@, kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D14554
2018-07-10 10:50:43 +00:00
Michael Tuexen
a026a53a76 Use appropriate MSS value when populating the TCP FO client cookie cache
When a client receives a SYN-ACK segment with a TFP fast open cookie,
but without an MSS option, an MSS value from uninitialised stack memory is used.
This patch ensures that in case no MSS option is included in the SYN-ACK,
the appropriate value as given in RFC 7413 is used.

Reviewed by:		kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16175
2018-07-10 10:42:48 +00:00
Steven Hartland
65c3a353e6 Removed pointless NULL check
Removed pointless NULL check after malloc with M_WAITOK which can never
return NULL.

Sponsored by:	Multiplay
2018-07-10 08:05:32 +00:00
Andrey V. Elsukov
f7c4fdee1a Add "record-state", "set-limit" and "defer-action" rule options to ipfw.
"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by:	lev
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D1776
2018-07-09 11:35:18 +00:00
Michael Tuexen
5f1347d7c9 Allow alternate TCP stack to populate the TCP FO client cookie
cache.

Without this patch, TCP FO could be used when using alternate
TCP stack, but only existing entires in the TCP client cookie
cache could be used. This cache was not populated by connections
using alternate TCP stacks.

Sponsored by:		Netflix, Inc.
2018-07-07 12:28:16 +00:00
Michael Tuexen
c556884f8e When initializing the TCP FO client cookie cache, take into account
whether the TCP FO support is enabled or not for the client side.

The code in tcp_fastopen_init() implicitly assumed that the sysctl
variable V_tcp_fastopen_client_enable was initialized to 0. This
was initially true, but was changed in r335610, which unmasked this
bug.

Thanks to Pieter de Goeje for reporting the issue on freebsd-net@
2018-07-07 11:18:26 +00:00
Brooks Davis
5c5e39e3d5 One more 32-bit fix for r335979.
Reported by:	tuexen
2018-07-06 13:34:45 +00:00
Brooks Davis
7524b4c14b Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
Andrew Turner
2bf9501287 Create a new macro for static DPCPU data.
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by:	bz, emaste, markj
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D16140
2018-07-05 17:13:37 +00:00
Brooks Davis
f38b68ae8a Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
Hiroki Sato
5ba05d3d0e - Fix a double unlock in inp_block_unblock_source() and
lock leakage in inp_leave_group() which caused a panic.
- Make order of CTR1() and IN_MULTI_LIST_LOCK() consistent
  around inm_merge().
2018-07-04 06:47:34 +00:00
Matt Macy
6573d7580b epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
Matt Macy
99208b820f inpcb: don't gratuitously defer frees
Don't defer frees in sysctl handlers. It isn't necessary
and it just confuses things.
revert: r333911, r334104, and r334125

Requested by: jtl
2018-07-02 05:19:44 +00:00
Kristof Provost
0d3d234cd1 carp: Set DSCP value CS7
Update carp to set DSCP value CS7(Network Traffic) in the flowlabel field of
packets by default. Currently carp only sets TOS_LOWDELAY in IPv4 which was
deprecated in 1998. This also implements sysctl that can revert carp back to
it's old behavior if desired.

This will allow implementation of QOS on modern network devices to make sure
carp packets aren't dropped during interface contention.

Submitted by:	Nick Wolff <darkfiberiru AT gmail.com>
Reviewed by:	kp, mav (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14536
2018-07-01 08:37:07 +00:00
Andrey V. Elsukov
6e081509db Add NULL pointer check.
encap_lookup_t method can be invoked by IP encap subsytem even if none
of gif/gre/me interfaces are exist. Hash tables are allocated on demand,
when first interface is created. So, make NULL pointer check before
doing access to hash table.

PR:		229378
2018-06-28 11:39:27 +00:00
Gleb Smirnoff
b8ab659396 Check the inp_flags under inp lock. Looks like the race was hidden
before, the conversion of tcbinfo to CK_LIST have uncovered it.
2018-06-27 22:01:59 +00:00
Sean Bruno
af4da58655 Enable TCP_FASTOPEN by default for FreeBSD 12.
Submitted by:	kbowling
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D15959
2018-06-24 21:46:29 +00:00
Sean Bruno
45fc0718d8 Reap unused variable and assignment that had no effect. Noted by cross
compiling with gcc on mips.

Reviewed by:	mmacy
2018-06-24 21:36:37 +00:00
Gleb Smirnoff
a00f4ac22f Revert r334843, and partially revert r335180.
tcp_outflags[] were defined since 4BSD and are defined nowadays in
all its descendants. Removing them breaks third party application.
2018-06-23 06:53:53 +00:00
Randall Stewart
581a046a8b This adds in an optimization so that we only walk one
time through the mbuf chain during copy and TSO limiting.
It is used by both Rack and now the FreeBSD stack.
Sponsored by:	Netflix Inc
Differential Revision: https://reviews.freebsd.org/D15937
2018-06-21 21:03:58 +00:00
Matt Macy
e93fdbe212 raw_ip: validate inp in both loops
Continuation of r335497. Also move the lock acquisition up to
validate before referencing inp_cred.

Reported by:	pho
2018-06-21 20:18:23 +00:00
Matt Macy
3d348772e7 in_pcblookup_hash: validate inp before return
Post r335356 it is possible to have an inpcb on the hash lists that is
partially torn down. Validate before using. Also as a side effect of this
change the lock ordering issue between hash lock and inpcb no longer exists
allowing some simplification.

Reported by:	pho@
2018-06-21 18:40:15 +00:00
Matt Macy
e5c331cf78 raw_ip: validate inp
Post r335356 it is possible to have an inpcb on the hash lists that is
partially torn down. Validate before using.

Reported by:	pho
2018-06-21 17:24:10 +00:00
Matt Macy
46374cbf54 udp_ctlinput: don't refer to unpcb after we drop the lock
Reported by: pho@
2018-06-21 06:10:52 +00:00
Randall Stewart
c6f76759ca Make sure that the t_peakrate_thr is not compiled in
by default until NF can upstream it.

Reviewed by:	and suggested lstewart
Sponsored by:	Netflix Inc.
2018-06-19 11:20:28 +00:00
Randall Stewart
f923a734b3 Move the tp set back to where it was before
we started playing with the VNET sets. This
way we have verified the INP settings before
we go to the trouble of de-referencing it.

Reviewed by:	and suggested by lstewart
Sponsored by:	Netflix Inc.
2018-06-19 05:28:14 +00:00
Matt Macy
9e58ff6ff9 convert inpcbinfo hash and info rwlocks to epoch + mutex
- Convert inpcbinfo info & hash locks to epoch for read and mutex for write
- Garbage collect code that handled INP_INFO_TRY_RLOCK failures as
  INP_INFO_RLOCK which can no longer fail

When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces
unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to
3%.

Overall packet throughput rate limited by CPU affinity and NIC driver design
choices.

On the receiver unhalted core cycles samples in in_pcblookup_hash went from
13% to to 1.6%

Tested by LLNW and pho@

Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15686
2018-06-19 01:54:00 +00:00
Randall Stewart
f994ead330 Move to using the inp->vnet pointer has suggested by lstewart.
This is far better since the hpts system is using the inp
as its basis anyway. Unfortunately his comments came late.

Sponsored by:	Netflix Inc.
2018-06-18 14:10:12 +00:00
Andrey V. Elsukov
20efcfc602 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
Michael Tuexen
43b223f42e When retransmitting TCP SYN-ACK segments with the TCP timestamp option
enabled use an updated timestamp instead of reusing the one used in
the initial TCP SYN-ACK segment.

This patch ensures that an updated timestamp is used when sending the
SYN-ACK from the syncache code. It was already done if the
SYN-ACK was retransmitted from the generic code.

This makes the behaviour consistent and also conformant with
the TCP specification.

Reviewed by:		jtl@, Jason Eggleston
MFC after:		1 month
Sponsored by:		Neflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D15634
2018-06-15 12:28:43 +00:00
Gleb Smirnoff
9293873e83 TCPOUTFLAGS no longer exists since r334843. 2018-06-14 22:25:10 +00:00
Michael Tuexen
33ef123090 Provide the ip6_plen in network byte order when calling ip6_output().
This is not strictly required by ip6_output(), since it overrides it,
but it is needed for upcoming dtrace support.
2018-06-14 21:30:52 +00:00
Michael Tuexen
8d86bd564f Whitespace changes. 2018-06-14 21:22:14 +00:00
Andrey V. Elsukov
eb548a1a5c In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested.
It is better to try allocate a big mbuf, than just silently drop a big
packet. A better solution could be reworking of libalias modules to be
able use m_copydata()/m_copyback() instead of requiring the single
contiguous buffer.

PR:		229006
MFC after:	1 week
2018-06-14 11:15:39 +00:00
Randall Stewart
4aec110f70 This fixes several bugs that Larry Rosenman helped me find in
Rack with respect to its handling of TCP Fast Open. Several
fixes all related to TFO are included in this commit:
1) Handling of non-TFO retransmissions
2) Building the proper send-map when we are doing TFO
3) Dealing with the ack that comes back that includes the
   SYN and data.

It appears that with this commit TFO now works :-)

Thanks Larry for all your help!!

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15758
2018-06-14 03:27:42 +00:00
Matt Macy
feeef8509b Fix PCBGROUPS build post CK conversion of pcbinfo 2018-06-13 23:19:54 +00:00
Andrey V. Elsukov
a5185adeb6 Rework if_gre(4) to use encap_lookup_t method to speedup lookup
of needed interface when many gre interfaces are present.

Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations. Use hash table to
speedup lookup of needed softc.
2018-06-13 11:11:33 +00:00