Commit Graph

5962 Commits

Author SHA1 Message Date
andrew
5b192cd830 icmp_quotelen was accidentially changes in r336676, undo this.
Sponsored by:	DARPA, AFRL
2018-07-24 16:45:01 +00:00
andrew
a6605d2938 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
rrs
9df38bfa1f Delete the example tcp stack "fastpath" which
was only put in has an example.

Sponsored by:	Netflix inc.
Differential Revision:	https://reviews.freebsd.org/D16420
2018-07-24 14:55:47 +00:00
mmacy
813f5d12cc Fix a potential use after free in getsockopt() access to inp_options
Discussed with: jhb
Reviewed by:	sbruno, transport
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-22 20:02:14 +00:00
mmacy
6fcaec6a10 NULL out cc_data in pluggable TCP {cc}_cb_destroy
When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has
a destructor (newreno_cb_destroy) for per connection state. Other congestion
controls may allocate and free cc_data on entry and exit, but the field is
never explicitly NULLed if moving back to NewReno which only internally
allocates stateful data (no entry contstructor) resulting in a situation where
newreno_cb_destory might be called on a junk pointer.

 -    NULL out cc_data in the framework after calling {cc}_cb_destroy
 -    free(9) checks for NULL so there is no need to perform not NULL checks
     before calling free.
 -    Improve a comment about NewReno in tcp_ccalgounload

This is the result of a debugging session from Jason Wolfe, Jason Eggleston,
and mmacy@ and very helpful insight from lstewart@.

Submitted by: Kevin Bowling
Reviewed by: lstewart
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16282
2018-07-22 05:37:58 +00:00
tuexen
72da02e61c Set the IPv4 version in the IP header for UDP and UDPLite. 2018-07-21 02:14:13 +00:00
tuexen
ff46e28acc Add missing dtrace probes for received UDP packets.
Fire UDP receive probes when a packet is received and there is no
endpoint consuming it. Fire the probe also if the TTL of the
received packet is smaller than the minimum required by the endpoint.

Clarify also in the man page, when the probe fires.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:32:20 +00:00
tuexen
9bf2bb1b21 Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
tuexen
14de4a3d5b Revert https://svnweb.freebsd.org/changeset/base/336503
since I also ran the export script with different parameters.
2018-07-19 20:11:14 +00:00
tuexen
5810243631 Whitespace changes due to change if ident. 2018-07-19 19:33:42 +00:00
rrs
4b9f4bff13 Bump the ICMP echo limits to match the RFC
Reviewed by:	tuexen
Sponsored by: Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D16333
2018-07-18 22:49:53 +00:00
ae
d94c744a40 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
sbruno
d142ab3470 There was quite a bit of feedback on r336282 that has led to the
submitter to want to revert it.
2018-07-14 23:53:51 +00:00
sbruno
388f09b02b Fixup memory management for fetching options in ip_ctloutput()
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-14 16:19:46 +00:00
markj
0f3f6a3bb8 Remove a duplicate check.
PR:		229663
Submitted by:	David Binderman <dcb314@hotmail.com>
MFC after:	3 days
2018-07-11 14:54:56 +00:00
brooks
39f527e7ee Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
tuexen
c7a5475854 Add support for printing the TCP FO client-side cookie cache via the
sysctl interface. This is similar to the TCP host cache.

Reviewed by:		pkelsey@, kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D14554
2018-07-10 10:50:43 +00:00
tuexen
6123188c8a Use appropriate MSS value when populating the TCP FO client cookie cache
When a client receives a SYN-ACK segment with a TFP fast open cookie,
but without an MSS option, an MSS value from uninitialised stack memory is used.
This patch ensures that in case no MSS option is included in the SYN-ACK,
the appropriate value as given in RFC 7413 is used.

Reviewed by:		kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16175
2018-07-10 10:42:48 +00:00
smh
c6ce3f9dca Removed pointless NULL check
Removed pointless NULL check after malloc with M_WAITOK which can never
return NULL.

Sponsored by:	Multiplay
2018-07-10 08:05:32 +00:00
ae
544b51e5e3 Add "record-state", "set-limit" and "defer-action" rule options to ipfw.
"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by:	lev
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D1776
2018-07-09 11:35:18 +00:00
tuexen
68c3da5c03 Allow alternate TCP stack to populate the TCP FO client cookie
cache.

Without this patch, TCP FO could be used when using alternate
TCP stack, but only existing entires in the TCP client cookie
cache could be used. This cache was not populated by connections
using alternate TCP stacks.

Sponsored by:		Netflix, Inc.
2018-07-07 12:28:16 +00:00
tuexen
ab8567c6ff When initializing the TCP FO client cookie cache, take into account
whether the TCP FO support is enabled or not for the client side.

The code in tcp_fastopen_init() implicitly assumed that the sysctl
variable V_tcp_fastopen_client_enable was initialized to 0. This
was initially true, but was changed in r335610, which unmasked this
bug.

Thanks to Pieter de Goeje for reporting the issue on freebsd-net@
2018-07-07 11:18:26 +00:00
brooks
c4d0432c6f One more 32-bit fix for r335979.
Reported by:	tuexen
2018-07-06 13:34:45 +00:00
brooks
8baf738e84 Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
andrew
ae591a440e Create a new macro for static DPCPU data.
On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by:	bz, emaste, markj
Sponsored by:	ABT Systems Ltd
Differential Revision:	https://reviews.freebsd.org/D16140
2018-07-05 17:13:37 +00:00
brooks
6615ed4c61 Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
hrs
44b953fc1a - Fix a double unlock in inp_block_unblock_source() and
lock leakage in inp_leave_group() which caused a panic.
- Make order of CTR1() and IN_MULTI_LIST_LOCK() consistent
  around inm_merge().
2018-07-04 06:47:34 +00:00
mmacy
14de8a2820 epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
mmacy
c7b15ce781 inpcb: don't gratuitously defer frees
Don't defer frees in sysctl handlers. It isn't necessary
and it just confuses things.
revert: r333911, r334104, and r334125

Requested by: jtl
2018-07-02 05:19:44 +00:00
kp
3f4da6d3e7 carp: Set DSCP value CS7
Update carp to set DSCP value CS7(Network Traffic) in the flowlabel field of
packets by default. Currently carp only sets TOS_LOWDELAY in IPv4 which was
deprecated in 1998. This also implements sysctl that can revert carp back to
it's old behavior if desired.

This will allow implementation of QOS on modern network devices to make sure
carp packets aren't dropped during interface contention.

Submitted by:	Nick Wolff <darkfiberiru AT gmail.com>
Reviewed by:	kp, mav (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14536
2018-07-01 08:37:07 +00:00
ae
fd52110019 Add NULL pointer check.
encap_lookup_t method can be invoked by IP encap subsytem even if none
of gif/gre/me interfaces are exist. Hash tables are allocated on demand,
when first interface is created. So, make NULL pointer check before
doing access to hash table.

PR:		229378
2018-06-28 11:39:27 +00:00
glebius
d63e928d5b Check the inp_flags under inp lock. Looks like the race was hidden
before, the conversion of tcbinfo to CK_LIST have uncovered it.
2018-06-27 22:01:59 +00:00
sbruno
6adde06b36 Enable TCP_FASTOPEN by default for FreeBSD 12.
Submitted by:	kbowling
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D15959
2018-06-24 21:46:29 +00:00
sbruno
1dc47ad154 Reap unused variable and assignment that had no effect. Noted by cross
compiling with gcc on mips.

Reviewed by:	mmacy
2018-06-24 21:36:37 +00:00
glebius
0157c8d39f Revert r334843, and partially revert r335180.
tcp_outflags[] were defined since 4BSD and are defined nowadays in
all its descendants. Removing them breaks third party application.
2018-06-23 06:53:53 +00:00
rrs
b788102191 This adds in an optimization so that we only walk one
time through the mbuf chain during copy and TSO limiting.
It is used by both Rack and now the FreeBSD stack.
Sponsored by:	Netflix Inc
Differential Revision: https://reviews.freebsd.org/D15937
2018-06-21 21:03:58 +00:00
mmacy
232eed4f26 raw_ip: validate inp in both loops
Continuation of r335497. Also move the lock acquisition up to
validate before referencing inp_cred.

Reported by:	pho
2018-06-21 20:18:23 +00:00
mmacy
41c8895b78 in_pcblookup_hash: validate inp before return
Post r335356 it is possible to have an inpcb on the hash lists that is
partially torn down. Validate before using. Also as a side effect of this
change the lock ordering issue between hash lock and inpcb no longer exists
allowing some simplification.

Reported by:	pho@
2018-06-21 18:40:15 +00:00
mmacy
d9ccda194c raw_ip: validate inp
Post r335356 it is possible to have an inpcb on the hash lists that is
partially torn down. Validate before using.

Reported by:	pho
2018-06-21 17:24:10 +00:00
mmacy
778cdcd6a1 udp_ctlinput: don't refer to unpcb after we drop the lock
Reported by: pho@
2018-06-21 06:10:52 +00:00
rrs
1b6c300c4e Make sure that the t_peakrate_thr is not compiled in
by default until NF can upstream it.

Reviewed by:	and suggested lstewart
Sponsored by:	Netflix Inc.
2018-06-19 11:20:28 +00:00
rrs
a9e128dc64 Move the tp set back to where it was before
we started playing with the VNET sets. This
way we have verified the INP settings before
we go to the trouble of de-referencing it.

Reviewed by:	and suggested by lstewart
Sponsored by:	Netflix Inc.
2018-06-19 05:28:14 +00:00
mmacy
79793784f7 convert inpcbinfo hash and info rwlocks to epoch + mutex
- Convert inpcbinfo info & hash locks to epoch for read and mutex for write
- Garbage collect code that handled INP_INFO_TRY_RLOCK failures as
  INP_INFO_RLOCK which can no longer fail

When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces
unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to
3%.

Overall packet throughput rate limited by CPU affinity and NIC driver design
choices.

On the receiver unhalted core cycles samples in in_pcblookup_hash went from
13% to to 1.6%

Tested by LLNW and pho@

Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15686
2018-06-19 01:54:00 +00:00
rrs
3309c975db Move to using the inp->vnet pointer has suggested by lstewart.
This is far better since the hpts system is using the inp
as its basis anyway. Unfortunately his comments came late.

Sponsored by:	Netflix Inc.
2018-06-18 14:10:12 +00:00
ae
a58623ba71 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
tuexen
f343969480 When retransmitting TCP SYN-ACK segments with the TCP timestamp option
enabled use an updated timestamp instead of reusing the one used in
the initial TCP SYN-ACK segment.

This patch ensures that an updated timestamp is used when sending the
SYN-ACK from the syncache code. It was already done if the
SYN-ACK was retransmitted from the generic code.

This makes the behaviour consistent and also conformant with
the TCP specification.

Reviewed by:		jtl@, Jason Eggleston
MFC after:		1 month
Sponsored by:		Neflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D15634
2018-06-15 12:28:43 +00:00
glebius
4dac513075 TCPOUTFLAGS no longer exists since r334843. 2018-06-14 22:25:10 +00:00
tuexen
712feec090 Provide the ip6_plen in network byte order when calling ip6_output().
This is not strictly required by ip6_output(), since it overrides it,
but it is needed for upcoming dtrace support.
2018-06-14 21:30:52 +00:00
tuexen
b9f357b787 Whitespace changes. 2018-06-14 21:22:14 +00:00
ae
76167af160 In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested.
It is better to try allocate a big mbuf, than just silently drop a big
packet. A better solution could be reworking of libalias modules to be
able use m_copydata()/m_copyback() instead of requiring the single
contiguous buffer.

PR:		229006
MFC after:	1 week
2018-06-14 11:15:39 +00:00