6123 Commits

Author SHA1 Message Date
ae
93a7173b74 Add NAT64 CLAT implementation as defined in RFC6877.
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.

  # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
  # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
  # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in

Obtained from:	Yandex LLC
Submitted by:	Boris N. Lytochkin
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
2019-03-18 11:44:53 +00:00
glebius
f2e5fcd17c Remove 'dir' argument from dummynet_io(). This makes it possible to make
dn_dir flags private to dummynet. There is still some room for improvement.
2019-03-14 22:32:50 +00:00
glebius
8a5caee402 Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info.
While here make 'tee' boolean.
2019-03-14 22:30:05 +00:00
glebius
25a76c3618 Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.
2019-03-14 22:23:09 +00:00
bz
886b55fe42 Improve ARP logging.
r344504 added an extra ARP_LOG() call in case of an if_output() failure.
It turns out IPv4 can be noisy. In order to not spam the console by default:
(a) add a counter for these events so people can keep better track of how
    often it happens, and
(b) add a sysctl to select the default ARP_LOG log level and set it to
    INFO avoiding the one (the new) DEBUG level by default.

Claim a spare (1st one after 10 years since the stats were added) in order
to not break netstat from FreeBSD 12->13 updates in the future.

Reviewed by:		karels
Differential Revision:	https://reviews.freebsd.org/D19490
2019-03-09 01:12:59 +00:00
tuexen
1e1af27f31 Fix locking bug.
MFC after:		3 days
2019-03-08 18:17:57 +00:00
tuexen
272f72e22d Some cleanup and consistency improvements.
MFC after:		3 days
2019-03-08 18:16:19 +00:00
tuexen
8abf87e138 After removing an entry from the stream scheduler list, set the pointers
to NULL, since we are checking for it in case the element gets inserted
again.

This issue was found by running syzkaller.

MFC after:		3 days
2019-03-07 08:43:20 +00:00
tuexen
a301c92873 Allocate an assocition id and register the stcb with holding the lock.
This avoids a race where stcbs can be found, which are not completely
initialized.

This was found by running syzkaller.

MFC after:		3 days
2019-03-03 19:55:06 +00:00
tuexen
848287bd3d Remove debug output.
MFC after:		3 days
2019-03-02 16:10:11 +00:00
tuexen
bde12a4960 Allow SCTP stream reconfiguration operations only in ESTABLISHED
state.

This issue was found by running syzkaller.

MFC after:		3 days
2019-03-02 14:30:27 +00:00
tuexen
48fbfe3342 Handle the case when calling the IPPROTO_SCTP level socket option
SCTP_STATUS on an association with no primary path (early state).

This issue was found by running syzkaller.

MFC after:		3 days
2019-03-02 14:15:33 +00:00
tuexen
5b706bbb5a Report the correct length when using the IPPROTO_SCTP level
socket options SCTP_GET_PEER_ADDRESSES and SCTP_GET_LOCAL_ADDRESSES.
2019-03-02 13:12:37 +00:00
tuexen
9143b7c0de Honor the memory limits provided when processing the IPPROTO_SCTP
level socket option SCTP_GET_LOCAL_ADDRESSES in a getsockopt() call.

Thanks to Thomas Barabosch for reporting the issue which was found by
running syzkaller.

MFC after:		3 days
2019-03-01 18:47:41 +00:00
tuexen
5ec78c1aa0 Improve consistency, not functional change.
MFC after:		3 days
2019-03-01 15:57:55 +00:00
jhb
1498304f92 Various cleanups to the management of multiple TCP stacks.
- Use strlcpy() with sizeof() instead of strncpy().

- Simplify initialization of TCP functions structures.

  init_tcp_functions() was already called before the first call to
  register a stack.  Just inline the work in the SYSINIT and remove
  the racy helper variable.  Instead, KASSERT that the rw lock is
  initialized when registering a stack.

- Protect the default stack via a direct pointer comparison.

  The default stack uses the name "freebsd" instead of "default" so
  this protection wasn't working for the default stack anyway.

Reviewed by:	rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19152
2019-02-27 20:24:23 +00:00
bz
0d49e52bdf Make arp code return (more) errors.
arprequest() is a void function and in case of error we simply
return without any feedback. In case of any local operation
or *if_output() failing no feedback is send up the stack for the
packet which triggered the arp request to be sent.
arpresolve_full() has three pre-canned possible errors returned
(if we have not yet sent enough arp requests or if we tried
often enough without success) otherwise "no error" is returned.

Make arprequest() an "internal" function arprequest_internal() which
does return a possible error to the caller. Preserve arprequest()
as a void wrapper function for external consumers.
In arpresolve_full() add an extra error checking. Use the
arprequest_internal() function and only return an error if non
of the three ones (mentioend above) are already set.

This will return possible errors all the way up the stack and
allows functions and programs to react on the send errors rather
than leaving them in the dark. Also they might get more detailed
feedback of why packets cannot be sent and they will receive it
quicker.

Reviewed by:		karels, hselasky
Differential Revision:	https://reviews.freebsd.org/D18904
2019-02-24 22:49:56 +00:00
glebius
13a1011c10 Support struct ip_mreqn as argument for IP_ADD_MEMBERSHIP. Legacy support
for struct ip_mreq remains in place.

The struct ip_mreqn is Linux extension to classic BSD multicast API. It
has extra field allowing to specify the interface index explicitly. In
Linux it used as argument for IP_MULTICAST_IF and IP_ADD_MEMBERSHIP.
FreeBSD kernel also declares this structure and supports it as argument
to IP_MULTICAST_IF since r170613. So, we have structure declared but
not fully supported, this confused third party application configure
scripts.

Code handling IP_ADD_MEMBERSHIP was mixed together with code for
IP_ADD_SOURCE_MEMBERSHIP.  Bringing legacy and new structure support
into the mess would made the "argument switcharoo" intolerable, so
code was separated into its own switch case clause.

MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D19276
2019-02-23 06:03:18 +00:00
tuexen
e763f31429 The receive buffer autoscaling for TCP is based on a linear growth, which
is acceptable in the congestion avoidance phase, but not during slow start.
The MTU is is also not taken into account.
Use a method instead, which is based on exponential growth working also in
slow start and being independent from the MTU.

This is joint work with rrs@.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18375
2019-02-21 10:35:32 +00:00
tuexen
192c95e996 This patch addresses an issue brought up by bz@ in D18968:
When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen,
if user data was received during the TCP handshake and BB logging is used.

A KASSERT is also added to detect tcp_reass() calls with illegal parameter
combinations.

Reported by:		bz@
Reviewed by:		rrs@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19254
2019-02-21 09:34:47 +00:00
tuexen
87f2a8bca4 Reduce the TCP initial retransmission timeout from 3 seconds to
1 second as allowed by RFC 6298.

Reviewed by:		kbowling@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18941
2019-02-20 18:03:43 +00:00
tuexen
796133921a Use exponential backoff for retransmitting SYN segments as specified
in the TCP RFCs.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18974
2019-02-20 17:56:38 +00:00
tuexen
0c4eb6ecc6 Fix a byte ordering issue for the advertised receiver window in ACK
segments sent in TIMEWAIT state, which I introduced in r336937.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2019-02-15 09:45:17 +00:00
ae
50a8601b2e In r335015 PCB destroing was made deferred using epoch_call().
But ipsec_delete_pcbpolicy() uses some VNET-virtualized variables,
and thus it needs VNET context, that is missing during gtaskqueue
executing. Use inp_vnet context to set curvnet in in_pcbfree_deferred().

PR:		235684
MFC after:	1 week
2019-02-13 15:46:05 +00:00
kp
8e2074c2e1 garp: Fix vnet related panic for gratuitous arp
Gratuitous ARP packets are sent from a timer, which means we don't have a vnet
context set. As a result we panic trying to send the packet.

Set the vnet context based on the interface associated with the interface
address.

To reproduce:
  sysctl net.link.ether.inet.garp_rexmit_count=2
  ifconfig vtnet1 10.0.0.1/24 up

PR:		235699
Reviewed by:	vangyzen@
MFC after:	1 week
2019-02-12 21:22:57 +00:00
tuexen
6903958c2a Improve input validation for raw IPv4 socket using the IP_HDRINCL
option.

This issue was found by running syzkaller on OpenBSD.
Greg Steuck made me aware that the problem might also exist on FreeBSD.

Reported by:		Greg Steuck
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D18834
2019-02-12 10:17:21 +00:00
tuexen
b80fcf68dd Fix a locking issue when reporing outbount messages.
MFC after:		3 days
2019-02-10 14:02:14 +00:00
tuexen
eee3d66791 Fix a locking issue in the IPPROTO_SCTP level SCTP_PEER_ADDR_THLDS socket
option. The problem affects only setsockopt with invalid parameters.

This issue was found by syzkaller.

MFC after:		3 days
2019-02-10 13:55:32 +00:00
tuexen
fb17e65b4c Fix a locking bug in the IPPROTO_SCTP level SCTP_EVENT socket option.
This occurs when call setsockopt() with invalid parameters.

This issue was found by syzkaller.

MFC after:		3 days
2019-02-10 10:42:16 +00:00
tuexen
b30530d5f6 Fix locking for IPPROTO_SCTP level SCTP_DEFAULT_PRINFO socket option.
This problem occurred when calling setsockopt() will invalid parameters.

This issue was found by running syzkaller.

MFC after:		3 days
2019-02-10 08:28:56 +00:00
tuexen
57ec47006e Ensure that when using the TCP CDG congestion control and setting the
sysctl variable net.inet.tcp.cc.cdg.smoothing_factor to 0, the smoothing
is disabled. Without this patch, a division by zero orrurs.

PR:			193762
Reviewed by:		lstewart@, rrs@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19071
2019-02-08 20:42:49 +00:00
tuexen
c6fb952de9 Only reduce the PMTU after the send call. The only way to increase it, is
via PMTUD.

This fixes an MTU issue reported by Timo Voelker.

MFC after:		3 days
2019-02-05 10:29:31 +00:00
tuexen
78654493eb Fix an off-by-one error in the input validation of the SCTP_RESET_STREAMS
socketoption.

This was found by running syzkaller.

MFC after:		3 days
2019-02-05 10:13:51 +00:00
imp
82650adfef Regularize the Netflix copyright
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
   piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.

Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
2019-02-04 21:28:25 +00:00
tuexen
46bca47606 When handling SYN-ACK segments in the SYN-RCVD state, set tp->snd_wnd
consistently.

This inconsistency was observed when working on the bug reported in
PR 235256, although it does not fix the reported issue. The fix for
the PR will be a separate commit.

PR:			235256
Reviewed by:		rrs@, Richard Scheffenegger
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19033
2019-02-01 12:33:00 +00:00
glebius
711fa71dfe Repair siftr(4): PFIL_IN and PFIL_OUT are defines of some value, relying
on them having particular values can break things.
2019-02-01 08:10:26 +00:00
glebius
9978a7d924 New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented.  The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision:	https://reviews.freebsd.org/D18951
2019-01-31 23:01:03 +00:00
brooks
b686fa9ec4 Add a simple port filter to SIFTR.
SIFTR does not allow any kind of filtering, but captures every packet
processed by the TCP stack.
Often, only a specific session or service is of interest, and doing the
filtering in post-processing of the log adds to the overhead of SIFTR.

This adds a new sysctl net.inet.siftr.port_filter. When set to zero, all
packets get captured as previously. If set to any other value, only
packets where either the source or the destination ports match, are
captured in the log file.

Submitted by:	Richard Scheffenegger
Reviewed by:	Cheng Cui
Differential Revision:	https://reviews.freebsd.org/D18897
2019-01-30 17:44:30 +00:00
tuexen
7c894e3728 Fix the detection of ECN-setup SYN-ACK packets.
RFC 3168 defines an ECN-setup SYN-ACK packet as on with the ECE flags
set and the CWR flags not set. The code was only checking if ECE flag
is set. This patch adds the check to verify that the CWR flags is not
set.

Submitted by:		Richard Scheffenegger
Reviewed by:		tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D18996
2019-01-28 12:45:31 +00:00
tuexen
98afe6eb13 Don't include two header files when not needed.
This allows the part of the rewrite of TCP reassembly in this
files to be MFCed to stable/11 with manual change.

MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-01-25 17:08:28 +00:00
tuexen
8e4ad3b84a Fix a bug in the restart window computation of TCP New Reno
When implementing support for IW10, an update in the computation
of the restart window used after an idle phase was missed. To
minimize code duplication, implement the logic in tcp_compute_initwnd()
and call it. This fixes a bug in NewReno, which was not aware of
IW10.

Submitted by:		Richard Scheffenegger
Reviewed by:		tuexen@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D18940
2019-01-25 13:57:09 +00:00
tuexen
cbb1f242e8 Get the arithmetic right...
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-01-24 16:47:18 +00:00
tuexen
347de7d146 Kill a trailing whitespace character...
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-01-24 16:43:13 +00:00
tuexen
7521f0c094 Update a comment to reflect the current reality.
SYN-cache entries live for abaut 12 seconds, not 45, when default
setting are used.

MFC after:		1 week
Sponsored by:		Netflix, Inc.
2019-01-24 16:40:14 +00:00
markj
acda75d443 Style.
Reviewed by:	bz
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-01-23 22:19:49 +00:00
markj
af69719726 Fix an LLE lookup race.
After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE.  The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer.  By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.

Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock.  If it's not set, the lookup fails.

PR:		234296
Reviewed by:	bz
Tested by:	sbruno, Victor <chernov_victor@list.ru>,
		Mike Andrews <mandrews@bit0.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18906
2019-01-23 22:18:23 +00:00
brooks
8ba0807050 Make SIFTR work again after r342125 (D18443).
Correct a logic error.

Only disable when already enabled or enable when disabled.

Submitted by:	Richard Scheffenegger
Reviewed by:	Cheng Cui
Obtained from:	Cheng Cui
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D18885
2019-01-18 21:46:38 +00:00
tuexen
eb82fa876e Limit the user-controllable amount of memory the kernel allocates
via IPPROTO_SCTP level socket options.

This issue was found by running syzkaller.

MFC after:	1 week
2019-01-16 11:33:47 +00:00
shurd
1b9893d7d7 Fix window update issue when scaling disabled
When the TCP window scale option is not used, and the window
opens up enough in one soreceive, a window update will not be sent.

For example, if recwin == 65535, so->so_rcv.sb_hiwat >= 262144, and
so->so_rcv.sb_hiwat <= 524272, the window update will never be sent.
This is because recwin and adv are clamped to TCP_MAXWIN << tp->rcv_scale,
and so will never be >= so->so_rcv.sb_hiwat / 4
or <= so->so_rcv.sb_hiwat / 8.

This patch ensures a window update is sent if the window opens by
TCP_MAXWIN << tp->rcv_scale, which should only happen when the window
size goes from zero to the max expressible.

This issue looks like it was introduced in r306769 when recwin was clamped
to TCP_MAXWIN << tp->rcv_scale.

MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D18821
2019-01-15 17:40:19 +00:00
tuexen
87ee738236 Fix getsockopt() for IP_OPTIONS/IP_RETOPTS.
r336616 copies inp->inp_options using the m_dup() function.
However, this function expects an mbuf packet header at the beginning,
which is not true in this case.
Therefore, use m_copym() instead of m_dup().

This issue was found by syzkaller.
Reviewed by:		mmacy@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18753
2019-01-09 06:36:57 +00:00