Commit Graph

291 Commits

Author SHA1 Message Date
Michael Tuexen
cf8a49ab6e Fix the following issues related to the TCP SYN-cache:
* Let the accepted TCP/IPv4 socket inherit the configured TTL and
  TOS value.
* Let the accepted TCP/IPv6 socket inherit the configured Hop Limit.
* Use the configured Hop Limit and Traffic Class when sending
  IPv6 packets.

Reviewed by:		rrs, lutz_donnerhacke.de
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25909
2020-08-10 20:24:48 +00:00
Michael Tuexen
1bea15e601 Improve the ECN negotiation when the TCP SYN-cache is used by making
sure that
* ECN is disabled if the client sends an non-ECN-setup SYN segment.
* ECN is disabled is the ECN-setup SYN-ACK segment is retransmitted more
  than net.inet.tcp.ecn.maxretries times.

Reviewed by:		rscheff
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D26008
2020-08-08 19:39:38 +00:00
Michael Tuexen
9c04fdfd34 When using automatically generated flow labels and using TCP SYN
cookies, use the same flow label for the segments sent during the
handshake and after the handshake.
This fixes a bug by making sure that sc_flowlabel is always stored in
network byte order.

Reviewed by:		bz@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23957
2020-03-04 16:41:25 +00:00
Michael Tuexen
6605e5791f Don't send an uninitilised traffic class in the IPv6 header, when
sending a TCP segment from the TCP SYN cache (like a SYN-ACK).
This fix initialises it to zero. This is correct for the ECN bits,
but is does not honor the DSCP what an application might have set via
the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be
fixed separately.

Reviewed by:		Richard Scheffenegger
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D23900
2020-03-04 12:22:53 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Randall Stewart
481be5de9d White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by:	Netflix Inc.
2020-02-12 13:31:36 +00:00
Randall Stewart
596ae436ef This small fix makes it so we properly follow
the RFC and only enable ECN when both the
CWR and ECT bits our set within the SYN packet.

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D23645
2020-02-12 13:04:19 +00:00
Gleb Smirnoff
b955545386 Make ip6_output() and ip_output() require network epoch.
All callers that before may called into these functions
without network epoch now must enter it.
2020-01-22 05:51:22 +00:00
Gleb Smirnoff
bab98355f9 Add some documenting NET_EPOCH_ASSERTs. 2020-01-22 02:37:47 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Michael Tuexen
3cf38784e2 Move all ECN related flags from the flags to the flags2 field.
This allows adding more ECN related flags in the future.
No functional change intended.

Submitted by:		Richard Scheffenegger
Reviewed by:		rrs@, tuexen@
Differential Revision:	https://reviews.freebsd.org/D22497
2019-12-01 21:01:33 +00:00
Michael Tuexen
fa49a96419 In order for the TCP Handshake to support ECN++, and further ECN-related
improvements, the ECN bits need to be exposed to the TCP SYNcache.
This change is a minimal modification to the function headers, without any
functional change intended.

Submitted by:		Richard Scheffenegger
Reviewed by:		rgrimes@, rrs@, tuexen@
Differential Revision:	https://reviews.freebsd.org/D22436
2019-12-01 18:05:02 +00:00
Gleb Smirnoff
032677ceb5 Now that there is no R/W lock on PCB list the pcblist sysctls
handlers can be greatly simplified.  All the previous double
cycling and complex locking was added to avoid these functions
holding global PCB locks for extended period of time, preventing
addition of new entries.
2019-11-07 21:27:32 +00:00
Gleb Smirnoff
1a49612526 Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER().
Remove few outdated comments and extraneous assertions.  No
functional change here.
2019-11-07 00:08:34 +00:00
Jonathan T. Looney
0b18fb0798 Add new functionality to switch to using cookies exclusively when we the
syn cache overflows. Whether this is due to an attack or due to the system
having more legitimate connections than the syn cache can hold, this
situation can quickly impact performance.

To make the system perform better during these periods, the code will now
switch to exclusively using cookies until the syn cache stops overflowing.
In order for this to occur, the system must be configured to use the syn
cache with syn cookie fallback. If syn cookies are completely disabled,
this change should have no functional impact.

When the system is exclusively using syn cookies (either due to
configuration or the overflow detection enabled by this change), the
code will now skip acquiring a lock on the syn cache bucket. Additionally,
the code will now skip lookups in several places (such as when the system
receives a RST in response to a SYN|ACK frame).

Reviewed by:	rrs, gallatin (previous version)
Discussed with:	tuexen
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21644
2019-09-26 15:18:57 +00:00
Jonathan T. Looney
0bee4d631a Access the syncache secret directly from the V_tcp_syncache variable,
rather than indirectly through the backpointer to the tcp_syncache
structure stored in the hashtable bucket.

This also allows us to remove the requirement in syncookie_generate()
and syncookie_lookup() that the syncache hashtable bucket must be
locked.

Reviewed by:	gallatin, rrs
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21644
2019-09-26 15:06:46 +00:00
Jonathan T. Looney
867e98f8ee Remove the unused sch parameter to the syncache_respond() function. The
use of this parameter was removed in r313330. This commit now removes
passing this now-unused parameter.

Reviewed by:	gallatin, rrs
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D21644
2019-09-26 15:02:34 +00:00
Andrew Gallatin
d2e6258258 Avoid unneeded call to arc4random() in syncache_add()
Don't call arc4random() unconditionally to initialize sc_iss, and
then when syncookies are enabled, just overwrite it with the
return value from from syncookie_generate(). Instead, only call
arc4random() to initialize sc_iss when syncookies are not
enabled.

Note that on a system under a syn flood attack, arc4random()
becomes quite expensive, and the chacha_poly crypto that it calls
is one of the more expensive things happening on the
system. Removing this unneeded arc4random() call reduces CPU from
about 40% to about 35% in my test scenario (Broadwell Xeon, 6Mpps
syn flood attack).

Reviewed by:	rrs, tuxen, bz
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21591
2019-09-11 18:48:26 +00:00
Michael Tuexen
bc35229fad When an ACK segment as the third message of the three way handshake is
received and support for time stamps was negotiated in the SYN/SYNACK
exchange, perform the PAWS check and only expand the syn cache entry if
the check is passed.
Without this check, endpoints may get stuck on the incomplete queue.

Reviewed by:		jtl@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20374
2019-05-26 17:18:14 +00:00
Andrew Gallatin
50575ce11c Track TCP connection's NUMA domain in the inpcb
Drivers can now pass up numa domain information via the
mbuf numa domain field.  This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by:	markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20028
2019-04-25 15:37:28 +00:00
Michael Tuexen
0999766ddf Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial
used by TCP.

Reviewed by:		rrs@, 0mp@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19355
2019-03-23 21:36:59 +00:00
Michael Tuexen
3b853844d7 Reduce the TCP initial retransmission timeout from 3 seconds to
1 second as allowed by RFC 6298.

Reviewed by:		kbowling@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18941
2019-02-20 18:03:43 +00:00
Michael Tuexen
c6dcb64b18 Use exponential backoff for retransmitting SYN segments as specified
in the TCP RFCs.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18974
2019-02-20 17:56:38 +00:00
Michael Tuexen
989321df11 Get the arithmetic right...
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-01-24 16:47:18 +00:00
Michael Tuexen
42395cbe31 Kill a trailing whitespace character...
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-01-24 16:43:13 +00:00
Michael Tuexen
34bb795ba1 Update a comment to reflect the current reality.
SYN-cache entries live for abaut 12 seconds, not 45, when default
setting are used.

MFC after:		1 week
Sponsored by:		Netflix, Inc.
2019-01-24 16:40:14 +00:00
Michael Tuexen
6999f6975c Remove debug code which slipped in accidently.
MFC after:		4 weeks
X-MFC with:		r339989
Sponsored by:		Netflix, Inc.
2018-11-01 11:41:40 +00:00
Michael Tuexen
099ab39f44 Improve a comment to refer to the actual sections in the TCP
specification for the comparisons made.
Thanks to lstewart@ for the suggestion.

MFC after:		4 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17595
2018-11-01 11:35:28 +00:00
Michael Tuexen
93899d10b4 The handling of RST segments in the SYN-RCVD state exists in the
code paths. Both are not consistent and the one on the syn cache code
does not conform to the relevant specifications (Page 69 of RFC 793
and Section 4.2 of RFC 5961).

This patch fixes this:
* The sequence numbers checks are fixed as specified on
  page Page 69 RFC 793.
* The sysctl variable net.inet.tcp.insecure_rst is now honoured
  and the behaviour as specified in Section 4.2 of RFC 5961.

Approved by:		re (gjb@)
Reviewed by:		bz@, glebius@, rrs@,
Differential Revision:	https://reviews.freebsd.org/D17595
Sponsored by:		Netflix, Inc.
2018-10-18 19:21:18 +00:00
Michael Tuexen
078a49a077 Remove the unused parameter 'locked' from the function
syncache_respond(). There is no functional change. The
parameter became unused in r313330, but wasn't removed.

Approved by:		re (kib@)
MFC after:		1 month
Sponsored by:		Netflix, Inc.
2018-09-23 16:37:32 +00:00
Michael Tuexen
7d4dcc36a8 Fix the inheritance of IPv6 level socket options on TCP sockets.
This was broken for IPv6 listening socket, which are not IPV6_ONLY,
and the accepted TCP connection was using IPv4.

Reviewed by:		bz@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16792
2018-08-21 14:07:36 +00:00
Michael Tuexen
8e02b4e00c Don't expose the uptime via the TCP timestamps.
The TCP client side or the TCP server side when not using SYN-cookies
used the uptime as the TCP timestamp value. This patch uses in all
cases an offset, which is the result of a keyed hash function taking
the source and destination addresses and port numbers into account.
The keyed hash function is the same a used for the initial TSN.

Reviewed by:		rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16636
2018-08-19 14:56:10 +00:00
Michael Tuexen
6138da62a9 Add missing send/recv dtrace probes for TCP.
These missing probe are mostly in the syncache and timewait code.

Reviewed by:		markj@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16369
2018-07-30 20:13:38 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Michael Tuexen
43b223f42e When retransmitting TCP SYN-ACK segments with the TCP timestamp option
enabled use an updated timestamp instead of reusing the one used in
the initial TCP SYN-ACK segment.

This patch ensures that an updated timestamp is used when sending the
SYN-ACK from the syncache code. It was already done if the
SYN-ACK was retransmitted from the generic code.

This makes the behaviour consistent and also conformant with
the TCP specification.

Reviewed by:		jtl@, Jason Eggleston
MFC after:		1 month
Sponsored by:		Neflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D15634
2018-06-15 12:28:43 +00:00
Michael Tuexen
c14f9fe5ef Limit the retransmission timer for SYN-ACKs by TCPTV_REXMTMAX.
Use the same logic to handle the SYN-ACK retransmission when sent from
the syn cache code as when sent from the main code.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 21:24:27 +00:00
Michael Tuexen
badef00d58 Ensure net.inet.tcp.syncache.rexmtlimit is limited by TCP_MAXRXTSHIFT.
If the sysctl variable is set to a value larger than TCP_MAXRXTSHIFT+1,
the array tcp_syn_backoff[] is accessed out of bounds.

Discussed with: jtl@
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 19:58:19 +00:00
Randall Stewart
3ee9c3c4eb This commit brings in the TCP high precision timer system (tcp_hpts).
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.

MFC after:	Never
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15020
2018-04-19 13:37:59 +00:00
Michael Tuexen
1574b1e41e Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.

Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.

PR:			226421
Reviewed by:		bz, hiren, kib
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D13514
2018-03-16 15:26:07 +00:00
Patrick Kelsey
18a7530938 Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option.
The conditional compilation support is now centralized in
tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum
theoretical code/data footprint when TCP_RFC7413 is disabled, but
nearly all the TFO code should wind up being removed by the optimizer,
the additional footprint in the syncache entries is a single pointer,
and the additional overhead in the tcpcb is at the end of the
structure.

This enables the TCP_RFC7413 kernel option by default in amd64 and
arm64 GENERIC.

Reviewed by:	hiren
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14048
2018-02-26 03:03:41 +00:00
Patrick Kelsey
c560df6f12 This is an implementation of the client side of TCP Fast Open (TFO)
[RFC7413]. It also includes a pre-shared key mode of operation in
which the server requires the client to be in possession of a shared
secret in order to successfully open TFO connections with that server.

The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14047
2018-02-26 02:53:22 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Patrick Kelsey
3f43239f21 The soisconnected() call removed from syncache_socket() in r307966 was
not extraneous in the TCP Fast Open (TFO) passive-open case.  In the
TFO passive-open case, syncache_socket() is being called during
processing of a TFO SYN bearing a valid cookie, and a call to
soisconnected() is required in order to allow the application to
immediately consume any data delivered in the SYN and to have a chance
to generate response data to accompany the SYN-ACK.  The removal of
this call to soisconnected() effectively converted all TFO passive
opens to having the same RTT cost as a standard 3WHS.

This commit adds a call to soisconnected() to syncache_tfo_expand() so
that it is only in the TFO passive-open path, thereby restoring TFO
passve-open RTT performance and preserving the non-TFO connection-rate
performance gains realized by r307966.

MFC after:	1 week
Sponsored by:	Limelight Networks
2017-10-01 23:37:17 +00:00
Sepherosa Ziehau
fc572e261f tcp: Don't "negotiate" MSS.
_NO_ OSes actually "negotiate" MSS.

RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."

This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:

    In syncache_respond() do not reply with a MSS that is larger than what
    the peer announced to us but make it at least tcp_minmss in size.

    Sponsored by:   TCP/IP Optimization Fundraise 2005

The tcp_minmss behaviour is still kept.

Syncookie fix was prodded by tuexen, who also helped to test this
patch w/ packetdrill.

Reviewed by:	tuexen, karels, bz (previous version)
MFC after:	2 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12430
2017-09-27 05:52:37 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Michael Tuexen
8cb5a8e90a Fix the ICMP6 handling for TCP.
The ICMP6 packets might not be contained in a single mbuf. So don't
assume this. Keep the IPv4 and IPv6 code in sync and make explicit
that the syncache code only need the TCP sequence number, not the
complete TCP header.

MFC after:		3 days
Sponsored by:		Netflix, Inc.
2017-06-03 21:53:58 +00:00
Michael Tuexen
75e7a91649 Represent "a syncache overflow hasn't happend yet" by using
-(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is
good enough and works when time_t is int32 or int64.
This fixes the issue reported by cy@ on i386.

Reported by:	cy
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2017-04-21 06:05:34 +00:00
Michael Tuexen
190d9abce7 Syncoockies can be used in combination with the syncache. If the cache
overflows, syncookies are used.
This patch restricts the usage of syncookies in this case: accept
syncookies only if there was an overflow of the syncache recently.
This mitigates a problem reported in PR217637, where is syncookie was
accepted without any recent drops.
Thanks to glebius@ for suggesting an improvement.

PR:			217637
Reviewed by:		gnn, glebius
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10272
2017-04-20 19:19:33 +00:00
Gleb Smirnoff
cc65eb4e79 Hide struct inpcb, struct tcpcb from the userland.
This is a painful change, but it is needed.  On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD.  We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.

Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
  kernel structures inpcb and tcpcb inside.  Export into these structures
  the fields from inpcb and tcpcb that are known to be used, and put there
  a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.

Reviewed by:	rrs, gnn
Differential Revision:	D10018
2017-03-21 06:39:49 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00