Commit Graph

265 Commits

Author SHA1 Message Date
Michael Tuexen
6999f6975c Remove debug code which slipped in accidently.
MFC after:		4 weeks
X-MFC with:		r339989
Sponsored by:		Netflix, Inc.
2018-11-01 11:41:40 +00:00
Michael Tuexen
099ab39f44 Improve a comment to refer to the actual sections in the TCP
specification for the comparisons made.
Thanks to lstewart@ for the suggestion.

MFC after:		4 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17595
2018-11-01 11:35:28 +00:00
Michael Tuexen
93899d10b4 The handling of RST segments in the SYN-RCVD state exists in the
code paths. Both are not consistent and the one on the syn cache code
does not conform to the relevant specifications (Page 69 of RFC 793
and Section 4.2 of RFC 5961).

This patch fixes this:
* The sequence numbers checks are fixed as specified on
  page Page 69 RFC 793.
* The sysctl variable net.inet.tcp.insecure_rst is now honoured
  and the behaviour as specified in Section 4.2 of RFC 5961.

Approved by:		re (gjb@)
Reviewed by:		bz@, glebius@, rrs@,
Differential Revision:	https://reviews.freebsd.org/D17595
Sponsored by:		Netflix, Inc.
2018-10-18 19:21:18 +00:00
Michael Tuexen
078a49a077 Remove the unused parameter 'locked' from the function
syncache_respond(). There is no functional change. The
parameter became unused in r313330, but wasn't removed.

Approved by:		re (kib@)
MFC after:		1 month
Sponsored by:		Netflix, Inc.
2018-09-23 16:37:32 +00:00
Michael Tuexen
7d4dcc36a8 Fix the inheritance of IPv6 level socket options on TCP sockets.
This was broken for IPv6 listening socket, which are not IPV6_ONLY,
and the accepted TCP connection was using IPv4.

Reviewed by:		bz@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16792
2018-08-21 14:07:36 +00:00
Michael Tuexen
8e02b4e00c Don't expose the uptime via the TCP timestamps.
The TCP client side or the TCP server side when not using SYN-cookies
used the uptime as the TCP timestamp value. This patch uses in all
cases an offset, which is the result of a keyed hash function taking
the source and destination addresses and port numbers into account.
The keyed hash function is the same a used for the initial TSN.

Reviewed by:		rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16636
2018-08-19 14:56:10 +00:00
Michael Tuexen
6138da62a9 Add missing send/recv dtrace probes for TCP.
These missing probe are mostly in the syncache and timewait code.

Reviewed by:		markj@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16369
2018-07-30 20:13:38 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Michael Tuexen
43b223f42e When retransmitting TCP SYN-ACK segments with the TCP timestamp option
enabled use an updated timestamp instead of reusing the one used in
the initial TCP SYN-ACK segment.

This patch ensures that an updated timestamp is used when sending the
SYN-ACK from the syncache code. It was already done if the
SYN-ACK was retransmitted from the generic code.

This makes the behaviour consistent and also conformant with
the TCP specification.

Reviewed by:		jtl@, Jason Eggleston
MFC after:		1 month
Sponsored by:		Neflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D15634
2018-06-15 12:28:43 +00:00
Michael Tuexen
c14f9fe5ef Limit the retransmission timer for SYN-ACKs by TCPTV_REXMTMAX.
Use the same logic to handle the SYN-ACK retransmission when sent from
the syn cache code as when sent from the main code.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 21:24:27 +00:00
Michael Tuexen
badef00d58 Ensure net.inet.tcp.syncache.rexmtlimit is limited by TCP_MAXRXTSHIFT.
If the sysctl variable is set to a value larger than TCP_MAXRXTSHIFT+1,
the array tcp_syn_backoff[] is accessed out of bounds.

Discussed with: jtl@
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 19:58:19 +00:00
Randall Stewart
3ee9c3c4eb This commit brings in the TCP high precision timer system (tcp_hpts).
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.

MFC after:	Never
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D15020
2018-04-19 13:37:59 +00:00
Michael Tuexen
1574b1e41e Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.

Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.

PR:			226421
Reviewed by:		bz, hiren, kib
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D13514
2018-03-16 15:26:07 +00:00
Patrick Kelsey
18a7530938 Greatly reduce the number of #ifdefs supporting the TCP_RFC7413 kernel option.
The conditional compilation support is now centralized in
tcp_fastopen.h and tcp_var.h. This doesn't provide the minimum
theoretical code/data footprint when TCP_RFC7413 is disabled, but
nearly all the TFO code should wind up being removed by the optimizer,
the additional footprint in the syncache entries is a single pointer,
and the additional overhead in the tcpcb is at the end of the
structure.

This enables the TCP_RFC7413 kernel option by default in amd64 and
arm64 GENERIC.

Reviewed by:	hiren
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14048
2018-02-26 03:03:41 +00:00
Patrick Kelsey
c560df6f12 This is an implementation of the client side of TCP Fast Open (TFO)
[RFC7413]. It also includes a pre-shared key mode of operation in
which the server requires the client to be in possession of a shared
secret in order to successfully open TFO connections with that server.

The names of some existing fastopen sysctls have changed (e.g.,
net.inet.tcp.fastopen.enabled -> net.inet.tcp.fastopen.server_enable).

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14047
2018-02-26 02:53:22 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Patrick Kelsey
3f43239f21 The soisconnected() call removed from syncache_socket() in r307966 was
not extraneous in the TCP Fast Open (TFO) passive-open case.  In the
TFO passive-open case, syncache_socket() is being called during
processing of a TFO SYN bearing a valid cookie, and a call to
soisconnected() is required in order to allow the application to
immediately consume any data delivered in the SYN and to have a chance
to generate response data to accompany the SYN-ACK.  The removal of
this call to soisconnected() effectively converted all TFO passive
opens to having the same RTT cost as a standard 3WHS.

This commit adds a call to soisconnected() to syncache_tfo_expand() so
that it is only in the TFO passive-open path, thereby restoring TFO
passve-open RTT performance and preserving the non-TFO connection-rate
performance gains realized by r307966.

MFC after:	1 week
Sponsored by:	Limelight Networks
2017-10-01 23:37:17 +00:00
Sepherosa Ziehau
fc572e261f tcp: Don't "negotiate" MSS.
_NO_ OSes actually "negotiate" MSS.

RFC 879:
"... This Maximum Segment Size (MSS) announcement (often mistakenly
called a negotiation) ..."

This negotiation behaviour was introduced 11 years ago by r159955
without any explaination about why FreeBSD had to "negotiate" MSS:

    In syncache_respond() do not reply with a MSS that is larger than what
    the peer announced to us but make it at least tcp_minmss in size.

    Sponsored by:   TCP/IP Optimization Fundraise 2005

The tcp_minmss behaviour is still kept.

Syncookie fix was prodded by tuexen, who also helped to test this
patch w/ packetdrill.

Reviewed by:	tuexen, karels, bz (previous version)
MFC after:	2 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12430
2017-09-27 05:52:37 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Michael Tuexen
8cb5a8e90a Fix the ICMP6 handling for TCP.
The ICMP6 packets might not be contained in a single mbuf. So don't
assume this. Keep the IPv4 and IPv6 code in sync and make explicit
that the syncache code only need the TCP sequence number, not the
complete TCP header.

MFC after:		3 days
Sponsored by:		Netflix, Inc.
2017-06-03 21:53:58 +00:00
Michael Tuexen
75e7a91649 Represent "a syncache overflow hasn't happend yet" by using
-(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is
good enough and works when time_t is int32 or int64.
This fixes the issue reported by cy@ on i386.

Reported by:	cy
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2017-04-21 06:05:34 +00:00
Michael Tuexen
190d9abce7 Syncoockies can be used in combination with the syncache. If the cache
overflows, syncookies are used.
This patch restricts the usage of syncookies in this case: accept
syncookies only if there was an overflow of the syncache recently.
This mitigates a problem reported in PR217637, where is syncookie was
accepted without any recent drops.
Thanks to glebius@ for suggesting an improvement.

PR:			217637
Reviewed by:		gnn, glebius
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D10272
2017-04-20 19:19:33 +00:00
Gleb Smirnoff
cc65eb4e79 Hide struct inpcb, struct tcpcb from the userland.
This is a painful change, but it is needed.  On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD.  We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.

Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
  kernel structures inpcb and tcpcb inside.  Export into these structures
  the fields from inpcb and tcpcb that are known to be used, and put there
  a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.

Reviewed by:	rrs, gnn
Differential Revision:	D10018
2017-03-21 06:39:49 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Hiren Panchasara
6134aabe38 Add a knob to change default behavior of inheriting listen socket's tcp stack
regardless of what the default stack for the system is set to.

With current/default behavior, after changing the default tcp stack, the
application needs to be restarted to pick up that change. Setting this new knob
net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that
behavior and make any new connection use the newly selected default tcp stack.

Reviewed by:		rrs
MFC after:		2 weeks
Sponsored by:		Limelight Networks
2017-01-27 23:10:46 +00:00
Gleb Smirnoff
030b9c2f69 Remove assigned only variable. 2016-12-21 22:47:10 +00:00
Hiren Panchasara
2806b2933b For RTT calculations mid-session, we explicitly ignore ACKs with tsecr of 0 as
many borken middle-boxes tend to do that. But during 3whs, in syncache_expand(),
we don't do that which causes us to send a RST to such a client. Relax this
constraint by only using tsecr to compare against timestamp that we sent when it
is not 0. As a result, we'd now accept the final ACK of 3whs with tsecr of 0.

Reviewed by:	    jtl, gnn
Sponsored by:	    Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D8552
2016-11-21 20:53:11 +00:00
Julien Charbon
f1ee30ccd6 Remove an extraneous call to soisconnected() in syncache_socket(),
introduced with r261242.  The useful and expected soisconnected()
call is done in tcp_do_segment().

Has been found as part of unrelated PR:212920 investigation.

Improve slightly (~2%) the maximum number of TCP accept per second.

Tested by:		kevin.bowling_kev009.com, jch
Approved by:		gnn, hiren
MFC after:		1 week
Sponsored by:		Verisign, Inc
Differential Revision:	https://reviews.freebsd.org/D8072
2016-10-26 15:19:18 +00:00
Patrick Kelsey
09c305eb65 Fix cases where the TFO pending counter would leak references, and eventually, memory.
Also renamed some tfo labels and added/reworked comments for clarity.

Based on an initial patch from jtl.

PR: 213424
Reviewed by:	jtl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D8235
2016-10-15 01:41:28 +00:00
Jonathan T. Looney
68bd7ed102 The TFO server-side code contains some changes that are not conditioned on
the TCP_RFC7413 kernel option. This change removes those few instructions
from the packet processing path.

While not strictly necessary, for the sake of consistency, I applied the
new IS_FASTOPEN macro to all places in the packet processing path that
used the (t_flags & TF_FASTOPEN) check.

Reviewed by:	hiren
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D8219
2016-10-12 19:06:50 +00:00
Julien Charbon
c1b19923a3 Fix an issue with accept_filter introduced with r261242:
As a side effect of r261242 when using accept_filter the
first call to soisconnected() is done earlier in tcp_input()
instead of tcp_do_segment() context.  Restore the expected behaviour.

Note:  This call to soisconnected() seems to be extraneous in all
cases (with or without accept_filter).  Will be addressed in a
separate commit.

PR:			212920
Reported by:		Alexey
Tested by:              Alexey, jch
Sponsored by:           Verisign, Inc.
MFC after:		1 week
2016-09-29 11:18:48 +00:00
Randall Stewart
587d67c008 Here we update the modular tcp to be able to switch to an
alternate TCP stack in other then the closed state (pre-listen/connect).
The idea is that *if* that is supported by the alternate stack, it
is asked if its ok to switch. If it approves the "handoff" then we
allow the switch to happen. Also the fini() function now gets a flag
to tell if you are switching away *or* the tcb is destroyed. The
init() call into the alternate stack is moved to the end so the
tcb is more fully formed before the init transpires.

Sponsored by:	Netflix Inc.
Differential Revision:	D6790
2016-08-16 15:11:46 +00:00
Sepherosa Ziehau
e6ec45f869 tcp/syncache: Add comment for syncache_respond
Suggested by:	hiren, hps
Reviewed by:	sbruno
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6148
2016-05-10 04:59:04 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Sepherosa Ziehau
9340a8d5b9 tcp/syncache: Set flowid and hash type properly for SYN|ACK
So the underlying drivers can use it to select the sending queue
properly for SYN|ACK instead of rolling their own hash.

Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6120
2016-04-29 07:23:08 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Bjoern A. Zeeb
dc95d65555 Mfp: r296309
While there is no dependency interaction, stopping the timer before
freeing the rest of the resources seems more natural and avoids it
being scheduled an extra time when it is no longer needed.

Reviewed by:	gnn, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5733
2016-04-09 10:51:07 +00:00
Gleb Smirnoff
bf840a1707 Redo r294869. The array of counters for TCP states doesn't belong to
struct tcpstat, because the structure can be zeroed out by netstat(1) -z,
and of course running connection counts shouldn't be touched.

Place running connection counts into separate array, and provide
separate read-only sysctl oid for it.
2016-03-15 00:15:10 +00:00
Gleb Smirnoff
75dd79d937 Grab a snap amount of TCP connections in syncache from tcpstat. 2016-01-27 00:48:05 +00:00
Gleb Smirnoff
57a78e3bae Augment struct tcpstat with tcps_states[], which is used for book-keeping
the amount of TCP connections by state.  Provides a cheap way to get
connection count without traversing the whole pcb list.

Sponsored by:	Netflix
2016-01-27 00:45:46 +00:00
Patrick Kelsey
281a0fd4f9 Implementation of server-side TCP Fast Open (TFO) [RFC7413].
TFO is disabled by default in the kernel build.  See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.

Reviewed by:	gnn, jch, stas
MFC after:	3 days
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D4350
2015-12-24 19:09:48 +00:00
Randall Stewart
55bceb1e2b First cut of the modularization of our TCP stack. Still
to do is to clean up the timer handling using the async-drain.
Other optimizations may be coming to go with this. Whats here
will allow differnet tcp implementations (one included).
Reviewed by:	jtl, hiren, transports
Sponsored by:	Netflix Inc.
Differential Revision:	D4055
2015-12-16 00:56:45 +00:00
Gleb Smirnoff
388909a12a Use Jenkins hash for TCP syncache.
o Unlike xor, in Jenkins hash every bit of input affects virtually
  every bit of output, thus salting the hash actually works. With
  xor salting only provides a false sense of security, since if
  hash(x) collides with hash(y), then of course, hash(x) ^ salt
  would also collide with hash(y) ^ salt. [1]
o Jenkins provides much better distribution than xor, very close to
  ideal.

TCP connection setup/teardown benchmark has shown a 10% increase
with default hash size, and with bigger hashes that still provide
possibility for collisions. With enormous hash size, when dataset is
by an order of magnitude smaller than hash size, the benchmark has
shown 4% decrease in performance decrease, which is expected and
acceptable.

Noticed by:	Jeffrey Knockel <jeffk cs.unm.edu> [1]
Benchmarks by:	jch
Reviewed by:	jch, pkelsey, delphij
Security:	strengthens protection against hash collision DoS
Sponsored by:	Nginx, Inc.
2015-09-05 10:15:19 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Hiren Panchasara
ec446b1375 Make syncookie_mac() use 'tcp_seq irs' in computing hash.
This fixes what seems like a simple oversight when the function was added in
r253210.

Reported by:            Daniel Borkmann <dborkman@redhat.com>
                        Florian Westphal <fw@strlen.de>
Differential Revision:  https://reviews.freebsd.org/D1628
Reviewed by:            gnn
MFC after:              1 month
Sponsored by:           Limelight Networks
2015-01-30 17:29:07 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Andrey V. Elsukov
7e4217558c Fix typo. 2014-10-31 11:40:49 +00:00
Alexander V. Chernikov
29c47f18da * Split tcp_signature_compute() into 2 pieces:
- tcp_get_sav() - SADB key lookup
 - tcp_signature_do_compute() - actual computation
* Fix TCP signature case for listening socket:
  do not assume EVERY connection coming to socket
  with TCP_SIGNATURE set to be md5 signed regardless
  of SADB key existance for particular address. This
  fixes the case for routing software having _some_
  BGP sessions secured by md5.
* Simplify TCP_SIGNATURE handling in tcp_input()

MFC after:	2 weeks
2014-09-27 07:04:12 +00:00