Commit Graph

5601 Commits

Author SHA1 Message Date
Conrad Meyer
bac5bedf44 tcp_usrreq: Free allocated buffer in relock case
The disgusting macro INP_WLOCK_RECHECK may early-return.  In
tcp_default_ctloutput() the TCP_CCALGOOPT case allocates memory before invoking
this macro, which may leak memory.

Add a _CLEANUP variant that takes a code argument to perform variable cleanup
in the early return path.  Use it to free the 'pbuf' allocated in
tcp_default_ctloutput().

I am not especially happy with this macro, but I reckon it's not any worse than
INP_WLOCK_RECHECK already was.

Reported by:	Coverity
CID:		1350286
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 23:02:18 +00:00
Michael Tuexen
7e372b1a40 Remove a function, which is not used anymore. 2016-04-23 09:15:58 +00:00
Jonathan T. Looney
b8c2cd15e9 Prevent underflows in tp->snd_wnd if the remote side ACKs more than
tp->snd_wnd. This can happen, for example, when the remote side responds to
a window probe by ACKing the one byte it contains.

Differential Revision:	https://reviews.freebsd.org/D5625
Reviewed by:	hiren
Obtained from:	Juniper Networks (earlier version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2016-04-21 15:06:53 +00:00
Pedro F. Giffuni
63b6b7a74a Indentation issues.
Contract some lines leftover from r298310.

Mea culpa.
2016-04-20 16:19:44 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Michael Tuexen
b1deed45e6 Address issues found by the XCode code analyzer. 2016-04-18 20:16:41 +00:00
Michael Tuexen
f8ee69bf81 Fix signed/unsigned warnings. 2016-04-18 11:39:41 +00:00
Michael Tuexen
a39ddef038 Fix a warning about an unused variable. 2016-04-18 09:39:46 +00:00
Michael Tuexen
98d5fd976b Put panic() calls under INVARIANTS. 2016-04-18 09:29:14 +00:00
Michael Tuexen
f2ea2a2d5f Cleanup debug output. 2016-04-18 06:58:07 +00:00
Michael Tuexen
e187bac213 Don't use anonymous unions. 2016-04-18 06:38:53 +00:00
Michael Tuexen
24a9e1b53b Remove a left-over debug printf(). 2016-04-18 06:32:24 +00:00
Michael Tuexen
b9dd6a90b6 Fix the ICMP6 handling for SCTP.
Keep the IPv4 code in sync.

MFC after:	1 week
2016-04-16 21:34:49 +00:00
Pedro F. Giffuni
99d628d577 netinet: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae. tuexen
2016-04-15 15:46:41 +00:00
Andrey V. Elsukov
2acdf79f53 Add External Actions KPI to ipfw(9).
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 22:51:23 +00:00
Michael Tuexen
4d6b853ad6 Allow the handling of ICMP messages sent in response to SCTP packets
containing an INIT chunk. These need to be handled in case the peer
does not support SCTP and returns an ICMP messages indicating destination
unreachable, protocol unreachable.

MFC after:	1 week
2016-04-14 19:59:21 +00:00
Michael Tuexen
f77b842746 When delivering an ICMP packet to the ctlinput function, ensure that
the outer IP header, the ICMP header, the inner IP header and the
first n bytes are stored in contgous memory. The ctlinput functions
currently rely on this for n = 8. This fixes a bug in case the inner IP
header had options.
While there, remove the options from the outer header and provide a
way to increase n to allow improved ICMP handling for SCTP. This will
be added in another commit.

MFC after:	1 week
2016-04-14 19:51:29 +00:00
Luiz Otavio O Souza
de89d74b70 Do not overwrite the dchg variable.
It does not cause any real issues because the variable is overwritten
only when the packet is forwarded (and the variable is not used anymore).

Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2016-04-14 18:57:30 +00:00
Michael Tuexen
08b9595770 Refactor the handling of ICMP/IPv4 packets for SCTP/IPv4.
This cleansup the code and prepares upcoming handling of ICMP/IPv4 packets
for SCTP/UDP/IPv4 packets. IPv6 changes will follow...

MFC after:	3 days
2016-04-12 21:40:54 +00:00
Michael Tuexen
cf4476eb39 When processing an ICMP packet containing an SCTP packet, it
is required to check the verification tag. However, this
requires the verification tag to be not 0. Enforce this.
For packets with a verification tag of 0, we need to
check it it contains an INIT chunk and use the initiate
tag for the validation. This will be a separate commit,
since it touches also other code.

MFC after: 1 week
2016-04-12 11:48:54 +00:00
Bjoern A. Zeeb
806929d514 Mfp: r296310,r296343
It looks like as with the safety belt of DELAY() fastened (*) we can
completely tear down and free all memory for TCP (after r281599).

(*) in theory a few ticks should be good enough to make sure the timers
are all really gone. Could we use a better matric here and check a
tcbcb count as an optimization?

PR:		164763
Reviewed by:	gnn, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5734
2016-04-09 12:05:23 +00:00
Bjoern A. Zeeb
8586a9635f Mfp: r296260
The tcp_inpcb (pcbinfo) zone should be safe to destroy.

PR:		164763
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5732
2016-04-09 11:27:47 +00:00
Bjoern A. Zeeb
f254aeda60 Mfp: r296259
We attach the "counter" to the tcpcbs. Thus don't free the
TCP Fastopen zone before the tcpcbs are gone, as otherwise
the zone won't be empty.
With that it should be safe to destroy the "tfo" zone without
leaking the memory.

PR:		164763
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5731
2016-04-09 10:58:08 +00:00
Bjoern A. Zeeb
dc95d65555 Mfp: r296309
While there is no dependency interaction, stopping the timer before
freeing the rest of the resources seems more natural and avoids it
being scheduled an extra time when it is no longer needed.

Reviewed by:	gnn, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5733
2016-04-09 10:51:07 +00:00
Bjoern A. Zeeb
e18b26d377 Mfp: r296345
No need to keep type stability on raw sockets zone.
We've also been running with a KASSERT since r222488 to make sure the
ipi_count is 0 on destroy.

PR:		164763
Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5735
2016-04-09 10:44:57 +00:00
Bjoern A. Zeeb
4c86b2bc13 Mfp: r296346
No reason identified to keep UMA_ZONE_NOFREE here.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5736
2016-04-09 10:39:54 +00:00
Randall Stewart
9d18771f69 A couple of minor changes that I missed that Michael had done, most noted
in these is the change to non-strict ordering for incoming data (this will
make pkt-drill test 14 fail but its expected).
2016-04-07 09:34:41 +00:00
Randall Stewart
44249214d3 This is work done by Michael Tuexen and myself at the IETF. This
adds the new I-Data (Interleaved Data) message. This allows a user
to be able to have complete freedom from Head Of Line blocking that
was previously there due to the in-ability to send multiple large
messages without the TSN's being in sequence. The code as been
tested with Michaels various packet drill scripts as well as
inter-networking between the IETF's location in Argentina and Germany.
2016-04-07 09:10:34 +00:00
Michael Tuexen
e2823e8570 Set the chunk id for ERROR chunks.
This is work with rrs@.
MFC after:	1 week
2016-04-01 20:38:15 +00:00
Sepherosa Ziehau
1ea448225c tcp/lro: Change SLIST to LIST, so that removing an entry is O(1)
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.

Reviewed by:	rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5765
2016-04-01 06:43:05 +00:00
Sepherosa Ziehau
6dd38b8716 tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c

Reviewed by:	gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5725
2016-04-01 06:28:33 +00:00
George V. Neville-Neil
ce223fb715 Unbreak the RSS/PCBGROUp build. 2016-03-31 00:53:23 +00:00
Edward Tomasz Napierala
35030a5dd4 Remove some NULL checks for M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-03-29 13:56:59 +00:00
Michael Tuexen
a08b29253d Don't allow the user to set a peer primary which is restricted
and not pending.

MFC after: 1 week
2016-03-28 19:32:13 +00:00
Michael Tuexen
76f8482a93 Restrict local addresses until they are acked by the peer.
MFC after: 1 week
2016-03-28 19:31:10 +00:00
Michael Tuexen
5114dccbd4 Trigger sending of queued ASCONF chunks if outstanding ones are ACKED.
MFC after:	1 week
2016-03-28 11:32:20 +00:00
Michael Tuexen
9a8e308861 Improve compilation on windows 64-bit (for the userland stack).
MFC after:	1 week
2016-03-27 10:04:25 +00:00
Sepherosa Ziehau
489f0c3c17 tcp/lro: Return TCP_LRO_NO_ENTRIES if we are short of LRO entries.
So that callers could react accordingly.

Reviewed by:	gallatin (no objection)
MFC after:	1 week
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5695
2016-03-25 02:54:13 +00:00
Bjoern A. Zeeb
4f321dbd1c Fix compile errors after r297225:
- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
  using gcc happy [-Wreturn-type]
  "type qualifiers ignored on function return type"
  I am not entirely happy with this solution putting the u_int there
  but it will do for now.
2016-03-24 11:40:10 +00:00
George V. Neville-Neil
84cc0778d0 FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
Michael Tuexen
ed65436366 Add const to several constants. Thanks to Nicholas Nethercote for
providing the patch via
https://bugzilla.mozilla.org/show_bug.cgi?id=1255655

MFC after:	1 week
2016-03-23 13:28:04 +00:00
Jonathan T. Looney
5d20f97461 to_flags is currently a 64-bit integer; however, we only use 7 bits.
Furthermore, there is no reason this needs to be a 64-bit integer
for the forseeable future.

Also, there is an inconsistency between to_flags and the mask in
tcp_addoptions(). Before r195654, to_flags was a u_long and the mask in
tcp_addoptions() was a u_int. r195654 changed to_flags to be a u_int64_t
but left the mask in tcp_addoptions() as a u_int, meaning that these
variables will only be the same width on platforms with 64-bit integers.

Convert both to_flags and the mask in tcp_addoptions() to be explicitly
32-bit variables. This may save a few cycles on 32-bit platforms, and
avoids unnecessarily mixing types.

Differential Revision:	https://reviews.freebsd.org/D5584
Reviewed by:	hiren
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2016-03-22 15:55:17 +00:00
Hans Petter Selasky
d4d32b9fec Fix kernel build after adding new sysctl asserts in r296933. 2016-03-16 10:42:24 +00:00
Gleb Smirnoff
bf840a1707 Redo r294869. The array of counters for TCP states doesn't belong to
struct tcpstat, because the structure can be zeroed out by netstat(1) -z,
and of course running connection counts shouldn't be touched.

Place running connection counts into separate array, and provide
separate read-only sysctl oid for it.
2016-03-15 00:15:10 +00:00
Gleb Smirnoff
2f06d2ab91 Comment fix: statistics are not read-only. 2016-03-14 18:06:59 +00:00
Bjoern A. Zeeb
19edab1711 Remove duplicate external declaration of tcprexmtthresh making
gcc compiles barf.
2016-03-13 21:26:18 +00:00
John Baldwin
47cedcbd72 Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem.
Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5515
2016-03-11 23:18:06 +00:00
Michael Tuexen
1fabc43e9f Actually send a asconf chunk, not only queue one.
MFC after: 3 days
2016-03-10 00:27:10 +00:00
Randall Stewart
ec64c84ddc Fix a sneaky bug where we were missing an extern
to get the rxt threshold.. and thus created our own defaulted to 0 :-(

Sponsored by:	Netflix Inc
2016-03-08 00:16:34 +00:00
Jonathan T. Looney
737d4f6c93 As reported on the transport@ and current@ mailing lists, the FreeBSD TCP
stack is not compliant with RFC 7323, which requires that TCP stacks send
a timestamp option on all packets (except, optionally, RSTs) after the
session is established.

This patch adds that support. It also adds a TCP signature option to the
packet, if appropriate.

PR:		206047
Differential Revision:	https://reviews.freebsd.org/D4808
Reviewed by:	hiren
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2016-03-07 15:00:34 +00:00
Jonathan T. Looney
9cbade8feb Some cleanup in tcp_respond() in preparation for another change:
- Reorder variables by size
- Move initializer closer to where it is used
- Remove unneeded variable

Differential Revision:	https://reviews.freebsd.org/D4808
Reviewed by:	hiren
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2016-03-07 14:59:49 +00:00
George V. Neville-Neil
e79cb051d5 Fix dtrace probes (introduced in 287759): debug__input was used
for output and drop; connect didn't always fire a user probe
some probes were missing in fastpath

Submitted by:	Hannes Mehnert
Sponsored by:	REMS, EPSRC
Differential Revision:	https://reviews.freebsd.org/D5525
2016-03-03 17:46:38 +00:00
Bryan Drewery
6971a63795 Fix build after r29592. 2016-02-23 21:21:47 +00:00
Randall Stewart
6e0efc6a39 This fixes the fastpath code to have a better module initialization sequence when
included in loader.conf. It also fixes it so that no matter if some one incorrectly
specifies a load order, the lists and such will be initialized on demand at that
time so no one can make that mistake.

Reviewed by:	hiren
Differential Revision:	D5189
2016-02-23 17:53:39 +00:00
Michael Tuexen
64a3a6304e Use the SCTP level pointer, not the interface level.
MFC after:	3 days
2016-02-19 11:25:18 +00:00
Michael Tuexen
861f6d1196 Add protection code.
MFC after:	3 days
CID:		748858
2016-02-18 21:33:10 +00:00
Michael Tuexen
fdc4c9d067 Add some protection code.
CID:		1331893
MFC after:	3 days
2016-02-18 21:21:45 +00:00
Sepherosa Ziehau
7ae3d4bf54 tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limit
ACK aggregation limit is append count based, while the TCP data segment
aggregation limit is length based.  Unless the network driver sets these
two limits, it's an NO-OP.

Reviewed by:	adrian, gallatin (previous version), hselasky (previous version)
Approved by:	adrian (mentor)
MFC after:	1 week
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5185
2016-02-18 04:58:34 +00:00
Michael Tuexen
828318e155 Add protection code for issues reported by PVS / D5245.
MFC after:	3 days
2016-02-17 18:12:38 +00:00
Michael Tuexen
815f806b82 Code cleanup which will silence a warning in PVS / D5245. 2016-02-17 18:04:22 +00:00
Michael Tuexen
7b0fd8f2af Address a warning reported by D5245 / PVS.
MFC after:	3 days
2016-02-17 17:52:46 +00:00
Michael Tuexen
467f0d55b4 Whitespace changes. 2016-02-16 20:33:18 +00:00
Michael Tuexen
2b1c7de4d8 Improve the teardown of the SCTP stack.
Obtained from:	bz@
MFC after: 1 week
2016-02-16 19:36:25 +00:00
Michael Tuexen
e51963a7bb Loopback addresses are 127.0.0.0/8, not 127.0.0.1/32.
MFC after: 1 week
2016-02-11 22:29:39 +00:00
Michael Tuexen
b028cf319e Use 4 spaces instead of a tab. 2016-02-11 18:35:46 +00:00
Devin Teske
41c0ec9a16 Merge SVN r295220 (bz) from projects/vnet/
Fix a panic that occurs when a vnet interface is unavailable at the time the
vnet jail referencing said interface is stopped.

Sponsored by:	FIS Global, Inc.
2016-02-11 17:07:19 +00:00
Hans Petter Selasky
3e9470b721 Use a pair of ifs when comparing the 32-bit flowid integers so that
the sign bit doesn't cause an overflow. The overflow manifests itself
as a sorting index wrap around in the middle of the sorted array,
which is not a problem for the LRO code, but might be a problem for
the logic inside qsort().

Reviewed by:		gnn @
Sponsored by:		Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D5239
2016-02-11 10:03:50 +00:00
Gleb Smirnoff
b4b12e52fb Garbage collect unused arguments of m_init(). 2016-02-10 18:54:18 +00:00
Bjoern A. Zeeb
a5243af262 Code duplication but rib_head is special. Not found an easy way to go
back and harmize the use cases among RIB, IPFW, PF yet but it's also not
the scope of this work.   Prevents instant panics on teardown and frees
the FIB bits again.

Sponsored by:	The FreeBSD Foundation
2016-02-03 21:56:51 +00:00
Bjoern A. Zeeb
2414e86439 MfH @r295202
Expect to see panics in routing code at least now.
2016-02-03 11:49:51 +00:00
Alfred Perlstein
7325dfbb59 Increase max allowed backlog for listen sockets
from short to int.

PR: 203922
Submitted by: White Knight <white_knight@2ch.net>
MFC After: 4 weeks
2016-02-02 05:57:59 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Michael Tuexen
5322a0968e Add missing parentheses. This was reported by ccaughie via GitHub
for the userland stack.

MFC after: 3 days
2016-01-30 17:32:46 +00:00
Michael Tuexen
3cf729a920 Update the path mtu when turning on/off UDP encapsulation for SCTP.
MFC after: 3 days
2016-01-30 16:56:39 +00:00
Michael Tuexen
ca83f93c09 Don't allow a remote encapsulation port change during the
SCTP restart procedure.

MFC after: 3 days
2016-01-30 12:58:38 +00:00
Michael Tuexen
4edd31fc71 Don't change the remote UDP encapsulation port for SCTP packets
containing an INIT chunk.

MFC after: 3 days
2016-01-30 11:10:22 +00:00
Michael Tuexen
843d04a89e Ignore peer addresses in a consistent way also when checking for
new addresses during restart. If this is not done, restart doesn't
work when the local socket is IPv4 only and the peer uses
IPv4 and IPv6 addresses.

MFC after: 3 days.
2016-01-30 10:39:05 +00:00
Michael Tuexen
a4cab32319 Remove debug output which was committed by accident.
Thanks to Oliver Pinter for reporting.

MFC after: 3 days
X-MFC with: r294995
2016-01-28 23:12:12 +00:00
Michael Tuexen
79b67faaf6 Always look in the TCP pool.
This fixes issues with a restarting peer when the listening
1-to-1 style socket is closed.

MFC after: 3 days
2016-01-28 16:05:46 +00:00
Gleb Smirnoff
4644fda3f7 Rename netinet/tcp_cc.h to netinet/cc/cc.h.
Discussed with:	lstewart
2016-01-27 17:59:39 +00:00
Gleb Smirnoff
af6fef3abb Fix issues with TCP_CONGESTION handling after r294540:
o Return back the buf[TCP_CA_NAME_MAX] for TCP_CONGESTION,
  for TCP_CCALGOOPT use dynamically allocated *pbuf.
o For SOPT_SET TCP_CONGESTION do NULL terminating of string
  taking from userland.
o For SOPT_SET TCP_CONGESTION do the search for the algorithm
  keeping the inpcb lock.
o For SOPT_GET TCP_CONGESTION first strlcpy() the name
  holding the inpcb lock into temporary buffer, then copyout.

Together with:	lstewart
2016-01-27 07:34:00 +00:00
Gleb Smirnoff
75dd79d937 Grab a snap amount of TCP connections in syncache from tcpstat. 2016-01-27 00:48:05 +00:00
Gleb Smirnoff
57a78e3bae Augment struct tcpstat with tcps_states[], which is used for book-keeping
the amount of TCP connections by state.  Provides a cheap way to get
connection count without traversing the whole pcb list.

Sponsored by:	Netflix
2016-01-27 00:45:46 +00:00
Gleb Smirnoff
d17d4c6b2a Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros. 2016-01-27 00:20:07 +00:00
Hiren Panchasara
0645c6049d Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.

Submitted by:		Jason Wolfe (j at nitrology dot com)
Reviewed by:		gnn, rrs, hiren, bz
MFC after:		1 week
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D5024
2016-01-26 16:33:38 +00:00
Alexander V. Chernikov
0d6a516eb8 Convert TCP mtu checks to the new routing KPI. 2016-01-25 10:06:49 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Bjoern A. Zeeb
70a0984741 sctp_asconf_iterator_end() has an unused second argument; compiles
better if you add it.

Sponsored by:	The FreeBSD Foundation
2016-01-23 12:56:28 +00:00
Bjoern A. Zeeb
d30c4f99ed Noisy comments (not sure if the static would be valid for all SCTP
implementations).

Reorder some cleanup just to match the general order we normally use.

Sponsored by:	The FreeBSD Foundation
2016-01-23 12:52:08 +00:00
Bjoern A. Zeeb
765cf0b825 Try to prevent an address (assoc) leak in one way or another when
sctp_initiate_iterator() fails.

Sponsored by:	The FreeBSD Foundation
2016-01-23 12:51:12 +00:00
Bjoern A. Zeeb
ce1d6b0efa Use sctp_asconf_iterator_end() rather than doing the cleanup manually.
Sponsored by:	The FreeBSD Foundation
2016-01-23 12:50:02 +00:00
Bjoern A. Zeeb
27a01c6c0c Try to catch a couple of SCTP teardown race conditions.
Saw all the printfs already.

Note: not sure the atomics are needed but without them, the condition
would never trigger, and we'd still see panics (which could have been
due to the insert race).  Will work my way backwards in case this stays
stable.

Sponsored by:	The FreeBSD Foundation
2016-01-23 11:05:13 +00:00
Bjoern A. Zeeb
eef5775f02 Fix build and avoid a double-free in the VIMAGE case.
Sponsored by:	The FreeBSD Foundation
2016-01-22 19:43:26 +00:00
Bjoern A. Zeeb
bb84e3d77d Correct function arguments for SYSUNINITs.
Sponsored by:	The FreeBSD Foundation
2016-01-22 18:39:23 +00:00
Bjoern A. Zeeb
1bbe967cc4 Correct function arguments for SYSUNINITs.
Obtained from:	p4 @180834
Sponsored by:	The FreeBSD Foundation
2016-01-22 18:37:17 +00:00
Bjoern A. Zeeb
4ce8702050 Correct function arguments for SYSUNINITs.
Add #ifdef VIMAGE, as in other cases it's dead code.

Obtained from:	p4 @180832
Sponsored by:	The FreeBSD Foundation
2016-01-22 18:35:11 +00:00
Bjoern A. Zeeb
8bdb5261e6 Correct function arguments for SYSUNINITs.
Obtained from:	p4 @180885
Sponsored by:	The FreeBSD Foundation
2016-01-22 18:29:02 +00:00
Bjoern A. Zeeb
9ff1c4634f Correct function arguments for SYSUNINITs.
Obtained from:	p4 @180886
Sponsored by:	The FreeBSD Foundation
2016-01-22 18:26:58 +00:00
Bjoern A. Zeeb
f2cf0121ca MFp4 @180887:
With pr_destroy being gone, call ip_destroy from an ordered
  VNET_SYSUNINT.  Make ip_destroy() static.

Sponsored by:	The FreeBSD Foundation
2016-01-22 18:22:03 +00:00
Bjoern A. Zeeb
009e81b164 MFH @r294567 2016-01-22 15:11:40 +00:00
Bjoern A. Zeeb
1f12da0e82 Just checkpoint the WIP in order to be able to make the tree update
easier.  Note:  this is currently not in a usable state as certain
teardown parts are not called and the DOMAIN rework is missing.
More to come soon and find its way to head.

Obtained from:	P4 //depot/user/bz/vimage/...
Sponsored by:	The FreeBSD Foundation
2016-01-22 15:00:01 +00:00
Gleb Smirnoff
d519cedbad Provide new socket option TCP_CCALGOOPT, which stands for TCP congestion
control algorithm options.  The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.

Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.

The new API doesn't yet have any in tree consumers.

The original code written by lstewart.
Reviewed by:	rrs, emax
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D711
2016-01-22 02:07:48 +00:00
Gleb Smirnoff
73e263b182 Refactor TCP_CONGESTION setsockopt handling:
- Use M_TEMP instead of stack variable.
- Unroll error handling, removing several levels of indentation.
2016-01-21 22:53:12 +00:00
Gleb Smirnoff
2de3e790f5 - Rename cc.h to more meaningful tcp_cc.h.
- Declare it a kernel only include, which it already is.
- Don't include tcp.h implicitly from tcp_cc.h
2016-01-21 22:34:51 +00:00
Gleb Smirnoff
b66d74c138 Cleanup TCP files from unnecessary interface related includes. 2016-01-21 22:24:20 +00:00
Bjoern A. Zeeb
df56caeeb1 The variable is write once only and not used.
Recover the vertical space.

Sponsored by:		The FreeBSD Foundation
MFC After:		3 days
Obtained from:		p4 CH=180830
Reviewed by:		gnn, hiren
Differential Revision:	https://reviews.freebsd.org/D4898
2016-01-21 17:25:41 +00:00
Hans Petter Selasky
e936121d31 Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets
  according to the hash type and flowid. This prevents exhaustion of
  the LRO entries due to too many connections at the same time.
  Testing using a larger number of higher bandwidth TCP connections
  showed that the incoming ACK packet aggregation rate increased from
  ~1.3:1 to almost 3:1. Another test showed that for a number of TCP
  connections greater than 16 per hardware receive ring, where 8 TCP
  connections was the LRO active entry limit, there was a significant
  improvement in throughput due to being able to fully aggregate more
  than 8 TCP stream. For very few very high bandwidth TCP streams, the
  optimizing LRO wrapper will add CPU usage instead of reducing CPU
  usage. This is expected. Network drivers which want to use the
  optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
  of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
  "tcp_lro_flush()". Further the LRO control structure must be
  initialized using "tcp_lro_init_args()" passing a non-zero number
  into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
  statistics which can be prone to wrap-around. Fix this while at it
  and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
  especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
  control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
  change of the LRO control structure size.

Sponsored by:	Mellanox Technologies
Reviewed by:	gallatin, sbruno, rrs, gnn, transport
Tested by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D4914
2016-01-19 15:33:28 +00:00
Michael Tuexen
c7e732ae61 Fix a bug in INIT handling on accepted 1-to-1 style sockets when the
listener is closed.
This fix allows the following packetdrill test to pass:
// Setup a connected, blocking 1-to-1 style socket
+0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3
// Check the handshake with en empty(!) cookie
+0.0 bind(3, ..., ...) = 0
+0.0 listen(3, 1) = 0
+0.0 < sctp: INIT[flgs=0, tag=1, a_rwnd=1500, os=1, is=1, tsn=1]
+0.0 > sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=..., os=..., is=..., tsn=1, ...]
+0.0 < sctp: COOKIE_ECHO[flgs=0, len=..., val=...]
+0.0 > sctp: COOKIE_ACK[flgs=0]
+0.0 accept(3, ..., ...) = 4
+0.0 close(3) = 0
// Inject an INIT chunk and expect an INIT-ACK
+0.0 < sctp: INIT[flgs=0, tag=3, a_rwnd=1500, os=1, is=1, tsn=1]
+0.0 > sctp: INIT_ACK[flgs=0, tag=..., a_rwnd=..., os=..., is=..., tsn=..., ...]

MFC after:	3 days
2016-01-15 00:26:15 +00:00
Michael Tuexen
ebee3dc229 Fail the SCTP_GET_ASSOC_NUMBER and SCTP_GET_ASSOC_ID_LIST
socket options for 1-to-1 style sockets as specified in RFC 6458.

MFC after:	3 days
2016-01-14 11:25:28 +00:00
Gleb Smirnoff
f73d9fd2f1 There is a bug in tcp_output()'s implementation of the TCP_SIGNATURE
(RFC 2385/TCP-MD5) kernel option.

If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called,
and to.to_signature is an uninitialized stack variable. The value
is later used as write offset, which leads to writing to random
address.

Submitted by:	rstone, jtl
Security:	SA-16:05.tcp
2016-01-14 10:22:45 +00:00
Alexander V. Chernikov
10e0e23528 Remove now-unused wrappers for various routing functions. 2016-01-14 08:54:44 +00:00
Michael Tuexen
fa89f69240 Store the timer type for logging, because the timer can be freed
during processing the timerout.

MFC after:	3 days
2016-01-13 14:28:12 +00:00
Alexander V. Chernikov
59747033cd Bring RADIX_MPATH support to new routing KPI to ease migration.
Move actual rte selection process from rtalloc_mpath_fib()
  to the rt_path_selectrte() function. Add public
  rt_mpath_select() to use in fibX_lookup_ functions.
2016-01-11 08:45:28 +00:00
Alexander V. Chernikov
36402a681f Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
  to check if this particular mbuf needs to be dropped (and what
  error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
  regardless of RTF_HOST flag existence. This is due to upcoming routing
  changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
  to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
  error value. In fact, the only place where this result is propagated
  is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
  RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
  rte flags to ro_flags. Call this function in ip_output() after looking up/
  verifying rte.

Reviewed by:	ae
2016-01-09 16:34:37 +00:00
Alexander V. Chernikov
ea8d14925c Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
Alexander V. Chernikov
f2b2e77a41 (Temporarily) remove route_redirect_event eventhandler.
Such handler should pass different set of variables, instead
  of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
  current code.
Will re-add it after finishing most major routing-related changes.

Discussed with:	np
2016-01-09 06:26:40 +00:00
Jonathan T. Looney
49b375e74b Apply the changes from r293284 to one additional file.
Discussed with:	glebius
2016-01-07 11:54:20 +00:00
Gleb Smirnoff
0c39d38d21 Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
  options length. The functions gives a better estimate, since it takes
  into account SACK state as well.

Reviewed by:	jtl
Differential Revision:	https://reviews.freebsd.org/D3593
2016-01-07 00:14:42 +00:00
Michael Tuexen
79cadff48d Get struct sctp_net_route in sync with struct route again. 2016-01-04 20:34:40 +00:00
Alexander V. Chernikov
45a8de88c6 Maintain consistent behavior: make fib4_lookup_nh_ext() return
rt_ifp pointer by default, as done by other fib lookup functions.
2016-01-04 17:23:10 +00:00
Alexander V. Chernikov
9a1b64d5a0 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
Alexander V. Chernikov
65d2872948 Fix fib4_lookup_nh_ext() flags/flowid order messed up while merging. 2016-01-03 16:13:03 +00:00
Alexander V. Chernikov
6cdb18544d Remove second EVENTHANDLER_REGISTER slipped in r292978.
Describe the reason of doing unconditional M_PREPEND in ether_output().
2016-01-01 10:15:06 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Jonathan T. Looney
2d8868dbb7 When checking the inp_ip_minttl restriction for IPv6 packets, don't check
the IPv4 header.

CID:	1017920
Differential Revision:	https://reviews.freebsd.org/D4727
Reviewed by:	bz
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2015-12-29 19:20:39 +00:00
Allan Jude
7a3f5d11fb Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c
cperciva's libmd implementation is 5-30% faster

The same was done for SHA256 previously in r263218

cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation

Extend sbin/md5 to create sha384(1)

Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}

Reviewed by:	cperciva, des, delphij
Approved by:	secteam, bapt (mentor)
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3929
2015-12-27 17:33:59 +00:00
Michael Tuexen
1672adc7b1 Don't implicitly terminate a user message when moving it to the
send_queue and the socket is closed. This results in strange
race conditions for the application.
While there, remove a stray character.

MFC after: 3 days
2015-12-25 18:11:40 +00:00
Kevin Lo
ddb1359877 Fix typo (s/harware/hardware/) 2015-12-25 14:51:36 +00:00
Patrick Kelsey
281a0fd4f9 Implementation of server-side TCP Fast Open (TFO) [RFC7413].
TFO is disabled by default in the kernel build.  See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.

Reviewed by:	gnn, jch, stas
MFC after:	3 days
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D4350
2015-12-24 19:09:48 +00:00
Sergey Kandaurov
e62b9bca9a Fixed comment placement.
Before r12296, this comment described the udp_recvspace default value.

Spotted by:	ru
Sponsored by:	Nginx, Inc.
2015-12-24 13:57:43 +00:00
Bjoern A. Zeeb
616bc4f476 If bootverbose is enabled every vnet startup and virtual interface
creation will print extra lines on the console. We are generally not
interested in this (repeated) information for each VNET. Thus only
print it for the default VNET. Virtual interfaces on the base system
will remain printing information, but e.g. each loopback in each vnet
will no longer cause a "bpf attached" line.

Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4531
2015-12-22 15:00:04 +00:00
Bjoern A. Zeeb
0a03cf8ca6 Since r256624 we've been leaking routing table allocations
on vnet enabled jail shutdown. Call the provided cleanup
routines for IP versions 4 and 6 to plug these leaks.

Sponsored by:		The FreeBSD Foundation
MFC atfer:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4530
2015-12-22 14:53:19 +00:00
Jonathan T. Looney
c54a41def7 Fix a panic when launching VNETs after the commit of r292309.
Differential Revision:	https://reviews.freebsd.org/D4645
Reviewed by:	rrs
Reported by:	kp
Tested by:	kp
Sponsored by:	Juniper Networks
2015-12-22 13:41:50 +00:00
Michael Tuexen
fe4a59b30a Stop processing of a SACK when the association has been aborted.
MFC after: 3 days
2015-12-21 18:52:02 +00:00
Steven Hartland
d6e82913c1 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Steven Hartland
3a909afe8e Fix issues introduced by r292275
* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by:	kp & melifaro
X-MFC-With:	r292275
MFC after:	1 month
Sponsored by:	Multiplay
2015-12-16 22:26:28 +00:00
Randall Stewart
f4e476c893 Remove redundant extern's that make the ppc compile fail.
Thanks Ed Maste for the heads up.
2015-12-16 15:16:44 +00:00
Alexander V. Chernikov
942e4b4b79 Fix ARP reply handling changed in r286955.
If source of ARP request didn't pass the routing check
(e.g. not in directly connected network), be polite and
still answer the request instead of dropping frame.

Reported by:	quadro at irc@rusnet
2015-12-16 09:16:06 +00:00
Randall Stewart
55bceb1e2b First cut of the modularization of our TCP stack. Still
to do is to clean up the timer handling using the async-drain.
Other optimizations may be coming to go with this. Whats here
will allow differnet tcp implementations (one included).
Reviewed by:	jtl, hiren, transports
Sponsored by:	Netflix Inc.
Differential Revision:	D4055
2015-12-16 00:56:45 +00:00
Steven Hartland
52e53e2de0 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
Hiren Panchasara
4d16338223 Clean up unused bandwidth entry in the TCP hostcache.
Submitted by:		Jason Wolfe (j at nitrology dot com)
Reviewed by:		rrs, hiren
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D4154
2015-12-11 06:22:58 +00:00
Michael Tuexen
9ee7a93696 Retire sctp_validate_no_locks().
This routine checks that there are no locks held for an inp,
without having any lock on the inp. This breaks if the inp
goes away when it is called. This happens on stress tests
on a RPi B+.

MFC after:	3 days
2015-12-10 11:49:32 +00:00
Hiren Panchasara
b87170f210 r290122 added 4 bytes and removed 8 in struct sackhint. Add a pad entry of 4
bytes to restore the size.

Spotted by:	rrs
Reviewed by:	rrs
X-MFC with:	r290122
Sponsored by:	Limelight Networks
2015-12-10 03:20:10 +00:00
Alexander V. Chernikov
9977be4a64 Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
  in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
  return IPv6 destination address with embedded scope. Currently
  rw_gateway has it scope embedded, do the same for non-gatewayed
  destinations.

Sponsored by:	Yandex LLC
2015-12-09 11:14:27 +00:00
Hiren Panchasara
a934d06194 Add an option to use rfc6675 based pipe/inflight bytes calculation in newreno.
MFC after:	    3 weeks
Sponsored by:	    Limelight Networks
2015-12-09 08:53:41 +00:00
Hiren Panchasara
f81bc34eac Add an option to use rfc6675 based pipe/inflight bytes calculation in cubic.
Reviewed by:		gnn
MFC after:		3 weeks
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D4205
2015-12-09 07:56:40 +00:00
Hiren Panchasara
021eaf7996 One of the ways to detect loss is to count duplicate acks coming back from the
other end till it reaches predetermined threshold which is 3 for us right now.
Once that happens, we trigger fast-retransmit to do loss recovery.

Main problem with the current implementation is that we don't honor SACK
information well to detect whether an incoming ack is a dupack or not. RFC6675
has latest recommendations for that. According to it, dupack is a segment that
arrives carrying a SACK block that identifies previously unknown information
between snd_una and snd_max even if it carries new data, changes the advertised
window, or moves the cumulative acknowledgment point.

With the prevalence of Selective ACK (SACK) these days, improper handling can
lead to delayed loss recovery.

With the fix, new behavior looks like following:

0) th_ack < snd_una --> ignore
Old acks are ignored.
1) th_ack == snd_una, !sack_changed --> ignore
Acks with SACK enabled but without any new SACK info in them are ignored.
2) th_ack == snd_una, window == old_window --> increment
Increment on a good dupack.
3) th_ack == snd_una, window != old_window, sack_changed --> increment
When SACK enabled, it's okay to have advertized window changed if the ack has
new SACK info.
4) th_ack > snd_una --> reset to 0
Reset to 0 when left edge moves.
5) th_ack > snd_una, sack_changed --> increment
Increment if left edge moves but there is new SACK info.

Here, sack_changed is the indicator that incoming ack has previously unknown
SACK info in it.

Note: This fix is not fully compliant to RFC6675. That may require a few
changes to current implementation in order to keep per-sackhole dupack counter
and change to the way we mark/handle sack holes.

PR:			203663
Reviewed by:		jtl
MFC after:		3 weeks
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D4225
2015-12-08 21:21:48 +00:00
Alexander V. Chernikov
65ff3638df Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
Michael Tuexen
c979034b18 Fix the allocation of outgoing streams:
* When processing a cookie, use the number of
  streams announced in the INIT-ACK.
* When sending an INIT-ACK for an existing
  association, use the value from the association,
  not from the end-point.

MFC after:	1 week
2015-12-06 16:17:57 +00:00