Commit Graph

4080 Commits

Author SHA1 Message Date
Michael Tuexen
936fc35bb3 Change the name of an internal structure, since the name
is used by a structure of the (new) SCTP API.

MFC after: 1 week.
2011-05-06 20:40:33 +00:00
Andrey V. Elsukov
318b735cc3 Convert delay parameter back to ms when reporting to user.
PR:		156838
MFC after:	1 week
2011-05-06 07:13:34 +00:00
Michael Tuexen
c3d72c80d3 Implement Resource Pooling V2 and an MPTCP like congestion
control.
Based on a patch received from Martin Becke.

MFC after: 2 weeks.
2011-05-04 21:27:05 +00:00
Michael Tuexen
274b0bd51d Remove code with any effect. 2011-05-03 20:34:02 +00:00
Michael Tuexen
1d663b4658 Add a missing break. This bug was introduced in r221249.
MFC after: 1 week
2011-05-03 20:32:21 +00:00
John Baldwin
f701e30d7f Handle a rare edge case with nearly full TCP receive buffers. If a TCP
buffer fills up causing the remote sender to enter into persist mode, but
there is still room available in the receive buffer when a window probe
arrives (either due to window scaling, or due to the local application
very slowing draining data from the receive buffer), then the single byte
of data in the window probe is accepted.  However, this can cause rcv_nxt
to be greater than rcv_adv.  This condition will only last until the next
ACK packet is pushed out via tcp_output(), and since the previous ACK
advertised a zero window, the ACK should be pushed out while the TCP
pcb is write-locked.

During the window while rcv_nxt is greather than rcv_adv, a few places
would compute the remaining receive window via rcv_adv - rcv_nxt.
However, this value was then (uint32_t)-1.  On a 64 bit machine this
could expand to a positive 2^32 - 1 when cast to a long.  In particular,
when calculating the receive window in tcp_output(), the result would be
that the receive window was computed as 2^32 - 1 resulting in advertising
a far larger window to the remote peer than actually existed.

Fix various places that compute the remaining receive window to either
assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the
window as full if rcv_nxt is greather than rcv_adv.

Reviewed by:	bz
MFC after:	1 month
2011-05-02 21:05:52 +00:00
Michael Tuexen
ea5eba1157 Some more cleanups related to an kernel without INET.
MFC after: 1 week
2011-05-02 15:53:00 +00:00
Bjoern A. Zeeb
29bd2010d4 Fix a mismerge from p4 in that in_localaddr() is not available without INET.
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 16:30:18 +00:00
Michael Tuexen
d085528d04 Remove some leftover debug code.
MFC after: 1 week
2011-04-30 11:22:30 +00:00
Bjoern A. Zeeb
b287c6c70c Make the TCP code compile without INET. Sort #includes and add #ifdef INETs.
Add some comments at #endifs given more nestedness.  To make the compiler
happy, some default initializations were added in accordance with the style
on the files.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:21:29 +00:00
Michael Tuexen
e6194c2ed4 Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
  IPv6.
* Checking for additional addresses takes the correct address
  into account and also does not do more comparisons than
  necessary.

This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.

MFC after: 1 week
2011-04-30 11:18:16 +00:00
Bjoern A. Zeeb
79288c112c Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:17:00 +00:00
Bjoern A. Zeeb
67107f4594 Make the PCB code compile without INET support by adding #ifdef INETs
and correcting few #includes.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:04:34 +00:00
John Baldwin
672dc4aea2 TCP reuses t_rxtshift to determine the backoff timer used for both the
persist state and the retransmit timer.  However, the code that implements
"bad retransmit recovery" only checks t_rxtshift to see if an ACK has been
received in during the first retransmit timeout window.  As a result, if
ticks has wrapped over to a negative value and a socket is in the persist
state, it can incorrectly treat an ACK from the remote peer as a
"bad retransmit recovery" and restore saved values such as snd_ssthresh and
snd_cwnd.  However, if the socket has never had a retransmit timeout, then
these saved values will be zero, so snd_ssthresh and snd_cwnd will be set
to 0.

If the socket is in fast recovery (this can be caused by excessive
duplicate ACKs such as those fixed by 220794), then each ACK that arrives
triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd
to be no larger than snd_ssthresh.  In effect, the socket's send window
is permamently stuck at 0 even though the remote peer is advertising a
much larger window and pending data is only sent via TCP window probes
(so one byte every few seconds).

Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that
the various snd_*_prev fields in the pcb are valid and only perform
"bad retransmit recovery" if this flag is set in the pcb.  The flag is set
on the first retransmit timeout that occurs and is cleared on subsequent
retransmit timeouts or when entering the persist state.

Reviewed by:	bz
MFC after:	2 weeks
2011-04-29 15:40:12 +00:00
Bjoern A. Zeeb
b8e463e644 MfP4 CH=192029:
Expose ip_icmp.c to INET6 as well and only export badport_bandlim()
along with the two sysctls in the non-INET case.
The bandlim types work for all cases I reviewed in IPv6 as well and
the sysctls are available as we export net.inet.* from in_proto.c.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:36:35 +00:00
Bjoern A. Zeeb
74e9dcf786 MfP4 CH=192004:
Move ip_defttl to raw_ip.c where it is actually used.  In an IPv6
only world we do not want to compile ip_input.c in for that and
it is a shared default with INET6.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:32:27 +00:00
Bjoern A. Zeeb
a0ae8f04e8 Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs.  For module builds the framework needs
adjustments for at least carp.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:30:44 +00:00
Attilio Rao
2903309aca Add the possibility to verify MD5 hash of incoming TCP packets.
As long as this is a costy function, even when compiled in (along with
the option TCP_SIGNATURE), it can be disabled via the
net.inet.tcp.signature_verify_input sysctl.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, bz
MFC after:	2 weeks
2011-04-25 17:13:40 +00:00
Bjoern A. Zeeb
acaeca65b3 Be less strict on includes than in r220746. We need in.h for both
INET or INET6 as it holds all the IPPROTO_* definitions needed
for the SYSCTL_NODE definitions.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	5 days
2011-04-25 16:36:16 +00:00
Gleb Smirnoff
acdef0460e Use size_t for sopt_valsize.
Submitted by:	Brandon Gooch <jamesbrandongooch gmail.com>
2011-04-21 08:18:55 +00:00
Bjoern A. Zeeb
00c081e908 MFp4 CH=191760:
When compiling out INET we still need the initialization routines
as well as the tuning and montoring sysctls shared with IPv6.

Move the two send/recvspace variables up from the middle of the
file to ease compiling out the INET only code.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	3 days
2011-04-20 08:03:22 +00:00
Bjoern A. Zeeb
aae49dd304 MFp4 CH=191470:
Move the ipport_tick_callout and related functions from ip_input.c
to in_pcb.c.  The random source port allocation code has been merged
and is now local to in_pcb.c only.
Use a SYSINIT to get the callout started and no longer depend on
initialization from the inet code, which would not work in an IPv6
only setup.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-20 08:00:29 +00:00
Bjoern A. Zeeb
ec4f97277f MFp4 CH=191466:
Move fw_one_pass to where it belongs: it is a property of ipfw,
not of ip_input.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	3 days
2011-04-20 07:55:33 +00:00
Gleb Smirnoff
9d0a2ddf69 - Rewrite functions that copyin/out NAT configuration, so that they
calculate required memory size dynamically.
- Fix races on chain re-lock.
- Introduce new field to ip_fw_chain - generation count. Now utilized
  only in the NAT configuration, but can be utilized wider in ipfw.
- Get rid of NAT_BUF_LEN in ip_fw.h

PR:		kern/143653
2011-04-19 15:06:33 +00:00
Andrey V. Elsukov
e3665201f5 Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit
and .pipe_slot_limit oids to prevent to set incorrect values.

MFC after:	2 weeks
2011-04-19 11:33:39 +00:00
Andrey V. Elsukov
8ad66025f6 ipdn_bound_var() functions is designed to bound a variable between
specified minimum and maximum. In case when specified default value
is out of bounds it does not work as expected and does not limit
variable. Check that default value is in range and limit it if needed.
Also bump max_hash_size value to 65536 to correspond with manual page.

PR:		kern/152887
MFC after:	2 weeks
2011-04-19 11:29:09 +00:00
Andrey V. Elsukov
3ab4af737d Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks.
MFC after:	1 week
2011-04-19 05:59:37 +00:00
Gleb Smirnoff
ca47294ddf LibAliasInit() should allocate memory with M_WAITOK flag. Modify it
and its callers.
2011-04-18 20:07:08 +00:00
Gleb Smirnoff
d0e16e0d1e Pullup up to TCP header length before matching against 'tcpopts'.
PR:		kern/156180
Reviewed by:	luigi
2011-04-18 18:22:10 +00:00
John Baldwin
da84b2e6c5 When checking to see if a window update should be sent to the remote peer,
don't force a window update if the window would not actually grow due to
window scaling.  Specifically, if the window scaling factor is larger than
2 * MSS, then after the local reader has drained 2 * MSS bytes from the
socket, a window update can end up advertising the same window.  If this
happens, the supposed window update actually ends up being a duplicate ACK.
This can result in an excessive number of duplicate ACKs when using a
higher maximum socket buffer size.

Reviewed by:	bz
MFC after:	1 month
2011-04-18 17:43:16 +00:00
Bjoern A. Zeeb
336d023b2e Make in_proto.c dependent on either inet or inet6.
While it does not provide any functionality for IPv6, it provides
the sysctl nodes for net.inet.* that a lot of functionality shared
between IPv4 and IPv6 depends on.  We cannot change these anymore
without breaking a lot of management and tuning.

In case of IPv6 only, we compile out everything but the sysctl node
declarations.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC After:	5 days
2011-04-17 16:35:16 +00:00
Edward Tomasz Napierala
79bb84fb15 Refactor udp_input(), moving calls to u_tun_func() into udp_append().
Obtained from: Wheel Systems Sp. z o.o.
Reviewed by:	bz@
2011-04-14 10:40:57 +00:00
Bjoern A. Zeeb
05b9d121aa The mbuf_frag_size always was and is file local and not queried from base
user space tools via kvm.  Mark it static.

MFC after:	3 days
2011-04-14 09:47:09 +00:00
Sergey Kandaurov
6bed196c35 Staticize malloc types.
Approved by:	lstewart
MFC after:	1 week
2011-04-13 11:28:46 +00:00
Andrey V. Elsukov
9974d151ec Restore previous behaviour - always match rule when we doing tagging,
even when tag is already exists.

Reported by:	Vadim Goncharov
MFC after:	1 week
2011-04-12 15:20:34 +00:00
Lawrence Stewart
891b8ed467 Use the full and proper company name for Swinburne University of Technology
throughout the source tree.

Requested by:	Grenville Armitage, Director of CAIA at Swinburne University of
			Technology
MFC after:	3 days
2011-04-12 08:13:18 +00:00
Jack F Vogel
c31aa19c53 Port of the LRO fix from mxge driver to the generic
LRO code. Thanks to Andrew Gallatin for the change.

MFC after:  7 days
2011-04-07 21:20:26 +00:00
Andrey V. Elsukov
a5620cc6c5 Fill up src_port and dst_port variables for SCTP over IPv4.
PR:		kern/153415
MFC after:	1 week
2011-03-31 16:30:14 +00:00
Andrey V. Elsukov
5600c92750 Fix malloc types.
MFC after:	1 week
2011-03-31 15:11:12 +00:00
Andrey V. Elsukov
3d10d64fd3 Fix a memory leak. Memory that is allocated for schedulers hash table
was not freed.

PR:		kern/156083
MFC after:	1 week
2011-03-31 15:10:41 +00:00
John Baldwin
766282cbe7 Clamp the initial advertised receive window when responding to a SYN/ACK
to the maximum allowed window.  Growing the window too large would cause
an underflow in the calculations in tcp_output() to decide if a window
update should be sent which would prevent the persist timer from being
started if data was pending and the other end of the connection advertised
an initial window size of 0.

PR:		kern/154006
Submitted by:	Stefan `Sec` Zehl  sec 42 org
Reviewed by:	bz
MFC after:	1 week
2011-03-30 12:35:39 +00:00
Weongyo Jeong
c45e1b3cad Covers values if (BYTES_THIS_ACK(tp, th) / tp->t_maxseg) value is from
2.0 to 3.0.

Reviewed by:	lstewart
2011-03-28 19:03:56 +00:00
Sergey Kandaurov
79d514355c Reference ifaddr object before unlocking as it can be freed
from another context at the moment of later access.

PR:		kern/155555
Submitted by:	Andrew Boyer <aboyer att averesystems.com>
Approved by:	avg (mentor)
MFC after:	2 weeks
2011-03-21 14:19:40 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb
4d457387fe Properly check for an IPv4 socket after r219579.
In some cases as udp6_connect() without an earlier bind(2) to an
address, v4-mapped scokets allowed and a non mapped destination
address, we can end up here with both v4 and v6 indicated:
	inp_vflag = (INP_IPV4|INP_IPV6|INP_IPV6PROTO)

In that case however laddrp is NULL as the IPv6 path does not
pass in a copy currently.

Reported by:	Pawel Worach (pawel.worach gmail.com)
Tested by:	Pawel Worach (pawel.worach gmail.com)
MFC after:	6 days
X-MFC with:	r219579
2011-03-19 19:08:54 +00:00
Bjoern A. Zeeb
efc76f729a Merge the two identical implementations for local port selections from
in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport().

MFC after:	2 weeks
2011-03-12 21:46:37 +00:00
Randall Stewart
f79aab1866 Tunes and fixes the new DC-CC to seem to hit the
right mix.  Still may need some tweaks but it
appears to almost not give away too much to an
RFC2581 flow, but can really minimize the amount of
buffers used in the net.

MFC after:	3 months
2011-03-08 11:58:25 +00:00
Randall Stewart
48b6c64938 Adds a new Congestion Control that helps reduce
the RTT that a flow will build up in buffers in
transit. It is a slight modification to RFC2581
but is more friendly i.e. less aggressive.

MFC after:	3 months
2011-03-01 00:37:46 +00:00
Dimitry Andric
cb8750c269 Fix breakage in sys/netinet/sctp_sysctl.c, introduced by r219057. If
SCTP_HAS_RTTC is not defined, this file fails to compile.  Insert the
necessary #ifdefs to make it work.

Pointy hat to:	rrs
2011-02-26 22:45:40 +00:00
Randall Stewart
299108c5a2 Improvements to CC modules:
1) Add four new points that allow you to get more information
   to cc algo's
2) Fix the case where user changes module on a existing TCB, in
   such a case, the initialization module needs to be called on all nets.
3) Move htcp_cc structure to a union that other modules can use.
4) Add 5th point for get/set socket options for cc_module specific options

MFC after:	2 months
2011-02-26 15:23:46 +00:00