4541 Commits

Author SHA1 Message Date
jhb
20f8f82daf Don't drop options from the third retransmitted SYN by default. If the
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not.  This can result in
very slow data transfers in the case of window scaling disagreements.

The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.

Reviewed by:	net@
MFC after:	2 weeks
2013-01-09 20:27:06 +00:00
peter
3f8d5a8f51 Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
2013-01-03 10:21:28 +00:00
tuexen
c79f406296 Some cleanups.
MFC after: 3 days
2012-12-27 08:10:58 +00:00
tuexen
9b6fdf83a8 Minor cleanups of debug messages.
MFC after: 3 days
2012-12-27 08:06:58 +00:00
tuexen
86859ae7b8 Fix a copy and paste error.
MFC after: 3 days
2012-12-27 08:02:58 +00:00
glebius
ddd09f2a26 Garbage collect carp_cksum(). 2012-12-25 14:29:38 +00:00
glebius
7ccc77532c Change net.inet.carp.demotion sysctl to add the supplied value
to the current demotion factor instead of assigning it.

  This allows external scripts to control demotion factor together
with kernel in a raceless manner.
2012-12-25 14:08:13 +00:00
glebius
2b889e1407 Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,
and arg2 doesn't pass size of arg1.
2012-12-25 13:55:21 +00:00
glebius
9f622a1b38 The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

  However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

  This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
2012-12-25 13:01:58 +00:00
glebius
08c9fb9d97 Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.
2012-12-24 21:35:48 +00:00
glebius
824cae556c Fix !INET6 build after r244365. 2012-12-18 08:14:16 +00:00
glebius
d30d64515c Clear correct flag in INET6 case. 2012-12-18 08:09:44 +00:00
ae
4fee09376a Since we use different flags to detect tcp forwarding, and we share the
same code for IPv4 and IPv6 in tcp_input, we should check both
M_IP_NEXTHOP and M_IP6_NEXTHOP flags.

MFC after:	3 days
2012-12-17 20:55:33 +00:00
glebius
8137816adb Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
glebius
ef158ba461 Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it,
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.

Reported & tested by:	Pawel Tyll <ptyll nitronet.pl>
MFC after:		3 days
2012-12-12 17:41:21 +00:00
tuexen
9194fa5887 Get it compiling without INET and INET6 support (mainly userland stack).
MFC after: 2 weeks
2012-12-08 15:11:09 +00:00
pjd
9d9aaa0e32 More warnings for zones that depend on the kern.ipc.maxsockets limit.
Obtained from:	WHEEL Systems
2012-12-08 12:51:06 +00:00
tuexen
a7d0df2ed1 Use correct padding of the ABORT chunk in case of an user initiated
abort cause is used.

MFC after: 2 weeks
2012-12-08 09:50:38 +00:00
tuexen
fe8921373b Ensure that the padding of the last parameter of an INIT chunk
is not included in the chunk length as required by RFC 4960.
While there, cleanup sctp_send_initiate().

MFC after: 2 weeks
2012-12-08 08:22:33 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
andre
cd840ea089 Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with:	yongari
MFC after:	1 week
2012-11-27 19:31:49 +00:00
andre
53bc7d240b Add DELACK to list of timers.
MFC after:	1 week
2012-11-27 19:07:28 +00:00
np
a6cf189273 Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not). 2012-11-27 06:42:44 +00:00
alfred
4d556cc343 Auto size the tcbhashsize structure based on max sockets.
While here, also make the code that enforces power-of-two more
forgiving, instead of just resetting to 512, graciously round-down
to the next lower power of two.
2012-11-27 03:04:24 +00:00
tuexen
2c40d3e68b Add support for sctp_peeloff() also in the front states of the
association.

MFC after: 3 days
2012-11-26 16:44:03 +00:00
tuexen
926b939df3 Find the endpoint for an incoming packet also if the endpoint
comes from sctp_peeloff().

MFC after: 3 days
2012-11-26 16:43:32 +00:00
tuexen
6c15e1ad42 Allow shutdown() to be used on fds returned from sctp_peeloff().
MFC after: 3 days
2012-11-26 08:50:00 +00:00
tuexen
b8042628ed Remove unused function.
MFC after: 1 week
2012-11-25 14:25:08 +00:00
tuexen
9a8531105a Add support for SCTP/UDP/IPV6.
This completes the support of
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps

MFC after: 1 week
2012-11-17 20:04:04 +00:00
tuexen
17aa08aca9 Get the accounting working. We now have counters how many
chunks for each SCTP outgoing stream are in the send and
sent queue.
While there, improve the naming of NR-SACK related constants
recently introduced.

MFC after: 1 week
2012-11-16 19:39:10 +00:00
rdivacky
ce61d80a96 Initialize hdrlen to 0 to avoid clang warning in NOINET case. 2012-11-10 10:41:00 +00:00
bz
37c975471a Cleanup some whitspace in this file to get it out of an upcoming patch.
MFC after:	10 days
2012-11-08 03:29:55 +00:00
tuexen
2e86daf6ac Add per outgoing stream accounting for chunks in the send
and sent queue. This provides no functional change, but is
a preparation for an upcoming stream reset improvement.
Done with rrs@.

MFC after: 1 week
2012-11-07 22:11:38 +00:00
tuexen
7e9001b55a Add some missing changes missed in the last commit.
MFC after: 1 week
X-MFC with: 242708
2012-11-07 21:25:32 +00:00
tuexen
5e2e6e0753 Improve PR-SCTP if used in combination with NR-SACK.
Based on work done by Mohammad Rajiullah.

MFC after: 1 week
2012-11-07 20:59:00 +00:00
kevlo
25611f9cf9 Fix typo; s/ouput/output 2012-11-07 07:00:59 +00:00
mjg
f5612c5202 Fix possible spurious sbunlock in sctp_sorecvmsg.
Reviewed by:	tuexen
Approved by:	trasz (mentor)
MFC after:	3 days
2012-11-06 23:04:23 +00:00
tuexen
f1eb961773 Move from early SSN assignment to late SSN assignment.
This doesn't change functionality, but makes upcoming change
much easier.
Developed with rrs@ at the IETF 85.

MFC after: 1 week
2012-11-05 20:55:17 +00:00
andre
b51779d7ea Back out r242262. The simplified window change/update logic wasn't
complete and ready for production use.

PR:	kern/173309
2012-11-05 09:13:06 +00:00
ae
4354018055 Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by:	andre
2012-11-02 01:20:55 +00:00
tuexen
139b791e20 Whitespace changes due to upstream integration of SCTP changes in the
FreeBSD code base.
2012-10-29 20:47:32 +00:00
tuexen
bd5ecc606d Add braces (as used elsewhere in the SCTP code). 2012-10-29 20:44:29 +00:00
tuexen
02bbac6d05 Use ntohs() and htons() in correct order. However, this doesn't change
functionality.
2012-10-29 20:42:48 +00:00
andre
844d4d2472 Forced commit to provide the correct commit message to r242251:
Defer sending an independent window update if a delayed ACK is pending
  saving a packet.  The window update then gets piggy-backed on the next
  already scheduled ACK.

Added grammar fixes as well.

MFC after:	2 weeks
2012-10-29 13:16:33 +00:00
andre
abf1521166 Define the delayed ACK timeout value directly as hz/10 instead of
obfuscating it by going through PR_FASTHZ.  No functional change.

MFC after:	2 weeks
2012-10-29 12:17:02 +00:00
andre
07dc51f3cc If the user has closed the socket then drop a persisting connection
after a much reduced timeout.

Typically web servers close their sockets quickly under the assumption
that the TCP connections goes away as well.  That is not entirely true
however.  If the peer closed the window we're going to wait for a long
time with lots of data in the send buffer.

MFC after:	2 weeks
2012-10-28 19:58:20 +00:00
andre
b824892b57 Increase the initial CWND to 10 segments as defined in IETF TCPM
draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
window improves the overall performance of many web services without
risking congestion collapse.

As long as it remains a draft it is placed under a sysctl marking it
as experimental:
 net.inet.tcp.experimental.initcwnd10 = 1
When it becomes an official RFC soon the sysctl will be changed to
the RFC number and moved to net.inet.tcp.

This implementation differs from the RFC draft in that it is a bit
more conservative in the case of packet loss on SYN or SYN|ACK because
we haven't reduced the default RTO to 1 second yet.  Also the restart
window isn't yet increased as allowed.  Both will be adjusted with
upcoming changes.

Is is enabled by default.  In Linux it is enabled since kernel 3.0.

MFC after:	2 weeks
2012-10-28 19:47:46 +00:00
andre
36473a548b Update comment to reflect the change made in r242263.
MFC after:	2 weeks
2012-10-28 19:22:18 +00:00
andre
ab8a697d0a Add SACK_PERMIT to the list of TCP options that are switched off after
retransmitting a SYN three times.

MFC after:	2 weeks
2012-10-28 19:20:23 +00:00
andre
b21f6ebbaa Simplify and enhance the window change/update acceptance logic,
especially in the presence of bi-directional data transfers.

snd_wl1 tracks the right edge, including data in the reassembly
queue, of valid incoming data.  This makes it like rcv_nxt plus
reassembly.  It never goes backwards to prevent older, possibly
reordered segments from updating the window.

snd_wl2 tracks the left edge of sent data.  This makes it a duplicate
of snd_una.  However joining them right now is difficult due to
separate update dependencies in different places in the code flow.

snd_wnd tracks the current advertized send window by the peer.  In
tcp_output() the effective window is calculated by subtracting the
already in-flight data, snd_nxt less snd_una, from it.

ACK's become the main clock of window updates and will always update
the window when the left edge of what we sent is advanced.  The ACK
clock is the primary signaling mechanism in ongoing data transfers.
This works reliably even in the presence of reordering, reassembly
and retransmitted segments.  The ACK clock is most important because
it determines how much data we are allowed to inject into the network.

Zero window updates get us out of persistence mode are crucial.  Here
a segment that neither moves ACK nor SEQ but enlarges WND is accepted.

When the ACK clock is not active (that is we're not or no longer
sending any data) any segment that moves the extended right SEQ edge,
including out-of-order segments, updates the window.  This gives us
updates especially during ping-pong transfers where the peer isn't
done consuming the already acknowledged data from the receive buffer
while responding with data.

The SSH protocol is a prime candidate to benefit from the improved
bi-directional window update logic as it has its own windowing
mechanism on top of TCP and is frequently sending back protocol ACK's.

Tcpdump provided by:	darrenr
Tested by:	darrenr
MFC after:	2 weeks
2012-10-28 19:16:22 +00:00