Commit Graph

214 Commits

Author SHA1 Message Date
andre
f3a38bd433 Restructure a too broad ifdef which was disabling the setting of the
tcp flightsize sysctl value for local networks in the !INET6 case.

Approved by:	re (scottl)
2003-11-25 20:58:59 +00:00
andre
6164d7c280 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
rwatson
9c969b771a Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
andre
6c57f18a59 dropwithreset is not needed in this case as tcp_drop() is already notifying
the other side. Before we were sending two RST packets.
2003-11-12 19:38:01 +00:00
sam
a946bc4854 o correct locking problem: the inpcb must be held across tcp_respond
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond

Supported by:	FreeBSD Foundation
2003-11-08 22:59:22 +00:00
sam
39ba2e1c90 speedup stream socket recv handling by tracking the tail of
the mbuf chain instead of walking the list for each append

Submitted by:	ps/jayanth
Obtained from:	netbsd (jason thorpe)
2003-10-28 05:47:40 +00:00
ume
01f1aaf295 enclose IPv6 part with ifdef INET6.
Obtained from:	KAME
2003-10-20 16:19:01 +00:00
ume
1bfb498609 correct linkmtu handling.
Obtained from:	KAME
2003-10-20 15:27:48 +00:00
ume
babf2c3ec0 - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from:	KAME
2003-10-17 15:46:31 +00:00
harti
88c7bba592 A number of patches in the last years have created new return paths
in tcp_input that leave the function before hitting the tcp_trace
function call for the TCPDEBUG option. This has made TCPDEBUG mostly
useless (and tools like ports/benchmarks/dbs not working). Add
tcp_trace calls to the return paths that could be identified in this
maze.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:46:54 +00:00
hsu
58acc5cc77 Unify the "send high" and "recover" variables as specified in the
lastest rev of the spec.  Use an explicit flag for Fast Recovery. [1]

Fix bug with exiting Fast Recovery on a retransmit timeout
diagnosed by Lu Guohan. [2]

Reviewed by:		Thomas Henderson <thomas.r.henderson@boeing.com>
Reported and tested by:	Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2]
Approved by:		Thomas Henderson <thomas.r.henderson@boeing.com>,
			Sally Floyd <floyd@acm.org> [1]
2003-07-15 21:49:53 +00:00
phk
9cb1f02d71 Add /* FALLTHROUGH */
Found by:       FlexeLint
2003-05-31 19:07:22 +00:00
rwatson
f332b50228 Correct a bug introduced with reduced TCP state handling; make
sure that the MAC label on TCP responses during TIMEWAIT is
properly set from either the socket (if available), or the mbuf
that it's responding to.

Unfortunately, this is made somewhat difficult by the TCP code,
as tcp_twstart() calls tcp_twrespond() after discarding the socket
but without a reference to the mbuf that causes the "response".
Passing both the socket and the mbuf works arounds this--eventually
it might be good to make sure the mbuf always gets passed in in
"response" scenarios but working through this provided to
complicate things too much.

Approved by:	re (scottl)
Reviewed by:	hsu
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-07 05:26:27 +00:00
obrien
43cb95182e Explicitly declare 'int' parameters. 2003-04-21 16:27:46 +00:00
hsu
f3bcf8791f Observe conservation of packets when entering Fast Recovery while
doing Limited Transmit.  Only artificially inflate the congestion
window by 1 segment instead of the usual 3 to take into account
the 2 already sent by Limited Transmit.

Approved in principle by:	Mark Allman <mallman@grc.nasa.gov>,
Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
2003-04-01 21:16:46 +00:00
hsu
857cde93f5 Greatly simplify the unlocking logic by holding the TCP protocol lock until
after FIN_WAIT_2 processing.

Helped with debugging:	Doug Barton
2003-03-13 11:46:57 +00:00
hsu
896f13eb8e Add support for RFC 3390, which allows for a variable-sized
initial congestion window.
2003-03-13 01:43:45 +00:00
hsu
e89bff4597 Implement the Limited Transmit algorithm (RFC 3042). 2003-03-12 20:27:28 +00:00
jlemon
03b8ace489 Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one.  Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs
2003-03-08 22:06:20 +00:00
jlemon
d57f539d00 In timewait state, if the incoming segment is a pure in-sequence ack
that matches snd_max, then do not respond with an ack, just drop the
segment.  This fixes a problem where a simultaneous close results in
an ack loop between two time-wait states.

Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG>
Sponsored by: DARPA, NAI Labs
2003-02-26 18:20:41 +00:00
jlemon
cdfe62aafb The TCP protocol lock may still be held if the reassembly queue dropped FIN.
Detect this case and drop the lock accordingly.

Sponsored by: DARPA, NAI Labs
2003-02-26 13:55:13 +00:00
hsu
90305f610e tcp_twstart() need to be called with the TCP protocol lock held to avoid
a race condition with the TCP timer routines.
2003-02-24 00:52:03 +00:00
hsu
8133d4eddb Pass the right function to callout_reset() for a compressed
TIME-WAIT control block.
2003-02-24 00:48:12 +00:00
jlemon
b55b232427 Yesterday just wasn't my day. Remove testing delta that crept into the diff.
Pointy hat provided by: sam
2003-02-23 15:40:36 +00:00
jlemon
e56303ef04 Check to see if the TF_DELACK flag is set before returning from
tcp_input().  This unbreaks delack handling, while still preserving
correct T/TCP behavior

Tested by: maxim
Sponsored by: DARPA, NAI Labs
2003-02-22 21:54:57 +00:00
jlemon
a8bc02dcb2 Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
jlemon
3edfb3aaed Correct comments. 2003-02-19 21:33:46 +00:00
jlemon
8b16df5c37 Clean up delayed acks and T/TCP interactions:
- delay acks for T/TCP regardless of delack setting
   - fix bug where a single pass through tcp_input might not delay acks
   - use callout_active() instead of callout_pending()

Sponsored by: DARPA, NAI Labs
2003-02-19 21:18:23 +00:00
hsu
5436697404 The protocol lock is always held in the dropafterack case, so we don't
need to check for it at runtime.
2003-02-13 22:14:22 +00:00
cjc
38d195389f Add the TCP flags to the log message whenever log_in_vain is 1, not
just when set to 2.

PR:		kern/43348
MFC after:	5 days
2003-02-02 22:06:56 +00:00
hsu
ab47952ce0 Fix NewReno.
Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
2003-01-13 11:01:20 +00:00
dillon
35422bf8e1 Remove the PAWS ack-on-ack debugging printf().
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of
order / reverse-time-indexed packet should be acknowledged as specified
in RFC-793 page 69 then dropped.  The original PAWS code in FreeBSD (1994)
simply acknowledged the segment unconditionally, which is incorrect, and
was fixed in 1.183 (2002).  At the moment we do not do checks for SYN or FIN
in addition to (tlen != 0), which may or may not be correct, but the
worst that ought to happen should be a retry by the sender.
2002-12-30 19:31:04 +00:00
hsu
3697a25648 Unravel a nested conditional.
Remove an unneeded local variable.
2002-12-20 11:16:52 +00:00
dillon
5ff92d6b51 Fix syntax in last commit. 2002-12-17 00:24:48 +00:00
dillon
c366943770 Bruce forwarded this tidbit from an analysis Van Jacobson did on an
apparent ack-on-ack problem with FreeBSD.  Prof. Jacobson noticed a
case in our TCP stack which would acknowledge a received ack-only packet,
which is not legal in TCP.

Submitted by:	 Van Jacobson <van@packetdesign.com>,
		bmah@packetdesign.com (Bruce A. Mah)
MFC after:	7 days
2002-12-14 07:31:51 +00:00
sam
1ff96af1ea a better solution to building FAST_IPSEC w/o INET6
Submitted by:	Jeffrey Hsu <hsu@FreeBSD.org>
2002-11-10 17:17:32 +00:00
sam
6019e3c767 fixup FAST_IPSEC build w/o INET6 2002-11-08 23:33:59 +00:00
jeff
b6176d93b1 - Consistently update snd_wl1, snd_wl2, and rcv_up in the header
prediction code.  Previously, 2GB worth of header predicted data
   could leave these variables too far out of sequence which would cause
   problems after receiving a packet that did not match the header
   prediction.

Submitted by:	Bill Baumann <bbaumann@isilon.com>
Sponsored by:	Isilon Systems, Inc.
Reviewed by:	hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com
2002-10-31 23:24:13 +00:00
hsu
fa157d4b9a Don't need to check if SO_OOBINLINE is defined.
Don't need to protect isipv6 conditional with INET6.
Fix leading indentation in 2 lines.
2002-10-30 08:32:19 +00:00
sam
0ef6c52bbc Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
sam
2a86be217a Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
dillon
c1c4819744 Guido found another bug. There is a situation with
timestamped TCP packets where FreeBSD will send DATA+FIN and
A W2K box will ack just the DATA portion.  If this occurs
after FreeBSD has done a (NewReno) fast-retransmit and is
recovering it (dupacks > threshold) it triggers a case in
tcp_newreno_partial_ack() (tcp_newreno() in stable) where
tcp_output() is called with the expectation that the retransmit
timer will be reloaded.  But tcp_output() falls through and
returns without doing anything, causing the persist timer to be
loaded instead.  This causes the connection to hang until W2K gives up.
This occurs because in the case where only the FIN must be acked, the
'len' calculation in tcp_output() will be 0, a lot of checks will be
skipped, and the FIN check will also be skipped because it is designed
to handle FIN retransmits, not forced transmits from tcp_newreno().

The solution is to simply set TF_ACKNOW before calling tcp_output()
to absolute guarentee that it will run the send code and reset the
retransmit timer.  TF_ACKNOW is already used for this purpose in other
cases.

For some unknown reason this patch also seems to greatly reduce
the number of duplicate acks received when Guido runs his tests over
a lossy network.  It is quite possible that there are other
tcp_newreno{_partial_ack()} cases which were not generating the expected
output which this patch also fixes.

X-MFC after:	Will be MFC'd after the freeze is over
2002-09-30 18:55:45 +00:00
silby
f004ac1d6f Fix issue where shutdown(socket, SHUT_RD) was effectively
ignored for TCP sockets.

NetBSD PR:	18185
Submitted by:	Sean Boudreau <seanb@qnx.com>
MFC after:	3 days
2002-09-22 02:54:07 +00:00
dillon
494e59ef36 Guido reported an interesting bug where an FTP connection between a
Windows 2000 box and a FreeBSD box could stall.  The problem turned out
to be a timestamp reply bug in the W2K TCP stack.  FreeBSD sends a
timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK
causing FreeBSD to calculate an insane SRTT and RTT, resulting in
a maximal retransmit timeout (60 seconds).  If there is any packet
loss on the connection for the first six or so packets the retransmit
case may be hit (the window will still be too small for fast-retransmit),
causing a 60+ second pause.  The W2K box gives up and closes the
connection.

This commit works around the W2K bug.

15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8]
15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)

Bug reported by: Guido van Rooij <guido@gvr.org>
2002-09-17 22:21:37 +00:00
charnier
7dd9d47059 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
jmallett
af73db292e Enclose IPv6 addresses in brackets when they are displayed printable with a
TCP/UDP port seperated by a colon.  This is for the log_in_vain facility.

Pointed out by:	Edward J. M. Brocklesby
Reviewed by:	ume
MFC after:	2 weeks
2002-08-19 19:47:13 +00:00
dillon
bb806c49bf Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
hsu
2c79764ced Cosmetic-only changes for readability.
Reviewed by:	(early form passed by) bde
Approved by:	itojun (from core@kame.net)
2002-08-17 02:05:25 +00:00
rwatson
aa8060c29e Rename mac_check_socket_receive() to mac_check_socket_deliver() so that
we can use the names _receive() and _send() for the receive() and send()
checks.  Rename related constants, policy implementations, etc.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 18:51:27 +00:00
hsu
deb0142560 Reset dupack count in header prediction.
Follow-on to rev 1.39.

Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
2002-08-15 17:13:18 +00:00