410 Commits

Author SHA1 Message Date
silby
4c84d1d020 Export the contents of the syncache to netstat.
Approved by: re (kensmith)
MFC after: 2 weeks
2007-07-27 00:57:06 +00:00
andre
90c73e9aec Fix comments in tcp_do_segment().
Approved by:	re (kensmith)
2007-07-25 18:48:24 +00:00
peter
c4233cd978 Fix cast-qualifiers warning when INET6 is not present
Approved by:  re (rwatson)
2007-07-05 05:55:57 +00:00
gnn
aeca69ded5 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
gnn
0cd74db89b Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
andre
445024c7ff Fix a case in tcp_do_segment() where tcp_update_sack_list() would
be called with an incorrect segment end value.  tcp_reass() may
trim segments when they overlap with already existing ones in the
reassembly queue.  Instead of saving the segment end value before
the call to tcp_reass() compute it on the fly based on the effective
segment length afterwards.

This bug was not really problematic as no information got lost and
the eventual SACK information computation was correct nontheless.

MFC after:	1 week
2007-06-10 21:07:21 +00:00
andre
51abe2d2e3 Fix style for comments, be more verbose and add some more. 2007-06-10 20:59:22 +00:00
andre
5a4573d5f0 Remove some bogosity from the SYN_SENT case in tcp_do_segment
and simplify handling of the send/receive window scaling.  No
change in effective behavour.

RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.

Noticed by:	yar
2007-06-09 21:09:49 +00:00
andre
c611006e89 Make log messages more verbose and simpler to understand for non-experts.
Update comments to be more conscious, verbose and fully reflect reality.
2007-05-28 23:27:44 +00:00
andre
799843834b Fix indentation of the syncache_expand() section in tcp_input(). 2007-05-28 11:35:40 +00:00
andre
b4b7eeb094 Refactor and rewrite in parts the SYN handling code on listen sockets
in tcp_input():

 o tighten the checks on allowed TCP flags to be RFC793 and
   tcp-secure conform
 o log check failures to syslog at LOG_DEBUG level
 o rearrange the code flow to be easier to follow
 o add KASSERTs to validate assumptions of the code flow

Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake.  The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits.  The
sysctl allows to chose between two options to deal with this.  One is
to send a reset to the other endpoint to notify it about the failure
(default).  The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.

Reviewed by:	rwatson (in general)
2007-05-28 11:03:53 +00:00
andre
696f13b7fd Add tcp_log_addrs() function to generate and standardized TCP log line
for use thoughout the tcp subsystem.

It is IPv4 and IPv6 aware creates a line in the following format:

 "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"

A "\n" is not included at the end.  The caller is supposed to add
further information after the standard tcp log header.

The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use.  All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.

Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.

Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.

tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
    void *ip6hdr)

Usage example:

 struct ip *ip;
 char *tcplog;

 if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
	log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
	    tcplog, __func__);
	free(s, M_TCPLOG);
 }
2007-05-18 19:58:37 +00:00
andre
db8bfe6922 Move TIME_WAIT related functions and timer handling from files
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:

 tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()

 tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
 tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
 tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()

This is a mechanical move with appropriate renames and making
them static if used only locally.

The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
2007-05-16 17:14:25 +00:00
andre
25d7c9695a Complete the (mechanical) move of the TCP reassembly and timewait
functions from their origininal place to their own files.

TCP Reassembly from tcp_input.c -> tcp_reass.c
TCP Timewait   from tcp_subr.c  -> tcp_timewait.c
2007-05-13 22:16:13 +00:00
rwatson
a25f94b5ae Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.
2007-05-10 15:58:48 +00:00
maxim
716f8a4756 o Fix style(9) bugs introduced in the last commit.
Pointed out by:	bde
2007-05-09 11:39:46 +00:00
maxim
cc6f7c0d99 o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.
PR:		kern/112517
Submitted by:	vd
2007-05-09 06:09:40 +00:00
andre
3d2c0e7f91 Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of
a decdicated sack_enable int for this bool.  Change all users accordingly.
2007-05-06 15:56:31 +00:00
andre
197f220d34 o Remove redundant tcp reassembly check in header prediction code
o Rearrange code to make intent in TCPS_SYN_SENT case more clear
 o Assorted style cleanup
 o Comment clarification for tcp_dropwithreset()
2007-05-06 15:41:06 +00:00
andre
daeac43899 Reorder the TCP header prediction test to check for the most volatile
values first to spend less time on a fallback to normal processing.
2007-05-06 15:23:51 +00:00
andre
568b21767b Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment
and change it to a void function.

We use a compressed structure for TCPS_TIME_WAIT to save memory.  Any late
late segments arriving for such a connection is handled directly in the TW
code.
2007-05-06 15:16:05 +00:00
rwatson
a717b2a83a Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the
pcbinfo lock will be released as well, not just the pcb lock.
2007-05-04 17:45:52 +00:00
andre
27e460915a o Fix INP lock leak in the minttl case
o Remove indirection in the decision of unlocking inp
o Further annotation of locking in tcp_input()
2007-04-23 19:41:47 +00:00
andre
0af88c9154 o Remove unncessary TOF_SIGLEN flag from struct tcpopt
o Correctly set to->to_signature in tcp_dooptions()
o Update comments
2007-04-20 15:28:01 +00:00
andre
a08a16ab40 Add more KASSERT's. 2007-04-20 15:21:29 +00:00
andre
edec8cf92b Remove bogus check for accept queue length and associated failure handling
from the incoming SYN handling section of tcp_input().

Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed.  It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake.  It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.

Change return value of syncache_add() to void.  No status communication
is required.
2007-04-20 14:34:54 +00:00
andre
707a700619 Simplifly syncache_expand() and clarify its semantics. Zero is returned
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies.  True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.

For both cases an RST is sent back in tcp_input().

A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.

Reported by:	kris (the panic)
2007-04-20 13:51:34 +00:00
rwatson
e781516999 Remove unused variable tcbinfo_mtx. 2007-04-15 21:03:23 +00:00
andre
bd6041301a Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
andre
82cdcabbd1 Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some
further INP_INFO_WLOCK_ASSERT() while there.
2007-04-04 18:30:16 +00:00
andre
32bf13d188 Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
andre
3c03d012a2 Retire unused TCP_SACK_DEBUG. 2007-04-04 14:44:15 +00:00
andre
37b70e01e6 In tcp_dooptions() skip over SACK options if it is a SYN segment. 2007-04-04 14:39:49 +00:00
andre
b1c6fabadd When blackholing do a 'dropunlock' in the new world order to prevent the
INP_INFO_LOCK from leaking.

Reported by:	ache
Found by:	rwatson
2007-03-28 12:58:13 +00:00
maxim
d789919b3a o Use a define for a buffer size.
Prodded by:	db

o Add missed vars for TCPDEBUG in tcp_do_segment().

Prodded by:	tinderbox
2007-03-24 22:15:02 +00:00
andre
acede79cba Split tcp_input() into its two functional parts:
o tcp_input() now handles TCP segment sanity checks and preparations
   including the INPCB lookup and syncache.
 o tcp_do_segment() handles all data and ACK processing and is IPv4/v6
   agnostic.

Change all KASSERT() messages to ("%s: ", __func__).

The changes in this commit are primarily of mechanical nature and no
functional changes besides the function split are made.

Discussed with:	rwatson
2007-03-23 20:16:50 +00:00
andre
04c6de816f Tidy up some code to conform better to surroundings and style(9), 0 = NULL
and space/tab.
2007-03-23 19:11:22 +00:00
andre
8057f89694 Bring SACK option handling in tcp_dooptions() in line with all other
options and ajust users accordingly.
2007-03-23 18:33:21 +00:00
andre
8aefbb6179 ANSIfy function declarations and remove register keywords for variables.
Consistently apply style to all function declarations.
2007-03-21 19:37:55 +00:00
andre
8bf685df51 Tidy up IPFIREWALL_FORWARD sections and comments. 2007-03-21 18:56:03 +00:00
andre
f79b08ac22 Update and clarify comments in first section of tcp_input(). 2007-03-21 18:52:58 +00:00
andre
9b59a4229e Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove
old dead T/TCP code.
2007-03-21 18:49:43 +00:00
andre
b4b4be32dc Tidy up tcp_log_in_vain and blackhole. 2007-03-21 18:36:49 +00:00
andre
878e882d88 Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it
doesn't impede normal operation negatively and is only a few lines of
code.  It's close relatives blackhole and log_in_vain aren't options
either.
2007-03-21 18:25:28 +00:00
andre
77fcda08c7 Remove tcp_minmssoverload DoS detection logic. The problem it tried to
protect us from wasn't really there and it only bloats the code.  Should
the problem surface in the future we can simply resurrect it from cvs
history.
2007-03-21 18:05:54 +00:00
andre
9ddf9fc256 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
andre
da1f496220 Consolidate insertion of TCP options into a segment from within tcp_output()
and syncache_respond() into its own generic function tcp_addoptions().

tcp_addoptions() is alignment agnostic and does optimal packing in all cases.

In struct tcpopt rename to_requested_s_scale to just to_wscale.

Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."

Reviewed by:	silby, mohans, julian
Sponsored by:	TCP/IP Optimization Fundraise 2005
2007-03-15 15:59:28 +00:00
qingli
fb4a7a64bd This patch is provided to fix a couple of deployment issues observed
in the field. In one situation, one end of the TCP connection sends
a back-to-back RST packet, with delayed ack, the last_ack_sent variable
has not been update yet. When tcp_insecure_rst is turned off, the code
treats the RST as invalid because last_ack_sent instead of rcv_nxt is
compared against th_seq. Apparently there is some kind of firewall that
sits in between the two ends and that RST packet is the only RST
packet received. With short lived HTTP connections, the symptom is
a large accumulation of connections over a short period of time .

The +/-(1) factor is to take care of implementations out there that
generate RST packets with these types of sequence numbers. This
behavior has also been observed in live environments.

Reviewed by:	silby, Mike Karels
MFC after:	1 week
2007-03-07 23:21:59 +00:00
mohans
2010e1d527 In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
2007-02-28 20:48:00 +00:00
mohans
384aeb29f6 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00