176 Commits

Author SHA1 Message Date
Andre Oppermann
cdaf208d09 o Move setting/resetting logic of syncache timer from macro
SYNCACHE_TIMEOUT to new function syncache_timeout().
o Fix inverted timeout callout engagement logic to actually
  enable the timer for the bucket row.  Before SYN|ACK was
  not retransmitted.
o Simplify SYN|ACK retransmit timeout backoff calculation.
o Improve logging of retransmit and timeout events.
o Reset timeout when duplicate SYN arrives.
o Add comments.
o Rearrange SYN cookie statistics counting.

Bug found by:	silby
Submitted by:	silby (different version)
Approved by:	re (rwatson)
2007-07-28 12:02:05 +00:00
Andre Oppermann
19bc77c549 o Move all detailed checks for RST in LISTEN state from tcp_input() to
syncache_rst().
o Fix tests for flag combinations of RST and SYN, ACK, FIN.  Before
  a RST for a connection in syncache did not properly free the entry.
o Add more detailed logging.

Approved by:	re (rwatson)
2007-07-28 11:51:44 +00:00
Mike Silbersack
c325962b47 Export the contents of the syncache to netstat.
Approved by: re (kensmith)
MFC after: 2 weeks
2007-07-27 00:57:06 +00:00
George V. Neville-Neil
b2630c2934 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Andre Oppermann
1f939165ce Correctly print SEQ and IRS in the corresponding log message in
syncache_expand().
2007-06-06 22:10:12 +00:00
Andre Oppermann
8d573cc158 Make log messages more verbose and simpler to understand for non-experts.
Update comments to be more conscious, verbose and fully reflect reality.
2007-05-28 23:27:44 +00:00
Andre Oppermann
a160e6302c Refactor and rewrite in parts the SYN handling code on listen sockets
in tcp_input():

 o tighten the checks on allowed TCP flags to be RFC793 and
   tcp-secure conform
 o log check failures to syslog at LOG_DEBUG level
 o rearrange the code flow to be easier to follow
 o add KASSERTs to validate assumptions of the code flow

Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake.  The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits.  The
sysctl allows to chose between two options to deal with this.  One is
to send a reset to the other endpoint to notify it about the failure
(default).  The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.

Reviewed by:	rwatson (in general)
2007-05-28 11:03:53 +00:00
Andre Oppermann
d2ddf5d4b0 Be more restrictive with segment validity checks in syncache_expand()
and log check failures to syslog at LOG_DEBUG level.

Always prefill the sc->sc_ts field to use it in the checks.
2007-05-18 21:42:25 +00:00
Andre Oppermann
5df429a002 o Add syslog logging under LOG_DEBUG to various failures caused by
bogus segments
o Add more KASSERT()s
o Update comments
2007-05-18 21:13:01 +00:00
Andre Oppermann
3529149e9a Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of
a decdicated sack_enable int for this bool.  Change all users accordingly.
2007-05-06 15:56:31 +00:00
Andre Oppermann
0d957bba48 o Remove unused and redundant TCP option definitions
o Replace usage of MAX_TCPOPTLEN with the correctly constructed and
  derived MAX_TCPOPTLEN
2007-04-20 15:08:09 +00:00
Andre Oppermann
4d6e713043 Remove bogus check for accept queue length and associated failure handling
from the incoming SYN handling section of tcp_input().

Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed.  It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake.  It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.

Change return value of syncache_add() to void.  No status communication
is required.
2007-04-20 14:34:54 +00:00
Andre Oppermann
e207f80039 Simplifly syncache_expand() and clarify its semantics. Zero is returned
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies.  True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.

For both cases an RST is sent back in tcp_input().

A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.

Reported by:	kris (the panic)
2007-04-20 13:51:34 +00:00
Andre Oppermann
0a5df51410 Only update TCP timestamp on SYN duplication if it is present on
current SYN in syncache_add().  Otherwise disable timestamps.
2007-04-20 13:36:48 +00:00
Andre Oppermann
c73f70b728 o Plug memory leak in syncache_add() on MAC label allocation failure.
o Simplify code flow with 'done' goto label.
o Remove mbuf argument from syncache_respond().  It doesn't make use
  of it.
2007-04-20 13:30:08 +00:00
Andre Oppermann
9eab54debf When we run into the syncache entry limits syncache_add() tries
to free the oldest entry in the current bucket row.  The global
entry limit may be smaller than the bucket rows and their limit
combined however.  Thus only try to free a syncache entry if we
found one in this bucket row.

Reported by:	kris
2007-04-17 15:25:14 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Andre Oppermann
0c38fd0a7a Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
Andre Oppermann
9daba64ed5 Unbreak IPv6 after consolidation of TCP options insertion.
Submitted by:	tegge
2007-03-17 11:52:54 +00:00
Kip Macy
9ad2c608c2 Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
  being changed anyway

Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
2007-03-17 06:40:09 +00:00
Andre Oppermann
02a1a64357 Consolidate insertion of TCP options into a segment from within tcp_output()
and syncache_respond() into its own generic function tcp_addoptions().

tcp_addoptions() is alignment agnostic and does optimal packing in all cases.

In struct tcpopt rename to_requested_s_scale to just to_wscale.

Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."

Reviewed by:	silby, mohans, julian
Sponsored by:	TCP/IP Optimization Fundraise 2005
2007-03-15 15:59:28 +00:00
Andre Oppermann
087b55ea59 Change the way the advertized TCP window scaling is computed. Instead of
upper-bounding it to the size of the initial socket buffer lower-bound it
to the smallest MSS we accept.  Ideally we'd use the actual MSS information
here but it is not available yet.

For socket buffer auto sizing to be effective we need room to grow the
receive window.  The window scale shift is determined at connection setup
and can't be changed afterwards.  The previous, original, method effectively
just did a power of two roundup of the socket buffer size at connection
setup severely limiting the headroom for larger socket buffers.

Tested by:	many (as part of the socket buffer auto sizing patch)
MFC after:	1 month
2007-02-01 17:39:18 +00:00
Christian S.J. Peron
826cef3d75 Fix LOR between the syncache and inpcb locks when MAC is present in the
kernel.  This LOR snuck in with some of the recent syncache changes.  To
fix this, the inpcb handling was changed:

- Hang a MAC label off the syncache object
- When the syncache entry is initially created, we pickup the PCB lock
  is held because we extract information from it while initializing the
  syncache entry.  While we do this, copy the MAC label associated with
  the PCB and use it for the syncache entry.
- When the packet is transmitted, copy the label from the syncache entry
  to the mbuf so it can be processed by security policies which analyze
  mbuf labels.

This change required that the MAC framework be extended to support the
label copy operations from the PCB to the syncache entry, and then from
the syncache entry to the mbuf.

These functions really should be referencing the syncache structure instead
of the label.  However, due to some of the complexities associated with
exposing this syncache structure we operate directly on it's label pointer.
This should be OK since we aren't making any access control decisions within
this code directly, we are merely allocating and copying label storage so
we can properly initialize mbuf labels for any packets the syncache code
might create.

This also has a nice side effect of caching.  Prior to this change, the
PCB would be looked up/locked for each packet transmitted.  Now the label
is cached at the time the syncache entry is initialized.

Submitted by:	andre [1]
Discussed with:	rwatson

[1] andre submitted the tcp_syncache.c changes
2006-12-13 06:00:57 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Andrey A. Chernov
239e71c612 Add missing #ifdef INET6 (can't be compiled) 2006-09-14 10:22:35 +00:00
Andre Oppermann
67d828b162 Remove unessary includes and follow common ordering style. 2006-09-13 13:21:17 +00:00
Andre Oppermann
bf6d304ab2 Rewrite of TCP syncookies to remove locking requirements and to enhance
functionality:

 - Remove a rwlock aquisition/release per generated syncookie.  Locking
   is now integrated with the bucket row locking of syncache itself and
   syncookies no longer add any additional lock overhead.
 - Syncookie secrets are different for and stored per syncache buck row.
   Secrets expire after 16 seconds and are reseeded on-demand.
 - The computational overhead for syncookie generation and verification
   is one MD5 hash computation as before.
 - Syncache can be turned off and run with syncookies only by setting the
   sysctl net.inet.tcp.syncookies_only=1.

This implementation extends the orginal idea and first implementation
of FreeBSD by using not only the initial sequence number field to store
information but also the timestamp field if present.  This way we can
keep track of the entire state we need to know to recreate the session in
its original form.  Almost all TCP speakers implement RFC1323 timestamps
these days.  For those that do not we still have to live with the known
shortcomings of the ISN only SYN cookies.  The use of the timestamp field
causes the timestamps to be randomized if syncookies are enabled.

The idea of SYN cookies is to encode and include all necessary information
about the connection setup state within the SYN-ACK we send back and thus
to get along without keeping any local state until the ACK to the SYN-ACK
arrives (if ever).  Everything we need to know should be available from
the information we encoded in the SYN-ACK.

A detailed description of the inner working of the syncookies mechanism
is included in the comments in tcp_syncache.c.

Reviewed by:	silby (slightly earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-13 13:08:27 +00:00
Andre Oppermann
cc477a6347 In syncache_respond() do not reply with a MSS that is larger than what
the peer announced to us but make it at least tcp_minmss in size.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 17:54:53 +00:00
Andre Oppermann
8bfb19180d Some cleanups and janitorial work to tcp_syncache:
o don't assign remote/local host/port information manually between provided
   struct in_conninfo and struct syncache, bcopy() it instead
 o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture
   the purpose of this field
 o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons
 o fix IPSEC error case printf's to report correct function name
 o in syncache_socket() only transpose enhanced tcp options parameters to
   struct tcpcb when the inpcb doesn't has TF_NOOPT set
 o in syncache_respond() reorder stack variables
 o in syncache_respond() remove bogus KASSERT()

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 16:14:19 +00:00
Andre Oppermann
dfabcc1d29 Reverse the source/destination parameters to in[6]_pcblookup_hash() in
syncache_respond() for the #ifdef MAC case.

Submitted by:	Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
2006-06-26 09:43:55 +00:00
Andre Oppermann
a846263567 Decrement the global syncache counter in syncache_expand() when the entry
is removed from the bucket.  This fixes the syncache statistics.
2006-06-25 11:11:33 +00:00
Andre Oppermann
649ac0ce5f Move the syncookie MD5 context from globals to the stack to make it MP safe. 2006-06-22 15:07:45 +00:00
Andre Oppermann
c9f7b0ad5b Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied
memory location for already existing/initialized mutexes.  With random
data in the memory location this fails (ie. after a soft reboot).

Reported by:	brueffer, YAMAMOTO Shigeru
Submitted by:	YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
2006-06-20 08:11:30 +00:00
Andre Oppermann
2f1a4ccfc1 Do not access syncache entry before it was allocated for the TF_NOOPT case
in syncache_add().

Found by:	Coverity Prevent
CID:		1473
2006-06-18 13:03:42 +00:00
Andre Oppermann
8411d000a1 Move all syncache related structures to tcp_syncache.c. They are only used
there.

This unbreaks userland programs that include tcp_var.h.

Discussed with:	rwatson
2006-06-18 12:26:11 +00:00
Andre Oppermann
bdfbf1e203 Remove double lock acquisition in syncookie_lookup() which came from last
minute conversions to macros.

Pointy hat to:	andre
2006-06-18 11:48:03 +00:00
Andre Oppermann
ee2e4c1d4e Fix the !INET6 compile.
Reported by:	alc
2006-06-17 18:42:07 +00:00
Andre Oppermann
0c529372f0 ANSIfy and tidy up comments.
Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:49:11 +00:00
Andre Oppermann
351630c40d Add locking to TCP syncache and drop the global tcpinfo lock as early
as possible for the syncache_add() case.  The syncache timer no longer
aquires the tcpinfo lock and timeout/retransmit runs can happen in
parallel with bucket granularity.

On a P4 the additional locks cause a slight degression of 0.7% in tcp
connections per second.  When IP and TCP input are deserialized and
can run in parallel this little overhead can be neglected. The syncookie
handling still leaves room for improvement and its random salts may be
moved to the syncache bucket head structures to remove the second lock
operation currently required for it.  However this would be a more
involved change from the way syncookies work at the moment.

Reviewed by:	rwatson
Tested by:	rwatson, ps (earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-17 17:32:38 +00:00
Robert Watson
92c07a345e Change soabort() from returning int to returning void, since all
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
2006-03-16 07:03:14 +00:00
Andre Oppermann
464fcfbc5c Rework TCP window scaling (RFC1323) to properly scale the send window
right from the beginning and partly clean up the differences in handling
between SYN_SENT and SYN_RCVD (syncache).

Further changes to this code to come.  This is a first incremental step
to a general overhaul and streamlining of the TCP code.

PR:		kern/15095
PR:		kern/92690 (partly)
Reviewed by:	qingli (and tested with ANVL)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-02-28 23:05:59 +00:00
Qing Li
eee9df08bd Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry.
Reviewed by:	andre, glebius
MFC after:	3 days
2006-02-09 21:29:02 +00:00
Qing Li
c1fd993af9 Redo the previous fix by setting the UMA_ZONE_ZINIT bit in the syncache
zone, eliminating the need to call bzero() after each syncache entry
allocation.

Suggested by:	glebius
Reviewed by:	andre
MFC after:	3 days
2006-02-08 23:32:57 +00:00
Qing Li
737b12e98f Fixes a crash due to the memory of the newly allocated syncache entry
in syncache_lookup() is not cleared and may lead to an arbitrary and
bogus rtentry pointer which later gets free'd.

Reviewed by: andre
MFC after: 3 days
2006-02-07 19:59:46 +00:00
Andre Oppermann
79eb490467 In syncache_expand() insert a proper syncache_free() to fix a case
that currently can't be triggered.  But better be safe than sorry
later on.  Additionally it properly silences Coverity Prevent for
future tests.

Found by:	Coverity Prevent(tm)
Coverity ID:	CID802
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-01-18 18:25:03 +00:00
Gleb Smirnoff
ecedca7441 UMA can return NULL not only in case when our zone is full, but
also in case of generic memory shortage. In the latter case we may
not find an old entry.

Found with:	Coverity Prevent(tm)
2006-01-14 13:04:08 +00:00
Andre Oppermann
ef39adf007 Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
  ip_dooptions(struct mbuf *m, int pass)
  save_rte(m, option, dst)
  ip_srcroute(m0)
  ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
  ip_insertoptions(m, opt, phlen)
  ip_optcopy(ip, jp)
  ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 20:12:40 +00:00
Andre Oppermann
34333b16cd Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it.  It only adds a layer of confusion.  The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit.  For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-02 13:46:32 +00:00
Andre Oppermann
db1240661f Do not ignore all other TCP options (eg. timestamp, window scaling)
when responding to TCP SYN packets with TCP_MD5 enabled and set.

PR:		kern/82963
Submitted by:	<demizu at dd.iij4u.or.jp>
MFC after:	3 days
2005-09-14 15:06:22 +00:00