Commit Graph

115 Commits

Author SHA1 Message Date
Andre Oppermann
0d957bba48 o Remove unused and redundant TCP option definitions
o Replace usage of MAX_TCPOPTLEN with the correctly constructed and
  derived MAX_TCPOPTLEN
2007-04-20 15:08:09 +00:00
Andre Oppermann
4d6e713043 Remove bogus check for accept queue length and associated failure handling
from the incoming SYN handling section of tcp_input().

Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed.  It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake.  It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.

Change return value of syncache_add() to void.  No status communication
is required.
2007-04-20 14:34:54 +00:00
Andre Oppermann
e207f80039 Simplifly syncache_expand() and clarify its semantics. Zero is returned
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies.  True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.

For both cases an RST is sent back in tcp_input().

A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.

Reported by:	kris (the panic)
2007-04-20 13:51:34 +00:00
Andre Oppermann
0a5df51410 Only update TCP timestamp on SYN duplication if it is present on
current SYN in syncache_add().  Otherwise disable timestamps.
2007-04-20 13:36:48 +00:00
Andre Oppermann
c73f70b728 o Plug memory leak in syncache_add() on MAC label allocation failure.
o Simplify code flow with 'done' goto label.
o Remove mbuf argument from syncache_respond().  It doesn't make use
  of it.
2007-04-20 13:30:08 +00:00
Andre Oppermann
9eab54debf When we run into the syncache entry limits syncache_add() tries
to free the oldest entry in the current bucket row.  The global
entry limit may be smaller than the bucket rows and their limit
combined however.  Thus only try to free a syncache entry if we
found one in this bucket row.

Reported by:	kris
2007-04-17 15:25:14 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Andre Oppermann
0c38fd0a7a Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
Andre Oppermann
9daba64ed5 Unbreak IPv6 after consolidation of TCP options insertion.
Submitted by:	tegge
2007-03-17 11:52:54 +00:00
Kip Macy
9ad2c608c2 Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
  being changed anyway

Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
2007-03-17 06:40:09 +00:00
Andre Oppermann
02a1a64357 Consolidate insertion of TCP options into a segment from within tcp_output()
and syncache_respond() into its own generic function tcp_addoptions().

tcp_addoptions() is alignment agnostic and does optimal packing in all cases.

In struct tcpopt rename to_requested_s_scale to just to_wscale.

Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."

Reviewed by:	silby, mohans, julian
Sponsored by:	TCP/IP Optimization Fundraise 2005
2007-03-15 15:59:28 +00:00
Andre Oppermann
087b55ea59 Change the way the advertized TCP window scaling is computed. Instead of
upper-bounding it to the size of the initial socket buffer lower-bound it
to the smallest MSS we accept.  Ideally we'd use the actual MSS information
here but it is not available yet.

For socket buffer auto sizing to be effective we need room to grow the
receive window.  The window scale shift is determined at connection setup
and can't be changed afterwards.  The previous, original, method effectively
just did a power of two roundup of the socket buffer size at connection
setup severely limiting the headroom for larger socket buffers.

Tested by:	many (as part of the socket buffer auto sizing patch)
MFC after:	1 month
2007-02-01 17:39:18 +00:00
Christian S.J. Peron
826cef3d75 Fix LOR between the syncache and inpcb locks when MAC is present in the
kernel.  This LOR snuck in with some of the recent syncache changes.  To
fix this, the inpcb handling was changed:

- Hang a MAC label off the syncache object
- When the syncache entry is initially created, we pickup the PCB lock
  is held because we extract information from it while initializing the
  syncache entry.  While we do this, copy the MAC label associated with
  the PCB and use it for the syncache entry.
- When the packet is transmitted, copy the label from the syncache entry
  to the mbuf so it can be processed by security policies which analyze
  mbuf labels.

This change required that the MAC framework be extended to support the
label copy operations from the PCB to the syncache entry, and then from
the syncache entry to the mbuf.

These functions really should be referencing the syncache structure instead
of the label.  However, due to some of the complexities associated with
exposing this syncache structure we operate directly on it's label pointer.
This should be OK since we aren't making any access control decisions within
this code directly, we are merely allocating and copying label storage so
we can properly initialize mbuf labels for any packets the syncache code
might create.

This also has a nice side effect of caching.  Prior to this change, the
PCB would be looked up/locked for each packet transmitted.  Now the label
is cached at the time the syncache entry is initialized.

Submitted by:	andre [1]
Discussed with:	rwatson

[1] andre submitted the tcp_syncache.c changes
2006-12-13 06:00:57 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Andrey A. Chernov
239e71c612 Add missing #ifdef INET6 (can't be compiled) 2006-09-14 10:22:35 +00:00
Andre Oppermann
67d828b162 Remove unessary includes and follow common ordering style. 2006-09-13 13:21:17 +00:00
Andre Oppermann
bf6d304ab2 Rewrite of TCP syncookies to remove locking requirements and to enhance
functionality:

 - Remove a rwlock aquisition/release per generated syncookie.  Locking
   is now integrated with the bucket row locking of syncache itself and
   syncookies no longer add any additional lock overhead.
 - Syncookie secrets are different for and stored per syncache buck row.
   Secrets expire after 16 seconds and are reseeded on-demand.
 - The computational overhead for syncookie generation and verification
   is one MD5 hash computation as before.
 - Syncache can be turned off and run with syncookies only by setting the
   sysctl net.inet.tcp.syncookies_only=1.

This implementation extends the orginal idea and first implementation
of FreeBSD by using not only the initial sequence number field to store
information but also the timestamp field if present.  This way we can
keep track of the entire state we need to know to recreate the session in
its original form.  Almost all TCP speakers implement RFC1323 timestamps
these days.  For those that do not we still have to live with the known
shortcomings of the ISN only SYN cookies.  The use of the timestamp field
causes the timestamps to be randomized if syncookies are enabled.

The idea of SYN cookies is to encode and include all necessary information
about the connection setup state within the SYN-ACK we send back and thus
to get along without keeping any local state until the ACK to the SYN-ACK
arrives (if ever).  Everything we need to know should be available from
the information we encoded in the SYN-ACK.

A detailed description of the inner working of the syncookies mechanism
is included in the comments in tcp_syncache.c.

Reviewed by:	silby (slightly earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-13 13:08:27 +00:00
Andre Oppermann
cc477a6347 In syncache_respond() do not reply with a MSS that is larger than what
the peer announced to us but make it at least tcp_minmss in size.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 17:54:53 +00:00
Andre Oppermann
8bfb19180d Some cleanups and janitorial work to tcp_syncache:
o don't assign remote/local host/port information manually between provided
   struct in_conninfo and struct syncache, bcopy() it instead
 o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture
   the purpose of this field
 o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons
 o fix IPSEC error case printf's to report correct function name
 o in syncache_socket() only transpose enhanced tcp options parameters to
   struct tcpcb when the inpcb doesn't has TF_NOOPT set
 o in syncache_respond() reorder stack variables
 o in syncache_respond() remove bogus KASSERT()

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 16:14:19 +00:00
Andre Oppermann
dfabcc1d29 Reverse the source/destination parameters to in[6]_pcblookup_hash() in
syncache_respond() for the #ifdef MAC case.

Submitted by:	Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
2006-06-26 09:43:55 +00:00
Andre Oppermann
a846263567 Decrement the global syncache counter in syncache_expand() when the entry
is removed from the bucket.  This fixes the syncache statistics.
2006-06-25 11:11:33 +00:00
Andre Oppermann
649ac0ce5f Move the syncookie MD5 context from globals to the stack to make it MP safe. 2006-06-22 15:07:45 +00:00
Andre Oppermann
c9f7b0ad5b Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied
memory location for already existing/initialized mutexes.  With random
data in the memory location this fails (ie. after a soft reboot).

Reported by:	brueffer, YAMAMOTO Shigeru
Submitted by:	YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
2006-06-20 08:11:30 +00:00
Andre Oppermann
2f1a4ccfc1 Do not access syncache entry before it was allocated for the TF_NOOPT case
in syncache_add().

Found by:	Coverity Prevent
CID:		1473
2006-06-18 13:03:42 +00:00
Andre Oppermann
8411d000a1 Move all syncache related structures to tcp_syncache.c. They are only used
there.

This unbreaks userland programs that include tcp_var.h.

Discussed with:	rwatson
2006-06-18 12:26:11 +00:00
Andre Oppermann
bdfbf1e203 Remove double lock acquisition in syncookie_lookup() which came from last
minute conversions to macros.

Pointy hat to:	andre
2006-06-18 11:48:03 +00:00
Andre Oppermann
ee2e4c1d4e Fix the !INET6 compile.
Reported by:	alc
2006-06-17 18:42:07 +00:00
Andre Oppermann
0c529372f0 ANSIfy and tidy up comments.
Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:49:11 +00:00
Andre Oppermann
351630c40d Add locking to TCP syncache and drop the global tcpinfo lock as early
as possible for the syncache_add() case.  The syncache timer no longer
aquires the tcpinfo lock and timeout/retransmit runs can happen in
parallel with bucket granularity.

On a P4 the additional locks cause a slight degression of 0.7% in tcp
connections per second.  When IP and TCP input are deserialized and
can run in parallel this little overhead can be neglected. The syncookie
handling still leaves room for improvement and its random salts may be
moved to the syncache bucket head structures to remove the second lock
operation currently required for it.  However this would be a more
involved change from the way syncookies work at the moment.

Reviewed by:	rwatson
Tested by:	rwatson, ps (earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-17 17:32:38 +00:00
Robert Watson
92c07a345e Change soabort() from returning int to returning void, since all
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
2006-03-16 07:03:14 +00:00
Andre Oppermann
464fcfbc5c Rework TCP window scaling (RFC1323) to properly scale the send window
right from the beginning and partly clean up the differences in handling
between SYN_SENT and SYN_RCVD (syncache).

Further changes to this code to come.  This is a first incremental step
to a general overhaul and streamlining of the TCP code.

PR:		kern/15095
PR:		kern/92690 (partly)
Reviewed by:	qingli (and tested with ANVL)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-02-28 23:05:59 +00:00
Qing Li
eee9df08bd Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry.
Reviewed by:	andre, glebius
MFC after:	3 days
2006-02-09 21:29:02 +00:00
Qing Li
c1fd993af9 Redo the previous fix by setting the UMA_ZONE_ZINIT bit in the syncache
zone, eliminating the need to call bzero() after each syncache entry
allocation.

Suggested by:	glebius
Reviewed by:	andre
MFC after:	3 days
2006-02-08 23:32:57 +00:00
Qing Li
737b12e98f Fixes a crash due to the memory of the newly allocated syncache entry
in syncache_lookup() is not cleared and may lead to an arbitrary and
bogus rtentry pointer which later gets free'd.

Reviewed by: andre
MFC after: 3 days
2006-02-07 19:59:46 +00:00
Andre Oppermann
79eb490467 In syncache_expand() insert a proper syncache_free() to fix a case
that currently can't be triggered.  But better be safe than sorry
later on.  Additionally it properly silences Coverity Prevent for
future tests.

Found by:	Coverity Prevent(tm)
Coverity ID:	CID802
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-01-18 18:25:03 +00:00
Gleb Smirnoff
ecedca7441 UMA can return NULL not only in case when our zone is full, but
also in case of generic memory shortage. In the latter case we may
not find an old entry.

Found with:	Coverity Prevent(tm)
2006-01-14 13:04:08 +00:00
Andre Oppermann
ef39adf007 Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
  ip_dooptions(struct mbuf *m, int pass)
  save_rte(m, option, dst)
  ip_srcroute(m0)
  ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
  ip_insertoptions(m, opt, phlen)
  ip_optcopy(ip, jp)
  ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 20:12:40 +00:00
Andre Oppermann
34333b16cd Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it.  It only adds a layer of confusion.  The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit.  For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-02 13:46:32 +00:00
Andre Oppermann
db1240661f Do not ignore all other TCP options (eg. timestamp, window scaling)
when responding to TCP SYN packets with TCP_MD5 enabled and set.

PR:		kern/82963
Submitted by:	<demizu at dd.iij4u.or.jp>
MFC after:	3 days
2005-09-14 15:06:22 +00:00
Gleb Smirnoff
360856f60e - Refuse hashsize of 0, since it is invalid.
- Use defined constant instead of 512.
2005-08-25 13:57:00 +00:00
Robert Watson
f59a9ebf10 Remove no-op spl's and most comment references to spls, as TCP locking
is believed to be basically done (modulo any remaining bugs).

MFC after:	3 days
2005-07-19 12:21:26 +00:00
Paul Saab
91232d6ccc Remove some code that snuck in by accident.
Submitted by:	Mohan Srinivasan
2005-04-21 20:29:40 +00:00
Paul Saab
be3f3b5ead Fix for interaction problems between TCP SACK and TCP Signature.
If TCP Signatures are enabled, the maximum allowed sack blocks aren't
going to fit. The fix is to compute how many sack blocks fit and tack
these on last. Also on SYNs, defer padding until after the SACK
PERMITTED option has been added.

Found by:	Mohan Srinivasan.
Submitted by:	Mohan Srinivasan, Noritoshi Demizu.
Reviewed by:	Raja Mukerji.
2005-04-21 20:26:07 +00:00
Paul Saab
97b76190eb Undo rev 1.71 as it is the wrong change. 2005-04-21 20:24:43 +00:00
Paul Saab
a3047bc036 Fix for 2 bugs related to TCP Signatures :
- If the peer sends the Signature option in the SYN, use of Timestamps
  and Window Scaling were disabled (even if the peer supports them).
- The sender must not disable signatures if the option is absent in
  the received SYN. (See comment in syncache_add()).

Found, Submitted by:	Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>.
Reviewed by:		Mohan Srinivasan <mohans at yahoo-inc dot com>.
2005-04-21 20:09:09 +00:00
Gleb Smirnoff
31199c8463 Use NET_CALLOUT_MPSAFE macro. 2005-03-01 12:01:17 +00:00
Robert Watson
77c16eed7c Remove clause three from tcp_syncache.c license per permission of
McAfee.  Update copyright to McAfee from NETA.
2005-01-30 19:28:27 +00:00
Andre Oppermann
c94c54e4df Remove RFC1644 T/TCP support from the TCP side of the network stack.
A complete rationale and discussion is given in this message
and the resulting discussion:

 http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706

Note that this commit removes only the functional part of T/TCP
from the tcp_* related functions in the kernel.  Other features
introduced with RFC1644 are left intact (socket layer changes,
sendmsg(2) on connection oriented protocols)  and are meant to
be reused by a simpler and less intrusive reimplemention of the
previous T/TCP functionality.

Discussed on:	-arch
2004-11-02 22:22:22 +00:00
Andre Oppermann
e098266191 Remove the last two global variables that are used to store packet state while
it travels through the IP stack.  This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.

The IP source route options of a packet are now stored in a mtag instead of the
global variable.
2004-09-15 20:13:26 +00:00
Robert Watson
a4f757cd5d White space cleanup for netinet before branch:
- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by:	re (scottl)
Submitted by:	Xin LI <delphij@frontfree.net>
2004-08-16 18:32:07 +00:00