Commit Graph

82 Commits

Author SHA1 Message Date
andre
6164d7c280 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
sam
8a7d4d7c73 replace mtx_assert by INP_LOCK_ASSERT
Supported by:	FreeBSD Foundation
2003-11-08 22:55:52 +00:00
sam
75598276f5 unbreak compilation of FAST_IPSEC
Supported by:	FreeBSD Foundation
2003-11-08 00:34:34 +00:00
ume
373abd9403 - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
harti
dc2bd41739 The tcp_trace call needs the length of the header. Unfortunately the
code has rotten a bit so that the header length is not correct at
the point when tcp_trace is called. Temporarily compute the correct
value before the call and restore the old value after. This makes
ports/benchmarks/dbs to almost work.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:50:42 +00:00
jlemon
79a1ebfa6f Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the
routine does not require a tcpcb to operate.  Since we no longer keep
template mbufs around, move pseudo checksum out of this routine, and
merge it with the length update.

Sponsored by: DARPA, NAI Labs
2003-02-19 22:18:06 +00:00
jlemon
8b16df5c37 Clean up delayed acks and T/TCP interactions:
- delay acks for T/TCP regardless of delack setting
   - fix bug where a single pass through tcp_input might not delay acks
   - use callout_active() instead of callout_pending()

Sponsored by: DARPA, NAI Labs
2003-02-19 21:18:23 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
hsu
a18d0c206f Optimize away call to bzero() in the common case by directly checking
if a connection has any cached TAO information.
2003-01-18 19:03:26 +00:00
dillon
12d5a1b9ff Fix oops in my last commit, I was calculating a new length but then not
using it.  (The code is already correct in -stable).

Found by: silby
2002-10-16 19:16:33 +00:00
sam
0ef6c52bbc Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
sam
2a86be217a Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
dillon
2b25357133 Update various comments mainly related to retransmit/FIN that I
documented while working on a previous bug.

Fix a PERSIST bug.  Properly account for a FIN sent during a PERSIST.

MFC after:	7 days
2002-10-10 19:21:50 +00:00
jennifer
e52ed9ae79 Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead
of pcbhead. It is on the way.
2002-09-17 03:19:43 +00:00
dillon
bb806c49bf Implement TCP bandwidth delay product window limiting, similar to (but
not meant to duplicate) TCP/Vegas.  Add four sysctls and default the
implementation to 'off'.

net.inet.tcp.inflight_enable	enable algorithm (defaults to 0=off)
net.inet.tcp.inflight_debug	debugging (defaults to 1=on)
net.inet.tcp.inflight_min	minimum window limit
net.inet.tcp.inflight_max	maximum window limit

MFC after:	1 week
2002-08-17 18:26:02 +00:00
jennifer
5709e80b7d Assert that the inpcb lock is held when calling tcp_output().
Approved by:	hsu
2002-08-12 03:22:46 +00:00
rwatson
a034d0cd3c Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the TCP socket code for packet generation and delivery:
label outgoing mbufs with the label of the socket, and check socket and
mbuf labels before permitting delivery to a socket.  Assign labels
to newly accepted connections when the syncache/cookie code has done
its business.  Also set peer labels as convenient.  Currently,
MAC policies cannot influence the PCB matching algorithm, so cannot
implement polyinstantiation.  Note that there is at least one case
where a PCB is not available due to the TCP packet not being associated
with any socket, so we don't label in that case, but need to handle
it in a special manner.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 19:06:49 +00:00
luigi
fbbaa9503a Slightly restructure the #ifdef INET6 sections to make the code
more readable.

Remove the six "register" attributes from variables tcp_output(), the
compiler surely knows well how to allocate them.
2002-06-23 21:25:36 +00:00
silby
86950bb1f4 Re-commit w/fix:
Ensure that the syn cache's syn-ack packets contain the same
  ip_tos, ip_ttl, and DF bits as all other tcp packets.

  PR:             39141
  MFC after:      2 weeks

This time, make sure that ipv4 specific code (aka all of the above)
is only run in the ipv4 case.
2002-06-14 03:08:05 +00:00
silby
0bbc7c9dc5 Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)
Pointy-hat to:	silby
2002-06-14 02:43:20 +00:00
silby
acb0745bfa Ensure that the syn cache's syn-ack packets contain the same
ip_tos, ip_ttl, and DF bits as all other tcp packets.

PR:		39141
MFC after:	2 weeks
2002-06-14 02:36:34 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
silby
1b6efabb90 Reduce the local network slowstart flightsize from infinity to 4 packets.
Now that we've increased the size of our send / receive buffers, bursting
an entire window onto the network may cause congestion.  As a result,
we will slow start beginning with a flightsize of 4 packets.

Problem reported by: Thomas Zenker <thz@Lennartz-electronic.de>

MFC after:	3 days
2001-12-14 18:26:52 +00:00
jlemon
776e8594bd Fix up tabs from cut&n&paste. 2001-12-13 04:02:31 +00:00
obrien
7fd9a6a23a Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.
2001-12-10 08:09:49 +00:00
dillon
f97547e246 Fix a bug with transmitter restart after receiving a 0 window. The
receiver was not sending an immediate ack with delayed acks turned on
when the input buffer is drained, preventing the transmitter from
restarting immediately.

Propogate the TCP_NODELAY option to accept()ed sockets.  (Helps tbench and
is a good idea anyway).

Some cleanup.  Identify additonal issues in comments.

MFC after:	1 day
2001-12-02 08:49:29 +00:00
dillon
cbc4eaa756 The transmit burst limit for newreno completely breaks TCP's performance
if the receive side is using delayed acks.  Temporarily remove it.

MFC after:	0 days
2001-11-30 21:33:39 +00:00
jlemon
a3c1c9fdb4 Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby  (in a previous iteration)
Sponsored by: DARPA, NAI Labs
2001-11-22 04:50:44 +00:00
jayanth
3c25260058 Add a flag TF_LASTIDLE, that forces a previously idle connection
to send all its data, especially when the data is less than one MSS.
This fixes an issue where the stack was delaying the sending
of data, eventhough there was enough window to send all the data and
the sending of data was emptying the socket buffer.

Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp)

Submitted by: Jayanth Vijayaraghavan
2001-10-05 21:33:38 +00:00
silby
f41767543e Eliminate the allocation of a tcp template structure for each
connection.  The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.

Previously, tcp templates required the allocation of one
mbuf per connection.  On large systems, this change should
free up a large number of mbufs.

Reviewed by:	bmilekic, jlemon, ru
MFC after: 2 weeks
2001-06-23 03:21:46 +00:00
ume
832f8d2249 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
phk
54ca48450c Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
archie
51e519827f Don't do snd_nxt rollback optimization (rev. 1.46) for SYN packets.
It causes a panic when/if snd_una is incremented elsewhere (this
is a conservative change, because originally no rollback occurred
for any packets at all).

Submitted by:	Vivek Sadananda Pai <vivek@imimic.com>
2000-09-11 19:11:33 +00:00
archie
be99417f31 Improve performance in the case where ip_output() returns an error.
When this happens, we know for sure that the packet data was not
received by the peer. Therefore, back out any advancing of the
transmit sequence number so that we send the same data the next
time we transmit a packet, avoiding a guaranteed missed packet and
its resulting TCP transmit slowdown.

In most systems ip_output() probably never returns an error, and
so this problem is never seen. However, it is more likely to occur
with device drivers having short output queues (causing ENOBUFS to
be returned when they are full), not to mention low memory situations.

Moreover, because of this problem writers of slow devices were
required to make an unfortunate choice between (a) having a relatively
short output queue (with low latency but low TCP bandwidth because
of this problem) or (b) a long output queue (with high latency and
high TCP bandwidth). In my particular application (ISDN) it took
an output queue equal to ~5 seconds of transmission to avoid ENOBUFS.
A more reasonable output queue of 0.5 seconds resulted in only about
50% TCP throughput. With this patch full throughput was restored in
the latter case.

Reviewed by:	freebsd-net
2000-08-03 23:23:36 +00:00
jayanth
f7e133ab0b re-enable the tcp newreno code. 2000-07-12 22:00:46 +00:00
itojun
5f4e854de1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00
jlemon
c7145aec0f When attempting to transmit a packet, if the system fails to allocate
a mbuf, it may return without setting any timers.  If no more data is
scheduled to be transmitted (this was a FIN) the system will sit in
LAST_ACK state forever.

Thus, when mbuf allocation fails, set the retransmit timer if neither
the retransmit or persist timer is already pending.

Problem discovered by:  Mike Silbersack (silby@silby.com)
Pushed for a fix by:    Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:            jayanth
2000-06-02 17:38:45 +00:00
jayanth
eff95a482d Temporarily turn off the newreno flag until we can track down the known
data corruption problem.
2000-05-11 22:28:28 +00:00
jlemon
8a3c72bb35 Implement TCP NewReno, as documented in RFC 2582. This allows
better recovery for multiple packet losses in a single window.
The algorithm can be toggled via the sysctl net.inet.tcp.newreno,
which defaults to "on".

Submitted by:  Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
2000-05-06 03:31:09 +00:00
jlemon
0dcc5bc0d1 Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.
2000-03-27 19:14:27 +00:00
shin
fdb3a70644 Avoid kernel panic when tcp rfc1323 and rfc1644 options are enabled
at the same time.

   When rfc1323 and rfc1644 option are enabled by sysctl,
   and tcp over IPv6 is tried, kernel panic happens by the
   following check in tcp_output(), because now hdrlen is bigger
   in such case than before.

/*#ifdef DIAGNOSTIC*/
        if (max_linkhdr + hdrlen > MHLEN)
                panic("tcphdr too big");
/*#endif*/

   So change the above check to compare with MCLBYTES in #ifdef INET6 case.
   Also, allocate a mbuf cluster for the header mbuf, in that case.

Bug reported at KAME environment.
Approved by: jkh

Reviewed by: sumikawa
Obtained from: KAME project
2000-02-09 00:34:40 +00:00
shin
e7b807d1e3 Fixed the problem that IPsec connection hangs when bigger data is sent.
-opt_ipsec.h was missing on some tcp files (sorry for basic mistake)
  -made buildable as above fix
  -also added some missing IPv4 mapped IPv6 addr consideration into
   ipsec4_getpolicybysock
2000-01-15 14:56:38 +00:00
shin
3bdc213839 tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
jlemon
628be0515e Restructure TCP timeout handling:
- eliminate the fast/slow timeout lists for TCP and instead use a
    callout entry for each timer.
  - increase the TCP timer granularity to HZ
  - implement "bad retransmit" recovery, as presented in
    "On Estimating End-to-End Network Path Properties", by Allman and Paxson.

Submitted by:	jlemon, wollmann
1999-08-30 21:17:07 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
dg
111e03a013 Added net.inet.tcp.path_mtu_discovery variable which when set to 0
(default 1) disables PMTUD globally. Although PMTUD can be disabled in
the standard case by locking the MTU on a static route (including the
default route), this method doesn't work in the face of dynamic routing
protocols like gated.
1999-05-27 12:24:21 +00:00
julian
9e60b40925 Two cosmetic changes, one a typo and the other, a clarification. 1999-04-07 22:22:06 +00:00