Commit Graph

47 Commits

Author SHA1 Message Date
Julien Charbon
cea40c4888 Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
tcp_tw_2msl_scan().  This race condition drives unplanned timewait
timeout cancellation.  Also simplify implementation by holding inpcb
reference and removing tcptw reference counting.

Differential Revision:	https://reviews.freebsd.org/D826
Submitted by:		Marc De la Gueronniere <mdelagueronniere@verisign.com>
Submitted by:		jch
Reviewed By:		jhb (mentor), adrian, rwatson
Sponsored by:		Verisign, Inc.
MFC after:		2 weeks
X-MFC-With:		r264321
2014-10-30 08:53:56 +00:00
John Baldwin
66eefb1eae Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.

To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT.  Only acquire the TCP pcbinfo lock when
closing an expired connection.  This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.

Submitted by:	Julien Charbon <jcharbon@verisign.com>
Reviewed by:	rwatson, rrs, adrian
MFC after:	2 months
2014-04-10 18:15:35 +00:00
Andre Oppermann
13feab8286 Add DELACK to list of timers.
MFC after:	1 week
2012-11-27 19:07:28 +00:00
Andre Oppermann
8d045dbdf3 Define the delayed ACK timeout value directly as hz/10 instead of
obfuscating it by going through PR_FASTHZ.  No functional change.

MFC after:	2 weeks
2012-10-29 12:17:02 +00:00
Andre Oppermann
024fd5b6bb For retransmits of SYN|ACK from the syncache use the slightly more
aggressive special tcp_syn_backoff[] retransmit schedule instead of
the normal tcp_backoff[] schedule for established connections.

MFC after:	2 weeks
2012-10-28 19:02:07 +00:00
Gleb Smirnoff
9077f38738 Add new socket options: TCP_KEEPINIT, TCP_KEEPIDLE, TCP_KEEPINTVL and
TCP_KEEPCNT, that allow to control initial timeout, idle time, idle
re-send interval and idle send count on a per-socket basis.

Reviewed by:	andre, bz, lstewart
2012-02-05 16:53:02 +00:00
Andre Oppermann
1c18314d17 Remove the TCP inflight bandwidth limiter as announced in r211315
to give way for the pluggable congestion control framework.  It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.

In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.

In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI.  It is always set to 0.

In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI.  It is always set to 0.

These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.

No MFC is planned.  The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.
2010-09-16 21:06:45 +00:00
Mike Silbersack
b8614722ff Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by:	andre, rwatson, Eric Van Gyzen
2009-09-16 05:33:15 +00:00
Mike Silbersack
e2f2059f68 Two changes:
- Reintegrate the ANSI C function declaration change
  from tcp_timer.c rev 1.92

- Reorganize the tcpcb structure so that it has a single
  pointer to the "tcp_timer" structure which contains all
  of the tcp timer callouts.  This change means that when
  the single tcp timer change is reintegrated, tcpcb will
  not change in size, and therefore the ABI between
  netstat and the kernel will not change.

Neither of these changes should have any functional
impact.

Reviewed by: bmah, rrs
Approved by: re (bmah)
2007-09-24 05:26:24 +00:00
Robert Watson
85d9437250 Back out tcp_timer.c:1.93 and associated changes that reimplemented the many
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change.  This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).

In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI.  The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.

This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.

Approved by:	re (kensmith)
Submitted by:	silby
Tested by:	rwatson, kris
Problems reported by:	peter, kris, others
2007-09-07 09:19:22 +00:00
Peter Wemm
c4a184bdc4 Change TCPTV_MIN to be independent of HZ. While it was documented to
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections.  When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.

To demonstrate, boot with kern.hz=100.  ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key.  Time how long it takes to
drop the connection.

The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.

Now boot with kern.hz=1000 (default).  The same test causes the ssh session
to drop after just 9-10 seconds.  This is a big deal on a wifi connection.

With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100.  Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.

This commit changes TCPTV_MIN to be scaled with hz.  rexmit_min should
always be about 30.  If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.

This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.

As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf.  This is safe to run from 6.0 onwards.

Approved by:  re (rwatson)
Reviewed by:  andre, silby
2007-07-31 22:11:55 +00:00
Andre Oppermann
abb91d889a Remove now unused stuff forgotten in the previous commit. 2007-05-16 17:55:22 +00:00
Andre Oppermann
2104448fe7 Move TIME_WAIT related functions and timer handling from files
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:

 tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()

 tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
 tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
 tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()

This is a mechanical move with appropriate renames and making
them static if used only locally.

The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
2007-05-16 17:14:25 +00:00
Ruslan Ermilov
7480de4305 Make "struct tcp_timer" visible only to the kernel, and unbreak world. 2007-04-11 14:08:42 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Mohan Srinivasan
7c72af8770 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
John-Mark Gurney
4dc630cdd2 if min is greater than max, prefer max over min... I managed to get a
retransmit timer that was going to take 19 days to trigger...

Reviewed by:	silby
2006-09-25 07:22:39 +00:00
Ruslan Ermilov
751dea2935 Back when we had T/TCP support, we used to apply different
timeouts for TCP and T/TCP connections in the TIME_WAIT
state, and we had two separate timed wait queues for them.
Now that is has gone, the timeout is always 2*MSL again,
and there is no reason to keep two queues (the first was
unused anyway!).

Also, reimplement the remaining queue using a TAILQ (it
was technically impossible before, with two queues).
2006-09-07 13:06:00 +00:00
Mohan Srinivasan
464469c713 Fixes an edge case bug in timewait handling where ticks rolling over causing
the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry).
Reviewed by:	silby
2006-08-11 21:15:23 +00:00
Andre Oppermann
eaf80179e2 Have TCP Inflight disable itself if the RTT is below a certain
threshold.  Inflight doesn't make sense on a LAN as it has
trouble figuring out the maximal bandwidth because of the coarse
tick granularity.

The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold
in milliseconds below which inflight will disengage.  It defaults
to 10ms.

Tested by:	Joao Barros <joao.barros-at-gmail.com>,
		Rich Murphey <rich-at-whiteoaklabs.com>
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-02-16 19:38:07 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Robert Watson
a4f757cd5d White space cleanup for netinet before branch:
- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by:	re (scottl)
Submitted by:	Xin LI <delphij@frontfree.net>
2004-08-16 18:32:07 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Jonathan Lemon
607b0b0cc9 Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one.  Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs
2003-03-08 22:06:20 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Alfred Perlstein
e88894d39a make the strings for tcptimers, tanames and prurequests const to silence
warnings.
2002-08-16 09:07:59 +00:00
Matthew Dillon
701bec5a38 Introduce two new sysctl's:
net.inet.tcp.rexmit_min (default 3 ticks equiv)

    This sysctl is the retransmit timer RTO minimum,
    specified in milliseconds.  This value is
    designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

    This sysctl is the retransmit timer RTO slop
    which is added to every retransmit timeout and
    is designed to handle protocol stack overheads
    and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!).  Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.
2002-07-18 19:06:12 +00:00
Matthew Dillon
22fd54d461 I don't know how the minimum retransmit timeout managed to get set to
one second but it badly breaks throughput on networks with minor packet
loss.

Complaints by: at least two people tracked down to this.
MFC after:	3 days
2002-07-17 23:32:03 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Peter Wemm
664a31e496 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 04:46:21 +00:00
Jonathan Lemon
c0a929b430 Change the delayed ack time from 200ms to 100ms.
This results in closer behavior to earlier versions, where the fixed
200ms timer actually resulted in a delay anywhere from 1..200ms, with
the average delay being 100ms.

Pointed out by:	 dg
1999-12-02 03:25:19 +00:00
Jonathan Lemon
9987d77844 Remove conversion macros that were used during development. 1999-08-31 16:31:07 +00:00
Jonathan Lemon
9b8b58e033 Restructure TCP timeout handling:
- eliminate the fast/slow timeout lists for TCP and instead use a
    callout entry for each timer.
  - increase the TCP timer granularity to HZ
  - implement "bad retransmit" recovery, as presented in
    "On Estimating End-to-End Network Path Properties", by Allman and Paxson.

Submitted by:	jlemon, wollmann
1999-08-30 21:17:07 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Bruce Evans
bea0f0be7b Some staticized variables were still declared to be extern. 1997-09-07 05:27:26 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Paul Traina
7b40aa327d Make the misnamed tcp initial keepalive timer value (which is really the
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.

[part 1 of two commits, I just realized I can't play with the indices as
 I was typing this commit message.]
1996-09-13 23:51:44 +00:00
Garrett Wollman
51fb392203 Better selection of initial retransmit timeout when no cached
RTT information is available.

Submitted by: kbracey@art.acorn.co.uk (Kevin Bracey)
(slightly modified by me)
1996-06-14 17:17:32 +00:00
Mike Pritchard
6c5e9bbdf5 Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.
1996-01-30 23:02:38 +00:00
Poul-Henning Kamp
0312fbe97d New style sysctl & staticize alot of stuff. 1995-11-14 20:34:56 +00:00
Garrett Wollman
2f96f1f446 Get rid of some unneeded #ifdef TTCP lines. Also, get rid of some
bogus commons declared in header files.
1995-02-14 02:35:19 +00:00
Garrett Wollman
eb6ad69646 Merge in T/TCP TCP header file changes. 1995-02-08 20:18:48 +00:00
Paul Richards
707f139edb Made idempotent.
Submitted by:	Paul
1994-08-21 05:27:42 +00:00
David Greenman
3c4dd3568f Added $Id$ 1994-08-02 07:55:43 +00:00
Rodney W. Grimes
26f9a76710 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
Rodney W. Grimes
df8bae1de4 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00