181 Commits

Author SHA1 Message Date
tuexen
f674536274 Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial
used by TCP.

Reviewed by:		rrs@, 0mp@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19355
2019-03-23 21:36:59 +00:00
tuexen
796133921a Use exponential backoff for retransmitting SYN segments as specified
in the TCP RFCs.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18974
2019-02-20 17:56:38 +00:00
mmacy
14de8a2820 epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
  there's no longer any benefit to dropping the pcbinfo lock
  and trying to do so just adds an error prone branchfest to
  these functions
- Remove cases of same function recursion on the epoch as
  recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
  thread as the tracker field is now stack or heap allocated
  as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
2018-07-04 02:47:16 +00:00
rrs
e4ec942fc5 This commit brings in a new refactored TCP stack called Rack.
Rack includes the following features:
 - A different SACK processing scheme (the old sack structures are not used).
 - RACK (Recent acknowledgment) where counting dup-acks is no longer done
        instead time is used to knwo when to retransmit. (see the I-D)
 - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt
        to try not to take a retransmit time-out. (see the I-D)
 - Burst mitigation using TCPHTPS
 - PRR (partial rate reduction) see the RFC.

Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.

Note that any connection that does not support SACK will be kicked
back to the "default" base  FreeBSD stack (currently known as "default").

To build this into your kernel you will need to enable in your
kernel:
   makeoptions WITH_EXTRA_TCP_STACKS=1
   options TCPHPTS

Sponsored by:	Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D15525
2018-06-07 18:18:13 +00:00
mmacy
fe8ba29698 Fix spurious retransmit recovery on low latency networks
TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value.

If an ACK arrives before the calculated badrxtwin (now + SRTT):
tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1));

We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops.

Reported by:	rstone
Reviewed by:	hiren, transport
Approved by:	sbruno
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8556
2018-05-08 02:22:34 +00:00
tuexen
08084bf9c5 SImplify the call to tcp_drop(), since the handling of soft error
is also done in tcp_drop(). No functional change.

Sponsored by:	Netflix, Inc.
2018-05-02 20:04:31 +00:00
jtl
a93bdf6963 Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.

The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.

It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.

You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.

This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.

There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.

Reviewed by:	gnn (previous version)
Obtained from:	Netflix, Inc.
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11085
2018-03-22 09:40:08 +00:00
jhb
3df465f54f Export tcp_always_keepalive for use by the Chelsio TOM module.
This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Reviewed by:	bz, emaste
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14129
2018-01-30 23:01:37 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
glebius
d48dcca311 Declare pmtud_blackhole global variables in tcp_timer.h, so that
alternative TCP stacks can legally use them.
2017-10-06 20:33:40 +00:00
tuexen
2621be48c9 Fix blackhole detection.
There were two bugs related to the blackhole detection:
* The smalles size was tried more than two times.
* The restored MSS was not the original one, but the second
  candidate.

MFC after:	1 week
Sponsored by:	Netflix, Inc.
2017-08-28 11:41:18 +00:00
sbruno
d0208bfad0 Use counter(9) for PLPMTUD counters.
Remove unused PLPMTUD sysctl counters.

Bump UPDATING and FreeBSD Version to indicate a rebuild is required.

Submitted by:	kevin.bowling@kev009.com
Reviewed by:	jtl
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12003
2017-08-25 19:41:38 +00:00
glebius
3a5c9aaf2b Hide struct inpcb, struct tcpcb from the userland.
This is a painful change, but it is needed.  On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD.  We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.

Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
  kernel structures inpcb and tcpcb inside.  Export into these structures
  the fields from inpcb and tcpcb that are known to be used, and put there
  a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.

Reviewed by:	rrs, gnn
Differential Revision:	D10018
2017-03-21 06:39:49 +00:00
imp
7e6cabd06e Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
rstone
cb359893cf Don't zero out srtt after excess retransmits
If the TCP stack has retransmitted more than 1/4 of the total
number of retransmits before a connection drop, it decides that
its current RTT estimate is hopelessly out of date and decides
to recalculate it from scratch starting with the next ACK.

Unfortunately, it implements this by zeroing out the current RTT
estimate.  Drop this hack entirely, as it makes it significantly more
difficult to debug connection issues.  Instead check for excessive
retransmits at the point where srtt is updated from an ACK being
received.  If we've exceeded 1/4 of the maximum retransmits,
discard the previous srtt estimate and replace it with the latest
rtt measurement.

Differential Revision:	https://reviews.freebsd.org/D9519
Reviewed by:	gnn
Sponsored by:	Dell EMC Isilon
2017-02-11 17:05:08 +00:00
jtl
0ba4ad5d78 The code currently resets the keepalive timer each time a packet is
received on a TCP session that has entered the ESTABLISHED state. This
results in a lot of calls to reset the keepalive timer.

This patch changes the behavior so we set the keepalive timer for the
keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will
first check to see if the session has been idle for TP_KEEPIDLE ticks. If
not, it will reschedule the keepalive timer for the time the session will
have been idle for TP_KEEPIDLE ticks.

For a session with regular communication, the keepalive timer should fire
approximately once every TP_KEEPIDLE ticks. For sessions with irregular
communication, the keepalive timer might fire more often. But, the
disruption from a periodic keepalive timer should be less than the regular
cost of resetting the keepalive timer on every packet.

(FWIW, this change saved approximately 1.73% of the busy CPU cycles on a
particular test system with a heavy TCP output load. Of course, the
actual impact is very specific to the particular hardware and workload.)

Reviewed by:	gallatin, rrs
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D8243
2016-10-14 14:57:43 +00:00
rrs
c9af4f2541 A few more wording tweaks as suggested (with some modifications
as well) by Ravi Pokala. Thanks for the comments :-)
Sponsored by: Netflix Inc.
2016-08-16 15:17:36 +00:00
rrs
30758bfe03 Comments describing how to properly use the new lock_add functions
and its respective companion.

Sponsored by:	Netflix Inc.
2016-08-16 13:08:03 +00:00
rrs
fae35efb7f This cleans up the timer code in TCP and also makes it so we do not
take the INFO lock *unless* we are really going to delete the TCB.

Differential Revision:	D7136
2016-08-16 12:40:56 +00:00
rrs
afc75dd440 This small change adopts the excellent suggestion for using named
structures in the add of a new tcp-stack that came in late to me
via email after the last commit. It also makes it so that a new
stack may optionally get a callback during a retransmit
timeout. This allows the new stack to clear specific state (think
sack scoreboards or other such structures).

Sponsored by:	Netflix Inc.
Differential Revision:	http://reviews.freebsd.org/D6303
2016-05-17 09:53:22 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
rrs
02c65aa0a0 This cleans up the timers code in TCP to start using the new
async_drain functionality. This as been tested in NF as well as
by Verisign. Still to do in here is to remove all the old flags. They
are currently left being maintained but probably are no longer needed.

Sponsored by:	Netflix Inc.
Differential Revision:	http://reviews.freebsd.org/D5924
2016-04-28 13:27:12 +00:00
gnn
c3d5404bbe FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
glebius
0769763b2b Rename netinet/tcp_cc.h to netinet/cc/cc.h.
Discussed with:	lstewart
2016-01-27 17:59:39 +00:00
hiren
c782b7ca11 Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.

Submitted by:		Jason Wolfe (j at nitrology dot com)
Reviewed by:		gnn, rrs, hiren, bz
MFC after:		1 week
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D5024
2016-01-26 16:33:38 +00:00
glebius
a1e3038e68 - Rename cc.h to more meaningful tcp_cc.h.
- Declare it a kernel only include, which it already is.
- Don't include tcp.h implicitly from tcp_cc.h
2016-01-21 22:34:51 +00:00
glebius
7e3646578b Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
  options length. The functions gives a better estimate, since it takes
  into account SACK state as well.

Reviewed by:	jtl
Differential Revision:	https://reviews.freebsd.org/D3593
2016-01-07 00:14:42 +00:00
pkelsey
e66e064c45 Implementation of server-side TCP Fast Open (TFO) [RFC7413].
TFO is disabled by default in the kernel build.  See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.

Reviewed by:	gnn, jch, stas
MFC after:	3 days
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D4350
2015-12-24 19:09:48 +00:00
rrs
50f477e182 First cut of the modularization of our TCP stack. Still
to do is to clean up the timer handling using the async-drain.
Other optimizations may be coming to go with this. Whats here
will allow differnet tcp implementations (one included).
Reviewed by:	jtl, hiren, transports
Sponsored by:	Netflix Inc.
Differential Revision:	D4055
2015-12-16 00:56:45 +00:00
rrs
dc494194a2 This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
hiren
c9534bc93d Fix an unnecessarily aggressive behavior where mtu clamping begins on first
retransmission timeout (rto) when blackhole detection is enabled.  Make
sure it only happens when the second attempt to send the same segment also fails
with rto.

Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows
the same pattern and gets 2 chances (rto) before further clamping down.

Note: RFC4821 doesn't specify implementation details on how this situation
should be handled.

Differential Revision:	https://reviews.freebsd.org/D3434
Reviewed by:	sbruno, gnn (previous version)
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2015-10-14 06:57:28 +00:00
gnn
e39dbc6166 dd DTrace probe points, translators and a corresponding script
to provide the TCPDEBUG functionality with pure DTrace.

Reviewed by:	rwatson
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	D3530
2015-09-13 15:50:55 +00:00
jch
c4ab4117ed Put r284245 back in place: If at first this fix was seen as a temporary
workaround for a callout(9) issue, it turns out it is instead the right
way to use callout in mpsafe mode without using callout_drain().

r284245 commit message:

Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision:	https://reviews.freebsd.org/D2763
2015-08-30 13:44:39 +00:00
jch
9661bffa8e Revert r284245: "Fix a callout race condition introduced in TCP
timers callouts with r281599."

r281599 fixed a TCP timer race condition, but due a callout(9) bug
it also introduced another race condition workaround-ed with r284245.
The callout(9) bug being fixed with r286880, we can now revert the
workaround (r284245).

Differential Revision:	https://reviews.freebsd.org/D2079 (Initial change)
Differential Revision:	https://reviews.freebsd.org/D2763 (Workaround)
Differential Revision:	https://reviews.freebsd.org/D3078 (Fix)
Sponsored by:		Verisign, Inc.
MFC after:		2 weeks
2015-08-24 09:30:27 +00:00
jch
6c45383db9 Make clear that TIME_WAIT timeout expiration is managed solely by
tcp_tw_2msl_scan().

Sponsored by:	Verisign, Inc.
2015-08-18 08:27:26 +00:00
jch
67927a7a7c Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
jch
582bfe6ed0 Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision:	https://reviews.freebsd.org/D2763
Reviewed by:		hiren
Approved by:		hiren
MFC after:		1 day
Sponsored by:		Verisign, Inc.
2015-06-10 20:43:07 +00:00
jch
b227cb3d85 Fix an old and well-documented use-after-free race condition in
TCP timers:
 - Add a reference from tcpcb to its inpcb
 - Defer tcpcb deletion until TCP timers have finished

Differential Revision:	https://reviews.freebsd.org/D2079
Submitted by:		jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by:		imp, rrs, adrian, jhb, bz
Approved by:		jhb
Sponsored by:		Verisign, Inc.
2015-04-16 10:00:06 +00:00
jch
990bf230ee Provide better debugging information in tcp_timer_activate() and
tcp_timer_active()

Differential Revision:	https://reviews.freebsd.org/D2179
Suggested by:		bz
Reviewed by:		jhb
Approved by:		jhb
2015-04-02 14:43:07 +00:00
jch
e6365e0ee8 Use appropriate timeout_t* instead of void* in tcp_timer_activate()
Suggested by:		imp
Differential Revision:	https://reviews.freebsd.org/D2154
Reviewed by:		imp, jhb
Approved by:		jhb
2015-03-31 10:17:13 +00:00
adrian
f1c9d3f332 Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
  configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
  that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision:	D1383
Reviewed by:	gnn, jfv (intel driver bits)
2015-01-18 18:06:40 +00:00
jch
5630210a7f Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
tcp_tw_2msl_scan().  This race condition drives unplanned timewait
timeout cancellation.  Also simplify implementation by holding inpcb
reference and removing tcptw reference counting.

Differential Revision:	https://reviews.freebsd.org/D826
Submitted by:		Marc De la Gueronniere <mdelagueronniere@verisign.com>
Submitted by:		jch
Reviewed By:		jhb (mentor), adrian, rwatson
Sponsored by:		Verisign, Inc.
MFC after:		2 weeks
X-MFC-With:		r264321
2014-10-30 08:53:56 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
sbruno
aa538f658e Handle small file case with regards to plpmtud blackhole detection.
Submitted by:	Mikhail <mp@lenta.ru>
MFC after:	2 weeks
Relnotes:	yes
2014-10-13 21:06:21 +00:00
sbruno
79bb3d3455 Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources.  If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us.  Attempt to
downshift the mss once to a preconfigured value.

Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.

Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss       -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss     -- mss to try for ipv6

Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed

Phabricator:	https://reviews.freebsd.org/D506
Reviewed by:	rpaulo bz
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Limelight Networks
2014-10-07 21:50:28 +00:00
adrian
14ce8d2190 If we're doing RSS then ensure the TCP timer selection uses the multi-CPU
callwheel setup, rather than just dumping all the timers on swi0.
2014-06-30 04:26:29 +00:00
adrian
7bdec225ef When RSS is enabled and per cpu TCP timers are enabled, do an RSS
lookup for the inp flowid/flowtype to destination CPU.

This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled.  Otherwise the behaviour should be the same
as before.
2014-05-18 22:39:01 +00:00
jhb
7d16e10f89 Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.

To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT.  Only acquire the TCP pcbinfo lock when
closing an expired connection.  This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.

Submitted by:	Julien Charbon <jcharbon@verisign.com>
Reviewed by:	rwatson, rrs, adrian
MFC after:	2 months
2014-04-10 18:15:35 +00:00
davide
431035cf16 - Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with:	mav
Reviewed by:	attilio, bde, luigi, phk
Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo (amd64, sparc64), marius (sparc64), ian (arm),
		markj (amd64), mav, Fabian Keil
2013-03-04 11:09:56 +00:00
jhb
20f8f82daf Don't drop options from the third retransmitted SYN by default. If the
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not.  This can result in
very slow data transfers in the case of window scaling disagreements.

The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.

Reviewed by:	net@
MFC after:	2 weeks
2013-01-09 20:27:06 +00:00