3951 Commits

Author SHA1 Message Date
bz
62b524634f Use correct field to track statistics counting error as bad header length.
This assimilates the code to what ip_input has been doing since r1.1 in
this case.

Submitted by:	Rozhuk Ivan (rozhuk.im gmail.com)
MFC after:	4 days
2010-12-05 01:09:48 +00:00
tuexen
bd095c3b8e Fix a bug where also the number of non-renegable gap reports
was considered to be potentially renegable.

MFC after: 1 day.
2010-12-04 19:29:49 +00:00
lstewart
8300743dab Import a clean-room implementation of the experimental H-TCP congestion control
algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.

H-TCP was designed to provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "H-TCP: A framework for congestion control in high-speed and long-distance
networks" provides additional detail.

In collaboration with:	David Hayes <dahayes at swin edu au> and
			Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	rpaulo (older patch from a few weeks ago)
MFC after:	3 months
2010-12-02 06:40:21 +00:00
lstewart
6e097be88a Import a clean-room implementation of the experimental CUBIC congestion control
algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.

CUBIC was designed for provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional
detail.

In collaboration with:	David Hayes <dahayes at swin edu au> and
			Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	rpaulo (older patch from a few weeks ago)
MFC after:	3 months
2010-12-02 06:05:44 +00:00
lstewart
6d18efb46a General cleanup of the NewReno CC module (no functional changes):
- Remove superfluous includes and unhelpful comments.

- Alphabetically order functions.

- Make functions static.

Sponsored by:	FreeBSD Foundation
MFC after:	9 weeks
X-MFC with:	r215166
2010-12-02 02:32:46 +00:00
lstewart
f6f8b1d549 - Reinstantiate the after_idle hook call in tcp_output(), which got lost
somewhere along the way due to mismerging r211464 in our development tree.

- Capture the essence of r211464 in NewReno's after_idle() hook. We don't
  use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited.

Sponsored by:	FreeBSD Foundation
Submitted by:	David Hayes <dahayes at swin edu au>
MFC after:	9 weeks
X-MFC with:	r215166
2010-12-02 01:36:00 +00:00
lstewart
ae5fd63399 Set ssthresh appropriately on RTO. This change was accidentally not ported from
the pre modular CC stack.

Sponsored by:	FreeBSD Foundation
Submitted by:	David Hayes <dahayes at swin edu au>
MFC after:	9 weeks
X-MFC with:	r215166
2010-12-02 01:01:37 +00:00
lstewart
88a0e7e6dd Pass NULL instead of 0 for the th pointer value. NULL != 0 on all platforms.
Submitted by:	David Hayes <dahayes at swin edu au>
MFC after:	9 weeks
X-MFC with:	r215166
2010-12-02 00:47:55 +00:00
glebius
bdd7d886f9 Use time_uptime instead of non-monotonic time_second to drive ARP
timeouts.

Suggested by:	bde
2010-11-30 15:57:00 +00:00
brucec
cd6001f0b6 Fix more continuous/contiguous typos (cf. r215955) 2010-11-27 21:51:39 +00:00
rrs
5f2060fdd7 Adds new dtrace for cwnd functions and lay's
groundwork for future dtrace points (rwnd flightsize etc).

MFC after:	2 months
2010-11-25 13:39:55 +00:00
glebius
2a04fe02fa Redo r166423. It is important not only skip freeing multicast
entires when underlying interface is detached, but also purge
pointers to them, to avoid double-free in future.
2010-11-24 05:24:36 +00:00
dim
fb307d7d1d After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
zec
cc776ae8bd Remove an apparently redundant CURVNET_SET() / CURVNET_RESTORE() pair.
MFC after:	3 days
2010-11-22 14:16:23 +00:00
lstewart
dc840543ff Fix a minor code redundancy nit.
MFC after:	3 days
2010-11-20 08:40:37 +00:00
lstewart
1f06c5de0d When enabling or disabling SIFTR with a VIMAGE kernel, ensure we add or remove
the SIFTR pfil(9) hook functions to or from all network stacks. This patch
allows packets inbound or outbound from a vnet to be "seen" by SIFTR.

Additional work is required to allow SIFTR to actually generate log messages for
all vnet related packets because the siftr_findinpcb() function does not yet
search for inpcbs across all vnets. This issue will be fixed separately.

Reported and tested by:	David Hayes <dahayes at swin edu au>
MFC after:	3 days
2010-11-20 07:36:43 +00:00
gnn
67b3b6b168 Add new, per connection, statistics for TCP, including:
Retransmitted Packets
Zero Window Advertisements
Out of Order Receives

These statistics are available via the -T argument to
netstat(1).
MFC after:	2 weeks
2010-11-17 18:55:12 +00:00
tuexen
53e7c3f2f5 Add an SCTP socket option to retrieve the number of timeouts
of an association.

MFC after: 3 days.
2010-11-16 22:16:38 +00:00
lstewart
d54d1011fb Make the CC framework more VIMAGE friendly by adding the machinery to allow
vnets to select their own default CC algorithm independent of each other and the
base system. If the base system or a vnet has set a default which gets unloaded,
we reset that netstack's default to NewReno.

Sponsored by:	FreeBSD Foundation
Tested by:	Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by:	bz (briefly)
MFC after:	3 months
2010-11-16 09:34:31 +00:00
lstewart
c63666dcd4 - Querying the default CC algo is more common than setting it and the function
is small, so there is no good reason not to declare the buffer at the top.

- Fix a whitespace nit.

Sponsored by:	FreeBSD Foundation
MFC after:	11 weeks
X-MFC with:	r215166
2010-11-16 08:43:25 +00:00
lstewart
4cb0a4f8c6 Move protocol specific implementation detail out of the core CC framework.
Sponsored by:	FreeBSD Foundation
Tested by:	Mikolaj Golub <to.my.trociny at gmail com>
MFC after:	11 weeks
X-MFC with:	r215166
2010-11-16 08:30:39 +00:00
lstewart
b114c25bf8 On CC algorithm module unload, we walk the list of active TCP control blocks.
Any found to be using the algorithm that is about to go away are switched back
to NewReno to avoid leaving dangling pointers which would trigger a panic. For
VIMAGE kernels, there is a list per vnet to walk, yet the implementation was
only examining one of the vnet lists.

Fix the implementation of the above feature for VIMAGE kernels by looping
through all active TCP control blocks across all vnets.

Sponsored by:	FreeBSD Foundation
Tested by:	Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by:	bz (briefly)
MFC after:	11 weeks
2010-11-16 07:57:56 +00:00
lstewart
c28db57abb cc_init() should only be run once on system boot, but with VIMAGE kernels it
runs on boot and each time a vnet jail is created. Running cc_init() multiple
times results in a panic when attempting to initialise the cc_list lock again,
and so r215166 effectively broke the use of vnet jails.

Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded
on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in
this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is
initialised before module registration is attempted.

Sponsored by:	FreeBSD Foundation
Reported and tested by:	Mikolaj Golub <to.my.trociny at gmail com>
MFC after:	11 weeks
X-MFC with:	r215166
2010-11-16 07:09:05 +00:00
dim
fda4020a88 Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
tuexen
bd09c40347 Take out special code for disable CRC computations on
the loopback interface for IPv6. It will be handled
by the loopback interface.
2010-11-14 16:44:18 +00:00
tuexen
46d16d30f2 Simplify sctp_delayed_cksum() a bit.
MFC after: 3 days.
2010-11-14 14:37:20 +00:00
tuexen
c0d6d04d71 Fix a locking issue reported by brucec@ affecting
1-to-1 style sockets which have not yet been
accepted.

MFC after: 3 days.
2010-11-13 12:52:44 +00:00
gnn
c3225b5eaa Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
tuexen
48df378ef0 Don't print an empty line when printing mapping arrays.
MFC after: 3 days.
2010-11-12 20:46:33 +00:00
tuexen
29f904bd05 Fix more issues with the SACK/NR-SACK generation code.
MFC after: 3 days.
2010-11-12 20:45:21 +00:00
luigi
e7ccc85b8f The first customer of the SO_USER_COOKIE option:
the "sockarg" ipfw option matches packets associated to
a local socket and with a non-zero so_user_cookie value.
The value is made available as tablearg, so it can be used
as a skipto target or pipe number in ipfw/dummynet rules.

Code by Paul Joe, manpage by me.

Submitted by:	Paul Joe
MFC after:	1 week
2010-11-12 13:05:17 +00:00
lstewart
df9f23bf3f This commit marks the first formal contribution of the "Five New TCP Congestion
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/

- Add a KPI and supporting infrastructure to allow modular congestion control
  algorithms to be used in the net stack. Algorithms can maintain per-connection
  state if required, and connections maintain their own algorithm pointer, which
  allows different connections to concurrently use different algorithms. The
  TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
  programmatically query or change the congestion control algorithm respectively
  from within an application at runtime.

- Integrate the framework with the TCP stack in as least intrusive a manner as
  possible. Care was also taken to develop the framework in a way that should
  allow integration with other congestion aware transport protocols (e.g. SCTP)
  in the future. The hope is that we will one day be able to share a single set
  of congestion control algorithm modules between all congestion aware transport
  protocols.

- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
  and use it to decouple the meaning of recovery from a congestion event and
  recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
  congestion control protocols don't generally need to recover from packet loss
  and need a different way to note a congestion recovery episode within the
  stack.

- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
  and ensures the stack always uses the appropriate mechanisms for recovering
  from packet loss during a congestion recovery episode.

- Extract the NewReno congestion control algorithm from the TCP stack and
  massage it into module form. NewReno is always built into the kernel and will
  remain the default algorithm for the forseeable future. Implementations of
  additional different algorithms will become available in the near future.

- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
  that relies on the size of "struct tcpcb" is required.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

In collaboration with:	David Hayes <dahayes at swin edu au> and
			Grenville Armitage <garmitage at swin edu au>
Sponsored by:	Cisco URP, FreeBSD Foundation
Reviewed by:	rpaulo
Tested by:	David Hayes (and many others over the years)
MFC after:	3 months
2010-11-12 06:41:55 +00:00
lstewart
100d9dcd2f Standardise all Swinburne related copyright/licence statements throughout the
tree in preparation for another large code import. Swinburne University is the
legal entity that owns copyright and the 2-clause BSD licence is acceptable.
2010-11-12 00:44:18 +00:00
lstewart
2b9cb75d46 The university does not require that its CRICOS number be included in source
code. Remove all references from the tree.

MFC after:	3 days
2010-11-12 00:19:42 +00:00
tuexen
9596952a63 Fix the SACK/NR-SACK generation code.
MFC after: 3 days.
2010-11-11 18:41:03 +00:00
rrs
8bcde2d437 Fix so that a multicast packet can be sent
even if there is no route out to that mcast address. The code in
in_pcb inadvertantly would error (no route) even though
the user may have specified the address with the
proper socket option (to specify the egress interface).
Thanks bz for reminding me I forgot to commit this ;-)

Reviewed by:	bz
MFC after:	1 week
2010-11-11 05:40:39 +00:00
tuexen
daddd7a01b Improve the scalability by using the local and remote port when
putting inps in the tcpephash.

MFC after: 3 days.
2010-11-09 16:18:32 +00:00
tuexen
66c05b025a Fix a bug which resulted in kevent() reporting an event twice on
1-to-1 style sockets when an ABORT was received.

MFC after: 3 days.
2010-11-09 12:00:39 +00:00
brucec
696c4e1f9b Fix typos.
PR:	bin/148894
Submitted by:	olgeni
2010-11-09 10:59:09 +00:00
tuexen
557f7dc61c Do not have the MTU table twice in the code. Therefore move the
function from the timer code to util, rename it appropriately and
also fix a bug in sctp_get_prev_mtu(), where calling it with a
value existing in the MTU table did not return a smaller one.

MFC after: 3 days.
2010-11-07 18:50:35 +00:00
tuexen
6b30c1f2be Remove two functions which are not used.
MFC after: 3 days.
2010-11-07 17:50:56 +00:00
tuexen
275229d14f * Use exponential backoff for retransmission of SHUTDOWN and
SHUTDOWN-ACK chunks.
* While there, do some cleanups.

MFC after: 3 days.
2010-11-07 17:44:04 +00:00
tuexen
128945a395 Not only stop all timers when entering the SHUTDOWN_SENT state,
but also when entering the SHUTDOWN_ACK_SEND state.

MFC after: 3 days.
2010-11-07 14:39:40 +00:00
tuexen
eab6797fe5 Do not resend DATA chunks without delay when dropped by the peer and
the CRC was correct.

MFC after: 3 days.
2010-11-06 13:43:18 +00:00
tuexen
12a24d255f * Fix an accounting bug regarding SACK/NR-SACK chunks.
* Fix the generation of the SACK/NR-SACK gap lists.

MFC after: 3 days.
2010-11-06 13:30:54 +00:00
n_hibma
06824871e7 Don't spam the console with loaded modules during boot and/or during
startup of ppp.

Note: This cannot be hidden behind bootverbose as this file is included
from lib/libalias as well.
2010-11-03 21:10:12 +00:00
jhb
07732251f2 Don't leak the LLE lock if the arptimer callout is pending or inactive.
Reported by:	David Rhodus
MFC after:	1 month
2010-11-02 13:00:56 +00:00
glebius
192d172d5b Remove meaningless XXXXX, that is a remain of comment, removed in r186200. 2010-10-29 11:13:42 +00:00
glebius
20ab72ce86 Revert a small part of the r198301, that is entirely unrelated to the
r198301 itself. It also broke the logic of not sending more than one
ARP request per second, that consequently lead to a potential problem
of flooding network with broadcast packets.

MFC after:	1 week
2010-10-29 10:57:18 +00:00
bz
520543ca43 Add initial inet DDB support for show in_ifaddr and show sin commands which
proved to be useful while debugging address list problems.

MFC after:	6 days
2010-10-24 22:02:36 +00:00