3740 Commits

Author SHA1 Message Date
bz
d9875d4fd4 Add pcb reference counting to the pcblist sysctl handler functions
to ensure type stability while caching the pcb pointers for the
copyout.

Reviewed by:	rwatson
MFC after:	7 days
2010-03-17 18:28:27 +00:00
luigi
3ada53d651 small fixes to estimate the buffer size when requesting all pipes/flows. 2010-03-15 18:09:21 +00:00
luigi
3c242d0b3e + implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
  pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
  enabled with the command

        queue X config mask queue ...

  and makes it possible to support priority-based schedulers, where
  packets should be grouped according to the priority and not some
  fields in the 5-tuple.
  This is implemented as follows:
  - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
    without changing the size or shape of the structure, so there are
    no ABI changes. On passing, also document how other fields are
    used, and remove some useless assignments in ip_fw2.c

  - implement small changes in the userland code to set/read the field;

  - revise the functions in ip_dummynet.c to manipulate masks so they
    also handle the additional field;

There are no ABI changes in this commit.
2010-03-15 17:14:27 +00:00
rwatson
1fdd3bccc0 Abstract out initialization of most aspects of struct inpcbinfo from
their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and
create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy()
to do this work in a central spot.  As inpcbinfo becomes more complex
due to ongoing work to add connection groups, this will reduce code
duplication.

MFC after:      1 month
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-14 18:59:11 +00:00
rrs
5db64758fc The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR:		144529
MFC after:	2 weeks
2010-03-12 22:58:52 +00:00
kmacy
128542c758 - restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
  (e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after:	7 days
2010-03-12 05:03:26 +00:00
luigi
0d5da117aa implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.
2010-03-11 22:42:33 +00:00
luigi
5bde959c5f fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by:	Riccardo Panicucci
2010-03-10 14:21:05 +00:00
qingli
93013817b0 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
luigi
91eb56543a cosmetic changes and C++ compatibility 2010-03-08 11:27:39 +00:00
luigi
4cac8d2a86 don't use C++ keywords as variable names 2010-03-08 11:27:08 +00:00
luigi
d13cb4f803 do not report an error unnecessarily 2010-03-08 11:22:47 +00:00
bz
721ece0e76 Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).

Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC after:	5 days
2010-03-07 15:58:44 +00:00
bz
07f7a52d59 Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00
rwatson
7502c4d558 Locking the tcbinfo structure should not be necessary in tcp_timer_delack(),
so don't.

MFC after:      1 week
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-07 14:23:44 +00:00
rwatson
14fa088a3b Add comment in tcp_discardcb() talking about how we don't, but should,
address TCP races relating to not calling tcp_drain() on stopped callouts.

Discussed with:	bz
2010-03-07 14:13:59 +00:00
rwatson
480b74ed20 Make udp_set_kernel_tunneling() less forgiving when its invariants are
violated: so_pcb can never be NULL for a valid UDP socket, and it is
always SOCK_DGRAM.  Use sotoinpcb() as the rest of the UDP code does.

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-07 10:47:47 +00:00
rwatson
c25f1494fd Remove unnecessary locking of divcbinfo lock from div_output(): this has not
been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was
guaranteed to be stable when a valid socket reference is held (as it is in
the output path).

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-06 22:04:45 +00:00
rwatson
7255ccc6fe Add a comment to tcp_usr_accept() to indicate why it is we acquire the
tcbinfo lock there: r175612, which re-added it, masked a race between
sonewconn(2) and accept(2) that could allow an incompletely initialized
address on a newly-created socket on a listen queue to be exposed.  Full
details can be found in that commit message.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-06 21:38:31 +00:00
bz
f82acabd2e Destroy UDP UMA zones (empty or not) upon network stack teardown
to not leak them making the VM subsystem unhappy with every stoped vnet(*).
We will still leak pages (especially as zones are marked NOFREE).

(*) This will also keep vmstat -z more usable.

Sponsored by:	ISPsystem
MFC after:	5 days
2010-03-06 21:24:32 +00:00
rwatson
72ccf68411 Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE()
to match other pcbinfo locking macros.

MFC after:	1 week
2010-03-06 21:24:11 +00:00
luigi
b84681e7ab plug a memory leak on pipe's reconfiguration 2010-03-05 17:53:28 +00:00
luigi
3aef100f01 fix a memory leak when deleting RED queues 2010-03-05 12:58:19 +00:00
luigi
8399f05e14 portability fixes 2010-03-04 21:52:40 +00:00
luigi
34f9fab9a3 don't use keywords as variable names. 2010-03-04 21:01:59 +00:00
luigi
70c24f778e use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by:	Francesco Magno
2010-03-04 16:53:38 +00:00
luigi
e983b27b49 improve compatibility with RELENG_7.2 2010-03-04 16:52:26 +00:00
luigi
5ceeac4aa8 Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
joel
bb682915c9 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-01 17:05:46 +00:00
bz
b8a1e8dec8 Upon virtual network stack teardown properly release the TCP syncache
resources.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 21:45:04 +00:00
tuexen
f9cc41e4ee Fix handling of SHUTDOWN-ACK chunk in COOKIE_WAIT and COOKIE_ECHOED.
MFC after: 1 week
2010-02-20 20:30:40 +00:00
bz
29381991cf Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().

Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 19:59:52 +00:00
tuexen
02181ec064 * Fix another u_long -> uint32_t issue.
* Remove an unused global variable.
* Fix an issue reported by Bruce Cran related to reusing SCTP socket which
  where connected.

MFC after: 1 week
2010-02-19 18:00:38 +00:00
pjd
c527452336 No need to include security/mac/mac_framework.h here. 2010-02-18 22:26:01 +00:00
tuexen
93bada478f Use uint32_t instead of u_long.
MFC after: 1 week
2010-02-18 13:46:54 +00:00
luigi
c2328f70d5 remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone
2010-02-17 13:06:06 +00:00
tuexen
06fc12b77a Add missing SCTP_PACKED. Spotted by Irene Ruengeler.
MFC after: 1 week
2010-02-13 21:38:15 +00:00
bz
0cce20af31 Properly free resources when destroying the TCP hostcache while
tearing down a network stack (in the VIMAGE jail+vnet case).

For that break out the logic from tcp_hc_purge() into an internal
function we can call from both, the sysctl handler and the
tcp_hc_destroy().

Sponsored by:	ISPsystem
Reviewed by:	silby, lstewart
MFC After:	8 days
2010-02-09 21:31:53 +00:00
tuexen
78aa3f59ba Restore the checksum received before processing the packet.
MFC after: 1 week
2010-02-04 21:02:29 +00:00
qingli
4d8ba24be3 Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by:	avg
MFC after:	5 days
2010-02-02 20:38:30 +00:00
luigi
d774a108f2 use u_char instead of u_int for short bitfields.
For our compiler the two constructs are completely equivalent, but
some compilers (including MSC and tcc) use the base type for alignment,
which in the cases touched here result in aligning the bitfields
to 32 bit instead of the 8 bit that is meant here.

Note that almost all other headers where small bitfields
are used have u_int8_t instead of u_int.

MFC after:	3 days
2010-02-01 14:13:44 +00:00
tuexen
01ee00225c Use [] instead of [0] for flexible arrays.
Obtained from: Bruce Cran
MFC after: 1 week
2010-01-22 07:53:41 +00:00
tuexen
5aaf03563a Get rid of a lot of duplicated code for NR-SACK handle.
Generalize the SACK to code handle also NR-SACKs.
2010-01-17 21:00:28 +00:00
rrs
e0b03cdcce Bug fix: If the allocation of a socket failed and we
freed the inpcb, it was possible to not set the
proper flags on the pcb (i.e. the socket is not there).
This is HIGHLY unlikely since no one else should be
able to find the socket.. but for consistency we
do the proper loop thing to make sure that we
mark the socket as gone on the PCB.
2010-01-17 19:47:59 +00:00
rrs
735b231916 Pulls out another leaked windows ifdef that somehow
made its way through the scrubber.
2010-01-17 19:40:21 +00:00
rrs
c85a2af4da This change syncs up the socketAPI stream-reset
values to match those in linux and the I-D
just released to the IETF.
2010-01-17 19:35:38 +00:00
rrs
09211b9ce2 More leaked ifdefs for APPLE and its mobility stuff. 2010-01-17 19:24:30 +00:00
rrs
3a0bea0af0 Remove another set of "leaked" ifdefs that somehow found
their way into FreeBSD.
2010-01-17 19:21:50 +00:00
rrs
317a5adf4b Remove strange APPLE define that leaked
through the scrubber scripts. Scripts are
now fixed so this won't happen again.
2010-01-17 19:17:16 +00:00
bz
5d1c4cb181 Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with:	rwatson
MFC after:	5 days
2010-01-17 13:07:52 +00:00