Commit Graph

3928 Commits

Author SHA1 Message Date
rrs
f0f6266342 This fixes a bug with the one-2-one model socket when a
user sets up a socket to a server sends data and closes
the socket before the server has called accept(). It used
to NOT work at all. Now we add a flag to the assoc and
defer assoc cleanup so that the accept will suceed.
2010-05-11 17:02:29 +00:00
bz
0a90ef1728 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
bz
c7fd54ae5a Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by:	(ten 211.ru)
Tested by:	(ten 211.ru)
MFC after:	4 days
2010-04-27 15:07:08 +00:00
bms
6def960c90 Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.

Submitted by:	Matthew Luckie
MFC after:	3 days
2010-04-27 14:14:21 +00:00
tuexen
8156e27dd7 Sending a FWDTSN chunk should not affect the retran count.
MFC after: 3 days.
2010-04-25 19:00:37 +00:00
tuexen
92b6c67524 Undo my lastest fix since that wasn't one at all.
MFC after: 3 days.
2010-04-25 15:04:57 +00:00
tuexen
312805d71c * Fix compilation when using SCTP_AUDITING_ENABLED.
* Fix delaying of SACK by taking out old optimization code
  which does not optimize anymore.
* Fix fast retransmission of chunks abandoned by the
  "number of retransmissions" policy.

MFC after: 3 days.
2010-04-23 08:19:47 +00:00
bz
b883f7a391 Avoid memory access after free. Use the (shortend) copy for the
ipsec mtu lookup as well.

PR:		kern/145736
Submitted by:	Peter Molnar (peter molnar.cc)
MFC after:	3 days
2010-04-21 10:21:34 +00:00
tuexen
df535bd79d Update highest_tsn variables when sliding mapping arrays. 2010-04-20 08:51:21 +00:00
tuexen
be51b44753 Really print the nr_mapping array when it should be printed.`
MFC after: 3 days.
2010-04-20 08:50:19 +00:00
luigi
6758ecb23d whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)
2010-04-19 16:17:30 +00:00
ken
fc7b7bb0cb Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was
causing TSO to break for the Xen netfront driver.

Reviewed by:	gibbs, rwatson
MFC after:	7 days
2010-04-19 15:15:36 +00:00
tuexen
ea377e0111 Get delayed SACK working again.
MFC after: 3 days.
2010-04-19 14:15:58 +00:00
tuexen
66c04c10a0 Fix a bug where SACKs are not sent when they should.
Move some protection code to INVARIANTS.
Cleanups.

MFC after: 3 days.
2010-04-17 12:22:44 +00:00
bz
d7a91dc6bf Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
bz
1f5c413779 Try to help with a virtualized dummynet after r206428.
This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.
2010-04-10 22:11:01 +00:00
rpaulo
d89608b359 Honor the CE bit even when the CWR bit is set.
PR:		145600
Submitted by:	Richard Scheffenegger <rs at netapp.com>
MFC after:	1 week
2010-04-10 12:47:06 +00:00
bms
a59efb5aef Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.

In inp_join_group():
 If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
 MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
 legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
 not violate POLA for apps using the old API.
 It also stops us falling through to kicking IGMP state transactions
 in what is otherwise a no-op case.
 [This has already been dealt with in HEAD, but make it explicit before
  we MFC the change to 8.]

In inp_leave_group():
 Fix a bogus conditional.
 Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
 where it actually belongs.
 If an interface was specified, by primary IPv4 address, for ioctl
 IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
 then and only then should we look up the ifp from the IPv4 address in
 mreqs.imr_interface.
 If not, we fall through to imo_match_group() as before, but only in
 the IP_DROP_MEMBERSHIP case.

With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.

Found by:	ermal (with pfSense)
MFC after:	3 days
2010-04-10 12:05:31 +00:00
luigi
ed181b3acb This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.
2010-04-09 18:02:19 +00:00
luigi
0881f9be0f no need to pass an argument to dn_compat_calc_size()
MFC after:	3 days
2010-04-09 16:06:53 +00:00
luigi
e00fa2c8d4 Hopefully fix the recent breakage in rule deletion.
A few  more tests and this will also go into -stable where
the problem is more critical.
2010-04-07 08:23:58 +00:00
tuexen
be2bd893e0 Fix a off-by-one bug in zeroing out the mapping arrays.
Fix sctp_print_mapping_array().

MFC after: 1 week
2010-04-06 18:57:50 +00:00
tuexen
a8e5a68f92 Use also SCTP/IPv6 checksum offloading in special cases.
MFC after: 2 weeks
2010-04-03 23:51:41 +00:00
tuexen
238a37de82 * Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
  FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
2010-04-03 15:40:14 +00:00
delphij
69ea0c9b4e Add definition of IPv6 mobility header's protocol number, as assigned by
IANA and defined in RFC 3775.

Obtained from:	KAME
2010-03-31 23:02:25 +00:00
luigi
f0058daed2 fix bug in previous commit related to rule deletion
(stable/8 just fixed moments ago)
2010-03-31 02:20:22 +00:00
luigi
8e0cabacd0 remove a leftover debugging message 2010-03-29 12:27:49 +00:00
luigi
564e0558f0 Fix handling of set manipulations.
This patch has two fixes for potential kernel panics (one wrong
index, one access to the wrong lock) and two fixes to wrong logic
in a conditional. The potential panics are also on stable/8,
so I am going to MFC the fix quickly.
2010-03-29 12:19:23 +00:00
rrs
e4906bb78b Adds the option of keeping per-cpu statistics in SCTP. This
may be useful since it gets rid of atomics but I want it to
remain an option until I can do further testing on if it really
speeds things up.
2010-03-24 20:02:40 +00:00
rrs
96102fe418 lagging file I forgot to commit with my nr-sack fixes... opps
Reviewed by:	tuexen@freebsd.org
2010-03-24 20:01:14 +00:00
rrs
4938adaeeb Fix for NR-Sack code. The code was NOT working properly when
enabled. Basically most of the operations were incorrect causing
bad sacks when you enabled nr-sack. The fixes range across
4 files and unifiy most of the processing so that we only test
nr_sack flags to decide which type of sack to generate.

Optimization left for this is to combine the sack generation
code and make it capable of generating either sack thus shrinking
out a routine.

Reviewed by:	tuexen@freebsd.org
2010-03-24 19:45:36 +00:00
luigi
9cd70e5323 Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.

PR:		145004
2010-03-24 15:16:59 +00:00
rrs
a4998a854d Fixes a bug where SACKs in the face of
mapping_array expansion would break. Basically
once we expanded the array we no longer had both
mapping arrays in sync which the sack processing code depends on.
This would mean we were randomly referring to memory that was probably
not there. This mostly just gave us bad sack results going back to the peer.
If INVARIENTS was on of course we would hit the panic routine in the sack_check
call.

We also add a print routine for the place where one would panic in
invarients so one can see what the main mapping array holds.

Reviewed by: tuexen@freebsd.org
MFC after:	2 weeks
2010-03-23 01:36:50 +00:00
kmacy
01cb21605b - boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
  when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
  allocating again until below 12.5%

MFC after:	7 days
2010-03-22 23:04:12 +00:00
luigi
5bd32ef7a5 Add a priority-based packet scheduler.
Sponsored by:	The ONELAB2 Project
Submitted by:	Riccardo Panicucci
2010-03-21 16:30:32 +00:00
luigi
2122ae15e7 no need for ipfw_flush_tables(), we just need ipfw_destroy_tables() 2010-03-21 15:54:07 +00:00
luigi
8cf7b4ad59 revise documentation 2010-03-21 15:52:55 +00:00
kmacy
7ef5a84218 - spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1
- don't default to acquiring tcbinfo lock exclusively in rexmt

MFC after:	7 days
2010-03-20 19:47:30 +00:00
bz
d9875d4fd4 Add pcb reference counting to the pcblist sysctl handler functions
to ensure type stability while caching the pcb pointers for the
copyout.

Reviewed by:	rwatson
MFC after:	7 days
2010-03-17 18:28:27 +00:00
luigi
3ada53d651 small fixes to estimate the buffer size when requesting all pipes/flows. 2010-03-15 18:09:21 +00:00
luigi
3c242d0b3e + implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
  pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
  enabled with the command

        queue X config mask queue ...

  and makes it possible to support priority-based schedulers, where
  packets should be grouped according to the priority and not some
  fields in the 5-tuple.
  This is implemented as follows:
  - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
    without changing the size or shape of the structure, so there are
    no ABI changes. On passing, also document how other fields are
    used, and remove some useless assignments in ip_fw2.c

  - implement small changes in the userland code to set/read the field;

  - revise the functions in ip_dummynet.c to manipulate masks so they
    also handle the additional field;

There are no ABI changes in this commit.
2010-03-15 17:14:27 +00:00
rwatson
1fdd3bccc0 Abstract out initialization of most aspects of struct inpcbinfo from
their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and
create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy()
to do this work in a central spot.  As inpcbinfo becomes more complex
due to ongoing work to add connection groups, this will reduce code
duplication.

MFC after:      1 month
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-14 18:59:11 +00:00
rrs
5db64758fc The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR:		144529
MFC after:	2 weeks
2010-03-12 22:58:52 +00:00
kmacy
128542c758 - restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
  (e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after:	7 days
2010-03-12 05:03:26 +00:00
luigi
0d5da117aa implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.
2010-03-11 22:42:33 +00:00
luigi
5bde959c5f fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by:	Riccardo Panicucci
2010-03-10 14:21:05 +00:00
qingli
93013817b0 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
luigi
91eb56543a cosmetic changes and C++ compatibility 2010-03-08 11:27:39 +00:00
luigi
4cac8d2a86 don't use C++ keywords as variable names 2010-03-08 11:27:08 +00:00
luigi
d13cb4f803 do not report an error unnecessarily 2010-03-08 11:22:47 +00:00
bz
721ece0e76 Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).

Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC after:	5 days
2010-03-07 15:58:44 +00:00
bz
07f7a52d59 Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00
rwatson
7502c4d558 Locking the tcbinfo structure should not be necessary in tcp_timer_delack(),
so don't.

MFC after:      1 week
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-07 14:23:44 +00:00
rwatson
14fa088a3b Add comment in tcp_discardcb() talking about how we don't, but should,
address TCP races relating to not calling tcp_drain() on stopped callouts.

Discussed with:	bz
2010-03-07 14:13:59 +00:00
rwatson
480b74ed20 Make udp_set_kernel_tunneling() less forgiving when its invariants are
violated: so_pcb can never be NULL for a valid UDP socket, and it is
always SOCK_DGRAM.  Use sotoinpcb() as the rest of the UDP code does.

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-07 10:47:47 +00:00
rwatson
c25f1494fd Remove unnecessary locking of divcbinfo lock from div_output(): this has not
been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was
guaranteed to be stable when a valid socket reference is held (as it is in
the output path).

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-06 22:04:45 +00:00
rwatson
7255ccc6fe Add a comment to tcp_usr_accept() to indicate why it is we acquire the
tcbinfo lock there: r175612, which re-added it, masked a race between
sonewconn(2) and accept(2) that could allow an incompletely initialized
address on a newly-created socket on a listen queue to be exposed.  Full
details can be found in that commit message.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-06 21:38:31 +00:00
bz
f82acabd2e Destroy UDP UMA zones (empty or not) upon network stack teardown
to not leak them making the VM subsystem unhappy with every stoped vnet(*).
We will still leak pages (especially as zones are marked NOFREE).

(*) This will also keep vmstat -z more usable.

Sponsored by:	ISPsystem
MFC after:	5 days
2010-03-06 21:24:32 +00:00
rwatson
72ccf68411 Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE()
to match other pcbinfo locking macros.

MFC after:	1 week
2010-03-06 21:24:11 +00:00
luigi
b84681e7ab plug a memory leak on pipe's reconfiguration 2010-03-05 17:53:28 +00:00
luigi
3aef100f01 fix a memory leak when deleting RED queues 2010-03-05 12:58:19 +00:00
luigi
8399f05e14 portability fixes 2010-03-04 21:52:40 +00:00
luigi
34f9fab9a3 don't use keywords as variable names. 2010-03-04 21:01:59 +00:00
luigi
70c24f778e use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by:	Francesco Magno
2010-03-04 16:53:38 +00:00
luigi
e983b27b49 improve compatibility with RELENG_7.2 2010-03-04 16:52:26 +00:00
luigi
5ceeac4aa8 Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
joel
bb682915c9 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-01 17:05:46 +00:00
bz
b8a1e8dec8 Upon virtual network stack teardown properly release the TCP syncache
resources.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 21:45:04 +00:00
tuexen
f9cc41e4ee Fix handling of SHUTDOWN-ACK chunk in COOKIE_WAIT and COOKIE_ECHOED.
MFC after: 1 week
2010-02-20 20:30:40 +00:00
bz
29381991cf Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().

Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 19:59:52 +00:00
tuexen
02181ec064 * Fix another u_long -> uint32_t issue.
* Remove an unused global variable.
* Fix an issue reported by Bruce Cran related to reusing SCTP socket which
  where connected.

MFC after: 1 week
2010-02-19 18:00:38 +00:00
pjd
c527452336 No need to include security/mac/mac_framework.h here. 2010-02-18 22:26:01 +00:00
tuexen
93bada478f Use uint32_t instead of u_long.
MFC after: 1 week
2010-02-18 13:46:54 +00:00
luigi
c2328f70d5 remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone
2010-02-17 13:06:06 +00:00
tuexen
06fc12b77a Add missing SCTP_PACKED. Spotted by Irene Ruengeler.
MFC after: 1 week
2010-02-13 21:38:15 +00:00
bz
0cce20af31 Properly free resources when destroying the TCP hostcache while
tearing down a network stack (in the VIMAGE jail+vnet case).

For that break out the logic from tcp_hc_purge() into an internal
function we can call from both, the sysctl handler and the
tcp_hc_destroy().

Sponsored by:	ISPsystem
Reviewed by:	silby, lstewart
MFC After:	8 days
2010-02-09 21:31:53 +00:00
tuexen
78aa3f59ba Restore the checksum received before processing the packet.
MFC after: 1 week
2010-02-04 21:02:29 +00:00
qingli
4d8ba24be3 Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by:	avg
MFC after:	5 days
2010-02-02 20:38:30 +00:00
luigi
d774a108f2 use u_char instead of u_int for short bitfields.
For our compiler the two constructs are completely equivalent, but
some compilers (including MSC and tcc) use the base type for alignment,
which in the cases touched here result in aligning the bitfields
to 32 bit instead of the 8 bit that is meant here.

Note that almost all other headers where small bitfields
are used have u_int8_t instead of u_int.

MFC after:	3 days
2010-02-01 14:13:44 +00:00
tuexen
01ee00225c Use [] instead of [0] for flexible arrays.
Obtained from: Bruce Cran
MFC after: 1 week
2010-01-22 07:53:41 +00:00
tuexen
5aaf03563a Get rid of a lot of duplicated code for NR-SACK handle.
Generalize the SACK to code handle also NR-SACKs.
2010-01-17 21:00:28 +00:00
rrs
e0b03cdcce Bug fix: If the allocation of a socket failed and we
freed the inpcb, it was possible to not set the
proper flags on the pcb (i.e. the socket is not there).
This is HIGHLY unlikely since no one else should be
able to find the socket.. but for consistency we
do the proper loop thing to make sure that we
mark the socket as gone on the PCB.
2010-01-17 19:47:59 +00:00
rrs
735b231916 Pulls out another leaked windows ifdef that somehow
made its way through the scrubber.
2010-01-17 19:40:21 +00:00
rrs
c85a2af4da This change syncs up the socketAPI stream-reset
values to match those in linux and the I-D
just released to the IETF.
2010-01-17 19:35:38 +00:00
rrs
09211b9ce2 More leaked ifdefs for APPLE and its mobility stuff. 2010-01-17 19:24:30 +00:00
rrs
3a0bea0af0 Remove another set of "leaked" ifdefs that somehow found
their way into FreeBSD.
2010-01-17 19:21:50 +00:00
rrs
317a5adf4b Remove strange APPLE define that leaked
through the scrubber scripts. Scripts are
now fixed so this won't happen again.
2010-01-17 19:17:16 +00:00
bz
5d1c4cb181 Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with:	rwatson
MFC after:	5 days
2010-01-17 13:07:52 +00:00
bz
d80ba03e3c Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by:	jamie, hrs (ipv6 part)
Pointed out by:	hrs [1]
MFC After:	2 weeks
Asked for by:	Jase Thew (bazerka beardz.net)
2010-01-17 12:57:11 +00:00
ume
185bf1f1d5 Change 'me' to match any IPv6 address configured on an interface in
the system as well as any IPv4 address.

Reviewed by:	David Horn <dhorn2000__at__gmail.com>, luigi, qingli
MFC after:	2 weeks
2010-01-17 08:39:48 +00:00
tuexen
c0a018dc4a Get rid of support of an old version of the SCTP-AUTH draft.
Get rid of unused MD5 code.

MFC after: 1 week
2010-01-16 20:04:17 +00:00
qingli
316634c7ad Ensure an address is removed from the interface address
list when the installation of that address fails.

PR:		139559
2010-01-08 17:49:24 +00:00
ru
ce510bcb3f Complete the swap of carp(4) log levels and document the change.
MFC after:	3 days
2010-01-08 16:14:41 +00:00
mbr
7450f52a57 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
luigi
51e5ccee24 we don't use dummynet_drain! 2010-01-07 13:53:47 +00:00
luigi
057d16827d check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO
2010-01-07 12:00:54 +00:00
luigi
db333db4e6 Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.
2010-01-07 10:39:15 +00:00
luigi
6ea737556e some header shuffling to help decoupling ip_divert from ipfw 2010-01-07 10:08:05 +00:00
luigi
6a3745e3ec put ip_len in correct order for ip_output().
This prevents a panic when ipfw generates packets on its own
(such as reject or keepalives for dynamic rules).

Reported by: Chagin Dmitry
2010-01-07 09:28:17 +00:00
luigi
543315e6a4 this file does not require ip_dummynet.h 2010-01-05 11:00:31 +00:00
qingli
281d5caa0e An existing incomplete ARP entry would expire a subsequent
statically configured entry of the same host. This bug was
due to the expiration timer was not cancelled when installing
the static entry. Since there exist a potential race condition
with respect to timer cancellation, simply check for the
LLE_STATIC bit inside the expiration function instead of
cancelling the active timer.

MFC after:	5 days
2010-01-05 00:35:46 +00:00
luigi
40024ff7c3 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
tuexen
67e62f9811 Correct usage of parenthesis.
PR:	kern/142066
Approved by: rrs (mentor)
Obtained from: Henning Petersen, Bruce Cran.
MFC after: 2 weeks
2010-01-04 18:25:38 +00:00
np
10cde58f33 Avoid NULL dereference in arpresolve. 2010-01-03 06:43:13 +00:00
qingli
0897bcc8ad Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.

MFC after:	5 days
2009-12-30 22:13:01 +00:00
qingli
ed965a92bc The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00
syrinx
3c572e438b Make sure the multicast forwarding cache entry's stall queue is properly
initialized before trying to insert an entry into it.

PR:		kern/142052
Reviewed by:	bms
MFC after:	now
2009-12-30 08:52:13 +00:00
luigi
7236f425fc we really need htonl() here, see the comment a few lines above in the code. 2009-12-29 00:02:57 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
bz
7eddc3a63a Make the compiler happy after r201125:
- + remove two unnecessary initializations in ip_output;
+ + remove one unnecessary initializations in ip_output;
2009-12-28 21:14:18 +00:00
luigi
1a1b4d40fb introduce a local variable rte acting as a cache of ro->ro_rt
within ip_output, achieving (in random order of importance):
- a reduction of the number of 'r's in the source code;
- improved legibility;
- a reduction of 64 bytes in the .text
2009-12-28 14:48:32 +00:00
luigi
9c18067568 + remove an unused #define print_ip;
+ remove two unnecessary initializations in ip_output;
+ localize 'len';
+ introduce a temporary variable n to count the number of fragments,
  the compiler seems unable to identify a common subexpression
  (written 3 times, used twice);
+ document some assumptions on ip_len and ip_hl
2009-12-28 14:09:46 +00:00
luigi
b41c473d90 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
luigi
483862a5a2 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
luigi
ffe8fa8dad readability fixes -- add braces on large blocks, remove unnecessary
initializations
2009-12-28 10:19:53 +00:00
luigi
5596409e34 explain details of operation of table lookups, and improve portability 2009-12-28 10:12:35 +00:00
luigi
19c9e43f09 diverted packet must re-enter _after_ the matching rule,
or we create loops.
The divert cookie (that can be set from userland too)
contains the matching rule nr, so we must start from nr+1.

Reported by: Joe Marcus Clarke
2009-12-27 10:19:10 +00:00
luigi
62c83b51a2 fix poor indentation resulting from a merge 2009-12-24 17:35:28 +00:00
luigi
4c57fc7f52 mostly style changes, such as removal of trailing whitespace,
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.

As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.
2009-12-23 18:53:11 +00:00
luigi
d90c98559e fix build with the new fast lookup structure.
Also remove some unnecessary headers
2009-12-23 12:15:21 +00:00
luigi
be2e837cde fix build on 64-bit architectures.
Also fix the indentation on a few lines.
2009-12-23 12:00:50 +00:00
luigi
2043aec456 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
jhb
beb0e14aae - Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove
the leading underscores since they are now implemented.
- Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info
  structure.

Reviewed by:	rwatson
MFC after:	2 weeks
2009-12-22 15:47:40 +00:00
luigi
b1be6dab1b some mostly cosmetic changes in preparation for upcoming work:
+ in many places, replace &V_layer3_chain with a local
  variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
  replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures
2009-12-22 13:53:34 +00:00
ru
2b342a7429 Added proper attribution.
Requested by:	luigi
2009-12-18 17:22:21 +00:00
luigi
0ef00561d5 Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

	ipfw add 5000 count log ....

and run
	tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.
2009-12-17 23:11:16 +00:00
luigi
308d52e697 simplify and document lookup_next_rule() 2009-12-17 17:27:12 +00:00
luigi
268b58e51f simplify the code that finds the next rule after reinjections
MFC after:	1 week
2009-12-17 12:27:54 +00:00
luigi
2544f1c542 remove a duplicate sysctl entry 2009-12-16 18:03:35 +00:00
luigi
0b8651c0f6 bring back a couple of #include that are supplied by nesting,
and explain why they are used.
2009-12-16 13:00:37 +00:00
luigi
3805c8f0d8 Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
  static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
  (the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
  slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after:	1 month
2009-12-16 10:48:40 +00:00
imp
56caa05e14 Quick fix to make this compile:
Remove redundant extern declearations.
If the maintainer has a better fix, then feel free to back this out.
2009-12-16 03:26:37 +00:00
luigi
0b02c03e54 more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).

Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.

MFC after:	1 month
2009-12-15 21:24:12 +00:00
luigi
c4e6c7a490 Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
	now references the two new source files

netinet/ip_fw.h
	remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
	new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
	ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
	code related to dynamic rules

netinet/ipfw/ip_fw2.c
	removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
	minor rearrangement to remove LOOKUP_NAT from the
	main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after:	1 month
2009-12-15 16:15:14 +00:00
luigi
84d17b9dde implement a new match option,
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N

which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.

Work supported by the Onelab project.

MFC after:	1 week
2009-12-15 09:46:27 +00:00
bz
932cbdbe4d Throughout the network stack we have a few places of
if (jailed(cred))
left.  If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that,  as it is used
outside the network stack as well.

Discussed with:	julian, zec, jamie, rwatson (back in Sept)
MFC after:	5 days
2009-12-13 13:57:32 +00:00
luigi
c9c12aa332 use div64 when converting back the burst value for userland 2009-12-10 18:37:14 +00:00
luigi
be7a7cc5a2 when draining a flowset free the entire chain, not just one packet. 2009-12-10 18:34:07 +00:00
luigi
1ac82bf52a centralize the code to free a packet (or a chain) while in dummynet.
Remove an old macro and its stale comment.
2009-12-10 15:17:34 +00:00
oleg
9bb92ae41a Fix burst processing for WF2Q pipes - do not increase available burst size
unless pipe is idle. This should fix follwing issues:
- 'dummynet: OUCH! pipe should have been idle!' log messages.
- exceeding configured pipe bandwidth.

MFC after:	1 week
2009-12-05 23:27:21 +00:00
luigi
4e7d20f2c3 adjust comment in previous commit after Julian's explanation 2009-12-05 11:51:32 +00:00
luigi
3840fefd31 remove a dead block of code, document how the ipfw clients are
hooked and the difference in handling the 'enable' variable
for layer2 and layer3. The latter needs fixing once i figure out
how it worked pre-vnet.

MFC after:	7 days
2009-12-05 09:13:06 +00:00
luigi
ecd6138bd5 fix build with VNET enabled
Reported by: David Wolfskill
2009-12-05 08:32:12 +00:00
ume
193fb81f20 Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard
coded number.

Spotted by:	bz
2009-12-04 15:39:37 +00:00
luigi
688d739fd8 preparation work to replace the monster switch in ipfw_chk() with
table of functions.

This commit (which is heavily based on work done by Marta Carbone
in this year's GSOC project), removes the goto's and explicit
return from the inner switch(), so we will have a easier time when
putting the blocks into individual functions.

MFC after:	3 weeks
2009-12-03 14:22:15 +00:00
ume
81ccd8ba96 Teach an IPv6 to the debug prints. 2009-12-03 11:16:53 +00:00
luigi
5ae7151443 - initialize src_ip in the main loop to prevent a compiler warning
(gcc 4.x under linux, not sure how real is the complaint).
- rename a macro argument to prevent name clashes.
-  add the macro name on a couple of #endif
- add a blank line for readability.

MFC after:	3 days
2009-12-02 17:50:52 +00:00
luigi
d09026c289 Dispatch sockopt calls to ipfw and dummynet
using the new option numbers, IP_FW3 and IP_DUMMYNET3.
Right now the modules return an error if called with those arguments
so there is no danger of unwanted behaviour.

MFC after:	3 days
2009-12-02 15:50:43 +00:00
luigi
c1f03ab1df small changes for portability and diff reduction wrt/ FreeBSD 7.
No functional differences.

- use the div64() macro to wrap 64 bit divisions
  (which almost always are 64 / 32 bits) so they are easier
  to handle with compilers or OS that do not have native
  support for 64bit divisions;

- use a local variable for p_numbytes even if not strictly
  necessary on HEAD, as it reduces diffs with FreeBSD7

- in dummynet_send() check that a tag is present before
  dereferencing the pointer.

- add a couple of blank lines for readability near the end of a function

MFC after:	3 days
2009-12-02 15:20:31 +00:00
ume
b26098335a Teach an IPv6 to send_pkt() and ipfw_tick().
It fixes the issue which keep-alive doesn't work for an IPv6.

PR:		kern/117234
Submitted by:	mlaier, Joost Bekkers <joost__at__jodocus.org>
MFC after:	1 month
2009-12-02 14:32:01 +00:00
glebius
fa173d72f4 Until this moment carp(4) used a strange logging priority. It used debug
priority for such important information as MASTER/BACKUP state change,
and used a normal logging priority for such innocent messages as receiving
short packet (which is a normal VRRP packet between some other routers) or
receving a CARP packet on non-carp interface (someone else running CARP).

This commit shifts message logging priorities to a more sane default.
2009-12-02 13:24:21 +00:00
luigi
0042b1fc70 Add new sockopt names for ipfw and dummynet.
This commit is just grabbing entries for the new names
that will be used in the future, so you don't need to
rebuild anything now.

MFC after:	3 days
2009-12-02 10:36:41 +00:00
luigi
2c01a78647 change the type of the opcode from enum *:8 to u_int8_t
so the size and alignment of the ipfw_insn is not compiler dependent.
No changes in the code generated by gcc.

There was only one instance of this kind in our entire source tree,
so i suspect the old definition was a poor choice (which i made).

MFC after:	3 days
2009-12-02 08:52:06 +00:00
tuexen
0f82c8e821 Use the default stack size for the iterator thread.
This fixes a crash reported by Irene Ruengeler.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-27 17:25:19 +00:00
bms
8c86c2baad Correct a comment.
MFC after:	1 day
2009-11-19 13:21:37 +00:00
tuexen
746ccfdea0 Fix a bug where the system panics when a SHUTDOWN is received with an
illegal TSN.

Approved by: rrs (mentor)
MFC after: ASAP
2009-11-18 12:17:06 +00:00
tuexen
da3464bbd6 Get rid of unused fields addr_over which is never really used,
only copied around.

Approved by: rrs (mentor)
2009-11-17 23:03:38 +00:00
tuexen
c432accfde Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY,
which is defined as LIST_EMPTY.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 20:56:14 +00:00
tuexen
d0a2f15667 Fix a bug where queued ASCONF messags are not sent out.
Approved by: rrs (mentor)
Obtained from:	Irene Ruengeler
MFC after: 1 month
2009-11-17 13:36:21 +00:00
tuexen
1daec1ba97 Fix a memory leak when destroying an SCTP stack.
Clean up sctp_pcb_finish().
Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:13:58 +00:00
tuexen
ca50a585fc Do not start the iterator when there are no associations.
This fixes a bug found by Irene Ruengeler.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:11:23 +00:00
tuexen
b4f7d47afa Disable (temporary) the thread based interator. It does not work with vnet.
Approved by: rrs (mentor)
2009-11-17 13:09:50 +00:00
tuexen
d91facc53d Allow the UMA to free data. This resolves the UMA related bug reported
by Julian.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:08:15 +00:00
tuexen
fca7557740 Do not hold the lock longer than necessary.
Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:05:51 +00:00
bms
d1297db1ae Fix a functional regression in multicast.
Userland daemons need to see IGMP traffic regardless of the group;
omit the imo filter check if the proto is IGMP. The kernel part
of IGMP will have already filtered appropriately at this point.

MFC after:      ASAP
Submitted by:   Franz Struwig
Reported by:    Ivor Prebeg, Franz Struwig
2009-11-15 11:07:22 +00:00
attilio
01da2349df Move inet_aton() (specular to inet_ntoa(), already present in libkern)
into libkern in order to made it usable by other modules than alias_proxy.

Obtained from:	Sandvine Incorporated
Sponsored by:	Sandvine Incorporated
MFC:		1 week
2009-11-12 00:46:28 +00:00
trasz
ab44b532fc Remove ifdefed out part of code, which seems to have originated a decade ago
in OpenBSD.  As it is now, there is no way for this to be useful, since IPsec
is free to forward packets via whatever interface it wants, so checking
capabilities of the interface passed from ip_output (fetched from the routing
table) serves no purpose.

Discussed with:	sam@
2009-11-09 19:53:34 +00:00
oleg
12da1cc788 style(9): add missing parentheses 2009-11-09 09:12:45 +00:00
jhb
5eac2449e3 Several years ago a feature was added to TCP that casued soreceive() to
send an ACK right away if data was drained from a TCP socket that had
previously advertised a zero-sized window.  The current code requires the
receive window to be exactly zero for this to kick in.  If window scaling is
enabled and the window is smaller than the scale, then the effective window
that is advertised is zero.  However, in that case the zero-sized window
handling is not enabled because the window is not exactly zero.  The fix
changes the code to check the raw window value against zero.

Reviewed by:	bz
MFC after:	1 week
2009-11-06 16:55:05 +00:00
oleg
f9007b2aae Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC after:	3 days
2009-11-03 08:41:14 +00:00
tuexen
6196cae2a7 Improve round robin stream scheduler and cleanup some code.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-29 17:40:33 +00:00
brueffer
8dff34f349 Close a stream file descriptor leak.
PR:		138130
Submitted by:	Patroklos Argyroudis <argp@census-labs.com>
MFC after:	1 week
2009-10-28 12:10:29 +00:00
tuexen
8c832e3349 Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-27 18:17:07 +00:00
tuexen
78210debc4 Improve the round robin stream scheduler.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-26 19:23:34 +00:00
rwatson
a5668d2042 Correct spelling typo in ip_input comment.
Pointed out by:	N.J. Mann <njm at njm.me.uk>,
		John Nielsen <john at jnielsen.net>, julian (!), lstewart
MFC after:	2 days
2009-10-24 09:18:26 +00:00
qingli
c96d27ad80 Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
MFC after:	3 days
2009-10-23 18:27:34 +00:00
rwatson
552ae71ff3 Improve grammar in ip_input comment while attempting to maintain what
might be its meaning.

MFC after:	3 days
2009-10-23 13:35:00 +00:00
qingli
eeb330ad1e In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
MFC after:	immediately
Verified by:	rnoland, yongari
2009-10-20 17:55:42 +00:00
rwatson
ec5eebfd83 Rewrap ip_input() comment so that it prints more nicely.
MFC after:	3 days
2009-10-18 11:23:56 +00:00
qingli
7d73ff246e This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.

Reviewed by:	kmacy
MFC after:	3 days
2009-10-15 06:12:04 +00:00
bz
58b36bef21 Compare pointer to NULL rather than 0.
MFC after:	1 month
2009-10-13 20:29:14 +00:00
tuexen
6e9ccb9f7a Fix a race condition where a mutex was destroyed while sleeping on it.
Found while analyzing a report from julian. It might fix his bug.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-11 12:23:56 +00:00
julian
79c1f884ef Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
tuexen
879da4fe23 Correct include order as indicated by bz.
Approved by: re (mentor)
MFC after: 3 days
2009-10-10 13:59:18 +00:00
tuexen
085d02030a Do not include vnet.h twice.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-09 19:30:23 +00:00
tuexen
b69eca12f6 Use correct arguments when calling SCTP_RTALLOC().
Approved by: rrs (mentor)
MFC after: 0 days
2009-10-08 20:33:12 +00:00
rrs
d29665c3a8 Fix so that round robing stream scheduling works as advertised
MFC after:	0 days
2009-10-08 11:36:06 +00:00
rwatson
ef46e20857 Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.

MFC after:	3 days
2009-10-06 20:35:41 +00:00
rwatson
cff0b225cd In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order.  In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.

MFC after:	3 days
Reported by:	Kamigishi Rei <spambox at haruhiism.net>
Reviewed by:	bz
2009-10-05 22:24:13 +00:00
qingli
a1020b6c31 Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after:	immediately
2009-10-02 01:45:11 +00:00
qingli
81ff2327e3 Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR:		kern/139113
MFC after:	3 days
2009-10-02 01:34:55 +00:00
tuexen
990095d301 Fix handling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 2 month
2009-09-20 11:33:39 +00:00
tuexen
0ceae3cfd1 Fix errnos.
Approved by: rrs(mentor)
MFC after: 3 days.
2009-09-20 11:32:22 +00:00
tuexen
064ad1c10a Use appropriate locking when using interface list.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:55:12 +00:00
tuexen
9fa52644a0 Fix the disabling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:18:42 +00:00
tuexen
de3c71bd61 Get SCTP working in combination with VIMAGE.
Contains code from bz.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:02:16 +00:00
bms
178f20a475 Return ENOBUFS consistently if user attempts to exceed
in_mcast_maxsocksrc resource limit.

Submitted by:	syrinx
MFC after:	3 days
2009-09-18 15:12:31 +00:00
rrs
1418771847 Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
tuexen
9695ccf66a Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.

Approved by: rrs (mentor)
MFC after: immediately
2009-09-16 14:23:31 +00:00
silby
ded53b4033 Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by:	andre, rwatson, Eric Van Gyzen
2009-09-16 05:33:15 +00:00