Commit Graph

2742 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
89e7e7e32a Add SCTP as a known upper layer protocol over v6.
We are not yet aware of the protocol internals but this way
SCTP traffic over v6 will not be discarded.

Reported by: Peter Lei via rrs
Tested by:   Peter Lei <peterlei cisco.com>
2006-11-13 19:07:32 +00:00
Randall Stewart
7f34832b95 In a true restart case, the send_lock was
not being aquired. This meant that when we cleanup
the outbound we may have one in transit to be
added with the old sequence number. This is bad
since then we loose a message :(

Also the report_outbound needed to have the right
lock when its called which it did not.. I added
the lock with of course a flag since we want to
have the lock before we call it in the restart
case.

This also fixed the FIX ME case where, in the cookie
collision case, we mark for retransmit any that
were bundled with the cookie that was dropped.
This also means changes to the output routine
so we can assure getting the COOKIE-ACK sent
BEFORE we retransmit the Data.

Approved by:	gnn
2006-11-11 22:44:12 +00:00
Randall Stewart
6a91f103b6 Turns out we would reset the TSN seq counter during
a colliding INIT. This if fine except when we have
data outstanding... we basically reset it to the
previous value it was.. so then we end up assigning
the same TSN to two different data chunks.
This patch:

1) Finds a missing lock for when we change the stream
   numbers during COOKIE and INIT-ACK processing.. we
   were NOT locking the send_buffer.. which COULD cause
   problems (found by inspection looking for <2>)

2) Fixes a case during a colliding INIT where we incorrectly
   reset the sending Sequence thus in some cases duplicately
   assigning a TSN.

3) Additional enhancments to logging so we can see strm/tsn in
   the receiver AND new tracking to watch what the sender
   is doing with TSN and STRM seq's.

Approved by:	gnn
2006-11-11 15:59:01 +00:00
Randall Stewart
de0e935b29 This patch fixes a LOR that happens during INIT-ACK collision.
We were calling select_a_tag() inside sctp_send_initate_ack().
During collision cases we have a stcb and thus a SCTP_LOCK. When
we call select_a_tag it (below it) locks the INFO lock. We now
1) pre-select the nonce-tie-tags in sctputil.c during setup of
   a tcb.
2) In the other case where we have to select tags, we unlock after
   incr the ref cnt (so assoc won't go away0 and then do the
   tag selection followed by a relock and decr the refcnt.
Approved by:	gnn
2006-11-10 13:34:55 +00:00
Randall Stewart
08598d7067 Fixes an issue with handling of stream reset. When a
reset comes in we need to calculate the length and
therefore the number of listed streams (if any) based
on the TLV type. Otherwise if we get a retran we could
in theory panic by sending a notification to a user with
a incorrect list and thus no memory listing the streams.
Found in IOS by devtest :-)
Approved by:	gnn
2006-11-09 21:01:07 +00:00
Randall Stewart
03b0b02163 -Fixes first of all the getcred on IPv6 and V4. The
copy's were incorrect and so was the locking.
-A bug was also found that would create a race and
 panic when an abort arrived on a socket being read
 from.
-Also fix the reader to get MSG_TRUNC when a partial
 delivery is aborted.
-Also addresses a couple of coverity caught error path
 memory leaks and a couple of other valid complaints
Approved by:	gnn
2006-11-08 00:21:13 +00:00
Joe Marcus Clarke
1bc3d4c1d1 Fix TFTP NAT support by making sure the appropriate fingerprinting checks
are done.

Reviewed by:	piso
2006-11-07 21:06:48 +00:00
Robert Watson
b96fbb37da Convert three new suser(9) calls introduced between when the priv(9)
patch was prepared and committed to priv(9) calls.  Add XXX comments
as, in each case, the semantics appear to differ from the TCP/UDP
versions of the calls with respect to jail, and because cr_canseecred()
is not used to validate the query.

Obtained from:	TrustedBSD Project
2006-11-06 14:54:06 +00:00
Randall Stewart
f4ad963c9f This changes tracks down the EEOR->NonEEOR mode failure
to wakeup on close of the sender. It basically moves
the return (when the asoc has a reader/writer) further
down and gets the wakeup and assoc appending (of the
PD-API event) moved up before the return.  It also
moves the flag set right before the return so we can
assure only once adding the PD-API events.

Approved by:	gnn
2006-11-06 14:34:21 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Ruslan Ermilov
9274ba8a1f Revert previous commit, and instead make the expression in rev. 1.2
match the style of this file.

OK'ed by:	rrs
2006-11-05 14:36:59 +00:00
Randall Stewart
50cec91936 Tons of fixes to get all the 64bit issues removed.
This also moves two 16 bit int's to become 32 bit
values so we do not have to use atomic_add_16.
Most of the changes are %p, casts and other various
nasty's that were in the orignal code base. With this
commit my machine will now do a build universe.. however
I as yet have not tested on a 64bit machine .. it may not work :-(
2006-11-05 13:25:18 +00:00
Ruslan Ermilov
11acae799a Fix pointer arithmetic to be 64-bit friendly. 2006-11-04 08:45:50 +00:00
Ruslan Ermilov
e349e6b8a0 Remove bogus casts that Randall for some reason didn't borrow
from my supplied patch.
2006-11-04 08:19:01 +00:00
John Birrell
5051417909 Remove a bogus cast in an attempt to fix the tinderbox builds on
lots of arches.
2006-11-04 05:39:39 +00:00
Randall Stewart
562a89b562 More 64 bit pointer fun.
%p changed in multiple prints
the mtod() was also fixed.
2006-11-03 23:04:34 +00:00
Randall Stewart
249820a7d8 Fix two of the 64bit errors on the printfs. 2006-11-03 21:19:54 +00:00
Randall Stewart
cef8ad061a Somehow I missed this one. The sys/cdef.h was out
of order with respect to the FSBID..
2006-11-03 19:48:56 +00:00
Randall Stewart
73932c69b6 Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I
inserted a few to the new files.. but I falied to
add the #include <sys/cdef.h>

Which causes a compile error.. sorry about that... got it
now :-)

Approved by:gnn
2006-11-03 17:21:53 +00:00
Randall Stewart
f8829a4a40 Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by:	gnn
Approved by:	gnn
2006-11-03 15:23:16 +00:00
Oleg Bulyzhin
35da9180dc - Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70
- Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input()

Reviewed by:	glebius, rwatson

- purge_flow_set():
    - Do not leak memory while purging queues which are not bound to pipe.
    - style(9) cleanup

MFC after:	2 months
2006-10-29 12:09:24 +00:00
Oleg Bulyzhin
c2df509a1d - Convert
net.inet.ip.dummynet.curr_time
	net.inet.ip.dummynet.searches
	net.inet.ip.dummynet.search_steps
  to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs.

- Implement simple mechanics for dummynet(4) internal time correction.
  Under certain circumstances (system high load, dummynet lock contention, etc)
  dummynet's tick counter can be significantly slower than it should be.
  (I've observed up to 25% difference on one of my production servers).
  Since this counter used for packet scheduling, it's accuracy is vital for
  precise bandwidth limitation.

  Introduce new sysctl nodes:
  net.inet.ip.dummynet.
    tick_lost		- number of ticks coalesced by taskqueue thread.
    tick_adjustment	- number of time corrections done.
    tick_diff		- adjusted vs non-adjusted tick counter difference
    tick_delta		- last vs 'standard' tick differnece (usec).
    tick_delta_sum	- accumulated (and not corrected yet) time
  			  difference (usec).

Reviewed by:	glebius
MFC after:	2 month
2006-10-27 13:05:37 +00:00
Oleg Bulyzhin
b2b05096fd Use separate thread for servicing dummynet(4).
Utilize taskqueue(9) API.

Submitted by:	glebius
MFC after:	2 month
2006-10-27 11:16:58 +00:00
Oleg Bulyzhin
c447b19f6e style(9) cleanup.
MFC after:	2 month
2006-10-27 10:52:32 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Julian Elischer
010b65f54a revert last change.. premature.. need to wait until if_ethersubr.c
uses pfil to get to ipfw.
2006-10-21 00:16:31 +00:00
Julian Elischer
3df668cc38 Move some variables to a more likely place
and remove "temporary" stuff that is not needed any more.
2006-10-20 19:32:08 +00:00
Maxim Konovalov
428b67b194 o Do not do args->f_id.addr_type == 6 when there is
IS_IP6_FLOW_ID() exactly for that.
2006-10-11 12:14:28 +00:00
Maxim Konovalov
f16ccf6814 o Kill a nit in the comment. 2006-10-11 12:00:53 +00:00
Maxim Konovalov
5f197ce41e o Extend not very informative ipfw(4) message 'drop session, too many
entries' by src:port and dst:port pairs.  IPv6 part is non-functional
as ``limit'' does not support IPv6 flows.

PR:		kern/103967
Submitted by:	based on Bruce Campbell patch
MFC after:	1 month
2006-10-11 11:52:34 +00:00
Ruslan Ermilov
cc81ddd9db Merge the rest of my changes. 2006-10-11 07:11:56 +00:00
Paolo Pisati
f3d9aab351 Various mdoc and grammar fixes.
Approved by: glebius
Reviewed by: glebius, ru
2006-10-08 13:53:45 +00:00
Bjoern A. Zeeb
7002145d8e Set scope on MC address so IPv6 carp advertisement will not get dropped
in ip6_output. In case this fails  handle the error directly and log it[1].
In addition permit CARP over v6 in ip_fw2.

PR:                     kern/98622
Similar patch by:       suz
Discussed with:         glebius [1]
Tested by:              Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr
MFC after:              3 days
2006-10-07 10:19:58 +00:00
Gleb Smirnoff
f7a679b200 Save space on stack moving token ring stuff to its own hack block. 2006-10-04 11:08:14 +00:00
Gleb Smirnoff
9b9a52b496 Style rev. 1.152. 2006-10-04 10:59:21 +00:00
Andre Oppermann
6a7c943c59 Remove stone-aged and irrelevant "#ifndef notdef". 2006-09-29 16:44:45 +00:00
Bruce M Simpson
910e1364b6 Nits.
Submitted by:	ru
2006-09-29 16:16:41 +00:00
Bruce M Simpson
2d20d32344 Push removal of mrouted down to the rest of the tree. 2006-09-29 15:45:11 +00:00
Maxim Konovalov
acc03ac6bb o Convert w/spaces to tabs in the previous commit. 2006-09-29 06:46:31 +00:00
Mike Silbersack
d4bdcb16cc Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5,
scale it to min(ephemeral port range / 2, maxsockets / 5) so that people
with large gobs of memory and/or large maxsockets settings will not
exhaust their entire ephemeral port range with sockets in the TIME_WAIT
state during periods of heavy load.

Those who wish to tweak the size of the TIME_WAIT zone can still do so with
net.inet.tcp.maxtcptw.

Reviewed by: glebius, ru
2006-09-29 06:24:26 +00:00
Andre Oppermann
2c30ec0a1f When tcp_output() receives an error upon sending a packet it reverts parts
of its internal state to ignore the failed send and try again a bit later.
If the error is EPERM the packet got blocked by the local firewall and the
revert may cause the session to get stuck and retry indefinitely.  This way
we treat it like a packet loss and let the retransmit timer and timeouts
do their work over time.

The correct behavior is to drop a connection that gets an EPERM error.
However this _may_ introduce some POLA problems and a two commit approach
was chosen.

Discussed with:	glebius
PR:		kern/25986
PR:		kern/102653
2006-09-28 18:02:46 +00:00
Andre Oppermann
6a2257d911 When doing TSO correctly do the check to prevent a maximum sized IP packet
from overflowing.
2006-09-28 13:59:26 +00:00
Bruce M Simpson
050596b4a0 Fix the IPv4 multicast routing detach path. On interface detach whilst
the MROUTER is running, the system would panic as described in the PR.

The fix in the PR is a good start, however, the other state associated
with the multicast forwarding cache has to be freed in order to avoid
leaking memory and other possible panics.

More care and attention is needed in this area.

PR:		kern/82882
MFC after:	1 week
2006-09-28 12:21:08 +00:00
Bruce M Simpson
d966841427 The IPv4 code should clean up multicast group state when an interface
goes away. Without this change, it leaks in_multi (and often ether_multi
state) if many clonable interfaces are created and destroyed in quick
succession.

The concept of this fix is borrowed from KAME. Detailed information about
this behaviour, as well as test cases, are available in the PR.

PR:		kern/78227
MFC after:	1 week
2006-09-28 10:04:07 +00:00
Paolo Pisati
7c00cc76f0 Compilation. 2006-09-27 02:08:44 +00:00
Paolo Pisati
be4f3cd0d9 Summer of Code 2005: improve libalias - part 1 of 2
With the first part of my previous Summer of Code work, we get:

-made libalias modular:

 -support for 'particular' protocols (like ftp/irc/etcetc) is no more
  hardcoded inside libalias, but it's available through external
  modules loadable at runtime

 -modules are available both in kernel (/boot/kernel/alias_*.ko) and
  user land (/lib/libalias_*)

 -protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp,
  skinny and smedia

-added logging support for kernel side

-cleanup

After a buildworld, do a 'mergemaster -i' to install the file libalias.conf
in /etc or manually copy it.

During startup (and after every HUP signal) user land applications running
the new libalias will try to read a file in /etc called libalias.conf:
that file contains the list of modules to load.

User land applications affected by this commit are ppp and natd:
if libalias.conf is present in /etc you won't notice any difference.

The only kernel land bit affected by this commit is ng_nat:
if you are using ng_nat, and it doesn't correctly handle
ftp/irc/etcetc sessions anymore, remember to kldload
the correspondent module (i.e. kldload alias_ftp).

General information and details about the inner working are available
in the libalias man page under the section 'MODULAR ARCHITECTURE
(AND ipfw(4) SUPPORT)'.

NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat
support will be part of the next libalias-related commit.

Approved by: glebius
Reviewed by: glebius, ru
2006-09-26 23:26:53 +00:00
John-Mark Gurney
e16fa5ca55 fix calculating to_tsecr... This prevents the rtt calculations from
going all wonky...
2006-09-26 01:21:46 +00:00
Bruce M Simpson
13c8384424 Fix an incompatibility between CARP and IPv4 multicast routing, whereby
the VRRPv2 advertisements will originate from the wrong source address.
This only affects kernels compiled with MROUTING and after the MRT_INIT
ioctl() has been issued.
Set imo_multicast_vif in carp's softc to the invalid value -1 after it is
zeroed by softc allocation, to stop the ip_output() path looking up the
incorrect source address thinking a vif is set.

PR:		kern/100532
Submitted by:	Bohus Plucinsky
MFC after:	1 week
2006-09-25 11:53:54 +00:00
Bruce M Simpson
e2fd806b36 Spleling
Submitted by:	pjd
2006-09-25 11:48:07 +00:00
Bruce M Simpson
07ea6709ea Account for output IP datagrams on the ifaddr where they originated from,
*not* the first ifaddr on the ifp.  This is similar to what NetBSD does.

PR:		kern/72936
Submitted by:	alfred
Reviewed by:	andre
2006-09-25 10:11:16 +00:00
John-Mark Gurney
4dc630cdd2 if min is greater than max, prefer max over min... I managed to get a
retransmit timer that was going to take 19 days to trigger...

Reviewed by:	silby
2006-09-25 07:22:39 +00:00
John-Mark Gurney
402865f637 now that we don't automagicly increase the MTU of host routes, when we copy
the loopback interface, copy it's mtu also..  This means that we again have
large mtu support for local ip addresses...
2006-09-23 19:24:10 +00:00
Bruce M Simpson
f1edc3bde5 Always set the IP version in the TCP input path, to preserve
the header field for possible later IPSEC SPD lookup, even
when the kernel is built without 'options INET6'.

PR:		kern/57760
MFC after:	1 week
Submitted by:	Joachim Schueth
2006-09-23 16:26:31 +00:00
Andre Oppermann
7ff0b850a6 Make tcp_usr_send() free the passed mbufs on error in all cases as the
comment to it claims.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-17 13:39:35 +00:00
John Hay
724e825a16 Handle a list of IPv6 src and dst addresses correctly, eg.
ipfw add allow ip6 from any to 2000::/16,2002::/16

PR:		102422 (part 3)
Submitted by:	Andrey V. Elsukov <bu7cher at yandex dot ru>
MFC after:	5 days
2006-09-16 10:27:05 +00:00
Andre Oppermann
31ecb34a4e When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len
from wrapping when we generate a maximally sized packet for later
segmentation.

Noticed by:	gallatin
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-15 16:08:09 +00:00
Andrey A. Chernov
239e71c612 Add missing #ifdef INET6 (can't be compiled) 2006-09-14 10:22:35 +00:00
Andre Oppermann
67d828b162 Remove unessary includes and follow common ordering style. 2006-09-13 13:21:17 +00:00
Andre Oppermann
bf6d304ab2 Rewrite of TCP syncookies to remove locking requirements and to enhance
functionality:

 - Remove a rwlock aquisition/release per generated syncookie.  Locking
   is now integrated with the bucket row locking of syncache itself and
   syncookies no longer add any additional lock overhead.
 - Syncookie secrets are different for and stored per syncache buck row.
   Secrets expire after 16 seconds and are reseeded on-demand.
 - The computational overhead for syncookie generation and verification
   is one MD5 hash computation as before.
 - Syncache can be turned off and run with syncookies only by setting the
   sysctl net.inet.tcp.syncookies_only=1.

This implementation extends the orginal idea and first implementation
of FreeBSD by using not only the initial sequence number field to store
information but also the timestamp field if present.  This way we can
keep track of the entire state we need to know to recreate the session in
its original form.  Almost all TCP speakers implement RFC1323 timestamps
these days.  For those that do not we still have to live with the known
shortcomings of the ISN only SYN cookies.  The use of the timestamp field
causes the timestamps to be randomized if syncookies are enabled.

The idea of SYN cookies is to encode and include all necessary information
about the connection setup state within the SYN-ACK we send back and thus
to get along without keeping any local state until the ACK to the SYN-ACK
arrives (if ever).  Everything we need to know should be available from
the information we encoded in the SYN-ACK.

A detailed description of the inner working of the syncookies mechanism
is included in the comments in tcp_syncache.c.

Reviewed by:	silby (slightly earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-13 13:08:27 +00:00
Christian S.J. Peron
d94f2a68f8 Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point
exists to allow the mandatory access control policy to properly initialize
mbufs generated by the firewall. An example where this might happen is keep
alive packets, or ICMP error packets in response to other packets.

This takes care of kernel panics associated with un-initialize mbuf labels
when the firewall generates packets.

[1] I modified this patch from it's original version, the initial patch
    introduced a number of entry points which were programmatically
    equivalent. So I introduced only one. Instead, we should leverage
    mac_create_mbuf_netlayer() which is used for similar situations,
    an example being icmp_error()

    This will minimize the impact associated with the MFC

Submitted by:	mlaier [1]
MFC after:	1 week

This is a RELENG_6 candidate
2006-09-12 04:25:13 +00:00
Andre Oppermann
384a05bfd0 Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the
validity of ro->ro_rt first.  This prevents crashing on any non-normally
routed IP packet.

Coverity CID:	162 (incorrectly, it was re-introduced by previous commit)
2006-09-11 19:56:10 +00:00
John-Mark Gurney
3ae2ad088e make use of the host route's mtu for processing. This means we can now
support a network w/ split mtu's by assigning each host route the correct
mtu.  an aspiring programmer could write a daemon to probe hosts and find
out if they support a larger mtu.
2006-09-10 17:49:09 +00:00
Gleb Smirnoff
3e630ef9a9 Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress
creating a compress TIME WAIT states, if both connection endpoints
are local. Default is off.
2006-09-08 13:09:15 +00:00
Ruslan Ermilov
751dea2935 Back when we had T/TCP support, we used to apply different
timeouts for TCP and T/TCP connections in the TIME_WAIT
state, and we had two separate timed wait queues for them.
Now that is has gone, the timeout is always 2*MSL again,
and there is no reason to keep two queues (the first was
unused anyway!).

Also, reimplement the remaining queue using a TAILQ (it
was technically impossible before, with two queues).
2006-09-07 13:06:00 +00:00
Andre Oppermann
b3c0f300fb Second step of TSO (TCP segmentation offload) support in our network stack.
TSO is only used if we are in a pure bulk sending state.  The presence of
TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent
using TSO.  With TSO the TCP header is the same (except for the sequence number)
for all generated packets.  This makes it impossible to transmit any options
which vary per generated segment or packet.

The length of TSO bursts is limited to TCP_MAXWIN.

The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled.

TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO
flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and
m_pkthdr.tso_segsz set to the segment size (net payload size, not counting
IP+TCP headers or TCP options).

IPv6 currently lacks a pseudo-header checksum function and thus doesn't support
TSO yet.

Tested by:	Jack Vogel <jfvogel-at-gmail.com>
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-07 12:53:01 +00:00
Ruslan Ermilov
3c89486cc7 Remove a microoptimization for i386 that was a micropessimization for amd64. 2006-09-07 09:49:08 +00:00
Andre Oppermann
233dcce118 First step of TSO (TCP segmentation offload) support in our network stack.
o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6
 o add CSUM_TSO flag to mbuf pkthdr csum_flags field
 o add tso_segsz field to mbuf pkthdr
 o enhance ip_output() packet length check to allow for large TSO packets
 o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities
 o adjust all callers of tcp_maxmtu[46]() accordingly

Discussed on:	-current, -net
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-06 21:51:59 +00:00
Andre Oppermann
6fbfd5825f Check inp_flags instead of inp_vflag for INP_ONESBCAST flag.
PR:		kern/99558
Tested by:	Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-09-06 19:04:36 +00:00
Andre Oppermann
773725a255 Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR:		kern/99558
Tested by:	Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-09-06 17:12:10 +00:00
Gleb Smirnoff
2c857a9be9 o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely
bad under high load. For example with 40k sockets and 25k tcptw
  entries, connect() syscall can run for seconds. Debugging showed
  that it iterates the cycle millions times and purges thousands of
  tcptw entries at a time.
  Besides practical unusability this change is architecturally
  wrong. First, in_pcblookup_local() is used in connect() and bind()
  syscalls. No stale entries purging shouldn't be done here. Second,
  it is a layering violation.
o Return back the tcptw purging cycle to tcp_timer_2msl_tw(),
  that was removed in rev. 1.78 by rwatson. The commit log of this
  revision tells nothing about the reason cycle was removed. Now
  we need this cycle, since major cleaner of stale tcptw structures
  is removed.
o Disable probably necessary, but now unused
  tcp_twrecycleable() function.

Reviewed by:	ru
2006-09-06 13:56:35 +00:00
Gleb Smirnoff
c3e07bf82a Finally fix rev. 1.256
Pointy hat to:	glebius
2006-09-05 14:00:59 +00:00
Gleb Smirnoff
23ebab416c Remove extra parenthesis in last commit.
Nitpicked by:	ru
2006-09-05 12:22:54 +00:00
Gleb Smirnoff
1f1f90c3a7 - Make net.inet.tcp.maxtcptw modifiable at run time.
- If net.inet.tcp.maxtcptw was ever set explicitly, do
  not change it if kern.ipc.maxsockets is changed.
2006-09-05 12:08:47 +00:00
Thomas Quinot
d438d81581 Fix typo in comment. 2006-09-04 08:32:17 +00:00
John Hay
1c31b456b9 Recognise IPv6 PIM packets.
MFC after:	1 week
2006-08-31 16:56:45 +00:00
Mohan Srinivasan
2374501ca4 Fix for a bug that causes the computation of "len" in tcp_output() to
get messed up, resulting in an inconsistency between the TCP state
and so_snd.
2006-08-26 17:53:19 +00:00
Julian Elischer
afad78e259 comply with style police
Submitted by:	ru
MFC after:	1 month
2006-08-18 22:36:05 +00:00
Julian Elischer
c487be961a Allow ipfw to forward to a destination that is specified by a table.
for example:
  fwd tablearg ip from any to table(1)
where table 1 has entries of the form:
1.1.1.0/24 10.2.3.4
208.23.2.0/24 router2

This allows trivial implementation of a secondary routing table implemented
in the firewall layer.

I expect more work (under discussion with Glebius) to follow this to clean
up some of the messy parts of ipfw related to tables.

Reviewed by:	Glebius
MFC after:	1 month
2006-08-17 22:49:50 +00:00
Julian Elischer
b7522c27d2 Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was
in older versions of FreeBSD. This option is pointless as it is needed in just
about every interesting usage of forward that I have ever seen. It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x  or 7.x
Reviewed by:	glebius
MFC after:	1 week
2006-08-17 00:37:03 +00:00
Mohan Srinivasan
464469c713 Fixes an edge case bug in timewait handling where ticks rolling over causing
the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry).
Reviewed by:	silby
2006-08-11 21:15:23 +00:00
Brooks Davis
43bc7a9c62 With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
2006-08-04 21:27:40 +00:00
Oleg Bulyzhin
0e0b1bb57a Remove useless NULL pointer check: we are using M_WAITOK flag for memory
allocation.

Submitted by:	Andrey Elsukov <bu7cher at yandex dot ru>
Approved by:	glebius (mentor)
MFC after:	1 week
2006-08-04 10:50:51 +00:00
Robert Watson
e850475248 Move soisdisconnected() in tcp_discardcb() to one of its calling contexts,
tcp_twstart(), but not to the other, tcp_detach(), as the socket is
already being torn down and therefore there are no listeners.  This avoids
a panic if kqueue state is registered on the socket at close(), and
eliminates to XXX comments.  There is one case remaining in which
tcp_discardcb() reaches up to the socket layer as part of the TCP host
cache, which would be good to avoid.

Reported by:	Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>
2006-08-02 16:18:05 +00:00
Oleg Bulyzhin
9b1858ca78 Do not leak memory while flushing rules.
Noticed by:	yar
Approved by:	glebius (mentor)
MFC after:	1 week
2006-08-02 14:58:51 +00:00
Robert Watson
a152f8a361 Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
Stephan Uphoff
d915b28015 Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by:	rwatson@, mohans@
MFC after:	1 week
2006-07-18 22:34:27 +00:00
Sam Leffler
6b7330e2d4 Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by:	arch@
2006-07-09 06:04:01 +00:00
Max Laier
05206588f2 Make in-kernel multicast protocols for pfsync and carp work after enabling
dynamic resizing of multicast membership array.

Reported and testing by:	Maxim Konovalov, Scott Ullrich
Reminded by:			thompsa
MFC after:			2 weeks
2006-07-08 00:01:01 +00:00
Robert Watson
be54a5eeb3 Remove unneeded mac.h include.
MFC after:	3 days
2006-07-06 13:25:01 +00:00
Oleg Bulyzhin
6372145725 Complete timebase (time_second -> time_uptime) conversion.
PR:		kern/94249
Reviewed by:	andre (few months ago)
Approved by:	glebius (mentor)
2006-07-05 23:37:21 +00:00
Maxim Konovalov
764a094c3f o Kill BUGS section as it is not valid since rev. 1.4 alias_pptp.c.
Spotted by:	ru.unix.bsd activists
MFC after:	1 week
2006-07-04 20:39:38 +00:00
Yaroslav Tykhiy
4b97d7affd There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy
ad67537233 Use TAILQ_FOREACH consistently. 2006-06-29 17:09:47 +00:00
Gleb Smirnoff
4d09f5a030 Fix URL to Bellovin's paper.
Submitted by:	Anton Yuzhaninov <citrin rambler-co.ru>
2006-06-29 13:38:36 +00:00
Bjoern A. Zeeb
333ad3bc40 Eliminate the offset argument from send_reject. It's not been
used since FreeBSD-SA-06:04.ipfw.
Adopt send_reject6 to what had been done for legacy IP: no longer
send or permit sending rejects for any but the first fragment.

Discussed with: oleg, csjp (some weeks ago)
2006-06-29 11:17:16 +00:00
Bjoern A. Zeeb
421d8aa603 Use INPLOOKUP_WILDCARD instead of just 1 more consistently.
OKed by: rwatson (some weeks ago)
2006-06-29 10:49:49 +00:00
Pawel Jakub Dawidek
835d4b8924 - Use suser_cred(9) instead of directly checking cr_uid.
- Change the order of conditions to first verify that we actually need
  to check for privileges and then eventually check them.

Reviewed by:	rwatson
2006-06-27 11:35:53 +00:00
Andre Oppermann
cc477a6347 In syncache_respond() do not reply with a MSS that is larger than what
the peer announced to us but make it at least tcp_minmss in size.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 17:54:53 +00:00
Andre Oppermann
8bfb19180d Some cleanups and janitorial work to tcp_syncache:
o don't assign remote/local host/port information manually between provided
   struct in_conninfo and struct syncache, bcopy() it instead
 o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture
   the purpose of this field
 o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons
 o fix IPSEC error case printf's to report correct function name
 o in syncache_socket() only transpose enhanced tcp options parameters to
   struct tcpcb when the inpcb doesn't has TF_NOOPT set
 o in syncache_respond() reorder stack variables
 o in syncache_respond() remove bogus KASSERT()

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 16:14:19 +00:00
Andre Oppermann
f72167f4d1 Some cleanups and janitorial work to tcp_dooptions():
o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its
   usage accordingly
 o update the comments to the tcp_dooptions() invocation in
   tcp_input():after_listen to reflect reality
 o move the logic checking the echoed timestamp out of tcp_dooptions() to the
   only place that uses it next to the invocation described in the previous
   item
 o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others
 o add comments in to struct tcpopt.to_flags #defines

No functional changes.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-26 15:35:25 +00:00
Andre Oppermann
dfabcc1d29 Reverse the source/destination parameters to in[6]_pcblookup_hash() in
syncache_respond() for the #ifdef MAC case.

Submitted by:	Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
2006-06-26 09:43:55 +00:00
Robert Watson
b4470c1639 In tcp6_usr_attach(), return immediately if SS_ISDISCONNECTED, to
avoid dereferencing an uninitialized inp variable.

Submitted by:	Michiel Boland <michiel at boland dot org>
MFC after:	1 month
2006-06-26 09:38:08 +00:00
Andre Oppermann
a846263567 Decrement the global syncache counter in syncache_expand() when the entry
is removed from the bucket.  This fixes the syncache statistics.
2006-06-25 11:11:33 +00:00
Andre Oppermann
649ac0ce5f Move the syncookie MD5 context from globals to the stack to make it MP safe. 2006-06-22 15:07:45 +00:00
Hajimu UMEMOTO
a0a59ae4af - Pullup even when the extention header is unknown, to prevent
infinite loop with net.inet6.ip6.fw.deny_unknown_exthdrs=0.
- Teach ipv6 and ipencap as they appear in an IPv4/IPv6 over IPv6
  tunnel.
- Test the next extention header even when the routing header type
  is unknown with net.inet6.ip6.fw.deny_unknown_exthdrs=0.

Found by:	xcast-fan-club
MFC after:	1 week
2006-06-22 13:22:54 +00:00
Andre Oppermann
c9f7b0ad5b Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied
memory location for already existing/initialized mutexes.  With random
data in the memory location this fails (ie. after a soft reboot).

Reported by:	brueffer, YAMAMOTO Shigeru
Submitted by:	YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
2006-06-20 08:11:30 +00:00
David Malone
5e1aa27995 When we receive an out-of-window SYN for an "ESTABLISHED" connection,
ACK the SYN as required by RFC793, rather than ignoring it. NetBSD
have had a similar change since 1999.

PR:		93236
Submitted by:	Grant Edwards <grante@visi.com>
MFC after:	1 month
2006-06-19 12:33:52 +00:00
Andre Oppermann
6593a94979 Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer
used and needed.

Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-18 14:24:12 +00:00
Andre Oppermann
2f1a4ccfc1 Do not access syncache entry before it was allocated for the TF_NOOPT case
in syncache_add().

Found by:	Coverity Prevent
CID:		1473
2006-06-18 13:03:42 +00:00
Andre Oppermann
8411d000a1 Move all syncache related structures to tcp_syncache.c. They are only used
there.

This unbreaks userland programs that include tcp_var.h.

Discussed with:	rwatson
2006-06-18 12:26:11 +00:00
Andre Oppermann
bdfbf1e203 Remove double lock acquisition in syncookie_lookup() which came from last
minute conversions to macros.

Pointy hat to:	andre
2006-06-18 11:48:03 +00:00
Andre Oppermann
ee2e4c1d4e Fix the !INET6 compile.
Reported by:	alc
2006-06-17 18:42:07 +00:00
Andre Oppermann
93f0d0c5bf Rearrange fields in struct syncache and syncache_head to make them more
cache line friendly.

Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:57:36 +00:00
Andre Oppermann
0c529372f0 ANSIfy and tidy up comments.
Sponsored by:   TCP/IP Optimization Fundraise 2005
2006-06-17 17:49:11 +00:00
Andre Oppermann
351630c40d Add locking to TCP syncache and drop the global tcpinfo lock as early
as possible for the syncache_add() case.  The syncache timer no longer
aquires the tcpinfo lock and timeout/retransmit runs can happen in
parallel with bucket granularity.

On a P4 the additional locks cause a slight degression of 0.7% in tcp
connections per second.  When IP and TCP input are deserialized and
can run in parallel this little overhead can be neglected. The syncookie
handling still leaves room for improvement and its random salts may be
moved to the syncache bucket head structures to remove the second lock
operation currently required for it.  However this would be a more
involved change from the way syncookies work at the moment.

Reviewed by:	rwatson
Tested by:	rwatson, ps (earlier version)
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-06-17 17:32:38 +00:00
Oleg Bulyzhin
254c472561 Add support of 'tablearg' feature for:
- 'tag' & 'untag' action parameters.
- 'tagged' & 'limit' rule options.
Rule examples:
	pipe 1 tag tablearg ip from table(1) to any
	allow ip from any to table(2) tagged tablearg
	allow tcp from table(3) to any 25 setup limit src-addr tablearg

sbin/ipfw/ipfw2.c:
1) new macros
   GET_UINT_ARG - support of 'tablearg' keyword, argument range checking.
   PRINT_UINT_ARG - support of 'tablearg' keyword.
2) strtoport(): do not silently truncate/accept invalid port list expressions
   like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup.

Approved by:	glebius (mentor)
MFC after:	1 month
2006-06-15 09:39:22 +00:00
Oleg Bulyzhin
58a0fab73f install_state(): style(9) cleanup
Approved by:	glebius (mentor)
MFC after:	1 month
2006-06-15 08:54:29 +00:00
Andrew Thompson
5feebeeb53 Enable proxy ARP answers on any of the bridged interfaces if proxy record
belongs to another interface within the bridge group.

PR:		kern/94408
Submitted by:	Eygene A. Ryabinkin
MFC after:	1 month
2006-06-09 00:33:30 +00:00
Oleg Bulyzhin
458009ae93 install_state() should properly initialize 'addr_type' field of newly created
flows for O_LIMIT rules.  Otherwise 'ipfw -d show' is unable to display
PARENT rules properly.
(This bug was exposed by ipfw2.c rev.1.90)

Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-06-08 11:27:45 +00:00
Oleg Bulyzhin
d2dc1907e8 Fix following rules: pipe X (tag|altq) Y ...
Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-06-08 11:13:23 +00:00
Robert Watson
f2de87fec4 Push acquisition of pcbinfo lock out of tcp_usr_attach() into
tcp_attach() after the call to soreserve(), as it doesn't require
the global lock.  Rearrange inpcb locking here also.

MFC after:	1 month
2006-06-04 09:31:34 +00:00
Robert Watson
d8ab0ec661 When entering a timer on a tcpcb, don't continue processing if it has been
dropped.  This prevents a bug introduced during the socket/pcb refcounting
work from occuring, in which occasionally the retransmit timer may fire
after a connection has been reset, resulting in the resulting R|A TCP
packet having a source port of 0, as the port reservation has been
released.

While here, fixing up some RUNLOCK->WUNLOCK bugs.

MFC after:	1 month
2006-06-03 19:37:08 +00:00
Robert Watson
f24618aaf0 Acquire udbinfo lock after call to soreserve() rather than before, as it
is not required.  This simplifies error-handling, and reduces the time
that this lock is held.

MFC after:	1 month
2006-06-03 19:29:26 +00:00
Christian S.J. Peron
16d878cc99 Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
Robert Watson
ad3a630f7e Minor restyling and cleanup around ipport_tick().
MFC after:	1 month
2006-06-02 08:18:27 +00:00
Oleg Bulyzhin
6a7d5cb645 Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9).
Since tags are kept while packet resides in kernelspace, it's possible to
use other kernel facilities (like netgraph nodes) for altering those tags.

Submitted by:	Andrey Elsukov <bu7cher at yandex dot ru>
Submitted by:	Vadim Goncharov <vadimnuclight at tpu dot ru>
Approved by:	glebius (mentor)
Idea from:	OpenBSD PF
MFC after:	1 month
2006-05-24 13:09:55 +00:00
Maxim Konovalov
d45e4f9945 o In udp|rip_disconnect() acquire a socket lock before the socket
state modification.  To prevent races do that while holding inpcb
lock.

Reviewed by:	rwatson
2006-05-21 19:28:46 +00:00
Maxim Konovalov
635354c446 o Add missed error check: in ip_ctloutput() sooptcopyin() returns a
result but we never examine it.

Reviewed by:	rwatson
MFC after:	2 weeks
2006-05-21 17:52:08 +00:00
Bruce M Simpson
8d7d85149e Initialize the new members of struct ip_moptions as
a defensive programming measure.

Note that whilst these members are not used by the ip_output()
path, we are passing an instance of struct ip_moptions here
which is declared on the stack (which could be considered a
bad thing).

ip_output() does not consume struct ip_moptions, but in case it
does in future, declare an in_multi vector on the stack too to
behave more like ip_findmoptions() does.
2006-05-18 19:51:08 +00:00
Gleb Smirnoff
e5f88c4492 Since m_pullup() can return a new mbuf, change gre_input2() to
return mbuf back to gre_input(). If the former returns mbuf back
to the latter, then pass it to raw_input().

Coverity ID:	829
2006-05-16 11:15:22 +00:00
Gleb Smirnoff
ffb761f624 - Backout one line from 1.78. The tp can be freed by tcp_drop().
- Style next line.

Coverity ID:	912
2006-05-16 10:51:26 +00:00
Maxim Konovalov
eb16472f74 o In rip_disconnect() do not call rip_abort(), just mark a socket
as not connected.  In soclose() case rip_detach() will kill inpcb for
us later.

It makes rawconnect regression test do not panic a system.

Reviewed by:	rwatson
X-MFC after:	with all 1th April inpcb changes
2006-05-15 09:28:57 +00:00
Max Laier
0e7185f6e7 Use only lower 64bit of src/dest (and src/dest port) for hashing of IPv6
connections and get rid of the flow_id as it is not guaranteed to be stable
some (most?) current implementations seem to just zero it out.

PR:		kern/88664
Reported by:	jylefort
Submitted by:	Joost Bekkers (w/ changes)
Tested by	"regisr" <regisrApoboxDcom>
2006-05-14 23:42:24 +00:00
Bruce M Simpson
3548bfc964 Fix a long-standing limitation in IPv4 multicast group membership.
By making the imo_membership array a dynamically allocated vector,
this minimizes disruption to existing IPv4 multicast code. This
change breaks the ABI for the kernel module ip_mroute.ko, and may
cause a small amount of churn for folks working on the IGMPv3 merge.

Previously, sockets were subject to a compile-time limitation on
the number of IPv4 group memberships, which was hard-coded to 20.
The imo_membership relationship, however, is 1:1 with regards to
a tuple of multicast group address and interface address. Users who
ran routing protocols such as OSPF ran into this limitation on machines
with a large system interface tree.
2006-05-14 14:22:49 +00:00
Max Laier
656faadcb8 Remove ip6fw. Since ipfw has full functional IPv6 support now and - in
contrast to ip6fw - is properly lockes, it is time to retire ip6fw.
2006-05-12 20:39:23 +00:00
Max Laier
e93187482d Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing
seperately.  Also use pfil hook/unhook instead of keeping the check
functions in pfil just to return there based on the sysctl.  While here fix
some whitespace on a nearby SYSCTL_ macro.
2006-05-12 04:41:27 +00:00
Max Laier
432288dcb6 Don't claim "(+ipv6)" if we didn't build with INET6. 2006-05-11 15:22:38 +00:00
Robert Watson
59b8854eee Modify UDP to use sosend_dgram() instead of sosend(). This allows
for signicantly optimized UDP socket I/O when using a single UDP
socket from many threads or processes that share it, by avoiding
significant locking and other overhead in the general sosend()
path that isn't necessary for simple datagram sockets.  Specifically,
this change results in a significant performance improvement for
threaded name service in BIND9 under load.

Suggested by:	Jinmei_Tatsuya at isc dot org
2006-05-06 11:24:59 +00:00
Bjoern A. Zeeb
91b309a1c4 Make sure the ip data pointer is correct before touching it again
after ipsec4_output processing else KAME IPSec using the handbook
configuration with gif(4) will panic the kernel.

Problem reported by:    t. patterson <tp lot.org>
Tested by:              t. patterson <tp lot.org>
2006-05-05 07:31:03 +00:00
Robert Watson
3127286870 Only return (tw) from tcp_twclose() if reuse is passed, otherwise
return NULL.  In principle this shouldn't change the behavior, but
avoids returning a potentially invalid/inappropriate pointer to
the caller.

Found with:	Coverity Prevent (tm)
Submitted by:	pjd
MFC after:	3 months
2006-05-05 06:50:23 +00:00
Pawel Jakub Dawidek
1d7d0bfe5e /tmp/cvsTXPIwQ 2006-05-05 06:24:34 +00:00
Marcel Moolenaar
7c5a8ab212 In in_pcbdrop(), fix !INVARIANTS build. 2006-04-25 23:23:13 +00:00
Robert Watson
8e3f3b169e Rename 'last' to 'inp' in udp_append(): the name 'last' is due to
the fact that the loop through inpcb's in udp_input() tracks the
last inpcb while looping.  We keep that name in the calling loop
but not in the delivery routine itself.

MFC after:	3 months
2006-04-25 17:38:08 +00:00
Robert Watson
10702a2840 Abstract inpcb drop logic, previously just setting of INP_DROPPED in TCP,
into in_pcbdrop().  Expand logic to detach the inpcb from its bound
address/port so that dropping a TCP connection releases the inpcb resource
reservation, which since the introduction of socket/pcb reference count
updates, has been persisting until the socket closed rather than being
released implicitly due to prior freeing of the inpcb on TCP drop.

MFC after:	3 months
2006-04-25 11:17:35 +00:00
Robert Watson
c78cbc7b1d Instead of calling tcp_usr_detach() from tcp_usr_abort(), break out
common pcb tear-down logic into tcp_detach(), which is called from
either.  Invoke tcp_drop() from the tcp_usr_abort() path rather than
tcp_disconnect(), as we want to drop it immediately not perform a
FIN sequence.  This is one reason why some people were experiencing
panics in sodealloc(), as the netisr and aborting thread were
simultaneously trying to tear down the socket.  This bug could often
be reproduced using repeated runs of the listenclose regression test.

MFC after:	3 months
PR:		96090
Reported by:	Peter Kostouros <kpeter at melbpc dot org dot au>, kris
Tested by:	Peter Kostouros <kpeter at melbpc dot org dot au>, kris
2006-04-24 08:20:02 +00:00
Robert Watson
9106a6d6b0 Replace isn_mtx direct use with ISN_*() lock macros so that locking
details/strategy can be changed without touching every use.

MFC after:	3 months
2006-04-23 12:27:42 +00:00
Robert Watson
4c0e8f41f6 Introduce a new TCP mutex, isn_mtx, which protects the initial sequence
number state, rather than re-using pcbinfo.  This introduces some
additional mutex operations during isn query, but avoids hitting the TCP
pcbinfo lock out of yet another frequently firing TCP timer.

MFC after:	3 months
2006-04-22 19:23:24 +00:00
Robert Watson
602cc7f12b Assert the inpcb lock when rehashing an inpcb.
Improve consistency of style around some current assertions.

MFC after:	3 months
2006-04-22 19:15:20 +00:00
Robert Watson
6466b28a40 Remove pcbinfo locking from in_setsockaddr() and in_setpeeraddr();
holding the inpcb lock is sufficient to prevent races in reading
the address and port, as both the inpcb lock and pcbinfo lock are
required to change the address/port.

Improve consistency of spelling in assertions about inp != NULL.

MFC after:	3 months
2006-04-22 19:10:02 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00