Commit Graph

4033 Commits

Author SHA1 Message Date
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Xin LI
9fe5092de1 Address an edge condition that we found at work, where the carp(4)
interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at
cold boot.  This behavior is not observed when carp(4) interface
is created slightly later, when the underlying interface is fully
up.

Before this change what happen at boot is roughly:

 - ifconfig creates em0 interface;
 - ifconfig clones a carp device using em0;
   (em0's link state is DOWN at this point)
 - carp state: INIT -> BACKUP [*]
 - carp state: BACKUP -> MASTER
 - [Some negotiate between em0 and switch]
 - em0 kicks up link state change event
   (em0's link state is now up DOWN at this point)
 - do_link_state_change() -> carp_carpdev_state()
 - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+]
 - carp state: INIT -> BACKUP
 - carp state: BACKUP -> MASTER

At the [*] stage, em0 did not received any broadcast message from other
node, and assume our node is the master, thus carp(4) sets the link
state to "UP" after becoming a master.  At [+], the master status
is forcely set to "INIT", then an election is casted, after which our
node would actually become a master.

We believe that at the [*] stage, the master status should remain as
"INIT" since the underlying parent interface's link state is not up.

Obtained from:	iXsystems, Inc.
Reported by:	jpaetzel
MFC after:	2 months
2010-08-08 07:04:27 +00:00
Ed Schouten
367698346b Don't use struct timezone.
The timezone structure acquired by gettimeofday() is not used at all.
Just remove it.
2010-08-08 02:51:32 +00:00
Michael Tuexen
87a37484eb Fix a bug where endpoints bound to wildcard addresses where
using addresses not announced to the peer due to address
scoping.

MFC after: 3 weeks
2010-08-05 16:52:13 +00:00
Michael Tuexen
d2604d08d0 Cleanup code.
MFC after: 2 weeks
2010-08-01 08:06:59 +00:00
Bjoern A. Zeeb
19291ab3de Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer.  It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by:	rwatson
MFC after:	6 days
2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb
4579930d2e MFp4 @181628:
Free the rtentry after we diconnected it from the FIB and are counting
it as rttrash.  There might still be a chance we leak it from a different
code path but there is nothing we can do about this here.

Sponsored by:	ISPsystem (in February)
Reviewed by:	julian (in February)
MFC after:	2 weeks
2010-07-31 15:31:23 +00:00
Andre Oppermann
28a53f037a Fix a bug in syncache where the initial CWND for new incoming connections
was limited to one segment under the faulty assumption of a retransmit.
Due to this the opportunity to initialize the increased congestion window
according to RFC3390 was missed.

Support for RFC3465 introduced in r187289 uncovered the bug as the ACK
to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this
increase shouldn't happen as it's explicitly forbidden by RFC3390, but
it's another issue).  Snd_cwnd remains really small (1*MSS + 1) and this
causes really bad interaction with delayed acks on other side.

The variable name sc_rxmits is a bit misleading as it counts all transmits,
not just retransmits.

Submitted by:	Maxim Dounin <mdounin-at-mdounin-dot-ru>
MFC after:	10 days
2010-07-30 21:45:53 +00:00
Randall Stewart
753358d725 Fix the comment block that has the nice
table to really have the nice table :-)

MFC after:	1 month
2010-07-29 12:01:59 +00:00
Randall Stewart
44fbe46280 PR SCTP Bugs. Basically a full sized frame of
PR SCTP FWD-TSN's would not be sent and thus
cause a stalled connection. Also the rwnd
Calculation was also off on the receiver side for
PR-SCTP.
MFC after:	1 month
2010-07-29 11:37:04 +00:00
Gleb Smirnoff
b9bff254af Fix operation of "netgraph" action in conjunction with the
net.inet.ip.fw.one_pass sysctl.

The "ngtee" action is still broken.

PR:		kern/148885
Submitted by:	Nickolay Dudorov <nnd mail.nsk.ru>
2010-07-27 14:26:34 +00:00
Michael Tuexen
74e906fa94 Fix a bug where the length of a FORWARD-TSN chunk was set incorrectly in
the chunk. This resulted in malformed frames.
Remove a duplicate assignment.

MFC after: 2 weeks
2010-07-26 09:26:55 +00:00
Randall Stewart
8db924defb Make sure that we report chunks if a socket
still exists that were not sent. In either
case carefully remove the data if it does not
get taken by the reporting routines.

MFC after:	2 weeks
2010-07-26 09:22:52 +00:00
Randall Stewart
6c065bbe06 When counting the number of chunks in the
retransmission queue to validate the retran count, we
need to include the chunks in the control send queue
too. Otherwise the count will not match and you will get
the invarient warning if invarients are on.

MFC after:	2 weeks
2010-07-26 09:20:55 +00:00
Lawrence Stewart
79848522b5 - Move common code from the hook functions that fills in a packet node struct to
a separate inline function. This further reduces duplicate code that didn't
  have a good reason to stay as it was.

- Reorder the malloc of a pkt_node struct in the hook functions such that it
  only occurs if we managed to find a usable tcpcb associated with the packet.

- Make the inp_locally_locked variable's type consistent with the prototype of
  siftr_siftdata().

Sponsored by:	FreeBSD Foundation
2010-07-18 05:09:10 +00:00
Warner Losh
43e05a6523 machine/cpu.h isn't appropriate for this file,so remove it 2010-07-16 06:32:38 +00:00
Luigi Rizzo
71ad35a185 remove some conditional #ifdefs (no-op on FreeBSD);
run the timer routine on cpu 0.
2010-07-15 14:43:12 +00:00
Luigi Rizzo
297151a0f3 whitespace fixes 2010-07-15 14:37:59 +00:00
Luigi Rizzo
e6fef96ef4 fix a comment and final empty line 2010-07-15 14:37:02 +00:00
Lawrence Stewart
adc5f0109d The SIFTR DPCPU statistics struct was not being zeroed between enable/disable
cycles so the values would accumulate rather than reset for each cycle.

Sponsored by:	FreeBSD Foundation
2010-07-13 08:23:46 +00:00
Lawrence Stewart
985147dec6 Catch up with the rename of DPCPU_SUM to DPCPU_VARSUM in r209978.
Sponsored by:	FreeBSD Foundation
2010-07-13 07:00:57 +00:00
Gleb Smirnoff
281b584e8e Improve last commit: use bpf_mtap2() to avoiding stack usage.
Prodded by:	julian
2010-07-09 11:27:33 +00:00
Gleb Smirnoff
a5f9fc17c2 Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with
a stub m_hdr instead of a full mbuf.

PR:		kern/148050
2010-07-08 13:07:40 +00:00
Randall Stewart
478fbccb67 This fixes a crash in SCTP. It was possible to have a
large number of packets queued to a crashing process.
In a specific case you may get 2 ABORT's back (from
say two packets in flight). If the aborts happened to
be processed at the same time its possible to have
one free the association while the other is trying
to report all the outbound packets. When this occured
it could lead to a crash.

MFC after:	3 days
2010-07-03 14:03:31 +00:00
Lawrence Stewart
a5548bf685 Import the Statistical Information For TCP Research (SIFTR) kernel module into
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log
file, providing the ability to make highly granular measurements of TCP
connection state. The tool is aimed at system administrators, developers and
researchers alike. Please take it for a spin and test it out - the man page
should have all the information required to get you going.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

Sponsored by:	Cisco URP, FreeBSD Foundation
Reviewed by:	dwmalone, gnn, rpaulo
Tested by:	Many on freebsd-current@ and elsewhere over the years
MFC after:	1 month
2010-07-03 13:32:39 +00:00
Randall Stewart
606c58db25 Fix a bug that WILL cause a panic. Basically
a read-lock is being called to check the vtag-timewait cache.
Then in two cases (where a vtag is bad i.e. in the time-wait
state) the write-unlock is called NOT the read-unlock. Under
conditions where lots of associations are coming and going
this will cause the system to panic at some point.

MFC after:	3 days
2010-07-02 09:53:26 +00:00
Gleb Smirnoff
24536f92c5 After processing the O_SKIPTO opcode our cmd points to the next rule, and
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.

To fix this, exit the inner loop with the continue operator forcibly and
explicitly.

PR:		kern/147798
2010-06-29 16:57:30 +00:00
Michael Tuexen
370d524f00 Fix a bug I introduced in r209470.
MFC after: 3 days
2010-06-24 07:43:25 +00:00
Michael Tuexen
749c49ac62 * Implement sctp_does_stcb_own_this_addr() correclty. It was taking the
wrong side into account.
* sctp_findassociation_ep_addr() must check the local address if available.
This fixes a bug where ABORT chunks were accepted even in the case where
the local was not owned by the endpoint.
Thanks to brucec for pointing out a bug in my first version of the fix.
MFC after: 3 days
2010-06-23 15:19:07 +00:00
Michael Tuexen
cd1386ab50 Fix a rece condition in the shutdown handling.
The race condition resulted in a panic.

MFC after: 3 days
2010-06-18 09:01:44 +00:00
Michael Tuexen
fc066a6137 * Fix a bug where the length of the ASCONF-ACK was calculated wrong due
to using an uninitialized variable.
* Fix a bug where a NULL pointer was dereferenced when interfaces
  come and go at a high rate.
* Fix a bug where inps where not deregistered from iterators.
* Fix a race condition in freeing an association.
* Fix a refcount problem related to the iterator.
Each of the above bug results in a panic. It shows up when
interfaces come and go at a high rate.

Obtained from: rrs (partly)
MFC after: 3 days
2010-06-14 21:25:07 +00:00
Randall Stewart
ec4c19fcf0 3 Fixes -
a) There was a case where a ICMP message could cause
   us to return leaving a stuck lock on an stcb.
b) The iterator needed some tweaks to fix its lock
   ordering.
c) The ITERATOR_LOCK is no longer needed in the freeing
   of a stcb. Now that the timer based one is gone we don't
   have a multiple resume situation. Add to that that there
   was somewhere a path out of the freeing of an assoc that
   did NOT release the iterator_lock.. it was time to clean
   this old code up and in the process fix the lock bug.

MFC after:	1 week
2010-06-11 03:54:00 +00:00
Randall Stewart
41291ef07f Found by Michael. In cases where we run
out of memory (no more inp space) we don't
propely NULL the INP on return.

Obtained from:	tuexen
MFC after:	3 Days
2010-06-09 22:05:29 +00:00
Randall Stewart
b3a44e469d Fix serveral bugs all having to do with freeing an
sctp_inpcb:
1) Make sure not to remove the flag on the PCB until
   after the close() caller is back in control with the
   lock. Otherwise a quickly freeing assoc could kill the
   inpcb and cause a panic.

2) Make sure all calls to log_closing have not released
   the locks before calling the log function, we don't
   want the logging function to crash us due to a freed
   inpcb.

3) Make sure that when we get to the end, we release all
   locks (after removing them from view) and as long as
   we are NOT the inp-kill timer removing the inp, call
   the callout_drain() function so a racing timer won't
   later call in and cause a racing crash.
MFC after:	1 week
2010-06-09 16:42:42 +00:00
Randall Stewart
8dcde5165e BUG:Turns out we need to use both bit maps
to calculate the cum-ack (we were not doing
it for the NR-Sack case). With this fix
NR-sack should now work correctly.
MFC after:	1 week
2010-06-09 16:39:18 +00:00
Randall Stewart
9b2e0767e2 2 Bugs:
1) Only use both mapping arrays when NR sack is off. This
   way we can hold off moving the cumack (not the best but
   workable) when NR-sack is on.

2) We must make sure to just return on the move of the
   bit to the NR array if the cum-ack as already went
   past the TSN. This prevents marking a bit behind the
   array and hitting the invariant code that panic's us.

MFC after:	1 week
2010-06-08 03:39:31 +00:00
Randall Stewart
66bd30bd4f This fixes a BUG in the handling of the cum-ack calculation.
We were only paying attention to the nr-mapping-array. Which
seems to make sense on the surface, by definition things
up to the cum-ack should be deliverable thus in the nr-mapping-array.
However (there is always a gotcha) thats not true when it
comes to large messages. The stack may hold the message
while re-assembling it not not deliver it based on several
thresholds. If that happens (which it would for smaller
large messages) then the cum-ack is figured wrong. We
now properly use both arrays in the cum-ack calculation.

MFC after:	1 week.
2010-06-07 18:29:10 +00:00
Randall Stewart
b9771f0404 Opps... my bad.. we don't need a SOCK_UNLOCK() after
calling socantrcvmore_locked() since it will unlock
the lock for you.

MFC after:	1 week
2010-06-07 11:33:20 +00:00
Randall Stewart
9ed1e280f6 Fix so we call socantrcvmore_locked so we
don't see a race where we unlock to call
the non-locked version and have the socket
go away.

MFC after:	1 week
2010-06-07 04:01:38 +00:00
Randall Stewart
8ce4a9a255 1) Optimize the cleanup and don't always depend on
the timer. This is done by considering the locks
   we will destroy and if they are contended we consider
   it the same as a reference count being up. Fixing this
   appears to cleanup another crash that was appearing with
   all the timers where the socket buf lock got corrupted.

2) Fix the sysctl code to take a lot more care when looking
   at INP's that are in the GONE or ALLGONE state.

MFC after:	1 week
2010-06-06 20:34:17 +00:00
Randall Stewart
0c7dc84076 Ok, yet another bug in killing off all the hundreds
of apitesters.. Basically we end up with attempting
to destroy a lock thats contended on. A cookie echo
arrives at the same time that the close is happening.
The close gets the lock but the cookie echo has already
passed the check for the gone flag and is then locked
waiting on the create lock.. when we go to destroy it
bam. For now we do the timer destroy for all calls
to close.. We can probably optimize this later so that
we check whats being contended on and if there is contention
then do the timer thing. but this is probably safest since
the inp has been removed from all lists and references and
only the timer can find it.. once the locks are released all
other places will instantly see the GONE flag and bail (thats
what the change in sctp_input is one place that was lacking
the bail code).

MFC after:	1 week
2010-06-06 19:24:32 +00:00
Randall Stewart
faa1e3f4a9 1) Further enhance the INVARIANT lock validation (no locks) are
held by checking the create and inp locks as well.

2) Fix a bug in that when a socket is closed an INIT-ACK
   is returned, we do NOT unlock the locked_tcb unless its
   different (an unlikely scenario). If we blindly unlock as
   we were doing before we can end up unlocking the actual
   stcb thats about to be sent down to the free function which
   requires the lock be held.

MFC after:	1 week
2010-06-06 16:11:16 +00:00
Randall Stewart
7c82e9fa93 Fix a bug in the sctp_inpcb_free. Basically if the socket
was setup to do an abortive close an association that was
in the accept_queue could get stuck and never freed. Now
we properly start the kill timer on the socket and turn
off the flag (same thing we do for the graceful close method).
MFC after:	1 week
2010-06-06 16:09:12 +00:00
Randall Stewart
3d7001cdcb Fix a bug in sctp_abort_assoc(). DON'T call the sctp_inpcb_free
when the gone flag is set. You don't know what locks the
caller has set and there is already a kill timer running.

MFC after:	1 week
2010-06-06 16:07:40 +00:00
Randall Stewart
2c6b25b4cd Hopefully this fixes a LOR by making
so we only hold the iterator lock during
updates to the iterators work.

MFC after:	1 week
2010-06-06 02:33:46 +00:00
Randall Stewart
a67294246e Bruce's fix for some return's in
error legs.

MFC after:	1 week
2010-06-06 02:32:20 +00:00
Randall Stewart
8e57327bbf Purge out a Windows def that somehow slipped
past the scrubber.

MFC after:	1 Week
2010-06-05 21:39:52 +00:00
Randall Stewart
1909799a4c Spacing issues
MFC after:	1 Week
2010-06-05 21:33:16 +00:00
Randall Stewart
aca14c2aa8 This change does the following:
1) Fix the alignment of a comment.
2) Fix a BUG where we were NOT paying attention
   to the RESEND marking on retransmitting control
   chunks.. and worse we were not decrementing the
   retran count that could cause us to loop forever.
3) Add in the valdiate_no_lock function on invariants
   so that we will really check all ways out to be sure
   a lock does not slip out locked.

MFC after:	1 week.
2010-06-05 21:27:43 +00:00
Randall Stewart
791437b51c Use the proper increment macro when increasing the
number on sent_queue_retran_cnt.

MFC after:	1 week
2010-06-05 21:22:58 +00:00
Randall Stewart
28085b2e10 This does two changes:
1) Makes it so that the INVARIANT function validate nolocks is
   available anywhere.
2) Fixes a BUG where a close has been done on a collision socket
   and the cookie processing would return leaving a lock held.
MFC after:	1 week
2010-06-05 21:20:28 +00:00
Randall Stewart
62fb761ff2 This fixes a bug in the close up of a socket that
had un-accepted assoc's. Basically the assoc (and inp)
would get stuck and never get cleaned up.

MFC after:	1 week
2010-06-05 21:17:23 +00:00
Marko Zec
7c4b8137cd Virtualize the IPv4 multicast routing code.
Submitted by:	iprebeg
Reviewed by:	bms, bz, Pavlin Radoslavov
MFC after:	30 days
2010-06-02 15:44:43 +00:00
Qing Li
0ed6142b31 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
Randall Stewart
f751743351 This adds back the Iterator to the sctp
code base. We now properly have ONE thread
that services all VNET's. Also we purge out
the old timer based iterator code which had
multiple LOR's and other issues.

MFC after:	3 days
2010-05-16 17:03:56 +00:00
Randall Stewart
ea9b0170bf Fix an old long time bug in generating a
fwd-tsn. This would appear when greater than
the size of mbuf TSN's would need to be skipped.

MFC after:	3 days
2010-05-12 18:33:25 +00:00
Randall Stewart
83128708b4 More PR-SCTP bugs:
- Make sure that when you kick the streams you add correctly
    using a 16 bit unsigned.
  - Make sure when sending out you allow FWD-TSN to skip over
    and list the ACKED chunks in the stream/seq list (so the
    rcv will kick the stream)
MFC after:	3 days
2010-05-12 18:00:15 +00:00
Michael Tuexen
091430c121 Get rid of unused constants.
MFC after: 3 days.
2010-05-12 16:10:33 +00:00
Randall Stewart
7898f4085c This fixes PR-SCTP issues:
- Slide the map at the proper place.
 - Mark the bits in the nr_array ONLY if there
   is no marking.
 - When generating a FWD-TSN we allow us to skip past
   ACKED chunks too.

MFC after:	1 weeks
2010-05-12 13:45:46 +00:00
Randall Stewart
88a7eb29d2 This fixes a bug with the one-2-one model socket when a
user sets up a socket to a server sends data and closes
the socket before the server has called accept(). It used
to NOT work at all. Now we add a flag to the assoc and
defer assoc cleanup so that the accept will suceed.
2010-05-11 17:02:29 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
7a657e630d Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by:	(ten 211.ru)
Tested by:	(ten 211.ru)
MFC after:	4 days
2010-04-27 15:07:08 +00:00
Bruce M Simpson
fd963b9929 Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.

Submitted by:	Matthew Luckie
MFC after:	3 days
2010-04-27 14:14:21 +00:00
Michael Tuexen
6dbd88581d Sending a FWDTSN chunk should not affect the retran count.
MFC after: 3 days.
2010-04-25 19:00:37 +00:00
Michael Tuexen
475d0674a6 Undo my lastest fix since that wasn't one at all.
MFC after: 3 days.
2010-04-25 15:04:57 +00:00
Michael Tuexen
f31e6c7f26 * Fix compilation when using SCTP_AUDITING_ENABLED.
* Fix delaying of SACK by taking out old optimization code
  which does not optimize anymore.
* Fix fast retransmission of chunks abandoned by the
  "number of retransmissions" policy.

MFC after: 3 days.
2010-04-23 08:19:47 +00:00
Bjoern A. Zeeb
1c044382c3 Avoid memory access after free. Use the (shortend) copy for the
ipsec mtu lookup as well.

PR:		kern/145736
Submitted by:	Peter Molnar (peter molnar.cc)
MFC after:	3 days
2010-04-21 10:21:34 +00:00
Michael Tuexen
ee94f0a272 Update highest_tsn variables when sliding mapping arrays. 2010-04-20 08:51:21 +00:00
Michael Tuexen
553aff12d4 Really print the nr_mapping array when it should be printed.`
MFC after: 3 days.
2010-04-20 08:50:19 +00:00
Luigi Rizzo
6ba1ccc0f2 whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)
2010-04-19 16:17:30 +00:00
Kenneth D. Merry
3579cf4c4f Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was
causing TSO to break for the Xen netfront driver.

Reviewed by:	gibbs, rwatson
MFC after:	7 days
2010-04-19 15:15:36 +00:00
Michael Tuexen
307b49efef Get delayed SACK working again.
MFC after: 3 days.
2010-04-19 14:15:58 +00:00
Michael Tuexen
37f144eb5d Fix a bug where SACKs are not sent when they should.
Move some protection code to INVARIANTS.
Cleanups.

MFC after: 3 days.
2010-04-17 12:22:44 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb
0f08182a03 Try to help with a virtualized dummynet after r206428.
This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.
2010-04-10 22:11:01 +00:00
Rui Paulo
9c251892c0 Honor the CE bit even when the CWR bit is set.
PR:		145600
Submitted by:	Richard Scheffenegger <rs at netapp.com>
MFC after:	1 week
2010-04-10 12:47:06 +00:00
Bruce M Simpson
933fc4dde6 Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.

In inp_join_group():
 If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
 MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
 legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
 not violate POLA for apps using the old API.
 It also stops us falling through to kicking IGMP state transactions
 in what is otherwise a no-op case.
 [This has already been dealt with in HEAD, but make it explicit before
  we MFC the change to 8.]

In inp_leave_group():
 Fix a bogus conditional.
 Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
 where it actually belongs.
 If an interface was specified, by primary IPv4 address, for ioctl
 IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
 then and only then should we look up the ifp from the IPv4 address in
 mreqs.imr_interface.
 If not, we fall through to imo_match_group() as before, but only in
 the IP_DROP_MEMBERSHIP case.

With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.

Found by:	ermal (with pfSense)
MFC after:	3 days
2010-04-10 12:05:31 +00:00
Luigi Rizzo
368a605202 This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.
2010-04-09 18:02:19 +00:00
Luigi Rizzo
c11e54acfc no need to pass an argument to dn_compat_calc_size()
MFC after:	3 days
2010-04-09 16:06:53 +00:00
Luigi Rizzo
7f0de52d2c Hopefully fix the recent breakage in rule deletion.
A few  more tests and this will also go into -stable where
the problem is more critical.
2010-04-07 08:23:58 +00:00
Michael Tuexen
aed5947cd0 Fix a off-by-one bug in zeroing out the mapping arrays.
Fix sctp_print_mapping_array().

MFC after: 1 week
2010-04-06 18:57:50 +00:00
Michael Tuexen
c1589eec14 Use also SCTP/IPv6 checksum offloading in special cases.
MFC after: 2 weeks
2010-04-03 23:51:41 +00:00
Michael Tuexen
b5c164935e * Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
  FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
2010-04-03 15:40:14 +00:00
Xin LI
b80d1bf60e Add definition of IPv6 mobility header's protocol number, as assigned by
IANA and defined in RFC 3775.

Obtained from:	KAME
2010-03-31 23:02:25 +00:00
Luigi Rizzo
af84b6f8a7 fix bug in previous commit related to rule deletion
(stable/8 just fixed moments ago)
2010-03-31 02:20:22 +00:00
Luigi Rizzo
10afb58b81 remove a leftover debugging message 2010-03-29 12:27:49 +00:00
Luigi Rizzo
296ec631be Fix handling of set manipulations.
This patch has two fixes for potential kernel panics (one wrong
index, one access to the wrong lock) and two fixes to wrong logic
in a conditional. The potential panics are also on stable/8,
so I am going to MFC the fix quickly.
2010-03-29 12:19:23 +00:00
Randall Stewart
ff014514ee Adds the option of keeping per-cpu statistics in SCTP. This
may be useful since it gets rid of atomics but I want it to
remain an option until I can do further testing on if it really
speeds things up.
2010-03-24 20:02:40 +00:00
Randall Stewart
7fa19ca6c1 lagging file I forgot to commit with my nr-sack fixes... opps
Reviewed by:	tuexen@freebsd.org
2010-03-24 20:01:14 +00:00
Randall Stewart
77acdc2565 Fix for NR-Sack code. The code was NOT working properly when
enabled. Basically most of the operations were incorrect causing
bad sacks when you enabled nr-sack. The fixes range across
4 files and unifiy most of the processing so that we only test
nr_sack flags to decide which type of sack to generate.

Optimization left for this is to combine the sack generation
code and make it capable of generating either sack thus shrinking
out a routine.

Reviewed by:	tuexen@freebsd.org
2010-03-24 19:45:36 +00:00
Luigi Rizzo
592a685e33 Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.

PR:		145004
2010-03-24 15:16:59 +00:00
Randall Stewart
0e13104de6 Fixes a bug where SACKs in the face of
mapping_array expansion would break. Basically
once we expanded the array we no longer had both
mapping arrays in sync which the sack processing code depends on.
This would mean we were randomly referring to memory that was probably
not there. This mostly just gave us bad sack results going back to the peer.
If INVARIENTS was on of course we would hit the panic routine in the sack_check
call.

We also add a print routine for the place where one would panic in
invarients so one can see what the main mapping array holds.

Reviewed by: tuexen@freebsd.org
MFC after:	2 weeks
2010-03-23 01:36:50 +00:00
Kip Macy
3059584e2a - boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
  when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
  allocating again until below 12.5%

MFC after:	7 days
2010-03-22 23:04:12 +00:00
Luigi Rizzo
3b4d8b3f7a Add a priority-based packet scheduler.
Sponsored by:	The ONELAB2 Project
Submitted by:	Riccardo Panicucci
2010-03-21 16:30:32 +00:00
Luigi Rizzo
b4eacea680 no need for ipfw_flush_tables(), we just need ipfw_destroy_tables() 2010-03-21 15:54:07 +00:00
Luigi Rizzo
2baa9be5d7 revise documentation 2010-03-21 15:52:55 +00:00
Kip Macy
87aedea449 - spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1
- don't default to acquiring tcbinfo lock exclusively in rexmt

MFC after:	7 days
2010-03-20 19:47:30 +00:00
Bjoern A. Zeeb
d0e157f6aa Add pcb reference counting to the pcblist sysctl handler functions
to ensure type stability while caching the pcb pointers for the
copyout.

Reviewed by:	rwatson
MFC after:	7 days
2010-03-17 18:28:27 +00:00
Luigi Rizzo
0804384f1d small fixes to estimate the buffer size when requesting all pipes/flows. 2010-03-15 18:09:21 +00:00
Luigi Rizzo
f9f7bde3bc + implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
  pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
  enabled with the command

        queue X config mask queue ...

  and makes it possible to support priority-based schedulers, where
  packets should be grouped according to the priority and not some
  fields in the 5-tuple.
  This is implemented as follows:
  - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
    without changing the size or shape of the structure, so there are
    no ABI changes. On passing, also document how other fields are
    used, and remove some useless assignments in ip_fw2.c

  - implement small changes in the userland code to set/read the field;

  - revise the functions in ip_dummynet.c to manipulate masks so they
    also handle the additional field;

There are no ABI changes in this commit.
2010-03-15 17:14:27 +00:00
Robert Watson
9bcd427b89 Abstract out initialization of most aspects of struct inpcbinfo from
their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and
create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy()
to do this work in a central spot.  As inpcbinfo becomes more complex
due to ongoing work to add connection groups, this will reduce code
duplication.

MFC after:      1 month
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-14 18:59:11 +00:00
Randall Stewart
1966e5b5a1 The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR:		144529
MFC after:	2 weeks
2010-03-12 22:58:52 +00:00
Kip Macy
d4121a02c0 - restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
  (e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after:	7 days
2010-03-12 05:03:26 +00:00
Luigi Rizzo
5007b59f26 implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.
2010-03-11 22:42:33 +00:00
Luigi Rizzo
642dddf0f8 fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by:	Riccardo Panicucci
2010-03-10 14:21:05 +00:00
Qing Li
c7ea0aa648 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
Luigi Rizzo
feadd2b1ca cosmetic changes and C++ compatibility 2010-03-08 11:27:39 +00:00
Luigi Rizzo
d12cc63303 don't use C++ keywords as variable names 2010-03-08 11:27:08 +00:00
Luigi Rizzo
b854138d5f do not report an error unnecessarily 2010-03-08 11:22:47 +00:00
Bjoern A. Zeeb
376aadf896 Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet.
We will still leak pages (especially for zones marked NOFREE).

Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC after:	5 days
2010-03-07 15:58:44 +00:00
Bjoern A. Zeeb
e253cdd07c Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00
Robert Watson
1f821c53f0 Locking the tcbinfo structure should not be necessary in tcp_timer_delack(),
so don't.

MFC after:      1 week
Reviewed by:    bz
Sponsored by:   Juniper Networks
2010-03-07 14:23:44 +00:00
Robert Watson
2bf3ce088d Add comment in tcp_discardcb() talking about how we don't, but should,
address TCP races relating to not calling tcp_drain() on stopped callouts.

Discussed with:	bz
2010-03-07 14:13:59 +00:00
Robert Watson
68b5629bf5 Make udp_set_kernel_tunneling() less forgiving when its invariants are
violated: so_pcb can never be NULL for a valid UDP socket, and it is
always SOCK_DGRAM.  Use sotoinpcb() as the rest of the UDP code does.

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-07 10:47:47 +00:00
Robert Watson
1d7429e0a9 Remove unnecessary locking of divcbinfo lock from div_output(): this has not
been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was
guaranteed to be stable when a valid socket reference is held (as it is in
the output path).

MFC after:	1 week
Reviewed by:	bz
Sponsored by:	Juniper Networks
2010-03-06 22:04:45 +00:00
Robert Watson
8296cddfdd Add a comment to tcp_usr_accept() to indicate why it is we acquire the
tcbinfo lock there: r175612, which re-added it, masked a race between
sonewconn(2) and accept(2) that could allow an incompletely initialized
address on a newly-created socket on a listen queue to be exposed.  Full
details can be found in that commit message.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-06 21:38:31 +00:00
Bjoern A. Zeeb
391dab1c2d Destroy UDP UMA zones (empty or not) upon network stack teardown
to not leak them making the VM subsystem unhappy with every stoped vnet(*).
We will still leak pages (especially as zones are marked NOFREE).

(*) This will also keep vmstat -z more usable.

Sponsored by:	ISPsystem
MFC after:	5 days
2010-03-06 21:24:32 +00:00
Robert Watson
66f80e90ef Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE()
to match other pcbinfo locking macros.

MFC after:	1 week
2010-03-06 21:24:11 +00:00
Luigi Rizzo
67d079f342 plug a memory leak on pipe's reconfiguration 2010-03-05 17:53:28 +00:00
Luigi Rizzo
6a82d14731 fix a memory leak when deleting RED queues 2010-03-05 12:58:19 +00:00
Luigi Rizzo
b05934e2cb portability fixes 2010-03-04 21:52:40 +00:00
Luigi Rizzo
ae8b199313 don't use keywords as variable names. 2010-03-04 21:01:59 +00:00
Luigi Rizzo
44e510399b use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by:	Francesco Magno
2010-03-04 16:53:38 +00:00
Luigi Rizzo
6aada3117b improve compatibility with RELENG_7.2 2010-03-04 16:52:26 +00:00
Luigi Rizzo
cc4d3c30ea Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
Joel Dahl
7df6f59359 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-01 17:05:46 +00:00
Bjoern A. Zeeb
aa3f803697 Upon virtual network stack teardown properly release the TCP syncache
resources.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 21:45:04 +00:00
Michael Tuexen
7b470fc31c Fix handling of SHUTDOWN-ACK chunk in COOKIE_WAIT and COOKIE_ECHOED.
MFC after: 1 week
2010-02-20 20:30:40 +00:00
Bjoern A. Zeeb
9802380e41 Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().

Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.

Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC After:	5 days
2010-02-20 19:59:52 +00:00
Michael Tuexen
7291848a0b * Fix another u_long -> uint32_t issue.
* Remove an unused global variable.
* Fix an issue reported by Bruce Cran related to reusing SCTP socket which
  where connected.

MFC after: 1 week
2010-02-19 18:00:38 +00:00
Pawel Jakub Dawidek
957d68dd91 No need to include security/mac/mac_framework.h here. 2010-02-18 22:26:01 +00:00
Michael Tuexen
63eda93d1a Use uint32_t instead of u_long.
MFC after: 1 week
2010-02-18 13:46:54 +00:00
Luigi Rizzo
27c9c97a3e remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone
2010-02-17 13:06:06 +00:00
Michael Tuexen
8d9d061323 Add missing SCTP_PACKED. Spotted by Irene Ruengeler.
MFC after: 1 week
2010-02-13 21:38:15 +00:00
Bjoern A. Zeeb
fffb9f1d9c Properly free resources when destroying the TCP hostcache while
tearing down a network stack (in the VIMAGE jail+vnet case).

For that break out the logic from tcp_hc_purge() into an internal
function we can call from both, the sysctl handler and the
tcp_hc_destroy().

Sponsored by:	ISPsystem
Reviewed by:	silby, lstewart
MFC After:	8 days
2010-02-09 21:31:53 +00:00
Michael Tuexen
f1150dc0a5 Restore the checksum received before processing the packet.
MFC after: 1 week
2010-02-04 21:02:29 +00:00
Qing Li
d577d18a00 Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by:	avg
MFC after:	5 days
2010-02-02 20:38:30 +00:00
Luigi Rizzo
dc5fd2595c use u_char instead of u_int for short bitfields.
For our compiler the two constructs are completely equivalent, but
some compilers (including MSC and tcc) use the base type for alignment,
which in the cases touched here result in aligning the bitfields
to 32 bit instead of the 8 bit that is meant here.

Note that almost all other headers where small bitfields
are used have u_int8_t instead of u_int.

MFC after:	3 days
2010-02-01 14:13:44 +00:00
Michael Tuexen
663fdad84b Use [] instead of [0] for flexible arrays.
Obtained from: Bruce Cran
MFC after: 1 week
2010-01-22 07:53:41 +00:00
Michael Tuexen
cd55430963 Get rid of a lot of duplicated code for NR-SACK handle.
Generalize the SACK to code handle also NR-SACKs.
2010-01-17 21:00:28 +00:00
Randall Stewart
e34b217f91 Bug fix: If the allocation of a socket failed and we
freed the inpcb, it was possible to not set the
proper flags on the pcb (i.e. the socket is not there).
This is HIGHLY unlikely since no one else should be
able to find the socket.. but for consistency we
do the proper loop thing to make sure that we
mark the socket as gone on the PCB.
2010-01-17 19:47:59 +00:00
Randall Stewart
0812a4d5e6 Pulls out another leaked windows ifdef that somehow
made its way through the scrubber.
2010-01-17 19:40:21 +00:00
Randall Stewart
a10c3242c7 This change syncs up the socketAPI stream-reset
values to match those in linux and the I-D
just released to the IETF.
2010-01-17 19:35:38 +00:00
Randall Stewart
92cf719944 More leaked ifdefs for APPLE and its mobility stuff. 2010-01-17 19:24:30 +00:00
Randall Stewart
33141385fc Remove another set of "leaked" ifdefs that somehow found
their way into FreeBSD.
2010-01-17 19:21:50 +00:00
Randall Stewart
58ac2d97b7 Remove strange APPLE define that leaked
through the scrubber scripts. Scripts are
now fixed so this won't happen again.
2010-01-17 19:17:16 +00:00
Bjoern A. Zeeb
4dcc55a363 Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with:	rwatson
MFC after:	5 days
2010-01-17 13:07:52 +00:00
Bjoern A. Zeeb
592bcae802 Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by:	jamie, hrs (ipv6 part)
Pointed out by:	hrs [1]
MFC After:	2 weeks
Asked for by:	Jase Thew (bazerka beardz.net)
2010-01-17 12:57:11 +00:00
Hajimu UMEMOTO
416458131a Change 'me' to match any IPv6 address configured on an interface in
the system as well as any IPv4 address.

Reviewed by:	David Horn <dhorn2000__at__gmail.com>, luigi, qingli
MFC after:	2 weeks
2010-01-17 08:39:48 +00:00
Michael Tuexen
5661a9ed70 Get rid of support of an old version of the SCTP-AUTH draft.
Get rid of unused MD5 code.

MFC after: 1 week
2010-01-16 20:04:17 +00:00
Qing Li
646c800540 Ensure an address is removed from the interface address
list when the installation of that address fails.

PR:		139559
2010-01-08 17:49:24 +00:00
Ruslan Ermilov
acc0fee071 Complete the swap of carp(4) log levels and document the change.
MFC after:	3 days
2010-01-08 16:14:41 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Luigi Rizzo
5afa29b41a we don't use dummynet_drain! 2010-01-07 13:53:47 +00:00
Luigi Rizzo
59a613b14d check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO
2010-01-07 12:00:54 +00:00
Luigi Rizzo
b2019e1789 Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.
2010-01-07 10:39:15 +00:00
Luigi Rizzo
62081e0f8d some header shuffling to help decoupling ip_divert from ipfw 2010-01-07 10:08:05 +00:00
Luigi Rizzo
eb6842e2a9 put ip_len in correct order for ip_output().
This prevents a panic when ipfw generates packets on its own
(such as reject or keepalives for dynamic rules).

Reported by: Chagin Dmitry
2010-01-07 09:28:17 +00:00
Luigi Rizzo
c95477dfa1 this file does not require ip_dummynet.h 2010-01-05 11:00:31 +00:00
Qing Li
ee8a75d320 An existing incomplete ARP entry would expire a subsequent
statically configured entry of the same host. This bug was
due to the expiration timer was not cancelled when installing
the static entry. Since there exist a potential race condition
with respect to timer cancellation, simply check for the
LLE_STATIC bit inside the expiration function instead of
cancelling the active timer.

MFC after:	5 days
2010-01-05 00:35:46 +00:00
Luigi Rizzo
7173b6e554 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
Michael Tuexen
f5366806c6 Correct usage of parenthesis.
PR:	kern/142066
Approved by: rrs (mentor)
Obtained from: Henning Petersen, Bruce Cran.
MFC after: 2 weeks
2010-01-04 18:25:38 +00:00
Navdeep Parhar
567145993f Avoid NULL dereference in arpresolve. 2010-01-03 06:43:13 +00:00
Qing Li
ccbb9c359d Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.

MFC after:	5 days
2009-12-30 22:13:01 +00:00
Qing Li
c7ab66020f The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00
Shteryana Shopova
7c90b0258f Make sure the multicast forwarding cache entry's stall queue is properly
initialized before trying to insert an entry into it.

PR:		kern/142052
Reviewed by:	bms
MFC after:	now
2009-12-30 08:52:13 +00:00
Luigi Rizzo
bcd3b68dd2 we really need htonl() here, see the comment a few lines above in the code. 2009-12-29 00:02:57 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Bjoern A. Zeeb
fc74d005d9 Make the compiler happy after r201125:
- + remove two unnecessary initializations in ip_output;
+ + remove one unnecessary initializations in ip_output;
2009-12-28 21:14:18 +00:00
Luigi Rizzo
ec396e61ed introduce a local variable rte acting as a cache of ro->ro_rt
within ip_output, achieving (in random order of importance):
- a reduction of the number of 'r's in the source code;
- improved legibility;
- a reduction of 64 bytes in the .text
2009-12-28 14:48:32 +00:00
Luigi Rizzo
ca8b83b0fa + remove an unused #define print_ip;
+ remove two unnecessary initializations in ip_output;
+ localize 'len';
+ introduce a temporary variable n to count the number of fragments,
  the compiler seems unable to identify a common subexpression
  (written 3 times, used twice);
+ document some assumptions on ip_len and ip_hl
2009-12-28 14:09:46 +00:00
Luigi Rizzo
e59084e086 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
Luigi Rizzo
830c6e2b97 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
Luigi Rizzo
6cc7b9f5d9 readability fixes -- add braces on large blocks, remove unnecessary
initializations
2009-12-28 10:19:53 +00:00
Luigi Rizzo
6730dcaec7 explain details of operation of table lookups, and improve portability 2009-12-28 10:12:35 +00:00
Luigi Rizzo
2082ecd966 diverted packet must re-enter _after_ the matching rule,
or we create loops.
The divert cookie (that can be set from userland too)
contains the matching rule nr, so we must start from nr+1.

Reported by: Joe Marcus Clarke
2009-12-27 10:19:10 +00:00
Luigi Rizzo
4a3c1bd27f fix poor indentation resulting from a merge 2009-12-24 17:35:28 +00:00
Luigi Rizzo
84918f5bc8 mostly style changes, such as removal of trailing whitespace,
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.

As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.
2009-12-23 18:53:11 +00:00
Luigi Rizzo
3ae19c3ba3 fix build with the new fast lookup structure.
Also remove some unnecessary headers
2009-12-23 12:15:21 +00:00
Luigi Rizzo
6aab896346 fix build on 64-bit architectures.
Also fix the indentation on a few lines.
2009-12-23 12:00:50 +00:00
Luigi Rizzo
de240d1013 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
John Baldwin
43d9473499 - Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove
the leading underscores since they are now implemented.
- Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info
  structure.

Reviewed by:	rwatson
MFC after:	2 weeks
2009-12-22 15:47:40 +00:00
Luigi Rizzo
46fdc2bf60 some mostly cosmetic changes in preparation for upcoming work:
+ in many places, replace &V_layer3_chain with a local
  variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
  replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures
2009-12-22 13:53:34 +00:00
Ruslan Ermilov
bec5f27f73 Added proper attribution.
Requested by:	luigi
2009-12-18 17:22:21 +00:00
Luigi Rizzo
1328a38b96 Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

	ipfw add 5000 count log ....

and run
	tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.
2009-12-17 23:11:16 +00:00
Luigi Rizzo
60ab046a41 simplify and document lookup_next_rule() 2009-12-17 17:27:12 +00:00
Luigi Rizzo
59cd9f65f9 simplify the code that finds the next rule after reinjections
MFC after:	1 week
2009-12-17 12:27:54 +00:00
Luigi Rizzo
53638988bc remove a duplicate sysctl entry 2009-12-16 18:03:35 +00:00
Luigi Rizzo
1b5691c61e bring back a couple of #include that are supplied by nesting,
and explain why they are used.
2009-12-16 13:00:37 +00:00
Luigi Rizzo
97219abf05 Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
  static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
  (the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
  slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after:	1 month
2009-12-16 10:48:40 +00:00
Warner Losh
26bbc1fc5a Quick fix to make this compile:
Remove redundant extern declearations.
If the maintainer has a better fix, then feel free to back this out.
2009-12-16 03:26:37 +00:00
Luigi Rizzo
22f123afad more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).

Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.

MFC after:	1 month
2009-12-15 21:24:12 +00:00
Luigi Rizzo
70228fb346 Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
	now references the two new source files

netinet/ip_fw.h
	remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
	new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
	ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
	code related to dynamic rules

netinet/ipfw/ip_fw2.c
	removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
	minor rearrangement to remove LOOKUP_NAT from the
	main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after:	1 month
2009-12-15 16:15:14 +00:00
Luigi Rizzo
472099c4b0 implement a new match option,
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N

which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.

Work supported by the Onelab project.

MFC after:	1 week
2009-12-15 09:46:27 +00:00
Bjoern A. Zeeb
de0bd6f76b Throughout the network stack we have a few places of
if (jailed(cred))
left.  If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that,  as it is used
outside the network stack as well.

Discussed with:	julian, zec, jamie, rwatson (back in Sept)
MFC after:	5 days
2009-12-13 13:57:32 +00:00
Luigi Rizzo
b2089673e5 use div64 when converting back the burst value for userland 2009-12-10 18:37:14 +00:00
Luigi Rizzo
89717f91ef when draining a flowset free the entire chain, not just one packet. 2009-12-10 18:34:07 +00:00
Luigi Rizzo
478cae8a97 centralize the code to free a packet (or a chain) while in dummynet.
Remove an old macro and its stale comment.
2009-12-10 15:17:34 +00:00
Oleg Bulyzhin
22746035ec Fix burst processing for WF2Q pipes - do not increase available burst size
unless pipe is idle. This should fix follwing issues:
- 'dummynet: OUCH! pipe should have been idle!' log messages.
- exceeding configured pipe bandwidth.

MFC after:	1 week
2009-12-05 23:27:21 +00:00
Luigi Rizzo
f573a0a634 adjust comment in previous commit after Julian's explanation 2009-12-05 11:51:32 +00:00
Luigi Rizzo
bc0d5982e2 remove a dead block of code, document how the ipfw clients are
hooked and the difference in handling the 'enable' variable
for layer2 and layer3. The latter needs fixing once i figure out
how it worked pre-vnet.

MFC after:	7 days
2009-12-05 09:13:06 +00:00
Luigi Rizzo
e99816f1eb fix build with VNET enabled
Reported by: David Wolfskill
2009-12-05 08:32:12 +00:00
Hajimu UMEMOTO
2ea64e8ef9 Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard
coded number.

Spotted by:	bz
2009-12-04 15:39:37 +00:00
Luigi Rizzo
4f60c0b97d preparation work to replace the monster switch in ipfw_chk() with
table of functions.

This commit (which is heavily based on work done by Marta Carbone
in this year's GSOC project), removes the goto's and explicit
return from the inner switch(), so we will have a easier time when
putting the blocks into individual functions.

MFC after:	3 weeks
2009-12-03 14:22:15 +00:00
Hajimu UMEMOTO
a22e82b87b Teach an IPv6 to the debug prints. 2009-12-03 11:16:53 +00:00
Luigi Rizzo
3c95089ef4 - initialize src_ip in the main loop to prevent a compiler warning
(gcc 4.x under linux, not sure how real is the complaint).
- rename a macro argument to prevent name clashes.
-  add the macro name on a couple of #endif
- add a blank line for readability.

MFC after:	3 days
2009-12-02 17:50:52 +00:00
Luigi Rizzo
3429911d4d Dispatch sockopt calls to ipfw and dummynet
using the new option numbers, IP_FW3 and IP_DUMMYNET3.
Right now the modules return an error if called with those arguments
so there is no danger of unwanted behaviour.

MFC after:	3 days
2009-12-02 15:50:43 +00:00
Luigi Rizzo
0a13f6b1b3 small changes for portability and diff reduction wrt/ FreeBSD 7.
No functional differences.

- use the div64() macro to wrap 64 bit divisions
  (which almost always are 64 / 32 bits) so they are easier
  to handle with compilers or OS that do not have native
  support for 64bit divisions;

- use a local variable for p_numbytes even if not strictly
  necessary on HEAD, as it reduces diffs with FreeBSD7

- in dummynet_send() check that a tag is present before
  dereferencing the pointer.

- add a couple of blank lines for readability near the end of a function

MFC after:	3 days
2009-12-02 15:20:31 +00:00
Hajimu UMEMOTO
fd63c04193 Teach an IPv6 to send_pkt() and ipfw_tick().
It fixes the issue which keep-alive doesn't work for an IPv6.

PR:		kern/117234
Submitted by:	mlaier, Joost Bekkers <joost__at__jodocus.org>
MFC after:	1 month
2009-12-02 14:32:01 +00:00
Gleb Smirnoff
e81ab87652 Until this moment carp(4) used a strange logging priority. It used debug
priority for such important information as MASTER/BACKUP state change,
and used a normal logging priority for such innocent messages as receiving
short packet (which is a normal VRRP packet between some other routers) or
receving a CARP packet on non-carp interface (someone else running CARP).

This commit shifts message logging priorities to a more sane default.
2009-12-02 13:24:21 +00:00
Luigi Rizzo
de9fc6bcd4 Add new sockopt names for ipfw and dummynet.
This commit is just grabbing entries for the new names
that will be used in the future, so you don't need to
rebuild anything now.

MFC after:	3 days
2009-12-02 10:36:41 +00:00
Luigi Rizzo
9565806f16 change the type of the opcode from enum *:8 to u_int8_t
so the size and alignment of the ipfw_insn is not compiler dependent.
No changes in the code generated by gcc.

There was only one instance of this kind in our entire source tree,
so i suspect the old definition was a poor choice (which i made).

MFC after:	3 days
2009-12-02 08:52:06 +00:00
Michael Tuexen
dec7fa27c6 Use the default stack size for the iterator thread.
This fixes a crash reported by Irene Ruengeler.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-27 17:25:19 +00:00
Bruce M Simpson
a8cf681de2 Correct a comment.
MFC after:	1 day
2009-11-19 13:21:37 +00:00
Michael Tuexen
7e6206af12 Fix a bug where the system panics when a SHUTDOWN is received with an
illegal TSN.

Approved by: rrs (mentor)
MFC after: ASAP
2009-11-18 12:17:06 +00:00
Michael Tuexen
0e891bcdc1 Get rid of unused fields addr_over which is never really used,
only copied around.

Approved by: rrs (mentor)
2009-11-17 23:03:38 +00:00
Michael Tuexen
83fc1165c5 Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY,
which is defined as LIST_EMPTY.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 20:56:14 +00:00
Michael Tuexen
2ab6846a23 Fix a bug where queued ASCONF messags are not sent out.
Approved by: rrs (mentor)
Obtained from:	Irene Ruengeler
MFC after: 1 month
2009-11-17 13:36:21 +00:00
Michael Tuexen
b6c5780299 Fix a memory leak when destroying an SCTP stack.
Clean up sctp_pcb_finish().
Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:13:58 +00:00
Michael Tuexen
87b4fcd323 Do not start the iterator when there are no associations.
This fixes a bug found by Irene Ruengeler.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:11:23 +00:00
Michael Tuexen
1e01164145 Disable (temporary) the thread based interator. It does not work with vnet.
Approved by: rrs (mentor)
2009-11-17 13:09:50 +00:00
Michael Tuexen
cf458c646d Allow the UMA to free data. This resolves the UMA related bug reported
by Julian.

Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:08:15 +00:00
Michael Tuexen
7a9b5b2040 Do not hold the lock longer than necessary.
Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 13:05:51 +00:00
Bruce M Simpson
793c70425a Fix a functional regression in multicast.
Userland daemons need to see IGMP traffic regardless of the group;
omit the imo filter check if the proto is IGMP. The kernel part
of IGMP will have already filtered appropriately at this point.

MFC after:      ASAP
Submitted by:   Franz Struwig
Reported by:    Ivor Prebeg, Franz Struwig
2009-11-15 11:07:22 +00:00
Attilio Rao
758801232c Move inet_aton() (specular to inet_ntoa(), already present in libkern)
into libkern in order to made it usable by other modules than alias_proxy.

Obtained from:	Sandvine Incorporated
Sponsored by:	Sandvine Incorporated
MFC:		1 week
2009-11-12 00:46:28 +00:00
Edward Tomasz Napierala
4f7418a09f Remove ifdefed out part of code, which seems to have originated a decade ago
in OpenBSD.  As it is now, there is no way for this to be useful, since IPsec
is free to forward packets via whatever interface it wants, so checking
capabilities of the interface passed from ip_output (fetched from the routing
table) serves no purpose.

Discussed with:	sam@
2009-11-09 19:53:34 +00:00
Oleg Bulyzhin
57edc1bbf3 style(9): add missing parentheses 2009-11-09 09:12:45 +00:00
John Baldwin
c6d9480519 Several years ago a feature was added to TCP that casued soreceive() to
send an ACK right away if data was drained from a TCP socket that had
previously advertised a zero-sized window.  The current code requires the
receive window to be exactly zero for this to kick in.  If window scaling is
enabled and the window is smaller than the scale, then the effective window
that is advertised is zero.  However, in that case the zero-sized window
handling is not enabled because the window is not exactly zero.  The fix
changes the code to check the raw window value against zero.

Reviewed by:	bz
MFC after:	1 week
2009-11-06 16:55:05 +00:00
Oleg Bulyzhin
5661377e37 Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC after:	3 days
2009-11-03 08:41:14 +00:00
Michael Tuexen
08abf6399a Improve round robin stream scheduler and cleanup some code.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-29 17:40:33 +00:00
Christian Brueffer
621882f0bc Close a stream file descriptor leak.
PR:		138130
Submitted by:	Patroklos Argyroudis <argp@census-labs.com>
MFC after:	1 week
2009-10-28 12:10:29 +00:00
Michael Tuexen
d18f7e0a98 Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-27 18:17:07 +00:00
Michael Tuexen
ac9bce0f3b Improve the round robin stream scheduler.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-26 19:23:34 +00:00
Robert Watson
99b96cf934 Correct spelling typo in ip_input comment.
Pointed out by:	N.J. Mann <njm at njm.me.uk>,
		John Nielsen <john at jnielsen.net>, julian (!), lstewart
MFC after:	2 days
2009-10-24 09:18:26 +00:00
Qing Li
6cb2b4e7a8 Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
MFC after:	3 days
2009-10-23 18:27:34 +00:00
Robert Watson
0d3d0d74ea Improve grammar in ip_input comment while attempting to maintain what
might be its meaning.

MFC after:	3 days
2009-10-23 13:35:00 +00:00
Qing Li
fc02323563 In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
MFC after:	immediately
Verified by:	rnoland, yongari
2009-10-20 17:55:42 +00:00
Robert Watson
6426657e9f Rewrap ip_input() comment so that it prints more nicely.
MFC after:	3 days
2009-10-18 11:23:56 +00:00
Qing Li
93704ac5d7 This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.

Reviewed by:	kmacy
MFC after:	3 days
2009-10-15 06:12:04 +00:00
Bjoern A. Zeeb
852da713c3 Compare pointer to NULL rather than 0.
MFC after:	1 month
2009-10-13 20:29:14 +00:00
Michael Tuexen
f71e78a1d9 Fix a race condition where a mutex was destroyed while sleeping on it.
Found while analyzing a report from julian. It might fix his bug.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-11 12:23:56 +00:00
Julian Elischer
0b4b0b0fee Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
Michael Tuexen
45623593fb Correct include order as indicated by bz.
Approved by: re (mentor)
MFC after: 3 days
2009-10-10 13:59:18 +00:00
Michael Tuexen
3b1de911e0 Do not include vnet.h twice.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-09 19:30:23 +00:00
Michael Tuexen
9dd512290c Use correct arguments when calling SCTP_RTALLOC().
Approved by: rrs (mentor)
MFC after: 0 days
2009-10-08 20:33:12 +00:00
Randall Stewart
806a5b8414 Fix so that round robing stream scheduling works as advertised
MFC after:	0 days
2009-10-08 11:36:06 +00:00
Robert Watson
f681a5fdd4 Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.

MFC after:	3 days
2009-10-06 20:35:41 +00:00
Robert Watson
883e9bc41d In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order.  In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.

MFC after:	3 days
Reported by:	Kamigishi Rei <spambox at haruhiism.net>
Reviewed by:	bz
2009-10-05 22:24:13 +00:00
Qing Li
b4a22c365c Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after:	immediately
2009-10-02 01:45:11 +00:00