Commit Graph

3899 Commits

Author SHA1 Message Date
Michael Tuexen
9c7635e18b Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.

MFC after: 4 weeks
2010-08-29 18:50:30 +00:00
Michael Tuexen
e24ea413e0 Bugfix: Do not send a packet drop report in response to a received
INIT-ACK with incorrect CRC.
2010-08-28 21:15:00 +00:00
Michael Tuexen
20083c2eb1 Fix the switching on/off of CMT using sysctl and socket option.
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.

MFC after: 4 weeks
2010-08-28 17:59:51 +00:00
John Baldwin
98b9eb0db2 Simplify the tcp pcblist estimate logic slightly.
MFC after:	3 days
2010-08-27 18:17:46 +00:00
Andre Oppermann
8502ec25dc Use timestamp modulo comparison macro for automatic receive buffer
scaling to correctly handle wrapping of ticks value.

MFC after:	1 week
2010-08-27 12:34:53 +00:00
Ana Kukec
1db8d1f843 MFp4: anchie_soc2009 branch:
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space.  The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by:	bz (mentor)
2010-08-19 11:31:03 +00:00
Andre Oppermann
c3f0bdc66b If a TCP connection has been idle for one retransmit timeout or more
it must reset its congestion window back to the initial window.

RFC3390 has increased the initial window from 1 segment to up to
4 segments.

The initial window increase of RFC3390 wasn't reflected into the
restart window which remained at its original defaults of 4 segments
for local and 1 segment for all other connections.  Both values are
controllable through sysctl net.inet.tcp.local_slowstart_flightsize
and net.inet.tcp.slowstart_flightsize.

The increase helps TCP's slow start algorithm to open up the congestion
window much faster.

Reviewed by:	lstewart
MFC after:	1 week
2010-08-18 18:05:54 +00:00
Andre Oppermann
b7d747ecec Untangle the net.inet.tcp.log_in_vain and net.inet.tcp.log_debug
sysctl's and remove any side effects.

Both sysctl's share the same backend infrastructure and due to the
way it was implemented enabling net.inet.tcp.log_in_vain would also
cause log_debug output to be generated.  This was surprising and
eventually annoying to the user.

The log output backend is kept the same but a little shim is inserted
to properly separate log_in_vain and log_debug and to remove any side
effects.

PR:		kern/137317
MFC after:	1 week
2010-08-18 17:39:47 +00:00
Bjoern A. Zeeb
2278f9927d When calculating the expected memory size for userspace, also take the
number of syncache entries into account for the surplus we add to account
for a possible increase of records in the re-entry window.

Discussed with:		jhb, silby
MFC after:		1 week
2010-08-18 09:28:12 +00:00
John Baldwin
c007b96a78 Ensure a minimum "slop" of 10 extra pcb structures when providing a
memory size estimate to userland for pcb list sysctls.  The previous
behavior of a "slop" of n/8 does not work well for small values of n
(e.g. no slop at all if you have less than 8 open UDP connections).

Reviewed by:	bz
MFC after:	1 week
2010-08-17 16:41:16 +00:00
Andre Oppermann
e4e9266071 Fix the interaction between 'ICMP fragmentation needed' MTU updates,
path MTU discovery and the tcp_minmss limiter for very small MTU's.

When the MTU suggested by the gateway via ICMP, or if there isn't
any the next smaller step from ip_next_mtu(), is lower than the
floor enforced by net.inet.tcp.minmss (default 216) the value is
ignored and the default MSS (512) is used instead.  However the
DF flag in the IP header is still set in tcp_output() preventing
fragmentation by the gateway.

Fix this by using tcp_minmss as the MSS and clear the DF flag if
the suggested MTU is too low.  This turns off path MTU dissovery
for the remainder of the session and allows fragmentation to be
done by the gateway.

Only MTU's smaller than 256 are affected.  The smallest official
MTU specified is for AX.25 packet radio at 256 octets.

PR:		kern/146628
Tested by:	Matthew Luckie <mjl-at-luckie org nz>
MFC after:	1 week
2010-08-15 13:25:18 +00:00
Andre Oppermann
0e678ed825 Initializing the new error variable to zero in syncache_socket()
is not necessary.

Noticed by:	bz
2010-08-15 13:07:08 +00:00
Andre Oppermann
943044b01f Add more logging points for failures in syncache_socket() to
report when a new socket couldn't be created because one of
in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.

Logging is conditional on net.inet.tcp.log_debug being enabled.

MFC after:	1 week
2010-08-15 09:30:13 +00:00
Andre Oppermann
153e5b57af When using TSO and sending more than TCP_MAXWIN sendalot is set
and we loop back to 'again'.  If the remainder is less or equal
to one full segment, the TSO flag was not cleared even though
it isn't necessary anymore.  Enabling the TSO flag on a segment
that doesn't require any offloaded segmentation by the NIC may
cause confusion in the driver or hardware.

Reset the internal tso flag in tcp_output() on every iteration
of sendalot.

PR:		kern/132832
Submitted by:	Renaud Lienhart <renaud-at-vmware com>
MFC after:	1 week
2010-08-14 21:41:33 +00:00
Andre Oppermann
40fe9eff47 Change the messages of the ICMP bad port bandwidth limiter from
a kernel printf to a log output with the priority of LOG_NOTICE.

This way the messages still show up in /var/log/messages but no
longer spam the console every other second on busy servers that
are port scanned:
 "Limiting open port RST response from 114 to 100 packets/sec"

PR:		kern/147352
Submitted by:	Eugene Grosbein <eugen-at-eg sd rdtc ru>
MFC after:	1 week
2010-08-14 21:04:27 +00:00
Andre Oppermann
bee4e5afa9 Disable TCP inflight limiter by default.
It was experimental and interferes with the normal congestion control
algorithms by instating a separate, possibly lower, ceiling for the
amount of data that is in flight to the remote host.  With high speed
internet connections the inflight limit frequently has been estimated
too low due to the noisy nature of the RTT measurements.

This code gives way for the upcoming pluggable congestion control
framework.  It is the task of the congestion control algorithm to
set the congestion window and amount of inflight data without external
interference.

Reviewed by:	lstewart
MFC after:	1 week
Removal after:	1 month
2010-08-14 20:40:55 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Xin LI
9fe5092de1 Address an edge condition that we found at work, where the carp(4)
interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at
cold boot.  This behavior is not observed when carp(4) interface
is created slightly later, when the underlying interface is fully
up.

Before this change what happen at boot is roughly:

 - ifconfig creates em0 interface;
 - ifconfig clones a carp device using em0;
   (em0's link state is DOWN at this point)
 - carp state: INIT -> BACKUP [*]
 - carp state: BACKUP -> MASTER
 - [Some negotiate between em0 and switch]
 - em0 kicks up link state change event
   (em0's link state is now up DOWN at this point)
 - do_link_state_change() -> carp_carpdev_state()
 - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+]
 - carp state: INIT -> BACKUP
 - carp state: BACKUP -> MASTER

At the [*] stage, em0 did not received any broadcast message from other
node, and assume our node is the master, thus carp(4) sets the link
state to "UP" after becoming a master.  At [+], the master status
is forcely set to "INIT", then an election is casted, after which our
node would actually become a master.

We believe that at the [*] stage, the master status should remain as
"INIT" since the underlying parent interface's link state is not up.

Obtained from:	iXsystems, Inc.
Reported by:	jpaetzel
MFC after:	2 months
2010-08-08 07:04:27 +00:00
Ed Schouten
367698346b Don't use struct timezone.
The timezone structure acquired by gettimeofday() is not used at all.
Just remove it.
2010-08-08 02:51:32 +00:00
Michael Tuexen
87a37484eb Fix a bug where endpoints bound to wildcard addresses where
using addresses not announced to the peer due to address
scoping.

MFC after: 3 weeks
2010-08-05 16:52:13 +00:00
Michael Tuexen
d2604d08d0 Cleanup code.
MFC after: 2 weeks
2010-08-01 08:06:59 +00:00
Bjoern A. Zeeb
19291ab3de Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer.  It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by:	rwatson
MFC after:	6 days
2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb
4579930d2e MFp4 @181628:
Free the rtentry after we diconnected it from the FIB and are counting
it as rttrash.  There might still be a chance we leak it from a different
code path but there is nothing we can do about this here.

Sponsored by:	ISPsystem (in February)
Reviewed by:	julian (in February)
MFC after:	2 weeks
2010-07-31 15:31:23 +00:00
Andre Oppermann
28a53f037a Fix a bug in syncache where the initial CWND for new incoming connections
was limited to one segment under the faulty assumption of a retransmit.
Due to this the opportunity to initialize the increased congestion window
according to RFC3390 was missed.

Support for RFC3465 introduced in r187289 uncovered the bug as the ACK
to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this
increase shouldn't happen as it's explicitly forbidden by RFC3390, but
it's another issue).  Snd_cwnd remains really small (1*MSS + 1) and this
causes really bad interaction with delayed acks on other side.

The variable name sc_rxmits is a bit misleading as it counts all transmits,
not just retransmits.

Submitted by:	Maxim Dounin <mdounin-at-mdounin-dot-ru>
MFC after:	10 days
2010-07-30 21:45:53 +00:00
Randall Stewart
753358d725 Fix the comment block that has the nice
table to really have the nice table :-)

MFC after:	1 month
2010-07-29 12:01:59 +00:00
Randall Stewart
44fbe46280 PR SCTP Bugs. Basically a full sized frame of
PR SCTP FWD-TSN's would not be sent and thus
cause a stalled connection. Also the rwnd
Calculation was also off on the receiver side for
PR-SCTP.
MFC after:	1 month
2010-07-29 11:37:04 +00:00
Gleb Smirnoff
b9bff254af Fix operation of "netgraph" action in conjunction with the
net.inet.ip.fw.one_pass sysctl.

The "ngtee" action is still broken.

PR:		kern/148885
Submitted by:	Nickolay Dudorov <nnd mail.nsk.ru>
2010-07-27 14:26:34 +00:00
Michael Tuexen
74e906fa94 Fix a bug where the length of a FORWARD-TSN chunk was set incorrectly in
the chunk. This resulted in malformed frames.
Remove a duplicate assignment.

MFC after: 2 weeks
2010-07-26 09:26:55 +00:00
Randall Stewart
8db924defb Make sure that we report chunks if a socket
still exists that were not sent. In either
case carefully remove the data if it does not
get taken by the reporting routines.

MFC after:	2 weeks
2010-07-26 09:22:52 +00:00
Randall Stewart
6c065bbe06 When counting the number of chunks in the
retransmission queue to validate the retran count, we
need to include the chunks in the control send queue
too. Otherwise the count will not match and you will get
the invarient warning if invarients are on.

MFC after:	2 weeks
2010-07-26 09:20:55 +00:00
Lawrence Stewart
79848522b5 - Move common code from the hook functions that fills in a packet node struct to
a separate inline function. This further reduces duplicate code that didn't
  have a good reason to stay as it was.

- Reorder the malloc of a pkt_node struct in the hook functions such that it
  only occurs if we managed to find a usable tcpcb associated with the packet.

- Make the inp_locally_locked variable's type consistent with the prototype of
  siftr_siftdata().

Sponsored by:	FreeBSD Foundation
2010-07-18 05:09:10 +00:00
Warner Losh
43e05a6523 machine/cpu.h isn't appropriate for this file,so remove it 2010-07-16 06:32:38 +00:00
Luigi Rizzo
71ad35a185 remove some conditional #ifdefs (no-op on FreeBSD);
run the timer routine on cpu 0.
2010-07-15 14:43:12 +00:00
Luigi Rizzo
297151a0f3 whitespace fixes 2010-07-15 14:37:59 +00:00
Luigi Rizzo
e6fef96ef4 fix a comment and final empty line 2010-07-15 14:37:02 +00:00
Lawrence Stewart
adc5f0109d The SIFTR DPCPU statistics struct was not being zeroed between enable/disable
cycles so the values would accumulate rather than reset for each cycle.

Sponsored by:	FreeBSD Foundation
2010-07-13 08:23:46 +00:00
Lawrence Stewart
985147dec6 Catch up with the rename of DPCPU_SUM to DPCPU_VARSUM in r209978.
Sponsored by:	FreeBSD Foundation
2010-07-13 07:00:57 +00:00
Gleb Smirnoff
281b584e8e Improve last commit: use bpf_mtap2() to avoiding stack usage.
Prodded by:	julian
2010-07-09 11:27:33 +00:00
Gleb Smirnoff
a5f9fc17c2 Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with
a stub m_hdr instead of a full mbuf.

PR:		kern/148050
2010-07-08 13:07:40 +00:00
Randall Stewart
478fbccb67 This fixes a crash in SCTP. It was possible to have a
large number of packets queued to a crashing process.
In a specific case you may get 2 ABORT's back (from
say two packets in flight). If the aborts happened to
be processed at the same time its possible to have
one free the association while the other is trying
to report all the outbound packets. When this occured
it could lead to a crash.

MFC after:	3 days
2010-07-03 14:03:31 +00:00
Lawrence Stewart
a5548bf685 Import the Statistical Information For TCP Research (SIFTR) kernel module into
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log
file, providing the ability to make highly granular measurements of TCP
connection state. The tool is aimed at system administrators, developers and
researchers alike. Please take it for a spin and test it out - the man page
should have all the information required to get you going.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

Sponsored by:	Cisco URP, FreeBSD Foundation
Reviewed by:	dwmalone, gnn, rpaulo
Tested by:	Many on freebsd-current@ and elsewhere over the years
MFC after:	1 month
2010-07-03 13:32:39 +00:00
Randall Stewart
606c58db25 Fix a bug that WILL cause a panic. Basically
a read-lock is being called to check the vtag-timewait cache.
Then in two cases (where a vtag is bad i.e. in the time-wait
state) the write-unlock is called NOT the read-unlock. Under
conditions where lots of associations are coming and going
this will cause the system to panic at some point.

MFC after:	3 days
2010-07-02 09:53:26 +00:00
Gleb Smirnoff
24536f92c5 After processing the O_SKIPTO opcode our cmd points to the next rule, and
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.

To fix this, exit the inner loop with the continue operator forcibly and
explicitly.

PR:		kern/147798
2010-06-29 16:57:30 +00:00
Michael Tuexen
370d524f00 Fix a bug I introduced in r209470.
MFC after: 3 days
2010-06-24 07:43:25 +00:00
Michael Tuexen
749c49ac62 * Implement sctp_does_stcb_own_this_addr() correclty. It was taking the
wrong side into account.
* sctp_findassociation_ep_addr() must check the local address if available.
This fixes a bug where ABORT chunks were accepted even in the case where
the local was not owned by the endpoint.
Thanks to brucec for pointing out a bug in my first version of the fix.
MFC after: 3 days
2010-06-23 15:19:07 +00:00
Michael Tuexen
cd1386ab50 Fix a rece condition in the shutdown handling.
The race condition resulted in a panic.

MFC after: 3 days
2010-06-18 09:01:44 +00:00
Michael Tuexen
fc066a6137 * Fix a bug where the length of the ASCONF-ACK was calculated wrong due
to using an uninitialized variable.
* Fix a bug where a NULL pointer was dereferenced when interfaces
  come and go at a high rate.
* Fix a bug where inps where not deregistered from iterators.
* Fix a race condition in freeing an association.
* Fix a refcount problem related to the iterator.
Each of the above bug results in a panic. It shows up when
interfaces come and go at a high rate.

Obtained from: rrs (partly)
MFC after: 3 days
2010-06-14 21:25:07 +00:00
Randall Stewart
ec4c19fcf0 3 Fixes -
a) There was a case where a ICMP message could cause
   us to return leaving a stuck lock on an stcb.
b) The iterator needed some tweaks to fix its lock
   ordering.
c) The ITERATOR_LOCK is no longer needed in the freeing
   of a stcb. Now that the timer based one is gone we don't
   have a multiple resume situation. Add to that that there
   was somewhere a path out of the freeing of an assoc that
   did NOT release the iterator_lock.. it was time to clean
   this old code up and in the process fix the lock bug.

MFC after:	1 week
2010-06-11 03:54:00 +00:00
Randall Stewart
41291ef07f Found by Michael. In cases where we run
out of memory (no more inp space) we don't
propely NULL the INP on return.

Obtained from:	tuexen
MFC after:	3 Days
2010-06-09 22:05:29 +00:00
Randall Stewart
b3a44e469d Fix serveral bugs all having to do with freeing an
sctp_inpcb:
1) Make sure not to remove the flag on the PCB until
   after the close() caller is back in control with the
   lock. Otherwise a quickly freeing assoc could kill the
   inpcb and cause a panic.

2) Make sure all calls to log_closing have not released
   the locks before calling the log function, we don't
   want the logging function to crash us due to a freed
   inpcb.

3) Make sure that when we get to the end, we release all
   locks (after removing them from view) and as long as
   we are NOT the inp-kill timer removing the inp, call
   the callout_drain() function so a racing timer won't
   later call in and cause a racing crash.
MFC after:	1 week
2010-06-09 16:42:42 +00:00
Randall Stewart
8dcde5165e BUG:Turns out we need to use both bit maps
to calculate the cum-ack (we were not doing
it for the NR-Sack case). With this fix
NR-sack should now work correctly.
MFC after:	1 week
2010-06-09 16:39:18 +00:00
Randall Stewart
9b2e0767e2 2 Bugs:
1) Only use both mapping arrays when NR sack is off. This
   way we can hold off moving the cumack (not the best but
   workable) when NR-sack is on.

2) We must make sure to just return on the move of the
   bit to the NR array if the cum-ack as already went
   past the TSN. This prevents marking a bit behind the
   array and hitting the invariant code that panic's us.

MFC after:	1 week
2010-06-08 03:39:31 +00:00
Randall Stewart
66bd30bd4f This fixes a BUG in the handling of the cum-ack calculation.
We were only paying attention to the nr-mapping-array. Which
seems to make sense on the surface, by definition things
up to the cum-ack should be deliverable thus in the nr-mapping-array.
However (there is always a gotcha) thats not true when it
comes to large messages. The stack may hold the message
while re-assembling it not not deliver it based on several
thresholds. If that happens (which it would for smaller
large messages) then the cum-ack is figured wrong. We
now properly use both arrays in the cum-ack calculation.

MFC after:	1 week.
2010-06-07 18:29:10 +00:00
Randall Stewart
b9771f0404 Opps... my bad.. we don't need a SOCK_UNLOCK() after
calling socantrcvmore_locked() since it will unlock
the lock for you.

MFC after:	1 week
2010-06-07 11:33:20 +00:00
Randall Stewart
9ed1e280f6 Fix so we call socantrcvmore_locked so we
don't see a race where we unlock to call
the non-locked version and have the socket
go away.

MFC after:	1 week
2010-06-07 04:01:38 +00:00
Randall Stewart
8ce4a9a255 1) Optimize the cleanup and don't always depend on
the timer. This is done by considering the locks
   we will destroy and if they are contended we consider
   it the same as a reference count being up. Fixing this
   appears to cleanup another crash that was appearing with
   all the timers where the socket buf lock got corrupted.

2) Fix the sysctl code to take a lot more care when looking
   at INP's that are in the GONE or ALLGONE state.

MFC after:	1 week
2010-06-06 20:34:17 +00:00
Randall Stewart
0c7dc84076 Ok, yet another bug in killing off all the hundreds
of apitesters.. Basically we end up with attempting
to destroy a lock thats contended on. A cookie echo
arrives at the same time that the close is happening.
The close gets the lock but the cookie echo has already
passed the check for the gone flag and is then locked
waiting on the create lock.. when we go to destroy it
bam. For now we do the timer destroy for all calls
to close.. We can probably optimize this later so that
we check whats being contended on and if there is contention
then do the timer thing. but this is probably safest since
the inp has been removed from all lists and references and
only the timer can find it.. once the locks are released all
other places will instantly see the GONE flag and bail (thats
what the change in sctp_input is one place that was lacking
the bail code).

MFC after:	1 week
2010-06-06 19:24:32 +00:00
Randall Stewart
faa1e3f4a9 1) Further enhance the INVARIANT lock validation (no locks) are
held by checking the create and inp locks as well.

2) Fix a bug in that when a socket is closed an INIT-ACK
   is returned, we do NOT unlock the locked_tcb unless its
   different (an unlikely scenario). If we blindly unlock as
   we were doing before we can end up unlocking the actual
   stcb thats about to be sent down to the free function which
   requires the lock be held.

MFC after:	1 week
2010-06-06 16:11:16 +00:00
Randall Stewart
7c82e9fa93 Fix a bug in the sctp_inpcb_free. Basically if the socket
was setup to do an abortive close an association that was
in the accept_queue could get stuck and never freed. Now
we properly start the kill timer on the socket and turn
off the flag (same thing we do for the graceful close method).
MFC after:	1 week
2010-06-06 16:09:12 +00:00
Randall Stewart
3d7001cdcb Fix a bug in sctp_abort_assoc(). DON'T call the sctp_inpcb_free
when the gone flag is set. You don't know what locks the
caller has set and there is already a kill timer running.

MFC after:	1 week
2010-06-06 16:07:40 +00:00
Randall Stewart
2c6b25b4cd Hopefully this fixes a LOR by making
so we only hold the iterator lock during
updates to the iterators work.

MFC after:	1 week
2010-06-06 02:33:46 +00:00
Randall Stewart
a67294246e Bruce's fix for some return's in
error legs.

MFC after:	1 week
2010-06-06 02:32:20 +00:00
Randall Stewart
8e57327bbf Purge out a Windows def that somehow slipped
past the scrubber.

MFC after:	1 Week
2010-06-05 21:39:52 +00:00
Randall Stewart
1909799a4c Spacing issues
MFC after:	1 Week
2010-06-05 21:33:16 +00:00
Randall Stewart
aca14c2aa8 This change does the following:
1) Fix the alignment of a comment.
2) Fix a BUG where we were NOT paying attention
   to the RESEND marking on retransmitting control
   chunks.. and worse we were not decrementing the
   retran count that could cause us to loop forever.
3) Add in the valdiate_no_lock function on invariants
   so that we will really check all ways out to be sure
   a lock does not slip out locked.

MFC after:	1 week.
2010-06-05 21:27:43 +00:00
Randall Stewart
791437b51c Use the proper increment macro when increasing the
number on sent_queue_retran_cnt.

MFC after:	1 week
2010-06-05 21:22:58 +00:00
Randall Stewart
28085b2e10 This does two changes:
1) Makes it so that the INVARIANT function validate nolocks is
   available anywhere.
2) Fixes a BUG where a close has been done on a collision socket
   and the cookie processing would return leaving a lock held.
MFC after:	1 week
2010-06-05 21:20:28 +00:00
Randall Stewart
62fb761ff2 This fixes a bug in the close up of a socket that
had un-accepted assoc's. Basically the assoc (and inp)
would get stuck and never get cleaned up.

MFC after:	1 week
2010-06-05 21:17:23 +00:00
Marko Zec
7c4b8137cd Virtualize the IPv4 multicast routing code.
Submitted by:	iprebeg
Reviewed by:	bms, bz, Pavlin Radoslavov
MFC after:	30 days
2010-06-02 15:44:43 +00:00
Qing Li
0ed6142b31 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
Randall Stewart
f751743351 This adds back the Iterator to the sctp
code base. We now properly have ONE thread
that services all VNET's. Also we purge out
the old timer based iterator code which had
multiple LOR's and other issues.

MFC after:	3 days
2010-05-16 17:03:56 +00:00
Randall Stewart
ea9b0170bf Fix an old long time bug in generating a
fwd-tsn. This would appear when greater than
the size of mbuf TSN's would need to be skipped.

MFC after:	3 days
2010-05-12 18:33:25 +00:00
Randall Stewart
83128708b4 More PR-SCTP bugs:
- Make sure that when you kick the streams you add correctly
    using a 16 bit unsigned.
  - Make sure when sending out you allow FWD-TSN to skip over
    and list the ACKED chunks in the stream/seq list (so the
    rcv will kick the stream)
MFC after:	3 days
2010-05-12 18:00:15 +00:00
Michael Tuexen
091430c121 Get rid of unused constants.
MFC after: 3 days.
2010-05-12 16:10:33 +00:00
Randall Stewart
7898f4085c This fixes PR-SCTP issues:
- Slide the map at the proper place.
 - Mark the bits in the nr_array ONLY if there
   is no marking.
 - When generating a FWD-TSN we allow us to skip past
   ACKED chunks too.

MFC after:	1 weeks
2010-05-12 13:45:46 +00:00
Randall Stewart
88a7eb29d2 This fixes a bug with the one-2-one model socket when a
user sets up a socket to a server sends data and closes
the socket before the server has called accept(). It used
to NOT work at all. Now we add a flag to the assoc and
defer assoc cleanup so that the accept will suceed.
2010-05-11 17:02:29 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
7a657e630d Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by:	(ten 211.ru)
Tested by:	(ten 211.ru)
MFC after:	4 days
2010-04-27 15:07:08 +00:00
Bruce M Simpson
fd963b9929 Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.

Submitted by:	Matthew Luckie
MFC after:	3 days
2010-04-27 14:14:21 +00:00
Michael Tuexen
6dbd88581d Sending a FWDTSN chunk should not affect the retran count.
MFC after: 3 days.
2010-04-25 19:00:37 +00:00
Michael Tuexen
475d0674a6 Undo my lastest fix since that wasn't one at all.
MFC after: 3 days.
2010-04-25 15:04:57 +00:00
Michael Tuexen
f31e6c7f26 * Fix compilation when using SCTP_AUDITING_ENABLED.
* Fix delaying of SACK by taking out old optimization code
  which does not optimize anymore.
* Fix fast retransmission of chunks abandoned by the
  "number of retransmissions" policy.

MFC after: 3 days.
2010-04-23 08:19:47 +00:00
Bjoern A. Zeeb
1c044382c3 Avoid memory access after free. Use the (shortend) copy for the
ipsec mtu lookup as well.

PR:		kern/145736
Submitted by:	Peter Molnar (peter molnar.cc)
MFC after:	3 days
2010-04-21 10:21:34 +00:00
Michael Tuexen
ee94f0a272 Update highest_tsn variables when sliding mapping arrays. 2010-04-20 08:51:21 +00:00
Michael Tuexen
553aff12d4 Really print the nr_mapping array when it should be printed.`
MFC after: 3 days.
2010-04-20 08:50:19 +00:00
Luigi Rizzo
6ba1ccc0f2 whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)
2010-04-19 16:17:30 +00:00
Kenneth D. Merry
3579cf4c4f Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was
causing TSO to break for the Xen netfront driver.

Reviewed by:	gibbs, rwatson
MFC after:	7 days
2010-04-19 15:15:36 +00:00
Michael Tuexen
307b49efef Get delayed SACK working again.
MFC after: 3 days.
2010-04-19 14:15:58 +00:00
Michael Tuexen
37f144eb5d Fix a bug where SACKs are not sent when they should.
Move some protection code to INVARIANTS.
Cleanups.

MFC after: 3 days.
2010-04-17 12:22:44 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb
0f08182a03 Try to help with a virtualized dummynet after r206428.
This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.
2010-04-10 22:11:01 +00:00
Rui Paulo
9c251892c0 Honor the CE bit even when the CWR bit is set.
PR:		145600
Submitted by:	Richard Scheffenegger <rs at netapp.com>
MFC after:	1 week
2010-04-10 12:47:06 +00:00
Bruce M Simpson
933fc4dde6 Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.

In inp_join_group():
 If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
 MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
 legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
 not violate POLA for apps using the old API.
 It also stops us falling through to kicking IGMP state transactions
 in what is otherwise a no-op case.
 [This has already been dealt with in HEAD, but make it explicit before
  we MFC the change to 8.]

In inp_leave_group():
 Fix a bogus conditional.
 Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
 where it actually belongs.
 If an interface was specified, by primary IPv4 address, for ioctl
 IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
 then and only then should we look up the ifp from the IPv4 address in
 mreqs.imr_interface.
 If not, we fall through to imo_match_group() as before, but only in
 the IP_DROP_MEMBERSHIP case.

With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.

Found by:	ermal (with pfSense)
MFC after:	3 days
2010-04-10 12:05:31 +00:00
Luigi Rizzo
368a605202 This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.
2010-04-09 18:02:19 +00:00
Luigi Rizzo
c11e54acfc no need to pass an argument to dn_compat_calc_size()
MFC after:	3 days
2010-04-09 16:06:53 +00:00
Luigi Rizzo
7f0de52d2c Hopefully fix the recent breakage in rule deletion.
A few  more tests and this will also go into -stable where
the problem is more critical.
2010-04-07 08:23:58 +00:00
Michael Tuexen
aed5947cd0 Fix a off-by-one bug in zeroing out the mapping arrays.
Fix sctp_print_mapping_array().

MFC after: 1 week
2010-04-06 18:57:50 +00:00
Michael Tuexen
c1589eec14 Use also SCTP/IPv6 checksum offloading in special cases.
MFC after: 2 weeks
2010-04-03 23:51:41 +00:00
Michael Tuexen
b5c164935e * Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
  FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
2010-04-03 15:40:14 +00:00