the SIFTR pfil(9) hook functions to or from all network stacks. This patch
allows packets inbound or outbound from a vnet to be "seen" by SIFTR.
Additional work is required to allow SIFTR to actually generate log messages for
all vnet related packets because the siftr_findinpcb() function does not yet
search for inpcbs across all vnets. This issue will be fixed separately.
Reported and tested by: David Hayes <dahayes at swin edu au>
MFC after: 3 days
Retransmitted Packets
Zero Window Advertisements
Out of Order Receives
These statistics are available via the -T argument to
netstat(1).
MFC after: 2 weeks
vnets to select their own default CC algorithm independent of each other and the
base system. If the base system or a vnet has set a default which gets unloaded,
we reset that netstack's default to NewReno.
Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 3 months
is small, so there is no good reason not to declare the buffer at the top.
- Fix a whitespace nit.
Sponsored by: FreeBSD Foundation
MFC after: 11 weeks
X-MFC with: r215166
Any found to be using the algorithm that is about to go away are switched back
to NewReno to avoid leaving dangling pointers which would trigger a panic. For
VIMAGE kernels, there is a list per vnet to walk, yet the implementation was
only examining one of the vnet lists.
Fix the implementation of the above feature for VIMAGE kernels by looping
through all active TCP control blocks across all vnets.
Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 11 weeks
runs on boot and each time a vnet jail is created. Running cc_init() multiple
times results in a panic when attempting to initialise the cc_list lock again,
and so r215166 effectively broke the use of vnet jails.
Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded
on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in
this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is
initialised before module registration is attempted.
Sponsored by: FreeBSD Foundation
Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com>
MFC after: 11 weeks
X-MFC with: r215166
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.
This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.
Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..
Reviewed by: bz, rpaulo
MFC after: 3 weeks
the "sockarg" ipfw option matches packets associated to
a local socket and with a non-zero so_user_cookie value.
The value is made available as tablearg, so it can be used
as a skipto target or pipe number in ipfw/dummynet rules.
Code by Paul Joe, manpage by me.
Submitted by: Paul Joe
MFC after: 1 week
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/
- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.
- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.
- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.
- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.
- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.
- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
that relies on the size of "struct tcpcb" is required.
Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo
Tested by: David Hayes (and many others over the years)
MFC after: 3 months
tree in preparation for another large code import. Swinburne University is the
legal entity that owns copyright and the 2-clause BSD licence is acceptable.
even if there is no route out to that mcast address. The code in
in_pcb inadvertantly would error (no route) even though
the user may have specified the address with the
proper socket option (to specify the egress interface).
Thanks bz for reminding me I forgot to commit this ;-)
Reviewed by: bz
MFC after: 1 week
function from the timer code to util, rename it appropriately and
also fix a bug in sctp_get_prev_mtu(), where calling it with a
value existing in the MTU table did not return a smaller one.
MFC after: 3 days.
r198301 itself. It also broke the logic of not sending more than one
ARP request per second, that consequently lead to a potential problem
of flooding network with broadcast packets.
MFC after: 1 week
legacy and IPv6 route destination address.
Previously in case of IPv6, there was a memory overwrite due to not enough
space for the IPv6 address.
PR: kern/122565
MFC After: 2 weeks
Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.
(*) It is believed that with the current code and locking strategy we
cannot completely fix all race.
Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after: 1 week
too coarse grained to be useful and the default value significantly degrades TCP
performance on moderate to high bandwidth-delay product paths with non-zero loss
(e.g. 5+Mbps connections across the public Internet often suffer).
Replace the outgoing mechanism with an individual per-queue limit based on the
number of MSS segments that fit into the socket's receive buffer. This should
strike a good balance between performance and the potential for resource
exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer
autotuning (which is enabled by default), the reassembly queue tracks the
socket buffer and benefits too.
As the XXX comment suggests, my testing uncovered some unexpected behaviour
which requires further investigation. By using so->so_rcv.sb_hiwat
instead of sbspace(&so->so_rcv), we allow more segments to be held across both
the socket receive buffer and reassembly queue than we probably should. The
tradeoff is better performance in at least one common scenario, versus a devious
sender's ability to consume more resources on a FreeBSD receiver.
Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo
MFC after: 2 weeks
"net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone
stats. The value returned by the cursegments sysctl is approximate owing to
the way in which uma_zone_get_cur is implemented.
- Discontinue use of V_tcp_reass_qsize as a global reassembly segment count
variable in the reassembly implementation. The variable was used without
proper synchronisation and was duplicating accounting done by UMA already. The
lack of synchronisation was particularly problematic on SMP systems
terminating many TCP sessions, resulting in poor TCP performance for
connections with non-zero packet loss.
Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo (as part of a larger patch)
MFC after: 2 weeks
This fixes the bug where setting bw > 1 MTU/tick resulted in
infinite bandwidth if io_fast=1
PR: 147245 148429
Obtained from: Riccardo Panicucci
MFC after: 3 days
net.inet.ip.fw.one_pass and always moved to the next rule
in case of a successful nat.
This should fix several related PR (waiting for feedback
before closing them)
PR: 145167 149572 150141
MFC after: 3 days
un-expiring.
The previous version of code have no locking when testing rt_refcnt.
The result of the lack of locking may result in a condition where
a routing entry have a reference count but at the same time have
RTPRF_OURS bit set and an expiration timer. These would eventually
lead to a panic:
panic: rtqkill route really not free
When the system have ICMP redirects accepted from local gateway
in a moderate frequency, for instance.
Commit this workaround for now until we have some better solution.
PR: kern/149804
Reviewed by: bz
Tested by: Zhao Xin, Pete French
MFC after: 2 weeks
not be used outside of the reassembly queue implementation. Provide a new
function to flush all segments from a reassembly queue and call it from the
appropriate places instead of manipulating the queue directly.
Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo
MFC after: 2 weeks
in the kernel (just as inet_ntoa() and inet_aton()) are and sync their
prototype accordingly with already mentioned functions.
Sponsored by: Sandvine Incorporated
Reviewed by: emaste, rstone
Approved by: dfr
MFC after: 2 weeks
separate the decision logic, of whether we can do TSO, and the
calculation of the burst length into two distinct parts.
Change the way the TSO burst length calculation is done. While
TSO could do bursts of 65535 bytes that can't be represented in
ip_len together with the IP and TCP header. Account for that and
use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
have the same value of 64K). When more data is available prevent
less than MSS sized segments from being sent during the current
TSO burst.
Add two more KASSERTs to ensure the integrity of the packets.
Tested by: Ben Wilber <ben-at-desync com>
MFC after: 10 days
to give way for the pluggable congestion control framework. It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.
In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.
In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI. It is always set to 0.
In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI. It is always set to 0.
These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.
No MFC is planned. The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.
a small difference in the last paragraph though) as suggested by jhb.
Clarify that the 'reviewed by' in r212653 by lstewart was for the
functional change, not the comments in the committed version.
artificial power-of-2 rounded number to their real values specified
in RFC879 and RFC2460.
From the history and existing comments it appears that the rounded
numbers were intended to be advantageous for the kernel and mbuf
system. However this hasn't been the case at for at least a long
time. The mbuf clusters used in tcp_output() have enough space
to hold the larger real value for the default MSS for both IPv4 and
IPv6. Note that the default MSS is only used when path MTU discovery
is disabled.
Update and expand related comments.
Reviewed by: lsteward (including some word-smithing)
MFC after: 2 weeks
using ipproto_{un,}register() and the newly created ip6proto_{un,}register()
so that it can again receive IPPROTO_CARP packets allowing its state machine
to work.
Reviewed by: bz
Approved by: ken (mentor)
In protosw we define pr_protocol as short, while on the wire
it is an uint8_t. That way we can have "internal" protocols
like DIVERT, SEND or gaps for modules (PROTO_SPACER).
Switch ipproto_{un,}register to accept a short protocol number(*)
and do an upfront check for valid boundries. With this we
also consistently report EPROTONOSUPPORT for out of bounds
protocols, as we did for proto == 0. This allows a caller
to not error for this case, which is especially important
if we want to automatically call these from domain handling.
(*) the functions have been without any in-tree consumer
since the initial introducation, so this is considered save.
Implement ip6proto_{un,}register() similarly to their legacy IP
counter parts to allow modules to hook up dynamically.
Reviewed by: philip, will
MFC after: 1 week
pseudo-interface. This leads to a panic due to uninitialized
if_broadcastaddr address. Initialize it and implement ip_output()
method to prevent mbuf leak later.
ipfw pseudo-interface should never send anything therefore call
panic(9) in if_start() method.
PR: kern/149807
Submitted by: Dmitrij Tejblum
MFC after: 2 weeks
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.
MFC after: 4 weeks
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.
The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.
Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space. The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.
The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.
Approved by: bz (mentor)
it must reset its congestion window back to the initial window.
RFC3390 has increased the initial window from 1 segment to up to
4 segments.
The initial window increase of RFC3390 wasn't reflected into the
restart window which remained at its original defaults of 4 segments
for local and 1 segment for all other connections. Both values are
controllable through sysctl net.inet.tcp.local_slowstart_flightsize
and net.inet.tcp.slowstart_flightsize.
The increase helps TCP's slow start algorithm to open up the congestion
window much faster.
Reviewed by: lstewart
MFC after: 1 week
sysctl's and remove any side effects.
Both sysctl's share the same backend infrastructure and due to the
way it was implemented enabling net.inet.tcp.log_in_vain would also
cause log_debug output to be generated. This was surprising and
eventually annoying to the user.
The log output backend is kept the same but a little shim is inserted
to properly separate log_in_vain and log_debug and to remove any side
effects.
PR: kern/137317
MFC after: 1 week
number of syncache entries into account for the surplus we add to account
for a possible increase of records in the re-entry window.
Discussed with: jhb, silby
MFC after: 1 week
memory size estimate to userland for pcb list sysctls. The previous
behavior of a "slop" of n/8 does not work well for small values of n
(e.g. no slop at all if you have less than 8 open UDP connections).
Reviewed by: bz
MFC after: 1 week
path MTU discovery and the tcp_minmss limiter for very small MTU's.
When the MTU suggested by the gateway via ICMP, or if there isn't
any the next smaller step from ip_next_mtu(), is lower than the
floor enforced by net.inet.tcp.minmss (default 216) the value is
ignored and the default MSS (512) is used instead. However the
DF flag in the IP header is still set in tcp_output() preventing
fragmentation by the gateway.
Fix this by using tcp_minmss as the MSS and clear the DF flag if
the suggested MTU is too low. This turns off path MTU dissovery
for the remainder of the session and allows fragmentation to be
done by the gateway.
Only MTU's smaller than 256 are affected. The smallest official
MTU specified is for AX.25 packet radio at 256 octets.
PR: kern/146628
Tested by: Matthew Luckie <mjl-at-luckie org nz>
MFC after: 1 week
report when a new socket couldn't be created because one of
in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.
Logging is conditional on net.inet.tcp.log_debug being enabled.
MFC after: 1 week
and we loop back to 'again'. If the remainder is less or equal
to one full segment, the TSO flag was not cleared even though
it isn't necessary anymore. Enabling the TSO flag on a segment
that doesn't require any offloaded segmentation by the NIC may
cause confusion in the driver or hardware.
Reset the internal tso flag in tcp_output() on every iteration
of sendalot.
PR: kern/132832
Submitted by: Renaud Lienhart <renaud-at-vmware com>
MFC after: 1 week
a kernel printf to a log output with the priority of LOG_NOTICE.
This way the messages still show up in /var/log/messages but no
longer spam the console every other second on busy servers that
are port scanned:
"Limiting open port RST response from 114 to 100 packets/sec"
PR: kern/147352
Submitted by: Eugene Grosbein <eugen-at-eg sd rdtc ru>
MFC after: 1 week
It was experimental and interferes with the normal congestion control
algorithms by instating a separate, possibly lower, ceiling for the
amount of data that is in flight to the remote host. With high speed
internet connections the inflight limit frequently has been estimated
too low due to the noisy nature of the RTT measurements.
This code gives way for the upcoming pluggable congestion control
framework. It is the task of the congestion control algorithm to
set the congestion window and amount of inflight data without external
interference.
Reviewed by: lstewart
MFC after: 1 week
Removal after: 1 month
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.
Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.
This commit requires IP6PROTOSPACER, introduced in r211115.
Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks
interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at
cold boot. This behavior is not observed when carp(4) interface
is created slightly later, when the underlying interface is fully
up.
Before this change what happen at boot is roughly:
- ifconfig creates em0 interface;
- ifconfig clones a carp device using em0;
(em0's link state is DOWN at this point)
- carp state: INIT -> BACKUP [*]
- carp state: BACKUP -> MASTER
- [Some negotiate between em0 and switch]
- em0 kicks up link state change event
(em0's link state is now up DOWN at this point)
- do_link_state_change() -> carp_carpdev_state()
- carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+]
- carp state: INIT -> BACKUP
- carp state: BACKUP -> MASTER
At the [*] stage, em0 did not received any broadcast message from other
node, and assume our node is the master, thus carp(4) sets the link
state to "UP" after becoming a master. At [+], the master status
is forcely set to "INIT", then an election is casted, after which our
node would actually become a master.
We believe that at the [*] stage, the master status should remain as
"INIT" since the underlying parent interface's link state is not up.
Obtained from: iXsystems, Inc.
Reported by: jpaetzel
MFC after: 2 months
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.
In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer. It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.
Reviewed by: rwatson
MFC after: 6 days
Free the rtentry after we diconnected it from the FIB and are counting
it as rttrash. There might still be a chance we leak it from a different
code path but there is nothing we can do about this here.
Sponsored by: ISPsystem (in February)
Reviewed by: julian (in February)
MFC after: 2 weeks
was limited to one segment under the faulty assumption of a retransmit.
Due to this the opportunity to initialize the increased congestion window
according to RFC3390 was missed.
Support for RFC3465 introduced in r187289 uncovered the bug as the ACK
to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this
increase shouldn't happen as it's explicitly forbidden by RFC3390, but
it's another issue). Snd_cwnd remains really small (1*MSS + 1) and this
causes really bad interaction with delayed acks on other side.
The variable name sc_rxmits is a bit misleading as it counts all transmits,
not just retransmits.
Submitted by: Maxim Dounin <mdounin-at-mdounin-dot-ru>
MFC after: 10 days
PR SCTP FWD-TSN's would not be sent and thus
cause a stalled connection. Also the rwnd
Calculation was also off on the receiver side for
PR-SCTP.
MFC after: 1 month
retransmission queue to validate the retran count, we
need to include the chunks in the control send queue
too. Otherwise the count will not match and you will get
the invarient warning if invarients are on.
MFC after: 2 weeks
a separate inline function. This further reduces duplicate code that didn't
have a good reason to stay as it was.
- Reorder the malloc of a pkt_node struct in the hook functions such that it
only occurs if we managed to find a usable tcpcb associated with the packet.
- Make the inp_locally_locked variable's type consistent with the prototype of
siftr_siftdata().
Sponsored by: FreeBSD Foundation
large number of packets queued to a crashing process.
In a specific case you may get 2 ABORT's back (from
say two packets in flight). If the aborts happened to
be processed at the same time its possible to have
one free the association while the other is trying
to report all the outbound packets. When this occured
it could lead to a crash.
MFC after: 3 days
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log
file, providing the ability to make highly granular measurements of TCP
connection state. The tool is aimed at system administrators, developers and
researchers alike. Please take it for a spin and test it out - the man page
should have all the information required to get you going.
Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: dwmalone, gnn, rpaulo
Tested by: Many on freebsd-current@ and elsewhere over the years
MFC after: 1 month
a read-lock is being called to check the vtag-timewait cache.
Then in two cases (where a vtag is bad i.e. in the time-wait
state) the write-unlock is called NOT the read-unlock. Under
conditions where lots of associations are coming and going
this will cause the system to panic at some point.
MFC after: 3 days
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.
To fix this, exit the inner loop with the continue operator forcibly and
explicitly.
PR: kern/147798
wrong side into account.
* sctp_findassociation_ep_addr() must check the local address if available.
This fixes a bug where ABORT chunks were accepted even in the case where
the local was not owned by the endpoint.
Thanks to brucec for pointing out a bug in my first version of the fix.
MFC after: 3 days
to using an uninitialized variable.
* Fix a bug where a NULL pointer was dereferenced when interfaces
come and go at a high rate.
* Fix a bug where inps where not deregistered from iterators.
* Fix a race condition in freeing an association.
* Fix a refcount problem related to the iterator.
Each of the above bug results in a panic. It shows up when
interfaces come and go at a high rate.
Obtained from: rrs (partly)
MFC after: 3 days
a) There was a case where a ICMP message could cause
us to return leaving a stuck lock on an stcb.
b) The iterator needed some tweaks to fix its lock
ordering.
c) The ITERATOR_LOCK is no longer needed in the freeing
of a stcb. Now that the timer based one is gone we don't
have a multiple resume situation. Add to that that there
was somewhere a path out of the freeing of an assoc that
did NOT release the iterator_lock.. it was time to clean
this old code up and in the process fix the lock bug.
MFC after: 1 week
sctp_inpcb:
1) Make sure not to remove the flag on the PCB until
after the close() caller is back in control with the
lock. Otherwise a quickly freeing assoc could kill the
inpcb and cause a panic.
2) Make sure all calls to log_closing have not released
the locks before calling the log function, we don't
want the logging function to crash us due to a freed
inpcb.
3) Make sure that when we get to the end, we release all
locks (after removing them from view) and as long as
we are NOT the inp-kill timer removing the inp, call
the callout_drain() function so a racing timer won't
later call in and cause a racing crash.
MFC after: 1 week
1) Only use both mapping arrays when NR sack is off. This
way we can hold off moving the cumack (not the best but
workable) when NR-sack is on.
2) We must make sure to just return on the move of the
bit to the NR array if the cum-ack as already went
past the TSN. This prevents marking a bit behind the
array and hitting the invariant code that panic's us.
MFC after: 1 week
We were only paying attention to the nr-mapping-array. Which
seems to make sense on the surface, by definition things
up to the cum-ack should be deliverable thus in the nr-mapping-array.
However (there is always a gotcha) thats not true when it
comes to large messages. The stack may hold the message
while re-assembling it not not deliver it based on several
thresholds. If that happens (which it would for smaller
large messages) then the cum-ack is figured wrong. We
now properly use both arrays in the cum-ack calculation.
MFC after: 1 week.
the timer. This is done by considering the locks
we will destroy and if they are contended we consider
it the same as a reference count being up. Fixing this
appears to cleanup another crash that was appearing with
all the timers where the socket buf lock got corrupted.
2) Fix the sysctl code to take a lot more care when looking
at INP's that are in the GONE or ALLGONE state.
MFC after: 1 week
of apitesters.. Basically we end up with attempting
to destroy a lock thats contended on. A cookie echo
arrives at the same time that the close is happening.
The close gets the lock but the cookie echo has already
passed the check for the gone flag and is then locked
waiting on the create lock.. when we go to destroy it
bam. For now we do the timer destroy for all calls
to close.. We can probably optimize this later so that
we check whats being contended on and if there is contention
then do the timer thing. but this is probably safest since
the inp has been removed from all lists and references and
only the timer can find it.. once the locks are released all
other places will instantly see the GONE flag and bail (thats
what the change in sctp_input is one place that was lacking
the bail code).
MFC after: 1 week
held by checking the create and inp locks as well.
2) Fix a bug in that when a socket is closed an INIT-ACK
is returned, we do NOT unlock the locked_tcb unless its
different (an unlikely scenario). If we blindly unlock as
we were doing before we can end up unlocking the actual
stcb thats about to be sent down to the free function which
requires the lock be held.
MFC after: 1 week
was setup to do an abortive close an association that was
in the accept_queue could get stuck and never freed. Now
we properly start the kill timer on the socket and turn
off the flag (same thing we do for the graceful close method).
MFC after: 1 week
1) Fix the alignment of a comment.
2) Fix a BUG where we were NOT paying attention
to the RESEND marking on retransmitting control
chunks.. and worse we were not decrementing the
retran count that could cause us to loop forever.
3) Add in the valdiate_no_lock function on invariants
so that we will really check all ways out to be sure
a lock does not slip out locked.
MFC after: 1 week.