Commit Graph

3050 Commits

Author SHA1 Message Date
Randall Stewart
9ceab0faf0 - When a SCTP socket is closed, but the last data
SACK is lost, we would incorrectly abort the association
  instead of retransmitting the SACK.
Approved by:	re@freebsd.org (Ken Smith)
2007-06-29 15:14:23 +00:00
Randall Stewart
97c76f10a0 - Update bindx address checking to properly screen out address
per the socket api, adding port validation. We allow port 0
  or the already bound port number and no others.

Approved by:	re@freebsd.org (Ken Smith)
2007-06-25 19:05:26 +00:00
Randall Stewart
a964e8de4c - Fix type casts in calling sctp_m_getptr, it expects a int not
an unsigned (returned by sizeof) also add cast to  comparison check
  for size bounds.
Approved by:	re(bmah@freebsd.org)
2007-06-22 14:40:09 +00:00
Randall Stewart
671d309c7c - Fix stream reset so it limits the number of streams that can be listed
- Fix fwd-tsn to use proper accessor so it does not overrun mbufs
- Fix stream reset error reporting to actually work (it has always been
  broken if the peer rejects a stream reset)
- Some 64 bit friendly changes

Approved by:	re(bmah@freebsd.org)
2007-06-22 13:50:56 +00:00
Randall Stewart
ea1fbec59a - Two more static analisys bugs found by cisco's tool on a subsequent
run.
2007-06-18 22:36:52 +00:00
Randall Stewart
eacc51c5b6 - Fixes cstatic issues found by cisco sa tool (missing frees and such
on error legs)
- align sctp_sockstore to 64 bit boundary ..
2007-06-18 21:59:15 +00:00
Maxim Konovalov
d069a5d478 o Make ipfw set more robust -- now it is possible:
- to show a specific set: ipfw set 3 show
    - to delete rules from the set: ipfw set 9 delete 100 200 300
    - to flush the set: ipfw set 4 flush
    - to reset rules counters in the set: ipfw set 1 zero

PR:		kern/113388
Submitted by:	Andrey V. Elsukov
Approved by:	re (kensmith)
MFC after:	6 weeks
2007-06-18 17:52:37 +00:00
Randall Stewart
d95ddf0251 Add additional logging level mask for packet_logging too. 2007-06-18 13:57:37 +00:00
Randall Stewart
19d8ca2eaf - The packet log needs to copy all of the buffer not to the end. 2007-06-17 23:43:37 +00:00
Randall Stewart
75298de2a0 Back out last change to inpcb_free. Turns out we need
to hold off freeing if there is data pending ... someone
might do send/close. Which means we want the data to
go and then close it after startup. Added comments to
the code as well to note that this is done for a reason.
2007-06-17 19:27:46 +00:00
Matt Jacob
cce418d3bf Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0)
codepath.
2007-06-17 04:07:11 +00:00
Randall Stewart
e42a0f5e72 - For sctp_input/sctp6_input add announcment when a packet arrives (debug)
- re-factor the packet drop in sctp_output a bit more, we don't need the
   trim after all, but the size calc is now corrected.
 - When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user
   closes, it should not matter if data is queued, the assoc should be
   purged.
 - In error leg a missing free_chunk when iph comes in NULL (should not
   happen but just in case).
2007-06-17 01:36:02 +00:00
Matt Jacob
27d65ef267 Replace incorrect local OFFSET_OF macro with the correct and generic
offsetof macro.
2007-06-17 00:33:34 +00:00
Matt Jacob
fbdd20a1ae Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr
to nonzero you fulfill the same function as the variable 'cmp'. so you
might as well zero match and test against it later.

Reviewed by:	timeout on review request
2007-06-17 00:31:24 +00:00
Randall Stewart
ca2cc3feac - Better handle sending large pkt-drops. We were not triming
the data with m_adj if a large pkt arrived with a bad csum
  some systems can't handle you not triming the tail (think panda :-D)
2007-06-16 14:03:15 +00:00
Randall Stewart
48dabb921d - Raise max range of sctp_logging sysctl so panda does not disallow
us to turn on logging levels.
2007-06-16 03:28:18 +00:00
Randall Stewart
72fb6fdb41 - Matthew's changes to get inlines out, plus a few of my own
to deal with the VRF inline function -> becomes a macro now.
Submitted by:	Matthew Jacobs
2007-06-16 00:33:47 +00:00
Matt Jacob
3c010a416c Garbage collect some debug code that not only no longer could
work but in fact probably causes a random pointer dereferences.
Garbage collect the tp variable too.
2007-06-15 22:54:11 +00:00
Randall Stewart
b9e7085a57 Name change SCTP_KTR_SUBSYS -> KTR_SCTP 2007-06-15 20:54:12 +00:00
Randall Stewart
0a374fd92a Remove extraneous extern (its gotten from sctp_sysctl.h) 2007-06-15 20:23:41 +00:00
Randall Stewart
cba882dfcc When removing a stream from the output-stream-wheel, if its the
first stream we saw we must update the starting point in the
wheel, else we may loop in an endless loop.
2007-06-15 19:49:13 +00:00
Randall Stewart
e1461651a4 - Update the comment lines in sctp_input.c
- We need to init the INP_LOCK since otherwise for
  non-SMP kernels you crash when you set the TOS.
2007-06-15 19:28:58 +00:00
Bruce M Simpson
f64a3b042a Stub out imported IGMPv3 definitions which clash with those of
the XORP router; the IGMPv3 definitions will be updated at a later
point in time when IGMPv3/MLDv2 support is fully merged.
2007-06-15 18:59:10 +00:00
Randall Stewart
458303da65 - Issue one, new stack reduction left packet_drop handling still
thinking it had the whole chunk. This could cause a crash if
  a large packet drop came in. Fixed by adjusting the trunc length
  down to the limit.
- Large sacks with lots of segments could also have same issue. Changed
  duplicate and segment handling to use proper get_m_ptr function to
  pull each block from mbuf chains.
2007-06-15 17:59:57 +00:00
Randall Stewart
22a6719709 - Add VRF id to sctp_ifa structure, needed mainly in panda but useful
during deletes of ifa's in diff VRF's when applicable.
2007-06-15 03:16:48 +00:00
Randall Stewart
629b8f3e0f KTR_GEN -> KTR_SUBSYS (for Kris). 2007-06-15 02:34:36 +00:00
Randall Stewart
80fefe0a08 - Fix so ifn's are properly deleted when the ref count goes to 0.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
  normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
  of the logging #ifdef's with a few exceptions reducing
  the number of config options for SCTP.
2007-06-14 22:59:04 +00:00
Randall Stewart
db4fd95b0e - fix bindx to check addresses against socket's protocol family 2007-06-13 14:39:41 +00:00
Robert Watson
2281b8f054 Remove IPX over IP tunneling support, which allows IPX routing over IP
tunnels, and was not MPSAFE.  The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.

This removes one of five remaining consumers of NET_NEEDS_GIANT.

Approved by:	re (kensmith)
2007-06-13 14:01:43 +00:00
Randall Stewart
9a97252585 - Fixed cookie handling to calc an RTO when
its an INIT collision case.
- Fixed RTO calc to maintain a seperate variable to track
  if a RTO calc as been done, this allows the RTO var to be
  doubled during initial timeouts.
- Reduces the amount of stack used by process control.
- Use a constant for the peer chunk overhead.
- Name change to spell candidate correctly.
2007-06-13 01:31:53 +00:00
Bruce M Simpson
71498f308b Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
Randall Stewart
35918f8571 - Restructure so bindx functions are not done inline to socket option
but are a seperate call that can be re-used if needed.
- 64 bit issues
  o re-arrange cookie so it is better 64 bit aligned
  o For wire level things we need the packed attribute.
2007-06-12 11:21:00 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Andre Oppermann
f194524fb1 Fix a case in tcp_do_segment() where tcp_update_sack_list() would
be called with an incorrect segment end value.  tcp_reass() may
trim segments when they overlap with already existing ones in the
reassembly queue.  Instead of saving the segment end value before
the call to tcp_reass() compute it on the fly based on the effective
segment length afterwards.

This bug was not really problematic as no information got lost and
the eventual SACK information computation was correct nontheless.

MFC after:	1 week
2007-06-10 21:07:21 +00:00
Andre Oppermann
e8949f7407 Fix style for comments, be more verbose and add some more. 2007-06-10 20:59:22 +00:00
Andre Oppermann
104ebb2a45 Make the handling of the tcp window explicit for the SYN_SENT case
in tcp_outout().  This is currently not strictly necessary but paves
the way to simplify the entire SYN options handling quite a bit.
Clarify comment.  No change in effective behavour with this commit.

RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
2007-06-09 21:19:12 +00:00
Andre Oppermann
5396d0f8d8 Remove some bogosity from the SYN_SENT case in tcp_do_segment
and simplify handling of the send/receive window scaling.  No
change in effective behavour.

RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.

Noticed by:	yar
2007-06-09 21:09:49 +00:00
Andre Oppermann
b7de7d87a0 Don't send pure window updates when the peer has closed the connection
and won't ever send more data.
2007-06-09 19:39:14 +00:00
Andre Oppermann
f58747375d Handle a race condition on >2 core machines in tcp_timer() when
a timer issues a shutdown and a simultaneous close on the socket
happens.  This race condition is inherent in the current socket/
inpcb life cycle system but can be handled well.

Reported by:	kris
Tested by:	kris (on 8-core machine)
2007-06-09 17:49:39 +00:00
Randall Stewart
2bf083e4c9 - Opps.. takes out debug printfs I accidentally left in :-( 2007-06-09 13:53:27 +00:00
Randall Stewart
d00aff5d79 - fix send_failed notification contents
- Reorder send failed to be in correct order.
- Fixed calulation of init-ack to be right off
  mbuf lengths instead of the precalculated value. This
  will fix one 64 bit platform issue.
2007-06-09 13:46:57 +00:00
Yaroslav Tykhiy
22b971db87 Replace a constant with an already defined symbolic name for it.
Tested with: md5(1)
2007-06-08 13:43:28 +00:00
Yaroslav Tykhiy
dba3c50842 Add a sysctl for the purge run interval so that it can
be tuned along with the rest of hostcache parameters.
The new sysctl name is `net.inet.tcp.hostcache.prune'.
2007-06-08 13:35:51 +00:00
Randall Stewart
108df27c0b - RTO was not being initialized to 0, thus the rtt calculation
algoritm would not go through the proper initialization.
- The initialization was incorrect as well, causing problems in
  sat networks with > 1sec RTT
- Get rid of magic numbers in RTT calculations.
2007-06-08 10:57:11 +00:00
Andre Oppermann
45024be06f In tcp_hc_insert() we may have the case where we have hit the global
cache size limit but this bucket row is empty.  Normally we want to
recycle the oldest entry in the bucket row.  If there isn't any the
TAILQ_REMOVE leads to a panic by trying to remove a non-existing
element.  Fix this by just returning NULL and failing the insert.
This is not a problem as the TCP hostache is only advisory.

Submitted by:	jhb
2007-06-07 21:41:50 +00:00
Andre Oppermann
1f939165ce Correctly print SEQ and IRS in the corresponding log message in
syncache_expand().
2007-06-06 22:10:12 +00:00
Gleb Smirnoff
e9bf9fb67c Do not leak lock in the case of EEXIST error.
PR:		kern/92776
Submitted by:	Ed Schouten <Ed.Schouten tunix.nl>
2007-06-06 14:21:49 +00:00
Randall Stewart
5f26a41d17 - Fixes a case where doing a sysctl would leave locks held
when coping out association data.
- Fixes a small bug that prevented the SCTP_UNORDERED indication
  from going up to the app on a recv in the sinfo_flags field.
2007-06-06 00:40:41 +00:00
David Malone
041b706b2f Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
Randall Stewart
f4c93d2405 - fix initial pcb vrf setting when the initial vrf is not the
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
  unique across systems (including different VRFs) this makes it easier
  to do ifn lookups.
2007-06-02 11:05:08 +00:00
Randall Stewart
ad21a36485 - Take out the broken table-id concept. Panda Routers have a M-VRF
concept that is NOT well thought out for a multi-homed transport
  protocol. So the useless table-id entries passed around need to
  be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
  with the largest ssthresh (the compare was wrong).
2007-06-01 11:19:54 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Robert Watson
abc7d91030 (1) In tcp_usrclosed(), tp can never become NULL, so don't test for NULL
before handling the socket disconnection case.

(2) Clean up surrounding comments and formatting.

Found with:	Coverity Prevent(tm) (1)
CID:		2203
2007-05-31 12:06:02 +00:00
Randall Stewart
4c9179ad6c - Fixed (Apple) compiler warnings in sctp_input.c, sctputil.c, sctp_output.c
- Fixed a LOR in handling a cookie. Turns out create lock is applied.
  And if we abort processing, this causes LOR. Changed to force the
  timer to clean up, that way create lock is released.
2007-05-30 22:34:21 +00:00
Randall Stewart
0696e1203e - Fix a memory overwrite when the mapping array
is expanded, size of expansion was not taken int consideration.
-  Fix so vtag hash is 1 bigger so that it modulo's out
   correctly, avoids a panic when restart with right modulo happens.
-  do not dereference stcb when control->do_not_ref_stcb is set
-  Fix up packet logging to not often use a lock and also to
   add to options.
-  Fix some logging option duplication in the sctputil.h
2007-05-30 17:39:45 +00:00
Randall Stewart
3c6f353630 Adds gcc attribute to prevent inlining of a function. If
it goes inline we may well blow the stack if witness and
such are enabled.
2007-05-29 14:17:47 +00:00
Randall Stewart
6b4ae3566a - Fix spelling errors in comments per Ruslan (.. thanks... ) 2007-05-29 11:53:27 +00:00
Randall Stewart
207304d4b7 - Fixes so we won't try to start a timer when we
hold a wq lock for the iterator. Panda uses a
  silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.

Reviewed by:	gnn
2007-05-29 09:29:03 +00:00
Andre Oppermann
8d573cc158 Make log messages more verbose and simpler to understand for non-experts.
Update comments to be more conscious, verbose and fully reflect reality.
2007-05-28 23:27:44 +00:00
Andre Oppermann
e885b205c6 Fix indentation of the syncache_expand() section in tcp_input(). 2007-05-28 11:35:40 +00:00
Randall Stewart
d61a0ae066 - fixed autclose to not allow setting on 1-2-1 model.
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
  no chance of a endless loop. Only one call to random per bind
  as well.
- fixes so set_peer_primary pre-screens addresses to be
  valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
  to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
  the timer.. since the timer may still run if unconf address
  are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
  any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
  taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
  of the non-pad version. This saves about 92 bytes from each struct
  by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
  tcb's value.
- The delayed ack time could be under a tick, this fixes so
  it bounds it to at least 1 tick for platforms whos tick
  is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
  had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
  not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details

Reviewed by:	gnn
2007-05-28 11:17:24 +00:00
Andre Oppermann
a160e6302c Refactor and rewrite in parts the SYN handling code on listen sockets
in tcp_input():

 o tighten the checks on allowed TCP flags to be RFC793 and
   tcp-secure conform
 o log check failures to syslog at LOG_DEBUG level
 o rearrange the code flow to be easier to follow
 o add KASSERTs to validate assumptions of the code flow

Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake.  The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits.  The
sysctl allows to chose between two options to deal with this.  One is
to send a reset to the other endpoint to notify it about the failure
(default).  The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.

Reviewed by:	rwatson (in general)
2007-05-28 11:03:53 +00:00
Robert Watson
e487a5e2a0 Normalize spelling and grammar in TCP hostcache comments. 2007-05-27 19:39:26 +00:00
Robert Watson
c214db75f2 In tcp_timer_2msl(), tp can never become NULL, so don't check it for
NULL before entering tcp_trace().

Found with:	Coverity Prevent(tm)
CID:		1840
2007-05-27 17:52:02 +00:00
Robert Watson
b312d4b0ba Don't assign sp to the value of s when we're about to assign it instead to
s + strlen(s).

Found with:	Coverity Prevent(tm)
CID:		2243
2007-05-27 17:02:54 +00:00
Andre Oppermann
faedb66c2a The printf %b list in PRINT_TH_FLAGS has to be in octal numbering.
Thus convert \8 to \10 and the warnings go away.

Pointed out by:	sam, ru, thompsa
2007-05-25 21:28:49 +00:00
Andre Oppermann
a250f3820c Add CWR back into the PRINT_TH_FLAGS list as gcc42 doesn't complain
about \8 in a string anymore.
2007-05-23 19:16:21 +00:00
Andre Oppermann
ec05a17370 In tcp_log_addrs():
o add the hex output of the th_flags field to the example log
   line in comments
 o simplify the log line length calculation and make it less
   evil
 o correct the test for the length panic; the line isn't on
   the stack but malloc'ed
2007-05-23 19:07:53 +00:00
Andre Oppermann
d2ddf5d4b0 Be more restrictive with segment validity checks in syncache_expand()
and log check failures to syslog at LOG_DEBUG level.

Always prefill the sc->sc_ts field to use it in the checks.
2007-05-18 21:42:25 +00:00
Andre Oppermann
5df429a002 o Add syslog logging under LOG_DEBUG to various failures caused by
bogus segments
o Add more KASSERT()s
o Update comments
2007-05-18 21:13:01 +00:00
Andre Oppermann
df541e5fc1 Add tcp_log_addrs() function to generate and standardized TCP log line
for use thoughout the tcp subsystem.

It is IPv4 and IPv6 aware creates a line in the following format:

 "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"

A "\n" is not included at the end.  The caller is supposed to add
further information after the standard tcp log header.

The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use.  All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.

Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.

Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.

tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
    void *ip6hdr)

Usage example:

 struct ip *ip;
 char *tcplog;

 if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
	log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
	    tcplog, __func__);
	free(s, M_TCPLOG);
 }
2007-05-18 19:58:37 +00:00
John Baldwin
0ba5d2eedb Fix statistical accounting for bytes and packets during sack retransmits.
MFC after:	1 week
Submitted by:	mohans
2007-05-18 19:56:24 +00:00
JINMEI Tatuya
187069853c - Disabled responding to NI queries from a global address by default as
specified in RFC4620.  A new flag for icmp6_nodeinfo was added to enable the
  feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
  flags is clearer (i.e., defined specific macro names instead of using
  hard-coded values).

Approved by:	gnn (mentor)
MFC after:	1 week
2007-05-17 21:20:24 +00:00
Randall Stewart
3c503c28da - Fixed 1-2-1 model to not worry about associd in sockopts
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
  it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
  checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
  level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
  tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
  input argument.
- When sending failed due to a no route to host, we leaked
  the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by:	gnn
2007-05-17 12:16:24 +00:00
Oleg Bulyzhin
7e17f8b864 Unbreak IPv4 kernel build. 2007-05-17 00:05:13 +00:00
Robert Watson
6751f8364e Remove leading spaces before tabs spotted thanks to silby using
kwrite to read ip_input.c.
2007-05-16 20:46:58 +00:00
Andre Oppermann
abb91d889a Remove now unused stuff forgotten in the previous commit. 2007-05-16 17:55:22 +00:00
Andre Oppermann
2104448fe7 Move TIME_WAIT related functions and timer handling from files
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:

 tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()

 tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
 tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
 tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()

This is a mechanical move with appropriate renames and making
them static if used only locally.

The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
2007-05-16 17:14:25 +00:00
David Malone
39629c92cc When verifying the IPv4 UDP checksum, don't overwrite the checksum
value in the mbuf with the result of the calculation. Previously,
if we chose to return an ICMP message, the quoted UDP checksum bytes
would be different to what was sent.

PR:		112471
Submitted by:	Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after:	3 weeks
2007-05-16 09:12:16 +00:00
Andre Oppermann
ec9c755352 Complete the (mechanical) move of the TCP reassembly and timewait
functions from their origininal place to their own files.

TCP Reassembly from tcp_input.c -> tcp_reass.c
TCP Timewait   from tcp_subr.c  -> tcp_timewait.c
2007-05-13 22:16:13 +00:00
Andre Oppermann
57615c7e86 Drop everything that doesn't belong into this new file.
It's neither functional not connected to the build yet.
2007-05-11 21:17:53 +00:00
Andre Oppermann
1433541aa4 Drop everything that doesn't belong into this new file.
It's neither functional nor connected to the build yet.
2007-05-11 21:04:57 +00:00
Andre Oppermann
0489b64c5e Make the TCP timer callout obtain Giant if the network stack is marked
as non-mpsafe.

This change is to be removed when all protocols are mp-safe.
2007-05-11 20:52:47 +00:00
Andre Oppermann
504abdc6e6 Add the timestamp offset to struct tcptw so we can generate proper
ACKs in TIME_WAIT state that don't get dropped by the PAWS check
on the receiver.
2007-05-11 18:29:39 +00:00
Robert Watson
632bbf0f5b Coalesce two identical UCB licenses into a single license instance with
one set of copyright years.

White space and comment cleanup.

Export $FreeBSD$ via __FBSDID.
2007-05-11 11:21:43 +00:00
Robert Watson
b34aab2337 Minor white space and style cleanups. 2007-05-11 11:05:30 +00:00
Robert Watson
c59b9aa51b White space and style cleanup. 2007-05-11 11:00:48 +00:00
Robert Watson
d22e451d5b Minor white space/style normalization. 2007-05-11 10:50:31 +00:00
Robert Watson
4d41cc2fe6 Normalize style a bit: reduce pseudo-randomness of comment layout and
white space.  Remove 'register'.
2007-05-11 10:48:30 +00:00
Robert Watson
54d642bbe5 Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
2007-05-11 10:20:51 +00:00
Robert Watson
169db7b25d Remove unneeded wrappers for in_setsockaddr() and in_setpeeraddr(), which
used to exist so pcbinfo locks could be acquired, but are no longer
required as a result of socket/pcb reference model refinements.
2007-05-11 09:54:53 +00:00
Andre Oppermann
4b8e42baab Fix an incorrect replace of a timer reference made during the TCP timer
rewrite in rev. 1.132.  This unmasked yet another bug that causes certain
connections to get indefinately stuck in LAST_ACK state.
2007-05-10 23:11:29 +00:00
Robert Watson
f2565d68a4 Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.
2007-05-10 15:58:48 +00:00
Randall Stewart
ad81507eed Two major items here:
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
  a macro that does all of this. This removes all printfs from
  the code and makes the code more portable and easier to
  read.
- Static Analysis (cisco) - found a few bugs, but mostly we
  add checks for NULL pointers and such to make the tool
  happy. We now pass the Cisco SA tools checks except for
  where it does not understand tailq/lists. We still need
  to look at the coverity tools output too (this is like
  the cisco SA tool) and see if it wants us to fix any other
  items. Hopefully this will be the last major churn in the
  code other than bug fixes.
2007-05-09 13:30:06 +00:00
Maxim Konovalov
d30d90dc80 o Fix style(9) bugs introduced in the last commit.
Pointed out by:	bde
2007-05-09 11:39:46 +00:00
Maxim Konovalov
10fe523e99 o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.
PR:		kern/112517
Submitted by:	vd
2007-05-09 06:09:40 +00:00
Randall Stewart
b100636770 - Copyright change, cisco's silly tool wants it to say:
"Copyright (c) 2001-2007, by Cisco Systems,"
   instead of
       *Copyright (c) 2001-2007, Cisco Systems,"

-  Also fix a few straglers that were still in 2006.
2007-05-08 17:01:12 +00:00
Randall Stewart
b0552ae214 - Get rid of the sctp_inpcb_free() "magic numbers", now they
are sensible defines that tell what you are directing
   the function to do.
2007-05-08 15:53:03 +00:00
Randall Stewart
6e55db5445 - Static analyisis fixes for cisco's commit (this is equivilant
to the coverity tool.. may even be the same one.. not sure).
-  A bug in the way sctp_abort() and friends were
   setting the IP_CLOSE flag.. and NOT passing the
   last argument as a (,1)... so that things would
   get freed..
2007-05-08 14:32:53 +00:00
Randall Stewart
17205ecc85 - More macros for OS compatabilty
-  PR-SCTP would ignore FWD-TSN's above a rwnd's worth
   of TSN's (1 byte msgs).. this left the peer hopelessly
   out of sync.. or an attacker. So now we abort the assoc.
-  New IFN hash, also rename hashes to match addr/ifn now
   that the vrf has multiple.
-  Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
   as defined in the Socket API ID.
-  Export MTU information via sysctl.
-  Vrf's need table id's. This is default for
   BSD, but may be other things later when BSD
   fully supports VRFs.
-  Additional stream reset bug (caught by cisco dev-test).
-  Additional validations for the address in sending a message (socket api).
-------- and -----
-  Fix association notifications not to give the active open
   side false notifications.
-  Fix so sendfile and SENDALL will work properly (missing
   flag to say socket sender is done).
-  Fix Bug that prevented COOKIES from being retransmitted.
-  Break out connectx into helper sub-models so that iox routines can
   reuse the helpers.
-  When an address is added during system init (non-dynamic mode) make
   sure that the "defer use" flag is not set.
** its compiling on XR now :-D **

Reviewed by:	gnn
2007-05-08 00:21:05 +00:00
Robert Watson
9df79d84c1 Rather than selectively zeroing fields in the tcp_debug structure
throughout tcp_trace(), zero the entire structure up front.

Minor style fixes.
2007-05-07 14:05:23 +00:00
Robert Watson
6db851a281 Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr()
and in_setsockaddr(), containing only stale comments on why they
exist, remove them and initialize the protosw for UDP to directly
reference in_setpeeraddr() and in_setsockaddr().
2007-05-07 13:51:24 +00:00
Robert Watson
af1ee11d54 Minor style tweaks. 2007-05-07 13:47:39 +00:00
Robert Watson
434a0d24dd When setting up timewait state for a TCP connection, don't hold the
socket lock over a crhold() of so_cred: so_cred is constant after
socket creation, so doesn't require locking to read.
2007-05-07 13:04:25 +00:00
Andre Oppermann
1a5537409f Remove unused requested_s_scale from struct tcpcb. 2007-05-06 16:04:36 +00:00
Andre Oppermann
3529149e9a Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of
a decdicated sack_enable int for this bool.  Change all users accordingly.
2007-05-06 15:56:31 +00:00
Andre Oppermann
0ca3f933eb o Remove redundant tcp reassembly check in header prediction code
o Rearrange code to make intent in TCPS_SYN_SENT case more clear
 o Assorted style cleanup
 o Comment clarification for tcp_dropwithreset()
2007-05-06 15:41:06 +00:00
Andre Oppermann
c5ad39b910 Reorder the TCP header prediction test to check for the most volatile
values first to spend less time on a fallback to normal processing.
2007-05-06 15:23:51 +00:00
Andre Oppermann
679d9708b6 Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment
and change it to a void function.

We use a compressed structure for TCPS_TIME_WAIT to save memory.  Any late
late segments arriving for such a connection is handled directly in the TW
code.
2007-05-06 15:16:05 +00:00
Andre Oppermann
37ba9d112a Fix two comments. 2007-05-06 13:38:25 +00:00
Randall Stewart
6114cd961a Two bugs:
- Locks were not being unlocked when an invalid size chunk is
    sent in.
  - When a notification comes in, we cannot use it to look up
    the fragment interleave stream information since its not
    on a stream.
2007-05-06 00:01:17 +00:00
Robert Watson
6087c3c29e Add global mutex tcp_debug_mtx, which will protect global TCP debugging
state tcp_debug, tcp_debx.  Acquire and drop as required in tcp_trace().

Move to ANSI C function header, correct prototype types so that short TCP
state is no longer promoted to int unnecessarily.

Add comments.

MFC after:	3 weeks
2007-05-04 23:43:18 +00:00
Robert Watson
1cd6eadfbb Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the
pcbinfo lock will be released as well, not just the pcb lock.
2007-05-04 17:45:52 +00:00
Randall Stewart
1bb552e88d Fixes a missing unlock in the one-2-one hash table, if
it was full and a collision occured, then we would leave
a inp locked. Also fixes a missing inp unlock if IPSEC was
on and it failed during the attach. Bug found by Weongyo Jeong.
2007-05-04 15:19:10 +00:00
Bjoern A. Zeeb
7a92401aea Add support for filtering on Routing Header Type 0 and
Mobile IPv6 Routing Header Type 2 in addition to filter
on the non-differentiated presence of any Routing Header.

MFC after:	3 weeks
2007-05-04 11:15:41 +00:00
Robert Watson
7abab91135 sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags
on each socket buffer with the socket buffer's mutex.  This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.

This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.

While here, fix two historic bugs:

(1) a bug allowing I/O to be occasionally interlaced during long I/O
    operations (discovere by Isilon).

(2) a bug in which failed non-blocking acquisition of the socket buffer
    I/O serialization lock might be ignored (discovered by sam).

SCTP portion of this patch submitted by rrs.
2007-05-03 14:42:42 +00:00
Randall Stewart
d06c82f169 - Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
-  Moved to latest socket api for extended sndrcv info struct.
-  Moved to support all new levels of fragment interleave (0-2).
-  Codenomicon security test updates - length checks and such.
-  Bug in stream reset (2 actually).
-  setpeerprimary could unlock a null pointer, fixed.
-  Added a flag in the pcb so netstat can see if we are listening easier.

Obtained from:	(some of the Listen changes from Weongyo Jeong)
2007-05-02 12:50:13 +00:00
Robert Watson
84ca8aa609 Remove unused pcbinfo arguments to in_setsockaddr() and
in_setpeeraddr().
2007-05-01 16:31:02 +00:00
Robert Watson
712fc218a0 Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches.  Clean up and comment structure
fields in structure definition.
2007-04-30 23:12:05 +00:00
Maxim Konovalov
1e2f57057d o Kill EOLWS while I'm here. 2007-04-30 20:26:11 +00:00
Maxim Konovalov
38ec733c53 o Fix strtoul() error conditions check.
PR:		kern/108211
Submitted by:	Yong Tang
MFC after:	2 weeks
2007-04-30 20:22:11 +00:00
Andre Oppermann
9fa198bead o Fix INP lock leak in the minttl case
o Remove indirection in the decision of unlocking inp
o Further annotation of locking in tcp_input()
2007-04-23 19:41:47 +00:00
Randall Stewart
ee7f985774 Fixes cut and paste bug using wrong pointer reference. 2007-04-23 00:51:49 +00:00
Randall Stewart
58967d8d46 Moves the PCB features and flags from sctp_pcb.h to
sctp.h so that netstat can access and display these
values.
2007-04-22 12:12:38 +00:00
Randall Stewart
9a6142d8cd - Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
-  Moved to latest socket api for extended sndrcv info struct.
-  Moved to support all new levels of fragment interleave.
2007-04-22 11:06:27 +00:00
Andre Oppermann
df47e4377b o Remove unncessary TOF_SIGLEN flag from struct tcpopt
o Correctly set to->to_signature in tcp_dooptions()
o Update comments
2007-04-20 15:28:01 +00:00
Andre Oppermann
7824d002c0 Add more KASSERT's. 2007-04-20 15:21:29 +00:00
Andre Oppermann
0d957bba48 o Remove unused and redundant TCP option definitions
o Replace usage of MAX_TCPOPTLEN with the correctly constructed and
  derived MAX_TCPOPTLEN
2007-04-20 15:08:09 +00:00
Andre Oppermann
4d6e713043 Remove bogus check for accept queue length and associated failure handling
from the incoming SYN handling section of tcp_input().

Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed.  It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake.  It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.

Change return value of syncache_add() to void.  No status communication
is required.
2007-04-20 14:34:54 +00:00
Andre Oppermann
e207f80039 Simplifly syncache_expand() and clarify its semantics. Zero is returned
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies.  True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.

For both cases an RST is sent back in tcp_input().

A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.

Reported by:	kris (the panic)
2007-04-20 13:51:34 +00:00
Andre Oppermann
0a5df51410 Only update TCP timestamp on SYN duplication if it is present on
current SYN in syncache_add().  Otherwise disable timestamps.
2007-04-20 13:36:48 +00:00
Andre Oppermann
c73f70b728 o Plug memory leak in syncache_add() on MAC label allocation failure.
o Simplify code flow with 'done' goto label.
o Remove mbuf argument from syncache_respond().  It doesn't make use
  of it.
2007-04-20 13:30:08 +00:00
Randall Stewart
f1f73e5718 - More work on making send lock contention.
- Removed free-oqueue cache.
- Fix counter for sq entries
- Increased the amount of information retained
  on ASOC_TSN logging on the association.
- Made it so with the ASOC_TSN logging on
  sending or recieving an abort we dump the log.
- Went through and added invariant's around some
  panic's that needed them.
- decrements went to atomic_subtact_int instead of add -1
- Removed residual count increment that threw off a
  strm oq count.
- Tracks and complaints if we don't have a LAST fragment and
  clean up the sp structure.
- Track a new stat that counts number of abandoned msgs that
  happen if you close without reading.
- Fix lookup of frag point to be aware of a 0 assoc-id.
Reviewed by:	gnn
2007-04-19 11:28:43 +00:00
Andre Oppermann
bbf4e1cb47 Make tcp_twrespond() use tcp_addoptions() instead of a home grown version. 2007-04-18 18:14:39 +00:00
Andre Oppermann
9eab54debf When we run into the syncache entry limits syncache_add() tries
to free the oldest entry in the current bucket row.  The global
entry limit may be smaller than the bucket rows and their limit
combined however.  Thus only try to free a syncache entry if we
found one in this bucket row.

Reported by:	kris
2007-04-17 15:25:14 +00:00
Robert Watson
c9791cfb3e Shorten text string for ip_fw2 dynamic rules zone by removing the word
"zone", which is generally not present in zone names.  This reduces the
incidence of line-wrapping in "vmstat -z " using 80-column displays.

MFC after:	3 days
2007-04-17 09:28:36 +00:00
Robert Watson
215c8d75b8 Remove unused variable tcbinfo_mtx. 2007-04-15 21:03:23 +00:00
Randall Stewart
f1d6e6dc71 Fix stupid syntax error - Pointy hat to me :-( 2007-04-15 13:03:14 +00:00
Randall Stewart
478d3f0901 - Add more comments to sctps_stats struture in sctp_uio.h
- Fix bug that prevented EEOR mode from working
  and simplified the can_we_split code in the process.
- Reduce lock contention for the tcb_send_lock. I did
  this especially for EEOR mode, still need to look at
  why I need a lock when removing from the tailq and the
  ->next is NOT null. A lock fixes it but it implies a
  bug yet exists.
- Activated Andre's proposed changes to better use the mbuf
  infrastructure.
- Fixed places that were not using the aloc macro's to take
  advantage of the per assoc cache.
- Adds ifdef fix so any logging will enable stat_logging to
  get the right data structures in place (suggested by Max Laier).
2007-04-15 11:58:26 +00:00
Max Laier
d0cf96b407 Fix a typeo - unbreak the build. 2007-04-14 18:27:34 +00:00
Randall Stewart
c105859eee - fix source address selection when picking an acceptable address
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
  if we failed to get init/or/cookie to bring up an assoc. Change
  so we don't just give a generic "comm lost" but look at actual
  states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
  compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
  for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
  during window probes on the sent queue. This would then
  cause an out-of-order problem and assure that the flight
  size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
  overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
  was active, fix to make strict sacks a bit smarter about what
  the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
  commit of at least m_last().. need to adjust for 6.2 as
  well.. since m_last won't exist.
Reviewed by:	gnn
2007-04-14 09:44:09 +00:00
Ruslan Ermilov
7480de4305 Make "struct tcp_timer" visible only to the kernel, and unbreak world. 2007-04-11 14:08:42 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Robert Watson
6493245ded Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser
checks to see whether bind() can reuse a port/address combination while
it's already in use (for some definition of use).
2007-04-10 15:58:38 +00:00
Paolo Pisati
c326cd0e62 Prevent the usage of an uninitialized variable: do not accept
StartMediaTx message before an OpnRcvChnAck message was received.

Reviewed by:	glebius
Approved by:	glebius (mentor)
MFC after:      3 days
Found with:	Coverity Prevent(tm)
CID:		498
2007-04-07 09:52:36 +00:00
Paolo Pisati
f4296f2246 Silence Coverity about an unused variable.
Reviewed by: 	glebius
Approved by: 	glebius (mentor)
MFC after: 	3 days
CID: 		538
2007-04-07 09:47:39 +00:00
Andre Oppermann
995a77176f Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some
further INP_INFO_WLOCK_ASSERT() while there.
2007-04-04 18:30:16 +00:00
Andre Oppermann
0c38fd0a7a Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
Andre Oppermann
beaa515e95 Some local and style(9) cleanups. 2007-04-04 15:30:31 +00:00
Andre Oppermann
5dd9dfefd6 Retire unused TCP_SACK_DEBUG. 2007-04-04 14:44:15 +00:00
Andre Oppermann
b728e90260 In tcp_dooptions() skip over SACK options if it is a SYN segment. 2007-04-04 14:39:49 +00:00
Alexander Kabaev
edb2e5dca3 Include string.h for non-kernel builds to get proper memcpy prototype. 2007-04-04 03:16:59 +00:00
Alexander Kabaev
d8164209b3 Include string.h for non-kernel builds to get proper strcpy, strlen
prototypes.
2007-04-04 03:14:15 +00:00
Alexander Kabaev
9160afee7c Do not assign result of (char *) cast to u_char * variable. 2007-04-04 03:10:42 +00:00
Julian Elischer
1bd69ee131 Since we switched to using monatomically increasing timestamps,
they have been reported back to the userland as being in 1970.
Add boot time to the timestamp to give the time in the scale of the 'current'
real timescale.  Not perfect if you change the time a lot but good enough
to keep all the rules correct relative to each other correct in terms
of time relative to "now".
2007-04-03 22:45:50 +00:00
Randall Stewart
bff64a4db3 - fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
  packet (1000 addresses).
- flight size correcting updated to include one message only
  and to handle case where the peer does not cumack the
  next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
  since we tried to unlock the destroyed mutex. Fixes
  so we properly exit when we need to destroy an assoc.
  (Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
  and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by:	gnn
2007-04-03 11:15:32 +00:00
Randall Stewart
5e54f665f0 - Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
  both how big a msg must be as well as how much needs
  to be left over.
- With our new algo in place, we need to implicitly
  set "end of msg" on the sp-> structure otherwise we
  end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
  header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
  will kill the assoc via the kill timer and send an
  abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
  us from fragmenting a small complete message when
  we needed to.
- Window probes were not marked back to unsent and
  flight adjusted when a sack came in with no
  window change or accepting of the probe data.
  We now fix this with having a mark on the net and
  the chunk so we can clear it out when the sack arrives
  forcing it to retran just like it was "new" this
  improves the handling of window probes, which were
  dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
2007-03-31 11:47:30 +00:00
Bruce M Simpson
f7e083af90 Fix a bug in IPv4 address configuration exposed by refcounting.
* Join the IPv4 all-hosts multicast group 224.0.0.1 once only;
   that is, when an IPv4 address is first configured on an interface.
 * Do not join it for subsequent IPv4 addresses as this violates IGMP.
 * Be sure to leave the group when all IPv4 addresses have been removed
   from the interface.
 * Add two DIAGNOSTIC printfs related to the issue.

Further care and attention is needed in this area; it is suggested that
netinet's attachment to the ifnet structure be compartmentalized and
non-implicit.

Bug found by:	andre
MFC after:	1 month
2007-03-29 21:39:22 +00:00
Andre Oppermann
1929eae1cc When blackholing do a 'dropunlock' in the new world order to prevent the
INP_INFO_LOCK from leaking.

Reported by:	ache
Found by:	rwatson
2007-03-28 12:58:13 +00:00
Robert Watson
77c78838f0 Remove stale comment about not enabling inpcb and inpcbinfo lock assertions
when IPv6 is enabled.

MFC after:	3 days
2007-03-28 00:50:20 +00:00
Andre Oppermann
07b64b901a In tcp_sack_doack() remove too tight KASSERT() added in last revision. This
function may be called without any TCP SACK option blocks present.  Protect
iteration over SACK option blocks by checking for SACK options present flag
first.

Bug reported by:	wkoszek, keramida, Nicolas Blais
2007-03-25 23:27:26 +00:00
Robert Watson
30916a2d1d Replace a comment about RSVP/mrouting with a different but similar comment
explaining that some more locking is needed.  The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.

Spotted by:	kris
2007-03-25 21:49:50 +00:00
Maxim Konovalov
14739780bd o Use a define for a buffer size.
Prodded by:	db

o Add missed vars for TCPDEBUG in tcp_do_segment().

Prodded by:	tinderbox
2007-03-24 22:15:02 +00:00
Andre Oppermann
302ce8d690 Split tcp_input() into its two functional parts:
o tcp_input() now handles TCP segment sanity checks and preparations
   including the INPCB lookup and syncache.
 o tcp_do_segment() handles all data and ACK processing and is IPv4/v6
   agnostic.

Change all KASSERT() messages to ("%s: ", __func__).

The changes in this commit are primarily of mechanical nature and no
functional changes besides the function split are made.

Discussed with:	rwatson
2007-03-23 20:16:50 +00:00
Andre Oppermann
4dfdffe9e2 Tidy up some code to conform better to surroundings and style(9), 0 = NULL
and space/tab.
2007-03-23 19:11:22 +00:00
Andre Oppermann
fc30a25199 Bring SACK option handling in tcp_dooptions() in line with all other
options and ajust users accordingly.
2007-03-23 18:33:21 +00:00
Bruce M Simpson
73ec8173eb Purge two redundant case labels. 2007-03-23 09:43:36 +00:00
Gleb Smirnoff
1daaa65d3f Remove global list of all llinfo_arp entries and use a callout per
instance expiry of the ARP entries. Since we no longer abuse the IPv4
radix head lock, we can now enter arp_rtrequest() with a lock held on
an arbitrary rt_entry.

Reviewed by:	bms
2007-03-22 10:37:53 +00:00
Andre Oppermann
ad3f9ab320 ANSIfy function declarations and remove register keywords for variables.
Consistently apply style to all function declarations.
2007-03-21 19:37:55 +00:00
Andre Oppermann
f7608d9e7f Match up SYSCTL declarations in style. 2007-03-21 19:34:12 +00:00
Andre Oppermann
eec9d82d8e Subtract optlen in the maximum length check for TSO and finally avoid
slightly oversized TSO mbuf chains.

Submitted by:	kmacy
2007-03-21 19:04:07 +00:00
Andre Oppermann
b10fbdeafa Tidy up IPFIREWALL_FORWARD sections and comments. 2007-03-21 18:56:03 +00:00
Andre Oppermann
794235b737 Update and clarify comments in first section of tcp_input(). 2007-03-21 18:52:58 +00:00
Andre Oppermann
db33b3e6a7 Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove
old dead T/TCP code.
2007-03-21 18:49:43 +00:00
Andre Oppermann
574b696407 Tidy up tcp_log_in_vain and blackhole. 2007-03-21 18:36:49 +00:00
Andre Oppermann
85c497918c Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it
doesn't impede normal operation negatively and is only a few lines of
code.  It's close relatives blackhole and log_in_vain aren't options
either.
2007-03-21 18:25:28 +00:00
Andre Oppermann
e406f5a1c9 Remove tcp_minmssoverload DoS detection logic. The problem it tried to
protect us from wasn't really there and it only bloats the code.  Should
the problem surface in the future we can simply resurrect it from cvs
history.
2007-03-21 18:05:54 +00:00
Bruce M Simpson
c7547d1aaf Increase default size of raw IP send and receive buffers to the same as
udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB)
are unnecessarily fragmented.

A common use case for this is OSPF link-state database synchronization
during adjacency bringup on a high speed network with a large MTU.

It is not possible to auto-tune this setting until a socket is bound to
a given interface, and because the laddr part of the inpcb tuple may be
overridden, it makes no sense to do so. Applications may request a larger
socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.

Certain applications such as Quagga ospfd do not probe for interface MTU
and therefore do not increase SO_SENDBUF in this use case.
XORP is not affected by this problem as it preemptively uses SO_SENDBUF
and SO_RECVBUF to account for any possible additional latency in XRL IPC.

PR:		kern/108375
Requested by:	Vladimir Ivanov
MFC after:	1 week
2007-03-20 13:15:20 +00:00
Randall Stewart
62c1ff9c48 - window update sacks sent incorrectly after
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
  express sack handling since it refered to an unused
  variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.
2007-03-20 10:23:11 +00:00
Bruce M Simpson
ec002fee99 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
Andre Oppermann
6489fe6553 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
Andre Oppermann
8b8ed7a78e Match up SYSCTL_INT declarations in style. 2007-03-19 18:42:27 +00:00
Andre Oppermann
4e02375908 Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.

Adjust tcp_output() make use of it.

Tested by:	gallatin
2007-03-19 18:35:13 +00:00
Randall Stewart
6a27c37636 Adds a hash table to speed local address lookup
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by:	gnn
2007-03-19 11:11:16 +00:00
Randall Stewart
132dea7d5a - errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
  pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.
2007-03-19 06:53:02 +00:00
Bruce M Simpson
27f8eaaf03 In IPv4 fast forwarding path, send ICMP unreachable messages for
routes which have RTF_REJECT set *and* a zero expiry timer.

PR:		kern/109246
MFC after:	10 days
Submitted by:	Ingo Flaschberger
2007-03-18 23:05:20 +00:00
Andre Oppermann
9daba64ed5 Unbreak IPv6 after consolidation of TCP options insertion.
Submitted by:	tegge
2007-03-17 11:52:54 +00:00
Kip Macy
9ad2c608c2 Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
  being changed anyway

Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
2007-03-17 06:40:09 +00:00
Robert Watson
8d0d6d112f Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl. 2007-03-16 13:42:26 +00:00
Andre Oppermann
02a1a64357 Consolidate insertion of TCP options into a segment from within tcp_output()
and syncache_respond() into its own generic function tcp_addoptions().

tcp_addoptions() is alignment agnostic and does optimal packing in all cases.

In struct tcpopt rename to_requested_s_scale to just to_wscale.

Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."

Reviewed by:	silby, mohans, julian
Sponsored by:	TCP/IP Optimization Fundraise 2005
2007-03-15 15:59:28 +00:00
Randall Stewart
42551e993f - Sysctl's move to seperate file
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
  built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
  made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
  well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
  send when we inflate the flight size. We now inflate
  the cwnd too and deflate it later when the revoked
  chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
  via compile switch that is scrubbed from BSD but we won't
  need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
  a #def (30).

Reviewed by:	gnn
2007-03-15 11:27:14 +00:00
Bruce M Simpson
5c51891ef7 Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address
is within the locally scoped multicast range 224.0.0.0/24.
2007-03-15 08:44:22 +00:00
Bruce M Simpson
1b7f038498 Fix IP_SENDSRCADDR semantics.
* To use this option with a UDP socket, it must be bound to a local port,
   and INADDR_ANY, to disallow possible collisions with existing udp inpcbs
   bound to the same port on other interfaces at send time.

 * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with
   INADDR_ANY will be rejected as it is ambiguous.

 * If the socket is bound to an address other than INADDR_ANY, specifying
   IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup().

Reviewed by:	silence on -net
Tested with:	src/tools/regression/netinet/ipbroadcast
MFC after:	4 days
2007-03-08 15:26:54 +00:00
Qing Li
95ad8418dc This patch is provided to fix a couple of deployment issues observed
in the field. In one situation, one end of the TCP connection sends
a back-to-back RST packet, with delayed ack, the last_ack_sent variable
has not been update yet. When tcp_insecure_rst is turned off, the code
treats the RST as invalid because last_ack_sent instead of rcv_nxt is
compared against th_seq. Apparently there is some kind of firewall that
sits in between the two ends and that RST packet is the only RST
packet received. With short lived HTTP connections, the symptom is
a large accumulation of connections over a short period of time .

The +/-(1) factor is to take care of implementations out there that
generate RST packets with these types of sequence numbers. This
behavior has also been observed in live environments.

Reviewed by:	silby, Mike Karels
MFC after:	1 week
2007-03-07 23:21:59 +00:00
Bruce M Simpson
44c4d7b2cb Purge an out-of-date comment. 2007-03-04 16:32:19 +00:00
Bruce M Simpson
a3fd02d88b Fix undirected broadcast sends for the case where SO_DONTROUTE has also
been set at the socket layer, in our somewhat convoluted IPv4 source
selection logic in ip_output().

IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255
must always be delivered on a local link with a TTL of 1.

If IP_ONESBCAST has been set at the socket layer, also perform destination
interface lookup for point-to-point interfaces based on the destination
address of the link; previously it was not possible to use the option with
such interfaces; also, the destination/broadcast address fields map to the
same field within struct ifnet, which doesn't help matters.

One more valid fix going forward for these issues is to treat 255.255.255.255
as a destination in its own right in the forwarding trie. Other
implementations do this. It fits with the use of multiple paths, though
it then becomes necessary to specify interface preference.
This hack will eventually go away when that comes to pass.

Reviewed by:	andre
MFC after:	1 week
2007-03-01 13:29:30 +00:00
Andre Oppermann
6aa5b62315 Prevent TSO mbuf chain from overflowing a few bytes by subtracting the
TCP options size before the TSO total length calculation.

Bug found by:	kmacy
2007-03-01 13:12:09 +00:00
Mohan Srinivasan
4a32dc299f In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
2007-02-28 20:48:00 +00:00
Bruce M Simpson
85e0793497 Style: Move declaration of subsystem mutex to where other
mutexes are in this file, and use macros for dealing with it.
2007-02-28 20:02:24 +00:00
Gleb Smirnoff
8bec3467b1 Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't
be returned up to the caller.

PR:		100172
Submitted by:	"Andrew - Supernews" <andrew supernews.net>
Reviewed by:	rwatson, bms
2007-02-28 12:47:49 +00:00