2877 Commits

Author SHA1 Message Date
rrs
94bb7c7cd5 - fix bindx to check addresses against socket's protocol family 2007-06-13 14:39:41 +00:00
rwatson
20ee7c7a9b Remove IPX over IP tunneling support, which allows IPX routing over IP
tunnels, and was not MPSAFE.  The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.

This removes one of five remaining consumers of NET_NEEDS_GIANT.

Approved by:	re (kensmith)
2007-06-13 14:01:43 +00:00
rrs
20ad8ecfb0 - Fixed cookie handling to calc an RTO when
its an INIT collision case.
- Fixed RTO calc to maintain a seperate variable to track
  if a RTO calc as been done, this allows the RTO var to be
  doubled during initial timeouts.
- Reduces the amount of stack used by process control.
- Use a constant for the peer chunk overhead.
- Name change to spell candidate correctly.
2007-06-13 01:31:53 +00:00
bms
ffd77d9ba5 Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
rrs
1cd5c5dd06 - Restructure so bindx functions are not done inline to socket option
but are a seperate call that can be re-used if needed.
- 64 bit issues
  o re-arrange cookie so it is better 64 bit aligned
  o For wire level things we need the packed attribute.
2007-06-12 11:21:00 +00:00
rwatson
00b02345d4 Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
andre
445024c7ff Fix a case in tcp_do_segment() where tcp_update_sack_list() would
be called with an incorrect segment end value.  tcp_reass() may
trim segments when they overlap with already existing ones in the
reassembly queue.  Instead of saving the segment end value before
the call to tcp_reass() compute it on the fly based on the effective
segment length afterwards.

This bug was not really problematic as no information got lost and
the eventual SACK information computation was correct nontheless.

MFC after:	1 week
2007-06-10 21:07:21 +00:00
andre
51abe2d2e3 Fix style for comments, be more verbose and add some more. 2007-06-10 20:59:22 +00:00
andre
ccc729300f Make the handling of the tcp window explicit for the SYN_SENT case
in tcp_outout().  This is currently not strictly necessary but paves
the way to simplify the entire SYN options handling quite a bit.
Clarify comment.  No change in effective behavour with this commit.

RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
2007-06-09 21:19:12 +00:00
andre
5a4573d5f0 Remove some bogosity from the SYN_SENT case in tcp_do_segment
and simplify handling of the send/receive window scaling.  No
change in effective behavour.

RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.

Noticed by:	yar
2007-06-09 21:09:49 +00:00
andre
e8333a59d1 Don't send pure window updates when the peer has closed the connection
and won't ever send more data.
2007-06-09 19:39:14 +00:00
andre
01595fc7b2 Handle a race condition on >2 core machines in tcp_timer() when
a timer issues a shutdown and a simultaneous close on the socket
happens.  This race condition is inherent in the current socket/
inpcb life cycle system but can be handled well.

Reported by:	kris
Tested by:	kris (on 8-core machine)
2007-06-09 17:49:39 +00:00
rrs
6787a70c5f - Opps.. takes out debug printfs I accidentally left in :-( 2007-06-09 13:53:27 +00:00
rrs
4ecfa17662 - fix send_failed notification contents
- Reorder send failed to be in correct order.
- Fixed calulation of init-ack to be right off
  mbuf lengths instead of the precalculated value. This
  will fix one 64 bit platform issue.
2007-06-09 13:46:57 +00:00
yar
41c16ea348 Replace a constant with an already defined symbolic name for it.
Tested with: md5(1)
2007-06-08 13:43:28 +00:00
yar
da0f8d2ada Add a sysctl for the purge run interval so that it can
be tuned along with the rest of hostcache parameters.
The new sysctl name is `net.inet.tcp.hostcache.prune'.
2007-06-08 13:35:51 +00:00
rrs
9708f20177 - RTO was not being initialized to 0, thus the rtt calculation
algoritm would not go through the proper initialization.
- The initialization was incorrect as well, causing problems in
  sat networks with > 1sec RTT
- Get rid of magic numbers in RTT calculations.
2007-06-08 10:57:11 +00:00
andre
9082e6724d In tcp_hc_insert() we may have the case where we have hit the global
cache size limit but this bucket row is empty.  Normally we want to
recycle the oldest entry in the bucket row.  If there isn't any the
TAILQ_REMOVE leads to a panic by trying to remove a non-existing
element.  Fix this by just returning NULL and failing the insert.
This is not a problem as the TCP hostache is only advisory.

Submitted by:	jhb
2007-06-07 21:41:50 +00:00
andre
c08bded78f Correctly print SEQ and IRS in the corresponding log message in
syncache_expand().
2007-06-06 22:10:12 +00:00
glebius
27c6e3f25e Do not leak lock in the case of EEXIST error.
PR:		kern/92776
Submitted by:	Ed Schouten <Ed.Schouten tunix.nl>
2007-06-06 14:21:49 +00:00
rrs
f084c2f975 - Fixes a case where doing a sysctl would leave locks held
when coping out association data.
- Fixes a small bug that prevented the SCTP_UNORDERED indication
  from going up to the app on a recv in the sinfo_flags field.
2007-06-06 00:40:41 +00:00
dwmalone
771efb08f5 Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
rrs
18555b3d48 - fix initial pcb vrf setting when the initial vrf is not the
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
  unique across systems (including different VRFs) this makes it easier
  to do ifn lookups.
2007-06-02 11:05:08 +00:00
rrs
f978918265 - Take out the broken table-id concept. Panda Routers have a M-VRF
concept that is NOT well thought out for a multi-homed transport
  protocol. So the useless table-id entries passed around need to
  be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
  with the largest ssthresh (the compare was wrong).
2007-06-01 11:19:54 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
rwatson
dbd8bd0d5f (1) In tcp_usrclosed(), tp can never become NULL, so don't test for NULL
before handling the socket disconnection case.

(2) Clean up surrounding comments and formatting.

Found with:	Coverity Prevent(tm) (1)
CID:		2203
2007-05-31 12:06:02 +00:00
rrs
2568e75f53 - Fixed (Apple) compiler warnings in sctp_input.c, sctputil.c, sctp_output.c
- Fixed a LOR in handling a cookie. Turns out create lock is applied.
  And if we abort processing, this causes LOR. Changed to force the
  timer to clean up, that way create lock is released.
2007-05-30 22:34:21 +00:00
rrs
00ab510c7c - Fix a memory overwrite when the mapping array
is expanded, size of expansion was not taken int consideration.
-  Fix so vtag hash is 1 bigger so that it modulo's out
   correctly, avoids a panic when restart with right modulo happens.
-  do not dereference stcb when control->do_not_ref_stcb is set
-  Fix up packet logging to not often use a lock and also to
   add to options.
-  Fix some logging option duplication in the sctputil.h
2007-05-30 17:39:45 +00:00
rrs
e0e3597bea Adds gcc attribute to prevent inlining of a function. If
it goes inline we may well blow the stack if witness and
such are enabled.
2007-05-29 14:17:47 +00:00
rrs
a8c7b867a5 - Fix spelling errors in comments per Ruslan (.. thanks... ) 2007-05-29 11:53:27 +00:00
rrs
f827c93ac6 - Fixes so we won't try to start a timer when we
hold a wq lock for the iterator. Panda uses a
  silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.

Reviewed by:	gnn
2007-05-29 09:29:03 +00:00
andre
c611006e89 Make log messages more verbose and simpler to understand for non-experts.
Update comments to be more conscious, verbose and fully reflect reality.
2007-05-28 23:27:44 +00:00
andre
799843834b Fix indentation of the syncache_expand() section in tcp_input(). 2007-05-28 11:35:40 +00:00
rrs
953518c197 - fixed autclose to not allow setting on 1-2-1 model.
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
  no chance of a endless loop. Only one call to random per bind
  as well.
- fixes so set_peer_primary pre-screens addresses to be
  valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
  to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
  the timer.. since the timer may still run if unconf address
  are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
  any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
  taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
  of the non-pad version. This saves about 92 bytes from each struct
  by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
  tcb's value.
- The delayed ack time could be under a tick, this fixes so
  it bounds it to at least 1 tick for platforms whos tick
  is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
  had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
  not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details

Reviewed by:	gnn
2007-05-28 11:17:24 +00:00
andre
b4b7eeb094 Refactor and rewrite in parts the SYN handling code on listen sockets
in tcp_input():

 o tighten the checks on allowed TCP flags to be RFC793 and
   tcp-secure conform
 o log check failures to syslog at LOG_DEBUG level
 o rearrange the code flow to be easier to follow
 o add KASSERTs to validate assumptions of the code flow

Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake.  The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits.  The
sysctl allows to chose between two options to deal with this.  One is
to send a reset to the other endpoint to notify it about the failure
(default).  The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.

Reviewed by:	rwatson (in general)
2007-05-28 11:03:53 +00:00
rwatson
ef37256e5e Normalize spelling and grammar in TCP hostcache comments. 2007-05-27 19:39:26 +00:00
rwatson
3b371f76aa In tcp_timer_2msl(), tp can never become NULL, so don't check it for
NULL before entering tcp_trace().

Found with:	Coverity Prevent(tm)
CID:		1840
2007-05-27 17:52:02 +00:00
rwatson
933cc5abb3 Don't assign sp to the value of s when we're about to assign it instead to
s + strlen(s).

Found with:	Coverity Prevent(tm)
CID:		2243
2007-05-27 17:02:54 +00:00
andre
21be5aeb9a The printf %b list in PRINT_TH_FLAGS has to be in octal numbering.
Thus convert \8 to \10 and the warnings go away.

Pointed out by:	sam, ru, thompsa
2007-05-25 21:28:49 +00:00
andre
7fbbdcecf4 Add CWR back into the PRINT_TH_FLAGS list as gcc42 doesn't complain
about \8 in a string anymore.
2007-05-23 19:16:21 +00:00
andre
f4178bd6f5 In tcp_log_addrs():
o add the hex output of the th_flags field to the example log
   line in comments
 o simplify the log line length calculation and make it less
   evil
 o correct the test for the length panic; the line isn't on
   the stack but malloc'ed
2007-05-23 19:07:53 +00:00
andre
8a6eb18a69 Be more restrictive with segment validity checks in syncache_expand()
and log check failures to syslog at LOG_DEBUG level.

Always prefill the sc->sc_ts field to use it in the checks.
2007-05-18 21:42:25 +00:00
andre
1411450edd o Add syslog logging under LOG_DEBUG to various failures caused by
bogus segments
o Add more KASSERT()s
o Update comments
2007-05-18 21:13:01 +00:00
andre
696f13b7fd Add tcp_log_addrs() function to generate and standardized TCP log line
for use thoughout the tcp subsystem.

It is IPv4 and IPv6 aware creates a line in the following format:

 "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"

A "\n" is not included at the end.  The caller is supposed to add
further information after the standard tcp log header.

The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use.  All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.

Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.

Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.

tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
    void *ip6hdr)

Usage example:

 struct ip *ip;
 char *tcplog;

 if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
	log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
	    tcplog, __func__);
	free(s, M_TCPLOG);
 }
2007-05-18 19:58:37 +00:00
jhb
b753e575a7 Fix statistical accounting for bytes and packets during sack retransmits.
MFC after:	1 week
Submitted by:	mohans
2007-05-18 19:56:24 +00:00
jinmei
17983d4327 - Disabled responding to NI queries from a global address by default as
specified in RFC4620.  A new flag for icmp6_nodeinfo was added to enable the
  feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
  flags is clearer (i.e., defined specific macro names instead of using
  hard-coded values).

Approved by:	gnn (mentor)
MFC after:	1 week
2007-05-17 21:20:24 +00:00
rrs
f03ff79b8e - Fixed 1-2-1 model to not worry about associd in sockopts
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
  it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
  checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
  level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
  tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
  input argument.
- When sending failed due to a no route to host, we leaked
  the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by:	gnn
2007-05-17 12:16:24 +00:00
oleg
fdfea0ef5b Unbreak IPv4 kernel build. 2007-05-17 00:05:13 +00:00
rwatson
c6ef292874 Remove leading spaces before tabs spotted thanks to silby using
kwrite to read ip_input.c.
2007-05-16 20:46:58 +00:00
andre
8f2f6eb60e Remove now unused stuff forgotten in the previous commit. 2007-05-16 17:55:22 +00:00