Commit Graph

2742 Commits

Author SHA1 Message Date
Andre Oppermann
1929eae1cc When blackholing do a 'dropunlock' in the new world order to prevent the
INP_INFO_LOCK from leaking.

Reported by:	ache
Found by:	rwatson
2007-03-28 12:58:13 +00:00
Robert Watson
77c78838f0 Remove stale comment about not enabling inpcb and inpcbinfo lock assertions
when IPv6 is enabled.

MFC after:	3 days
2007-03-28 00:50:20 +00:00
Andre Oppermann
07b64b901a In tcp_sack_doack() remove too tight KASSERT() added in last revision. This
function may be called without any TCP SACK option blocks present.  Protect
iteration over SACK option blocks by checking for SACK options present flag
first.

Bug reported by:	wkoszek, keramida, Nicolas Blais
2007-03-25 23:27:26 +00:00
Robert Watson
30916a2d1d Replace a comment about RSVP/mrouting with a different but similar comment
explaining that some more locking is needed.  The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.

Spotted by:	kris
2007-03-25 21:49:50 +00:00
Maxim Konovalov
14739780bd o Use a define for a buffer size.
Prodded by:	db

o Add missed vars for TCPDEBUG in tcp_do_segment().

Prodded by:	tinderbox
2007-03-24 22:15:02 +00:00
Andre Oppermann
302ce8d690 Split tcp_input() into its two functional parts:
o tcp_input() now handles TCP segment sanity checks and preparations
   including the INPCB lookup and syncache.
 o tcp_do_segment() handles all data and ACK processing and is IPv4/v6
   agnostic.

Change all KASSERT() messages to ("%s: ", __func__).

The changes in this commit are primarily of mechanical nature and no
functional changes besides the function split are made.

Discussed with:	rwatson
2007-03-23 20:16:50 +00:00
Andre Oppermann
4dfdffe9e2 Tidy up some code to conform better to surroundings and style(9), 0 = NULL
and space/tab.
2007-03-23 19:11:22 +00:00
Andre Oppermann
fc30a25199 Bring SACK option handling in tcp_dooptions() in line with all other
options and ajust users accordingly.
2007-03-23 18:33:21 +00:00
Bruce M Simpson
73ec8173eb Purge two redundant case labels. 2007-03-23 09:43:36 +00:00
Gleb Smirnoff
1daaa65d3f Remove global list of all llinfo_arp entries and use a callout per
instance expiry of the ARP entries. Since we no longer abuse the IPv4
radix head lock, we can now enter arp_rtrequest() with a lock held on
an arbitrary rt_entry.

Reviewed by:	bms
2007-03-22 10:37:53 +00:00
Andre Oppermann
ad3f9ab320 ANSIfy function declarations and remove register keywords for variables.
Consistently apply style to all function declarations.
2007-03-21 19:37:55 +00:00
Andre Oppermann
f7608d9e7f Match up SYSCTL declarations in style. 2007-03-21 19:34:12 +00:00
Andre Oppermann
eec9d82d8e Subtract optlen in the maximum length check for TSO and finally avoid
slightly oversized TSO mbuf chains.

Submitted by:	kmacy
2007-03-21 19:04:07 +00:00
Andre Oppermann
b10fbdeafa Tidy up IPFIREWALL_FORWARD sections and comments. 2007-03-21 18:56:03 +00:00
Andre Oppermann
794235b737 Update and clarify comments in first section of tcp_input(). 2007-03-21 18:52:58 +00:00
Andre Oppermann
db33b3e6a7 Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove
old dead T/TCP code.
2007-03-21 18:49:43 +00:00
Andre Oppermann
574b696407 Tidy up tcp_log_in_vain and blackhole. 2007-03-21 18:36:49 +00:00
Andre Oppermann
85c497918c Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it
doesn't impede normal operation negatively and is only a few lines of
code.  It's close relatives blackhole and log_in_vain aren't options
either.
2007-03-21 18:25:28 +00:00
Andre Oppermann
e406f5a1c9 Remove tcp_minmssoverload DoS detection logic. The problem it tried to
protect us from wasn't really there and it only bloats the code.  Should
the problem surface in the future we can simply resurrect it from cvs
history.
2007-03-21 18:05:54 +00:00
Bruce M Simpson
c7547d1aaf Increase default size of raw IP send and receive buffers to the same as
udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB)
are unnecessarily fragmented.

A common use case for this is OSPF link-state database synchronization
during adjacency bringup on a high speed network with a large MTU.

It is not possible to auto-tune this setting until a socket is bound to
a given interface, and because the laddr part of the inpcb tuple may be
overridden, it makes no sense to do so. Applications may request a larger
socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.

Certain applications such as Quagga ospfd do not probe for interface MTU
and therefore do not increase SO_SENDBUF in this use case.
XORP is not affected by this problem as it preemptively uses SO_SENDBUF
and SO_RECVBUF to account for any possible additional latency in XRL IPC.

PR:		kern/108375
Requested by:	Vladimir Ivanov
MFC after:	1 week
2007-03-20 13:15:20 +00:00
Randall Stewart
62c1ff9c48 - window update sacks sent incorrectly after
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
  express sack handling since it refered to an unused
  variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.
2007-03-20 10:23:11 +00:00
Bruce M Simpson
ec002fee99 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
Andre Oppermann
6489fe6553 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
Andre Oppermann
8b8ed7a78e Match up SYSCTL_INT declarations in style. 2007-03-19 18:42:27 +00:00
Andre Oppermann
4e02375908 Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.

Adjust tcp_output() make use of it.

Tested by:	gallatin
2007-03-19 18:35:13 +00:00
Randall Stewart
6a27c37636 Adds a hash table to speed local address lookup
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by:	gnn
2007-03-19 11:11:16 +00:00
Randall Stewart
132dea7d5a - errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
  pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.
2007-03-19 06:53:02 +00:00
Bruce M Simpson
27f8eaaf03 In IPv4 fast forwarding path, send ICMP unreachable messages for
routes which have RTF_REJECT set *and* a zero expiry timer.

PR:		kern/109246
MFC after:	10 days
Submitted by:	Ingo Flaschberger
2007-03-18 23:05:20 +00:00
Andre Oppermann
9daba64ed5 Unbreak IPv6 after consolidation of TCP options insertion.
Submitted by:	tegge
2007-03-17 11:52:54 +00:00
Kip Macy
9ad2c608c2 Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
  being changed anyway

Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
2007-03-17 06:40:09 +00:00
Robert Watson
8d0d6d112f Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl. 2007-03-16 13:42:26 +00:00
Andre Oppermann
02a1a64357 Consolidate insertion of TCP options into a segment from within tcp_output()
and syncache_respond() into its own generic function tcp_addoptions().

tcp_addoptions() is alignment agnostic and does optimal packing in all cases.

In struct tcpopt rename to_requested_s_scale to just to_wscale.

Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."

Reviewed by:	silby, mohans, julian
Sponsored by:	TCP/IP Optimization Fundraise 2005
2007-03-15 15:59:28 +00:00
Randall Stewart
42551e993f - Sysctl's move to seperate file
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
  built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
  made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
  well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
  send when we inflate the flight size. We now inflate
  the cwnd too and deflate it later when the revoked
  chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
  via compile switch that is scrubbed from BSD but we won't
  need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
  a #def (30).

Reviewed by:	gnn
2007-03-15 11:27:14 +00:00
Bruce M Simpson
5c51891ef7 Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address
is within the locally scoped multicast range 224.0.0.0/24.
2007-03-15 08:44:22 +00:00
Bruce M Simpson
1b7f038498 Fix IP_SENDSRCADDR semantics.
* To use this option with a UDP socket, it must be bound to a local port,
   and INADDR_ANY, to disallow possible collisions with existing udp inpcbs
   bound to the same port on other interfaces at send time.

 * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with
   INADDR_ANY will be rejected as it is ambiguous.

 * If the socket is bound to an address other than INADDR_ANY, specifying
   IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup().

Reviewed by:	silence on -net
Tested with:	src/tools/regression/netinet/ipbroadcast
MFC after:	4 days
2007-03-08 15:26:54 +00:00
Qing Li
95ad8418dc This patch is provided to fix a couple of deployment issues observed
in the field. In one situation, one end of the TCP connection sends
a back-to-back RST packet, with delayed ack, the last_ack_sent variable
has not been update yet. When tcp_insecure_rst is turned off, the code
treats the RST as invalid because last_ack_sent instead of rcv_nxt is
compared against th_seq. Apparently there is some kind of firewall that
sits in between the two ends and that RST packet is the only RST
packet received. With short lived HTTP connections, the symptom is
a large accumulation of connections over a short period of time .

The +/-(1) factor is to take care of implementations out there that
generate RST packets with these types of sequence numbers. This
behavior has also been observed in live environments.

Reviewed by:	silby, Mike Karels
MFC after:	1 week
2007-03-07 23:21:59 +00:00
Bruce M Simpson
44c4d7b2cb Purge an out-of-date comment. 2007-03-04 16:32:19 +00:00
Bruce M Simpson
a3fd02d88b Fix undirected broadcast sends for the case where SO_DONTROUTE has also
been set at the socket layer, in our somewhat convoluted IPv4 source
selection logic in ip_output().

IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255
must always be delivered on a local link with a TTL of 1.

If IP_ONESBCAST has been set at the socket layer, also perform destination
interface lookup for point-to-point interfaces based on the destination
address of the link; previously it was not possible to use the option with
such interfaces; also, the destination/broadcast address fields map to the
same field within struct ifnet, which doesn't help matters.

One more valid fix going forward for these issues is to treat 255.255.255.255
as a destination in its own right in the forwarding trie. Other
implementations do this. It fits with the use of multiple paths, though
it then becomes necessary to specify interface preference.
This hack will eventually go away when that comes to pass.

Reviewed by:	andre
MFC after:	1 week
2007-03-01 13:29:30 +00:00
Andre Oppermann
6aa5b62315 Prevent TSO mbuf chain from overflowing a few bytes by subtracting the
TCP options size before the TSO total length calculation.

Bug found by:	kmacy
2007-03-01 13:12:09 +00:00
Mohan Srinivasan
4a32dc299f In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
2007-02-28 20:48:00 +00:00
Bruce M Simpson
85e0793497 Style: Move declaration of subsystem mutex to where other
mutexes are in this file, and use macros for dealing with it.
2007-02-28 20:02:24 +00:00
Gleb Smirnoff
8bec3467b1 Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't
be returned up to the caller.

PR:		100172
Submitted by:	"Andrew - Supernews" <andrew supernews.net>
Reviewed by:	rwatson, bms
2007-02-28 12:47:49 +00:00
Gleb Smirnoff
72757d9a53 Toss the code, that handles errors from ip_output(), to make it more
readable:
- Merge two embedded if() into one.
- Introduce switch() block to handle different kinds of errors.

Reviewed by:	rwatson, bms
2007-02-28 12:41:49 +00:00
Bruce M Simpson
ad3b9f70ed Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support.
Obtained from:	OpenSolaris
2007-02-27 14:45:37 +00:00
Mohan Srinivasan
7c72af8770 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
Bruce M Simpson
410052125e Unlock a mutex which should be unlocked before returning.
MFC after:	1 week
2007-02-25 14:22:03 +00:00
Bruce M Simpson
6be2e366d6 Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2007-02-24 11:38:47 +00:00
Robert Watson
afdb42748d Rename two identically named log_in_vain variables: tcp_input.c's static
log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to
udp_log_in_vain.

MFC after:	1 week
2007-02-20 10:20:03 +00:00
Robert Watson
3329b23659 Gratuitous UDP restyling toward style(9) in 7.x. 2007-02-20 10:13:11 +00:00
Robert Watson
03dc38a48b #ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed
with minor adjustments.

Submitted by:	Florian C. Smeets <flo at kasimir dot com>
2007-02-18 08:57:23 +00:00