Commit Graph

2707 Commits

Author SHA1 Message Date
Qing Li
95ad8418dc This patch is provided to fix a couple of deployment issues observed
in the field. In one situation, one end of the TCP connection sends
a back-to-back RST packet, with delayed ack, the last_ack_sent variable
has not been update yet. When tcp_insecure_rst is turned off, the code
treats the RST as invalid because last_ack_sent instead of rcv_nxt is
compared against th_seq. Apparently there is some kind of firewall that
sits in between the two ends and that RST packet is the only RST
packet received. With short lived HTTP connections, the symptom is
a large accumulation of connections over a short period of time .

The +/-(1) factor is to take care of implementations out there that
generate RST packets with these types of sequence numbers. This
behavior has also been observed in live environments.

Reviewed by:	silby, Mike Karels
MFC after:	1 week
2007-03-07 23:21:59 +00:00
Bruce M Simpson
44c4d7b2cb Purge an out-of-date comment. 2007-03-04 16:32:19 +00:00
Bruce M Simpson
a3fd02d88b Fix undirected broadcast sends for the case where SO_DONTROUTE has also
been set at the socket layer, in our somewhat convoluted IPv4 source
selection logic in ip_output().

IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255
must always be delivered on a local link with a TTL of 1.

If IP_ONESBCAST has been set at the socket layer, also perform destination
interface lookup for point-to-point interfaces based on the destination
address of the link; previously it was not possible to use the option with
such interfaces; also, the destination/broadcast address fields map to the
same field within struct ifnet, which doesn't help matters.

One more valid fix going forward for these issues is to treat 255.255.255.255
as a destination in its own right in the forwarding trie. Other
implementations do this. It fits with the use of multiple paths, though
it then becomes necessary to specify interface preference.
This hack will eventually go away when that comes to pass.

Reviewed by:	andre
MFC after:	1 week
2007-03-01 13:29:30 +00:00
Andre Oppermann
6aa5b62315 Prevent TSO mbuf chain from overflowing a few bytes by subtracting the
TCP options size before the TSO total length calculation.

Bug found by:	kmacy
2007-03-01 13:12:09 +00:00
Mohan Srinivasan
4a32dc299f In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
2007-02-28 20:48:00 +00:00
Bruce M Simpson
85e0793497 Style: Move declaration of subsystem mutex to where other
mutexes are in this file, and use macros for dealing with it.
2007-02-28 20:02:24 +00:00
Gleb Smirnoff
8bec3467b1 Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't
be returned up to the caller.

PR:		100172
Submitted by:	"Andrew - Supernews" <andrew supernews.net>
Reviewed by:	rwatson, bms
2007-02-28 12:47:49 +00:00
Gleb Smirnoff
72757d9a53 Toss the code, that handles errors from ip_output(), to make it more
readable:
- Merge two embedded if() into one.
- Introduce switch() block to handle different kinds of errors.

Reviewed by:	rwatson, bms
2007-02-28 12:41:49 +00:00
Bruce M Simpson
ad3b9f70ed Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support.
Obtained from:	OpenSolaris
2007-02-27 14:45:37 +00:00
Mohan Srinivasan
7c72af8770 Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.
2007-02-26 22:25:21 +00:00
Bruce M Simpson
410052125e Unlock a mutex which should be unlocked before returning.
MFC after:	1 week
2007-02-25 14:22:03 +00:00
Bruce M Simpson
6be2e366d6 Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2007-02-24 11:38:47 +00:00
Robert Watson
afdb42748d Rename two identically named log_in_vain variables: tcp_input.c's static
log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to
udp_log_in_vain.

MFC after:	1 week
2007-02-20 10:20:03 +00:00
Robert Watson
3329b23659 Gratuitous UDP restyling toward style(9) in 7.x. 2007-02-20 10:13:11 +00:00
Robert Watson
03dc38a48b #ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed
with minor adjustments.

Submitted by:	Florian C. Smeets <flo at kasimir dot com>
2007-02-18 08:57:23 +00:00
Robert Watson
497057eeea Add "show inpcb", "show tcpcb" DDB commands, which should come in handy
for debugging sblock and other network panics.
2007-02-17 21:02:38 +00:00
Robert Watson
8ca5b13f2f Remove unused inp6_ifindex field from inpcb, as well as unused macro
shortcut for it.
2007-02-16 14:09:24 +00:00
Robert Watson
1f9b46facf Remove unused in6p_ip6_hlim macro shortcut for non-present
inp_depend6.inp6_hlim field in the inpcb.
2007-02-16 13:56:06 +00:00
Randall Stewart
f42a358a6f - Copyright updates (aka 2007)
- ZONE get now also take a type cast so it does the
  cast like mtod does.
- New macro SCTP_LIST_EMPTY, which in bsd is just
  LIST_EMPTY
- Removal of const in some of the static hmac functions
  (not needed)
- Store length changes to allow for new fields in auth
- Auth code updated to current draft (this should be the
  RFC version we think).
- use uint8_t instead of u_char in LOOPBACK address comparison
- Some u_int32_t converted to uint32_t (in crc code)
- A bug was found in the mib counts for ordered/unordered
  count, this was fixed (was referencing a freed mbuf).
- SCTP_ASOCLOG_OF_TSNS added (code will probably disappear
  after my testing completes. It allows us to keep a
  small log on each assoc of the last 40 TSN's in/out and
  stream assignment. It is NOT in options and so is only
  good for private builds.
- Some CMT changes in prep for Jana fixing his problem
  with reneging when CMT is enabled (Concurrent Multipath
  Transfer = CMT).
- Some missing mib stats added.
- Correction to number of open assoc's count in mib
- Correction to os_bsd.h to get right sha2 macros
- Add of special AUTH_04 flags so you can compile the code
  with the old format (in case the peer does not yet support
  the latest auth code).
- Nonce sum was incorrectly being set in when ecn_nonce was
  NOT on.
- LOR in listen with implicit bind found and fixed.
- Moved away from using mbuf's for socket options to using
  just data pointers. The mbufs were used to harmonize
  NetBSD code since both Net and Open used this method. We
  have decided to move away from that and more conform to
  FreeBSD style (which makes more sense).
- Very very nasty bug found in some of my "debug" code. The
  cookie_how collision case tracking had an endless loop in
  it if you got a second retransmission of a cookie collision
  case. This would lock up  a CPU .. ugly..
- auth function goes to using size_t instead of int which
  conforms to socketapi better
- Found the nasty bug that happens after 9 days of testing.. you
  get the data chunk, deliver it and due to the reference to a ch->
  that every now and then has been deleted (depending on the postion
  in the mbuf) you have an invalid ch->ch.flags.. and thus you don't
  advance the stream sequence number.. so you block the stream
  permanently. The fix is to make local variables of these guys
  and set them up before you have any chance of trimming the
  mbuf.
- style fix in sctp_util.h, not sure how this got bad maybe in
  the last patch? (aka it may not be in the real source).
- Found interesting bug when using the extended snd/rcv info where
  we would get an error on receiving with this. Thats because
  it was NOT padded to the same size as the snd_rcv info. We
  increase (add the pad) so the two structs are the same size
  in sctp_uio.h
- In sctp_usrreq.c one of the most common things we did for
  socket options was to cast the pointer and validate the size.
  This as been macro-ized to help make the code more readable.
- in sctputil.c two things, the socketapi class found a missing
  flag type (the next msg is a notification) and a missing
  scope recovery was also fixed.

Reviewed by:	gnn
2007-02-12 23:24:31 +00:00
Bruce M Simpson
79760c6bdf Use MAXTTL.
Obtained from:	NetBSD
2007-02-10 23:15:28 +00:00
Bruce M Simpson
7a90229b61 If the rendezvous point for a group is not specified, do not send
IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon,
as an optimization to mitigate the effects of high multicast
forwarding load.

This is an experimental change, therefore it must be explicitly enabled by
setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value.
The tunable may be set from the loader or from within the kernel environment
when loading ip_mroute.ko as a module.

Submitted by:	edrt <edrt at citiz.net>
See also:	http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html
2007-02-10 14:48:42 +00:00
Bruce M Simpson
0948f0a28f Build PIM by default as part of the IPv4 multicast forwarding path.
Make PIM dynamically loadable by using encap_attach_func().
PIM may now be loaded into a GENERIC kernel.

Tested with:	ports/net/pimdd && tcpreplay && wireshark
Reviewed by:	Pavlin Radoslavov
2007-02-10 13:59:13 +00:00
Bruce M Simpson
f2bf119ead Store the cached route in vifp in the normal send_packet() case.
The VIFF_TUNNEL case no longer exists, therefore this field is free to
use, and its use eliminates a static data member.
2007-02-08 23:05:08 +00:00
Bruce M Simpson
162c78d481 Nuke the token bucket filter code. Attempting to request rate limiting
by the token bucket filter will result in EINVAL being returned.

If you want to rate-limit traffic in future, use ALTQ or dummynet; this
isn't a general purpose QoS engine.

Preserve the now unused fields in struct vif so as to avoid having to
recompile netstat(1) and other tools.

Reviewed by:	Pavlin Radslavov, Bill Fenner
2007-02-08 22:58:01 +00:00
Bruce M Simpson
aab7b273bf eliminate redundant macro MC_SEND() 2007-02-07 20:36:33 +00:00
Bruce M Simpson
78cb087e34 Remove support for IPIP tunnels in IPv4 multicast forwarding. XORP has
never used them; with mrouted, their functionality may be replaced by
explicitly configuring gif(4) instances and specifying them with the
'phyint' keyword.

Bump __FreeBSD_version to 700030, and update UPDATING.
A doc update is forthcoming.

Discussed on:	net
Reviewed by:	fenner
MFC after:	3 months
2007-02-07 16:04:13 +00:00
Bruce M Simpson
64e740a352 When fast-forwarding is enabled, do not forward directed IPv4 broadcasts
to locally attached broadcast networks.

Note well: This relies on the layer 2 route cloning behaviour in BSD.

PR:		98799
Tested by:	Dmitry Sergienko
MFC after:	1 week
2007-02-05 00:15:40 +00:00
Alan Cox
055867a06c Include opt_ipdivert.h so that the message announcing ipfw correctly
describes the state of IPDIVERT.
2007-02-03 22:11:53 +00:00
Bruce M Simpson
d256723b8b In fast forwarding path, defer processing of 169.254.0.0/16
to ip_input(). See RFC 3927 section 2.7.
2007-02-03 06:46:48 +00:00
Bruce M Simpson
f8429ca2e1 In regular forwarding path, reject packets destined for 169.254.0.0/16
link-local addresses. See RFC 3927 section 2.7.
2007-02-03 06:45:51 +00:00
Bruce M Simpson
d055815799 Comply with RFC 3927, by forcing ARP replies which contain a source
address within the link-local IPv4 prefix 169.254.0.0/16, to be
broadcast at link layer.

Reviewed by:	fenner
MFC after:	2 weeks
2007-02-02 20:31:44 +00:00
Bruce M Simpson
1baaf8347c Expose smoothed RTT and RTT variance measurements to userland via
socket option TCP_INFO.
Note that the units used in the original Linux API are in microseconds,
so use a 64-bit mantissa to convert FreeBSD's internal measurements
from struct tcpcb from ticks.
2007-02-02 18:34:18 +00:00
Gleb Smirnoff
fbfdcf8735 Since rev. 1.94 of netinet/in.c, the netinet layer frees all its
multicast memberships, when interface is detached. Thus, when
an underlying interface is detached, we do not need to free
our multicast memberships.

Reviewed by:	bms
2007-02-02 09:39:09 +00:00
Andre Oppermann
6741ecf595 Auto sizing TCP socket buffers.
Normally the socket buffers are static (either derived from global
defaults or set with setsockopt) and do not adapt to real network
conditions. Two things happen: a) your socket buffers are too small
and you can't reach the full potential of the network between both
hosts; b) your socket buffers are too big and you waste a lot of
kernel memory for data just sitting around.

With automatic TCP send and receive socket buffers we can start with a
small buffer and quickly grow it in parallel with the TCP congestion
window to match real network conditions.

FreeBSD has a default 32K send socket buffer. This supports a maximal
transfer rate of only slightly more than 2Mbit/s on a 100ms RTT
trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send
buffer auto scaling and the default values below it supports 20Mbit/s
at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or
1000%. For the receive side it looks slightly better with a default of
64K buffer size.

New sysctls are:
  net.inet.tcp.sendbuf_auto=1 (enabled)
  net.inet.tcp.sendbuf_inc=8192 (8K, step size)
  net.inet.tcp.sendbuf_max=262144 (256K, growth limit)
  net.inet.tcp.recvbuf_auto=1 (enabled)
  net.inet.tcp.recvbuf_inc=16384 (16K, step size)
  net.inet.tcp.recvbuf_max=262144 (256K, growth limit)

Tested by:	many (on HEAD and RELENG_6)
Approved by:	re
MFC after:	1 month
2007-02-01 18:32:13 +00:00
Andre Oppermann
087b55ea59 Change the way the advertized TCP window scaling is computed. Instead of
upper-bounding it to the size of the initial socket buffer lower-bound it
to the smallest MSS we accept.  Ideally we'd use the actual MSS information
here but it is not available yet.

For socket buffer auto sizing to be effective we need room to grow the
receive window.  The window scale shift is determined at connection setup
and can't be changed afterwards.  The previous, original, method effectively
just did a power of two roundup of the socket buffer size at connection
setup severely limiting the headroom for larger socket buffers.

Tested by:	many (as part of the socket buffer auto sizing patch)
MFC after:	1 month
2007-02-01 17:39:18 +00:00
Bruce M Simpson
1976bc4af7 Import macros IN_LINKLOCAL(), IN_PRIVATE(), IN_LOCAL_GROUP(), IN_ANY_LOCAL().
This is not a functional change.

IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix.
IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix.
IN_LOCAL_GROUP() tests if an address falls within the statically assigned
link-local multicast scope specified in RFC 2365.
IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP().

As with the existing macros in the FreeBSD netinet stack, comparisons
are performed in host-byte order.

See also:	RFC 1918, RFC 2365, RFC 3927
Obtained from:	NetBSD (dyoung@)
MFC after:	2 weeks
2007-01-31 14:34:47 +00:00
Gleb Smirnoff
3cf0d02480 Make it possible that carpdetach() unlocks on return. Then, in
carp_clone_destroy() we are on a safe side, we don't need to
unlock the cif, that can me already non-existent at this point.

Reported by:	Anton Yuzhaninov <citrin rambler-co.ru>
2007-01-25 18:03:40 +00:00
Gleb Smirnoff
62dae1e917 Spacing. 2007-01-25 17:58:16 +00:00
Randall Stewart
93164cf98c - most all includes (#include <>) migrate to the sctp_os_bsd.h file
- Finally all splxx() are removed
 - Count error fixed in mapping array which might
   cause a wrong cumack generation.
 - Invariants around panic for case D + printf when no invariants.
 - one-to-one model race condition fixed by using
   a pre-formed connection and then completing the
   work so accept won't happen on a non-formed
   association.
 - Some additional paranoia checks in sctp_output.
 - Locks that were missing in the accept code.

Approved by:	gnn
2007-01-18 09:58:43 +00:00
Randall Stewart
44b7479ba2 - Macroizes the V6ONLY flag check.
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
  unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
  are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
  (ordering was off).
- Found yet another collision bug when the random number
  generator returns two numbers on one side (during a collision)
  that are the same. Also added some tracking of cookies
  that will go away when we know that we have the last collision
  bug gone.
- Fixed an init bug for book_size_scale, that was causing
  Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
  Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
  static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
  has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
  shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
  did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
  not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
  ordering of locks problem in the way we did timers. It
  now conforms to the timeout(9) manual (except for the
  _drain part, we had to do this a different way due
  to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
  depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
  error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
  error when the connection was reset.
Approved by:	gnn
2007-01-15 15:12:10 +00:00
Maxim Konovalov
95ebcabed8 o Increment requests counter right before send out an ARP query actually.
Otherwise the code could lead to the spurious EHOSTDOWN errors.

PR:		kern/107807
Submitted by:	Dmitrij Tejblum
MFC after:	1 month
2007-01-14 18:44:17 +00:00
Warner Losh
0befead1e0 Marking this as __packed was needed to get the alignment and offset of
members right.  However, it also said it was aligned(1), which meant
that gcc generated really bad code.  Mark this as aligned(4).  This
makes things a little faster on arm (a couple percent), but also saves
about 30k on the size of the kernel for arm.

I talked about doing this with bde, but didn't check with him before
the commit, so I'm hesitant say 'reviewed by: bde'.
2007-01-12 07:23:31 +00:00
Julian Elischer
7e170af886 Remove two lines that somehow snuck back in after testing.
ip is now an argument to the function ipfw_log()
2007-01-09 21:03:07 +00:00
Maxim Konovalov
8b5b885047 o One more typo in the comment.
PR:		kern/107609
Submitted by:	Dr. Markus Waldeck
2007-01-06 13:12:24 +00:00
Paolo Pisati
3d2fff0d3d Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined.
Reviewed: luigi
2007-01-05 12:15:31 +00:00
Paolo Pisati
61c0e134f5 Wrap ipfw nat support in a new kernel config option named
"IPFIREWALL_NAT": this way nat is turned off by default and
POLA is preserved.

Reviewed by: rwatson
2007-01-03 11:12:54 +00:00
Julian Elischer
3b62120e87 Remove a bunch of dependencies in the IP header being the first thing in the
mbuf. First moves toward being able to cope better with having layer 2 (or
other encapsulation data) before the IP header in the packet being examined.
More commits to come to round out this functionality. This commit should
have no practical effect but clears the way for what is coming.
Revirewed by: luigi, yar
MFC After: 2 weeks
2007-01-02 19:57:31 +00:00
Warner Losh
6796a2d434 Fix typo in comment.
Submitted by: remko
2007-01-01 00:35:34 +00:00
Warner Losh
74eb3236c7 Add comment about udp checksums being off in BSD 4.2 compatibility mode.
Submitted by: Dr. Markus Waldeck
PR: kern/106657
2006-12-31 21:34:53 +00:00
John Baldwin
54e3607de6 Whitespace fix and remove an extra cast. 2006-12-30 17:53:28 +00:00