Commit Graph

1822 Commits

Author SHA1 Message Date
andre
f6253c9b05 Enable the following TCP options by default to give it more exposure:
rfc3042  Limited retransmit
 rfc3390  Increasing TCP's initial congestion Window
 inflight TCP inflight bandwidth limiting

All my production server have it enabled and there have been no
issues.  I am confident about having them on by default and it gives
us better overall TCP performance.

Reviewed by:	sam (mentor)
2004-01-06 23:29:46 +00:00
andre
f14c2fc588 According to RFC1812 we have to ignore ICMP redirects when we
are acting as router (ipforwarding enabled).

This doesn't fix the problem that host routes from ICMP redirects
are never removed from the kernel routing table but removes the
problem for machines doing packet forwarding.

Reviewed by:	sam (mentor)
2004-01-06 23:20:07 +00:00
ru
8dbbb519f5 Document the net.inet.ip.subnets_are_local sysctl. 2003-12-30 16:05:03 +00:00
sobomax
96159ed234 Sync with NetBSD:
if_gre.c rev.1.41-1.49

 o Spell output with two ts.
 o Remove assigned-to but not used variable.
 o fix grammatical error in a diagnostic message.
 o u_short -> u_int16_t.
 o gi_len is ip_len, so it has to be network byteorder.

if_gre.h rev.1.11-1.13

 o prototype must not have variable name.
 o u_short -> u_int16_t.
 o Spell address with two d's.

ip_gre.c rev.1.22-1.29

 o KNF - return is not a function.
 o The "osrc" variable in gre_mobile_input() is only ever set but not
   referenced; remove it.
 o correct (false) assumptions on mbuf chain.  not sure if it really helps, but
   anyways, it is necessary to perform m_pullup.
 o correct arg to m_pullup (need to count IP header size as well).
 o remove redundant adjustment of m->m_pkthdr.len.
 o clear m_flags just for safety.
 o tabify.
 o u_short -> u_int16_t.

MFC after:	2 weeks
2003-12-30 11:41:43 +00:00
sam
c165a87f8d o eliminate widespread on-stack mbuf use for bpf by introducing
a new bpf_mtap2 routine that does the right thing for an mbuf
  and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
  when prepending the address family (several places were assuming
  sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
  mbufs have been eliminated; this may better be moved to the bpf
  routines

Reviewed by:	arch@ and several others
2003-12-28 03:56:00 +00:00
maxim
33621bd9ec o Fix a comment: softticks lives in sys/kern/kern_timeout.c.
PR:		kern/60613
Submitted by:	Gleb Smirnoff
MFC after:	3 days
2003-12-27 14:08:53 +00:00
ume
ad7968509b NULL is not 0.
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
2003-12-24 18:22:04 +00:00
ru
fc90128bf7 I didn't notice it right away, but check the right length too. 2003-12-23 14:08:50 +00:00
ru
0b3fdbfa52 Fix a problem introduced in revision 1.84: m_pullup() does not
necessarily return the same mbuf chain so we need to recompute
mtod() consumers after pulling up.
2003-12-23 13:33:23 +00:00
peter
7cc77e03ae Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.
2003-12-23 02:36:43 +00:00
sam
2a76f36902 o move mutex init/destroy logic to the module load/unload hooks;
otherwise they are initialized twice when the code is statically
  configured in the kernel because the module load method gets
  invoked before the user application calls ip_mrouter_init
o add a mutex to synchronize the module init/done operations; this
  sort of was done using the value of ip_mroute but X_ip_mrouter_done
  sets it to NULL very early on which can lead to a race against
  ip_mrouter_init--using the additional mutex means this is safe now
o don't call ip_mrouter_reset from ip_mrouter_init; this now happens
  once at module load and X_ip_mrouter_done does the appropriate
  cleanup work to insure the data structures are in a consistent
  state so that a subsequent init operation inherits good state

Reviewed by:	juli
2003-12-20 18:32:48 +00:00
jhb
bfeab27f15 Fix some becuase -> because typos.
Reported by:	Marco Wertejuk <wertejuk@mwcis.com>
2003-12-17 16:12:01 +00:00
rwatson
8315bb904d Switch TCP over to using the inpcb label when responding in timed
wait, rather than the socket label.  This avoids reaching up to
the socket layer during connection close, which requires locking
changes.  To do this, introduce MAC Framework entry point
mac_create_mbuf_from_inpcb(), which is called from tcp_twrespond()
instead of calling mac_create_mbuf_from_socket() or
mac_create_mbuf_netlayer().  Introduce MAC Policy entry point
mpo_create_mbuf_from_inpcb(), and implementations for various
policies, which generally just copy label data from the inpcb to
the mbuf.  Assert the inpcb lock in the entry point since we
require consistency for the inpcb label reference.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-12-17 14:55:11 +00:00
maxim
90bd792be8 o IN_MULTICAST wants an address in host byte order.
PR:		kern/60304
Submitted by:	demon
MFC after:	1 week
2003-12-16 18:21:47 +00:00
emax
9d07822abc Do not panic when flushing dummynet firewall rules
Reviewed by: andre
Approved by: re (scottl)
2003-12-06 09:01:25 +00:00
andre
d0ffd0a240 Swap destination and source arguments of two bcopy() calls.
Before committing the initial tcp_hostcache I changed them from memcpy()
to conform with FreeBSD style without realizing the difference in argument
definition.

This fixes hostcache operation for IPv6 (in general and explicitly IPv6
path mtu discovery) and T/TCP (RFC1644).

Submitted by:	Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Approved by:	re (rwatson)
2003-12-02 21:25:12 +00:00
sam
69806d879e Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate
code is compiled in to support the O_IPSEC operator.  Previously no
support was included and ipsec rules were always matching.  Note that
we do not return an error when an ipsec rule is added and the kernel
does not have IPsec support compiled in; this is done intentionally
but we may want to revisit this (document this in the man page).

PR:		58899
Submitted by:	Bjoern A. Zeeb
Approved by:	re (rwatson)
2003-12-02 00:23:45 +00:00
andre
c06e5a954d Fix an optimization where I made an ifdef'd out section to broad.
When the hostcache bucket limit is reached the last bucket wasn't
removed from the bucket row but inserted a few lines later at the
bucket row head again.  This leads to infinite loop when the same
bucket row is accessed the next time for a lookup/insert or purge
action.

Tested by:	imp, Matt Smith
Approved by:	re (rwatson)
2003-11-28 16:33:03 +00:00
andre
f3b2f134c7 Fix verify_rev_path() function. The author of this function tried to
cut corners which completely broke down when the routing table locking
was introduced.

Reviewed by:	sam (mentor)
Approved by:	re (rwatson)
2003-11-27 09:40:13 +00:00
andre
4a037b3dd4 Make sure all uses of stack allocated struct route's are properly
zeroed.  Doing a bzero on the entire struct route is not more
expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.

Reviewed by:	sam (mentor)
Approved by:	re  (scottl)
2003-11-26 20:31:13 +00:00
sam
960b35f03a Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2003-11-26 01:40:44 +00:00
andre
f3a38bd433 Restructure a too broad ifdef which was disabling the setting of the
tcp flightsize sysctl value for local networks in the !INET6 case.

Approved by:	re (scottl)
2003-11-25 20:58:59 +00:00
sam
fa98e6052a Correct a problem where ipfw-generated packets were being returned
for ipfw processing w/o an indication the packets were generated
by ipfw--and so should not be processed (this manifested itself
as a LOR.)  The flag bit in the mbuf that was used to mark the
packets was not listed in M_COPYFLAGS so if a packet had a header
prepended (as done by IPsec) the flag was lost.  Correct this by
defining a new M_PROTO6 flag and use it to mark packets that need
this processing.

Reviewed by:	bms
Approved by:	re (rwatson)
MFC after:	2 weeks
2003-11-24 03:57:03 +00:00
sam
083aa028f3 Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines
potentially transmit packets that may enter KAME IPsec w/o Giant if the
callouts are marked MPSAFE.

Reviewed by:	ume
Approved by:	re (rwatson)
2003-11-23 18:13:41 +00:00
tmm
5bae25b0ad bzero() the the sockaddr used for the destination address for
rtalloc_ign() in in_pcbconnect_setup() before it is filled out.
Otherwise, stack junk would be left in sin_zero, which could
cause host routes to be ignored because they failed the comparison
in rn_match().
This should fix the wrong source address selection for connect() to
127.0.0.1, among other things.

Reviewed by:	sam
Approved by:	re (rwatson)
2003-11-23 03:02:00 +00:00
andre
6164d7c280 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
andre
6dca20de07 Remove RTF_PRCLONING from routing table and adjust users of it
accordingly.  The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache.  The
network stack remains fully useable with this change.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 19:47:31 +00:00
maxim
18a4a482e1 Fix an arguments order in check_uidgid() call.
PR:		kern/59314
Submitted by:	Andrey V. Shytov
Approved by:	re (rwatson, jhb)
2003-11-20 10:28:33 +00:00
rwatson
9c969b771a Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
cognet
b60bc6ed20 In rip_abort(), unlock the inpcb if we didn't detach it, or we may
recurse on the lock before destroying the mutex.

Submitted by:	sam
2003-11-17 19:21:53 +00:00
green
fc779a7573 Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but
do not have mh_nextpkt initialized.  Somtimes what's there is "1", and the
ip_input() code pukes trying to m_free() it, rendering divert sockets and
such broken.
This really underscores the need to get rid of MT_TAG.

Reviewed by:	rwatson
2003-11-17 03:17:49 +00:00
andre
5c67a85f68 Make two casts correct for all types of 64bit platforms.
Explained by:	bde
2003-11-16 12:50:33 +00:00
andre
ccae1a80af Correct a cast to make it compile on 64bit platforms (noticed by tinderbox)
and remove two unneccessary variable initializations.
Make the introduction comment more clear with regard which parts of
the packet are touched.

Requested by:	luigi
2003-11-15 17:03:37 +00:00
andre
8ac3f681eb Make ipstealth global as we need it in ip_fastforward too. 2003-11-15 01:45:56 +00:00
andre
30ed90673d Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by:	sam (mentor)
2003-11-14 21:48:57 +00:00
andre
de48630dfb Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:

 o adds full direct process-to-completion IPv4 forwarding code
 o handles ip fragmentation incl. hw support (ip_flow did not)
 o sends icmp needfrag to source if DF is set (ip_flow did not)
 o supports ipfw and ipfilter (ip_flow did not)
 o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
 o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by:	sam (mentor)
2003-11-14 21:02:22 +00:00
sam
64b81ef0ad add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)
Supported by:	FreeBSD Foundation
2003-11-13 05:18:23 +00:00
sam
135a94b34b o reorder some locking asserts to reflect the order of the locks
o correct a read-lock assert in in_pcblookup_local that should be
  a write-lock assert (since time wait close cleanups may alter state)

Supported by:	FreeBSD Foundation
2003-11-13 05:16:56 +00:00
andre
c864ff5792 Move global variables for icmp_input() to its stack. With SMP or
preemption two CPUs can be in the same function at the same time
and clobber each others variables.  Remove register declaration
from local variables.

Reviewed by:	sam (mentor)
2003-11-13 00:32:13 +00:00
andre
39849f80f8 Do not fragment a packet with hardware assistance if it has the DF
bit set.

Reviewed by:	sam (mentor)
2003-11-12 23:35:40 +00:00
bms
3f57e25aeb Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path.
This switch toggles between strict multicast delivery, and traditional
multicast delivery.

The traditional (default) behaviour is to deliver multicast datagrams to all
sockets which are members of that group, regardless of the network interface
where the datagrams were received.

The strict behaviour is to deliver multicast datagrams received on a
particular interface only to sockets whose membership is bound to that
interface.

Note that as a matter of course, multicast consumers specifying INADDR_ANY
for their interface get joined on the interface where the default route
happens to be bound. This switch has no effect if the interface which the
consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.

The original patch has been cleaned up somewhat from that submitted. It has
been tested on a multihomed machine with multiple QuickTime RTP streams
running over the local switch, which doesn't do IGMP snooping.

PR:		kern/58359
Submitted by:	William A. Carrel
Reviewed by:	rwatson
MFC after:	1 week
2003-11-12 20:17:11 +00:00
andre
6c57f18a59 dropwithreset is not needed in this case as tcp_drop() is already notifying
the other side. Before we were sending two RST packets.
2003-11-12 19:38:01 +00:00
rwatson
77ed6e2d1c Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
sam
cff9b0702d correct typos
Pointed out by:	Mike Silbersack
2003-11-11 18:16:54 +00:00
sam
f6943f86fd o add missing inpcb locking in tcp_respond
o replace spl's with lock assertions

Supported by:	FreeBSD Foundation
2003-11-11 17:54:47 +00:00
sam
c2cd984a13 use Giant-less callouts when debug_mpsafenet is non-zero
Supported by:	FreeBSD Foundation
2003-11-10 23:29:33 +00:00
iedowse
14918757c4 In in_pcbconnect_setup(), don't use the cached inp->inp_route unless
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.

Reviewed by:	sam
2003-11-10 22:45:37 +00:00
hsu
d60321e5dd Mark TCP syncache timer as not Giant-free ready yet. 2003-11-10 20:42:04 +00:00
sam
c997776d7c replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by:	FreeBSD Foundation
2003-11-08 23:36:32 +00:00
sam
555f1c248b divert socket fixups:
o pickup Giant in divert_packet to protect sbappendaddr since it
  can be entered through MPSAFE callouts or through ip_input when
  mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
  on an ICMP redirect (may not be needed)

Supported by:	FreeBSD Foundation
2003-11-08 23:09:42 +00:00