1987 Commits

Author SHA1 Message Date
andre
de48630dfb Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:

 o adds full direct process-to-completion IPv4 forwarding code
 o handles ip fragmentation incl. hw support (ip_flow did not)
 o sends icmp needfrag to source if DF is set (ip_flow did not)
 o supports ipfw and ipfilter (ip_flow did not)
 o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
 o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by:	sam (mentor)
2003-11-14 21:02:22 +00:00
sam
64b81ef0ad add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)
Supported by:	FreeBSD Foundation
2003-11-13 05:18:23 +00:00
sam
135a94b34b o reorder some locking asserts to reflect the order of the locks
o correct a read-lock assert in in_pcblookup_local that should be
  a write-lock assert (since time wait close cleanups may alter state)

Supported by:	FreeBSD Foundation
2003-11-13 05:16:56 +00:00
andre
c864ff5792 Move global variables for icmp_input() to its stack. With SMP or
preemption two CPUs can be in the same function at the same time
and clobber each others variables.  Remove register declaration
from local variables.

Reviewed by:	sam (mentor)
2003-11-13 00:32:13 +00:00
andre
39849f80f8 Do not fragment a packet with hardware assistance if it has the DF
bit set.

Reviewed by:	sam (mentor)
2003-11-12 23:35:40 +00:00
bms
3f57e25aeb Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path.
This switch toggles between strict multicast delivery, and traditional
multicast delivery.

The traditional (default) behaviour is to deliver multicast datagrams to all
sockets which are members of that group, regardless of the network interface
where the datagrams were received.

The strict behaviour is to deliver multicast datagrams received on a
particular interface only to sockets whose membership is bound to that
interface.

Note that as a matter of course, multicast consumers specifying INADDR_ANY
for their interface get joined on the interface where the default route
happens to be bound. This switch has no effect if the interface which the
consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.

The original patch has been cleaned up somewhat from that submitted. It has
been tested on a multihomed machine with multiple QuickTime RTP streams
running over the local switch, which doesn't do IGMP snooping.

PR:		kern/58359
Submitted by:	William A. Carrel
Reviewed by:	rwatson
MFC after:	1 week
2003-11-12 20:17:11 +00:00
andre
6c57f18a59 dropwithreset is not needed in this case as tcp_drop() is already notifying
the other side. Before we were sending two RST packets.
2003-11-12 19:38:01 +00:00
rwatson
77ed6e2d1c Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
sam
cff9b0702d correct typos
Pointed out by:	Mike Silbersack
2003-11-11 18:16:54 +00:00
sam
f6943f86fd o add missing inpcb locking in tcp_respond
o replace spl's with lock assertions

Supported by:	FreeBSD Foundation
2003-11-11 17:54:47 +00:00
sam
c2cd984a13 use Giant-less callouts when debug_mpsafenet is non-zero
Supported by:	FreeBSD Foundation
2003-11-10 23:29:33 +00:00
iedowse
14918757c4 In in_pcbconnect_setup(), don't use the cached inp->inp_route unless
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.

Reviewed by:	sam
2003-11-10 22:45:37 +00:00
hsu
d60321e5dd Mark TCP syncache timer as not Giant-free ready yet. 2003-11-10 20:42:04 +00:00
sam
c997776d7c replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by:	FreeBSD Foundation
2003-11-08 23:36:32 +00:00
sam
555f1c248b divert socket fixups:
o pickup Giant in divert_packet to protect sbappendaddr since it
  can be entered through MPSAFE callouts or through ip_input when
  mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
  on an ICMP redirect (may not be needed)

Supported by:	FreeBSD Foundation
2003-11-08 23:09:42 +00:00
sam
854b820d7c assert optional inpcb is passed in locked
Supported by:	FreeBSD Foundation
2003-11-08 23:03:29 +00:00
sam
111b711b82 add locking assertions
Supported by:	FreeBSD Foundation
2003-11-08 23:02:36 +00:00
sam
0860beca46 assert inpcb is locked in udp_output
Supported by:	FreeBSD Foundation
2003-11-08 23:00:48 +00:00
sam
a946bc4854 o correct locking problem: the inpcb must be held across tcp_respond
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond

Supported by:	FreeBSD Foundation
2003-11-08 22:59:22 +00:00
sam
b84b4aa531 use local values instead of chasing pointers
Supported by:	FreeBSD Foundation
2003-11-08 22:57:13 +00:00
sam
8a7d4d7c73 replace mtx_assert by INP_LOCK_ASSERT
Supported by:	FreeBSD Foundation
2003-11-08 22:55:52 +00:00
sam
b88d477c4e add some missing locking
Supported by:	FreeBSD Foundation
2003-11-08 22:53:41 +00:00
sam
354edc9c36 the sbappendaddr call in socket_send must be protected by Giant
because it can happen from an MPSAFE callout

Supported by:	FreeBSD Foundation
2003-11-08 22:51:18 +00:00
sam
e0d3008a3f add locking assertions that turn into noops if INET6 is configured;
this is necessary because the ipv6 code shares the in_pcb code with
ipv4 but (presently) lacks proper locking

Supported by:	FreeBSD Foundation
2003-11-08 22:48:27 +00:00
sam
7f3b205cb8 o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
sam
75598276f5 unbreak compilation of FAST_IPSEC
Supported by:	FreeBSD Foundation
2003-11-08 00:34:34 +00:00
sam
e5e7aba8ca MFp4: reminder that random id code is not reentrant
Supported by:	FreeBSD Foundation
2003-11-07 23:31:29 +00:00
sam
94792986aa Move uid/gid checking logic out of line and lock inpcb usage. This
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).

Supported by:	FreeBSD Foundation
2003-11-07 23:26:57 +00:00
ume
ac7871522d use ipsec_getnhist() instead of obsoleted ipsec_gethist().
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Reviewed by:	Ari Suutari <ari@suutari.iki.fi> (ipfw@)
2003-11-07 20:25:47 +00:00
sam
81a2c9c441 Fix locking of the ip forwarding cache. We were holding a reference
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd.  This caused major havoc (for some
reason it appeared most frequently for folks running natd).  Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us.  This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.

Supported by:	FreeBSD Foundation
2003-11-07 01:47:52 +00:00
ume
373abd9403 - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
rwatson
2ef58bb97d Note that when ip_output() is called from ip_forward(), it will already
have its options inserted, so the opt argument to ip_output()  must be
NULL.
2003-11-03 18:03:05 +00:00
rwatson
2dda0c225b Remove comment about desire for eventual explicit labeling of ICMP
header copy made on input path: this is now handled differently.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-03 18:01:38 +00:00
sam
112f2c3806 Remove bogus RTFREE that was added in rev 1.47. The rmx code operates
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.

Supported by:	FreeBSD Foundation
2003-11-03 06:11:44 +00:00
sam
4462f3d45a Correct rev 1.56 which (incorrectly) reversed the test used to
decide if in_pcbpurgeif0 should be invoked.

Supported by:	FreeBSD Foundation
2003-11-03 03:22:39 +00:00
silby
1bddc356b1 Add an additional check to the tcp_twrecycleable function; I had
previously only considered the send sequence space.  Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.

The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
2003-11-02 07:47:03 +00:00
silby
34fc0661fd - Add a new function tcp_twrecycleable, which tells us if the ISN which
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.

- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.

This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
2003-11-01 07:30:08 +00:00
brooks
f1e94c6f29 Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
sam
9183d53dd7 Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE).  This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by:	FreeBSD Foundation
2003-10-30 23:02:51 +00:00
sam
16414b86ee Potential fix for races shutting down callouts when unloading
the module.  Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed.  Instead
stop the callout first then lock.

Supported by:	FreeBSD Foundation
2003-10-29 19:15:00 +00:00
sam
f65eafde26 o add locking to protect routing table refcnt manipulations
o add some more debugging help for figuring out why folks are
  getting complaints about releasing routing table entries with
  a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG

Supported by:	FreeBSD Foundation
2003-10-29 19:03:58 +00:00
ume
b9fecc82d3 add ECN support in layer-3.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
   make ip{,6}_ecn_egress() return integer to tell the caller that
   this packet should be dropped.
 - handle ECN at fragment reassembly in ip_input.c and frag6.c.

Obtained from:	KAME
2003-10-29 15:07:04 +00:00
ume
36edae8e0d ip6_savecontrol() argument is redundant 2003-10-29 12:52:28 +00:00
sam
409cf5f514 Introduce the notion of "persistent mbuf tags"; these are tags that stay
with an mbuf until it is reclaimed.  This is in contrast to tags that
vanish when an mbuf chain passes through an interface.  Persistent tags
are used, for example, by MAC labels.

Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp.  This fixes problems
with "tag leakage".

Pointed out by:	Jonathan Stone
Reviewed by:	Robert Watson
2003-10-29 05:40:07 +00:00
sam
39ba2e1c90 speedup stream socket recv handling by tracking the tail of
the mbuf chain instead of walking the list for each append

Submitted by:	ps/jayanth
Obtained from:	netbsd (jason thorpe)
2003-10-28 05:47:40 +00:00
ume
bbeee5f0f7 revert following unwanted changes:
- __packed to __attribute__((__packed__)
  -  uintN_t back to u_intN_t

Reported by:	bde
2003-10-25 10:57:08 +00:00
ume
b3c1e80175 correct namespace pollution.
Submitted by:	bde
2003-10-25 09:37:10 +00:00
ume
19c7c976c8 remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{}
according to rfc2292bis.

Obtained from:	KAME
2003-10-24 20:37:05 +00:00
ume
38391a7834 correct tab and order. 2003-10-24 19:51:49 +00:00
ume
881c4fa391 Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542
(aka RFC2292bis).  Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.

Obtained from:	KAME
2003-10-24 18:26:30 +00:00