Commit Graph

1919 Commits

Author SHA1 Message Date
Sam Leffler
252f24a2cf divert socket fixups:
o pickup Giant in divert_packet to protect sbappendaddr since it
  can be entered through MPSAFE callouts or through ip_input when
  mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
  on an ICMP redirect (may not be needed)

Supported by:	FreeBSD Foundation
2003-11-08 23:09:42 +00:00
Sam Leffler
8484384564 assert optional inpcb is passed in locked
Supported by:	FreeBSD Foundation
2003-11-08 23:03:29 +00:00
Sam Leffler
59daba27d9 add locking assertions
Supported by:	FreeBSD Foundation
2003-11-08 23:02:36 +00:00
Sam Leffler
3c47a187b7 assert inpcb is locked in udp_output
Supported by:	FreeBSD Foundation
2003-11-08 23:00:48 +00:00
Sam Leffler
c29afad673 o correct locking problem: the inpcb must be held across tcp_respond
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond

Supported by:	FreeBSD Foundation
2003-11-08 22:59:22 +00:00
Sam Leffler
2a0746208b use local values instead of chasing pointers
Supported by:	FreeBSD Foundation
2003-11-08 22:57:13 +00:00
Sam Leffler
fa286d7db2 replace mtx_assert by INP_LOCK_ASSERT
Supported by:	FreeBSD Foundation
2003-11-08 22:55:52 +00:00
Sam Leffler
50d7c061a3 add some missing locking
Supported by:	FreeBSD Foundation
2003-11-08 22:53:41 +00:00
Sam Leffler
1d78192b35 the sbappendaddr call in socket_send must be protected by Giant
because it can happen from an MPSAFE callout

Supported by:	FreeBSD Foundation
2003-11-08 22:51:18 +00:00
Sam Leffler
e3f268fc89 add locking assertions that turn into noops if INET6 is configured;
this is necessary because the ipv6 code shares the in_pcb code with
ipv4 but (presently) lacks proper locking

Supported by:	FreeBSD Foundation
2003-11-08 22:48:27 +00:00
Sam Leffler
7902224c6b o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
Sam Leffler
27a940c9a2 unbreak compilation of FAST_IPSEC
Supported by:	FreeBSD Foundation
2003-11-08 00:34:34 +00:00
Sam Leffler
aab621f060 MFp4: reminder that random id code is not reentrant
Supported by:	FreeBSD Foundation
2003-11-07 23:31:29 +00:00
Sam Leffler
8f1ee3683d Move uid/gid checking logic out of line and lock inpcb usage. This
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).

Supported by:	FreeBSD Foundation
2003-11-07 23:26:57 +00:00
Hajimu UMEMOTO
aef3a65eb7 use ipsec_getnhist() instead of obsoleted ipsec_gethist().
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Reviewed by:	Ari Suutari <ari@suutari.iki.fi> (ipfw@)
2003-11-07 20:25:47 +00:00
Sam Leffler
ad67584665 Fix locking of the ip forwarding cache. We were holding a reference
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd.  This caused major havoc (for some
reason it appeared most frequently for folks running natd).  Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us.  This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.

Supported by:	FreeBSD Foundation
2003-11-07 01:47:52 +00:00
Hajimu UMEMOTO
0f9ade718d - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
Robert Watson
3de758d3e3 Note that when ip_output() is called from ip_forward(), it will already
have its options inserted, so the opt argument to ip_output()  must be
NULL.
2003-11-03 18:03:05 +00:00
Robert Watson
eecfe773aa Remove comment about desire for eventual explicit labeling of ICMP
header copy made on input path: this is now handled differently.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-03 18:01:38 +00:00
Sam Leffler
04df2fbbb8 Remove bogus RTFREE that was added in rev 1.47. The rmx code operates
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.

Supported by:	FreeBSD Foundation
2003-11-03 06:11:44 +00:00
Sam Leffler
9ce7877897 Correct rev 1.56 which (incorrectly) reversed the test used to
decide if in_pcbpurgeif0 should be invoked.

Supported by:	FreeBSD Foundation
2003-11-03 03:22:39 +00:00
Mike Silbersack
4bd4fa3fe6 Add an additional check to the tcp_twrecycleable function; I had
previously only considered the send sequence space.  Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.

The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
2003-11-02 07:47:03 +00:00
Mike Silbersack
96af9ea52b - Add a new function tcp_twrecycleable, which tells us if the ISN which
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.

- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.

This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
2003-11-01 07:30:08 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Sam Leffler
9c63e9dbd7 Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE).  This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by:	FreeBSD Foundation
2003-10-30 23:02:51 +00:00
Sam Leffler
d0402f1b73 Potential fix for races shutting down callouts when unloading
the module.  Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed.  Instead
stop the callout first then lock.

Supported by:	FreeBSD Foundation
2003-10-29 19:15:00 +00:00
Sam Leffler
3520e9d61d o add locking to protect routing table refcnt manipulations
o add some more debugging help for figuring out why folks are
  getting complaints about releasing routing table entries with
  a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG

Supported by:	FreeBSD Foundation
2003-10-29 19:03:58 +00:00
Hajimu UMEMOTO
59dfcba4aa add ECN support in layer-3.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
   make ip{,6}_ecn_egress() return integer to tell the caller that
   this packet should be dropped.
 - handle ECN at fragment reassembly in ip_input.c and frag6.c.

Obtained from:	KAME
2003-10-29 15:07:04 +00:00
Hajimu UMEMOTO
11de19f44d ip6_savecontrol() argument is redundant 2003-10-29 12:52:28 +00:00
Sam Leffler
9c855a36c1 Introduce the notion of "persistent mbuf tags"; these are tags that stay
with an mbuf until it is reclaimed.  This is in contrast to tags that
vanish when an mbuf chain passes through an interface.  Persistent tags
are used, for example, by MAC labels.

Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp.  This fixes problems
with "tag leakage".

Pointed out by:	Jonathan Stone
Reviewed by:	Robert Watson
2003-10-29 05:40:07 +00:00
Sam Leffler
395bb18680 speedup stream socket recv handling by tracking the tail of
the mbuf chain instead of walking the list for each append

Submitted by:	ps/jayanth
Obtained from:	netbsd (jason thorpe)
2003-10-28 05:47:40 +00:00
Hajimu UMEMOTO
618d51bbdc revert following unwanted changes:
- __packed to __attribute__((__packed__)
  -  uintN_t back to u_intN_t

Reported by:	bde
2003-10-25 10:57:08 +00:00
Hajimu UMEMOTO
16cd67e933 correct namespace pollution.
Submitted by:	bde
2003-10-25 09:37:10 +00:00
Hajimu UMEMOTO
c302f5bc07 remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{}
according to rfc2292bis.

Obtained from:	KAME
2003-10-24 20:37:05 +00:00
Hajimu UMEMOTO
5434eaa208 correct tab and order. 2003-10-24 19:51:49 +00:00
Hajimu UMEMOTO
f95d46333d Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542
(aka RFC2292bis).  Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.

Obtained from:	KAME
2003-10-24 18:26:30 +00:00
Mike Silbersack
0709c23335 Reduce the number of tcp time_wait structs to maxsockets / 5; this ensures
that at most 20% of sockets can be in time_wait at one time, ensuring
that time_wait sockets do not starve real connections from inpcb
structures.

No implementation change is needed, jlemon already implemented a nice
LRU-ish algorithm for tcp_tw structure recycling.

This should reduce the need for sysadmins to lower the default msl on
busy servers.
2003-10-24 05:44:14 +00:00
Sam Leffler
ac6b0748be o restructure initialization code so data structures are setup
when loaded as a module
o cleanup data structures on module unload when no application has
  been started (i.e. kldload, kldunload w/o mrtd)
o remove extraneous unlocks immediately prior to destroying them

Supported by:	FreeBSD Foundation
2003-10-24 00:09:18 +00:00
Mike Silbersack
184dcdc7c8 Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.
2003-10-21 18:28:36 +00:00
Hajimu UMEMOTO
b339980338 enclose IPv6 part with ifdef INET6.
Obtained from:	KAME
2003-10-20 16:19:01 +00:00
Hajimu UMEMOTO
31b3783c8d correct linkmtu handling.
Obtained from:	KAME
2003-10-20 15:27:48 +00:00
Hajimu UMEMOTO
31b1bfe1b0 - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from:	KAME
2003-10-17 15:46:31 +00:00
Sam Leffler
f51f805f7e pfil hooks can modify packet contents so check if the destination
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.

Supported by:	FreeBSD Foundation
Submitted by:	Pyun YongHyeon
2003-10-16 16:25:25 +00:00
Sam Leffler
b15694110f Drop dummynet lock when calling back into the network stack to deliver
packets.  This eliminates a LOR with Giant that caused outbound pipes
to fail.

Supported by:	FreeBSD Foundation
2003-10-16 16:21:25 +00:00
Kirk McKusick
b03587f06a Malloc buckets of size 128 have been having their 64-byte offset
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing.  In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children.  When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.

Submitted by:	Jim Kuhn <jkuhn@sandvine.com>
Reviewed by:	"Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by:	Ben Pfountz <netprince@vt.edu>
MFC after:	1 week
2003-10-16 02:00:12 +00:00
Sam Leffler
b35a1e5d66 purge extraneous ';'s
Supported by:	FreeBSD Foundation
Noticed by:	bde
2003-10-15 18:19:28 +00:00
Sam Leffler
929b31ddab Lock ip forwarding route cache. While we're at it, remove the global
variable ipforward_rt by introducing an ip_forward_cacheinval() call
to use to invalidate the cache.

Supported by:	FreeBSD Foundation
2003-10-14 19:19:12 +00:00
Sam Leffler
888c2a3c4e remove dangling ';'s` that were harmless
Supported by:	FreeBSD Foundation
2003-10-14 18:45:50 +00:00
Hajimu UMEMOTO
06cd0a3f97 - fix typo in comment.
- style.

Obtained from:	KAME
2003-10-07 17:46:18 +00:00
Hajimu UMEMOTO
1ae02d474a nuke unused ICMPV6CTL_NAMES and KEYCTL_NAMES macros. 2003-10-07 15:14:33 +00:00
Hajimu UMEMOTO
8c99329e89 return(code) -> return (code)
Obtained from:	KAME
2003-10-07 15:02:29 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
Sam Leffler
87002f0dc1 hookup ctlinput for fast ipsec versions of esp+ah protocols
Supported by:	FreeBSD Foundation
2003-10-03 22:06:36 +00:00
Sam Leffler
12394d06d8 place some kernel-specific data structures under #ifdef _KERNEL
Sponsored by:	FreeBSD Foundation
2003-10-03 20:58:56 +00:00
Bruce M Simpson
c3b52d6499 Shorten 'bad gateway' AF_LINK message.
Submitted by:	green
2003-10-03 17:22:14 +00:00
Bruce M Simpson
beb2ced8ac Make arp_rtrequest()'s 'bad gateway' messages slightly more informative,
to aid me in tracking down LLINFO inconsistencies in the routing table.

Discussed with:	fenner
2003-10-03 17:21:17 +00:00
Bruce M Simpson
b75bead1f2 Only delete the route if arplookup() tried to create it. Do not delete
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.

Add a more verbose comment about exactly what this code does.

Submitted by:	ru
2003-10-03 09:19:23 +00:00
Ruslan Ermilov
deb62e2887 By popular demand, added the "static ARP" per-interface option. 2003-10-01 08:32:37 +00:00
Hajimu UMEMOTO
5c6ebad8f6 add /*CONSTCOND*/ to reduce diffs against latest KAME.
Obtained from:	KAME
2003-09-25 13:40:06 +00:00
Bruce M Simpson
85cc199400 Fix a logic error in the check to see if arplookup() should free the route.
Noticed by:	Mike Hogsett
Reviewed by:	ru
2003-09-24 20:52:25 +00:00
Sam Leffler
134ea22494 o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by:	"Max Laier" <max@love2party.net>
Supported by:		FreeBSD Foundation
Obtained from:		NetBSD (bits of pfil.h and pfil.c)
2003-09-23 17:54:04 +00:00
Bruce M Simpson
fedf1d01a2 Fix a bug in arplookup(), whereby a hostile party on a locally
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.

Approved by:	jake (mentor)
Reported by:	Apple Product Security <product-security@apple.com>
2003-09-23 16:39:31 +00:00
Joe Marcus Clarke
68f1756b2a Grrr...add the Skinny alias code forgotten in the last commit. 2003-09-23 07:42:33 +00:00
Joe Marcus Clarke
b07fbc17e9 Add Cisco Skinny Station protocol support to libalias, natd, and ppp.
Skinny is the protocol used by Cisco IP phones to talk to Cisco Call
Managers.  With this code, one can use a Cisco IP phone behind a FreeBSD
NAT gateway.

Currently, having the Call Manager behind the NAT gateway is not supported.
More information on enabling Skinny support in libalias, natd, and ppp
can be found in those applications' manpages.

PR:		55843
Reviewed by:	ru
Approved by:	ru
MFC after:	30 days
2003-09-23 07:41:55 +00:00
Sam Leffler
598345da4b Bandaid locking change: mark static rule mutex recursive so re-entry when
sending an ICMP packet doesn't cause a panic.  A better solution is needed;
possibly defering the transmit to a dedicated thread.

Observed by:	"Aaron Wohl" <freebsd@soith.com>
2003-09-17 22:06:47 +00:00
Sam Leffler
f34f3a7097 shuffle code so we don't "continue" and miss a needed unlock operation
Observed by:	Wiktor Niesiobedzki <w@evip.pl>
2003-09-17 21:13:16 +00:00
Sam Leffler
293941a556 Add locking.
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage

Sponsored by:	FreeBSD Foundation
2003-09-17 00:56:50 +00:00
Sam Leffler
91176902bc Minor fixups + add locking.
o change time to MPSAFE callout
o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable
  by net.inet.ip.dummynet.debug
o make boot-time printf dependent on bootverbose

Sponsored by:	FreeBSD Foundation
2003-09-17 00:54:04 +00:00
Ruslan Ermilov
78f94aa951 Fix a bunch of off-by-one errors in the range checking code. 2003-09-11 21:40:21 +00:00
Ruslan Ermilov
8e75a37bb0 Fixed -Wpointer-arith warning.
Submitted by:	Stefan Farfeleder
PR:		bin/56653
2003-09-09 23:50:57 +00:00
Ruslan Ermilov
fe08efe680 mdoc(7): Use the new feature of the .In macro. 2003-09-08 19:57:22 +00:00
Sam Leffler
468cf6f61a Add locking.
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and
fixing numerous problems.

Sponsored by:	FreeBSD Foundation
Reviewed by:	Pavlin Radoslavov <pavlin@icir.org>
2003-09-06 04:53:43 +00:00
Sam Leffler
2fad1e931e lock ip fragment queues
Submitted by:	Robert Watson <rwatson@freebsd.org>
Obtained from:	BSD/OS
2003-09-05 00:10:33 +00:00
Sam Leffler
26f91065e7 o add locking
o move the global divsrc socket address to a local variable
  instead of locking it

Sponsored by:	FreeBSD Foundation
2003-09-05 00:00:51 +00:00
Bruce M Simpson
8a538743b5 PR: kern/56343
Reviewed by:	tjr
Approved by:	jake (mentor)
2003-09-03 02:19:29 +00:00
Mike Silbersack
3390d47670 Implement MBUF_STRESS_TEST mark II.
Changes from the original implementation:

- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed.  Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.

- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments.  While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.

- Add two modes of random fragmentation.  Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
2003-09-01 05:55:37 +00:00
Sam Leffler
638ed548b7 add locking
NB: There is a known LOR on the forwarding path; this needs to be resolved
    together with a similar issue in the bridge.  For the moment it is
    believed to be benign.

Sponsored by:	FreeBSD Fondation
2003-09-01 05:12:36 +00:00
Sam Leffler
611ceef62a remove warning about use of old divert sockets; this was marked
for removal before 5.2

Reviewed by:	silence on -net and -arch
2003-09-01 04:27:34 +00:00
Sam Leffler
3b6dd5a9d0 add locking
Sponsored by:	FreeBSD Foundation
2003-09-01 04:23:48 +00:00
Robert Watson
f19389746e Remove redundant initialization of rti; SLIST_FOREACH does that for
us.
2003-08-28 22:15:05 +00:00
Robert Watson
6b48911b00 M_PREPEND() with an argument of M_TRYWAIT can fail, meaning the
returned mbuf can be NULL.  Check for NULL in rip_output() when
prepending an IP header.  This prevents mbuf exhaustion from
causing a local kernel panic when sending raw IP packets.

PR:		kern/55886
Reported by:	Pawel Malachowski <pawmal-posting@freebsd.lublin.pl>
MFC after:	3 days
2003-08-26 14:11:48 +00:00
Jeffrey Hsu
578c5e1212 Remove redundant bzero.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-24 08:27:57 +00:00
Robert Watson
baee0c3e66 Introduce two new MAC Framework and MAC policy entry points:
mac_reflect_mbuf_icmp()
  mac_reflect_mbuf_tcp()

These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket.  For example, in respond to a ping or a RST
packet to a SYN on a closed port.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 18:39:16 +00:00
Robert Watson
b8ecbcd287 Before digging into IGMP locking, do a whitespace and prototype cleanup:
prefer tabs to 8 spaces, focus on consistent indentation, prefer modern
C function prototypes.  Not all the way to style(9), but substantially
closer.
2003-08-20 17:32:17 +00:00
Robert Watson
6c4b2ad305 Move from a custom-crafted singly-linked list to the SLIST_* macros
from queue(3).

Improve vertical compactness by using a IGMP_PRINTF() macro rather
than #ifdefing IGMP_DEBUG a large number of debugging printfs.

Reviewed by:	mdodd (SLIST changes)
2003-08-20 17:09:01 +00:00
Bruce M Simpson
8afa230470 Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.

Suggested by:   fenestro
Obtained from:  BSD/OS
Reviewed by:    mini, sam
Approved by:    jake (mentor)
2003-08-20 14:46:40 +00:00
Sam Leffler
c06eb4e293 Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
Jeffrey Hsu
9ba208b413 * Bug fix in bw_meter_process(): the periodically processed bins
of bw_meter entries were processed up to one second ahead.
  After an unappropriate rescheduling of some of the bw_meter
  entries, the upcalls weren't delivered.

* pim_register_prepare() uses the appropriate sw_csum flag to
  call ip_fragment() so the IP checksum is computed properly.

* Modify pim_register_prepare() to take care of IP packets that
  don't need fragmentation.

* Add-back in_delayed_cksum() to encap_send(), because it seems it
  should be there.

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-19 17:22:51 +00:00
Sam Leffler
53b57cd1ab add missing unlock when in_pcballoc returns an error 2003-08-19 17:11:46 +00:00
David E. O'Brien
4f4a104ee8 style.Makefile(5) 2003-08-18 15:25:39 +00:00
Gordon Tetlow
41d8423f71 Stage 3 of dynamic root support. Make all the libraries needed to run
binaries in /bin and /sbin installed in /lib. Only the versioned files
reside in /lib, the .so symlink continues to live /usr/lib so the
toolchain doesn't need to be modified.
2003-08-17 08:28:46 +00:00
Hartmut Brandt
a9ca5bdbd0 The syncache has made use of TCPDEBUG problematic, because the SYN
segments are lost for the application. This broke, for example,
ports/benchmarks/dbs which needs the SYN segment to filter the
contents of the trace buffer for the connection it is interested in.

This patch makes the SYN segments available again. Unfortunately they
are now associated with the listening socket instead of the new one, so
a change to applications is required, but without this patch it wouldn't
work altogether.

PR:		kern/45966
2003-08-13 10:20:57 +00:00
Hartmut Brandt
91f467d592 The tcp_trace call needs the length of the header. Unfortunately the
code has rotten a bit so that the header length is not correct at
the point when tcp_trace is called. Temporarily compute the correct
value before the call and restore the old value after. This makes
ports/benchmarks/dbs to almost work.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:50:42 +00:00
Hartmut Brandt
3c653157a5 A number of patches in the last years have created new return paths
in tcp_input that leave the function before hitting the tcp_trace
function call for the TCPDEBUG option. This has made TCPDEBUG mostly
useless (and tools like ports/benchmarks/dbs not working). Add
tcp_trace calls to the return paths that could be identified in this
maze.

This is a NOP unless you compile with TCPDEBUG.
2003-08-13 08:46:54 +00:00
Hartmut Brandt
b24521d779 Change the code that enables/disables the ATM channel to use the
new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels
for IP over ATM, but also CBR, VBR and ABR. Change the format of the
link layer address to specify the channel characteristics. The old
format is still supported and opens UBR channels.
2003-08-12 14:20:32 +00:00
Jeffrey Hsu
59ca77f4a1 New PIM header files.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-07 18:17:43 +00:00
Jeffrey Hsu
1e78ac216e 1. Basic PIM kernel support
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):

options	MROUTING		# Multicast routing
options	PIM			# Protocol Independent Multicast

2. Add support for advanced multicast API setup/configuration and
extensibility.

3. Add support for kernel-level PIM Register encapsulation.
Disabled by default.  Can be enabled by the advanced multicast API.

4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-07 18:16:59 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Hartmut Brandt
20e57b1045 Ups. I forgot this one in the SIOCATMENA/SIOCATMDIS removal commit.
This change allows one to specify almost the complete traffic parameters
for IPoverATM channels through the routing table. Up to now we used
4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed.
If the address is longer, however, the 5th byte is interpreted as the
traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the
parameters for this traffic class:

  UBR: 0 byte or 3 byte PCR
  CBR: 3 byte PCR
  VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS
  ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM,
       1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF

A script to generate the corresponding 'route add' arguments will follow soon.
2003-08-06 15:56:37 +00:00
Jeffrey Hsu
1b6002ec30 * makes mfc[MFCTBLSIZ] and vif[MAXVIFS] tables accessible via
sysctl:
  - sysctlbyname("net.inet.ip.mfctable", ...)
  - sysctlbyname("net.inet.ip.viftable", ...)

  This change is needed so netstat can use sysctlbyname() to read
  the data from those tables.
  Otherwise, in some cases "netstat -g" may fail to report the
  multicast forwarding information (e.g., if we run a multicast
  router on PicoBSD).

* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast
  routing daemon, set properly "im->im_vif" to the receiving
  incoming interface of the packet that triggered that upcall
  rather than to the expected incoming interface of that packet.

* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"

* Few formatting nits (e.g., replace extra spaces with TABs)

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-05 17:01:33 +00:00
Hartmut Brandt
7e3d4432af When adding a channel for INET failed at the device level (ioctl) the
code used to call rtrequest(RTM_DELETE, ...). This is a problem, because
the function that just has called us (route_output)
is not really happy with the route it just is creating beeing ripped out
from under it. Unfortunately we also cannot return an error from
ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
2003-08-05 14:59:06 +00:00
Hartmut Brandt
5246b4ff88 Make this file to conform more to style(9) before really touching it. 2003-08-05 13:58:04 +00:00
Maxim Konovalov
e1bd2f381a o Fix a typo in previous commit. 2003-07-31 10:24:36 +00:00
Maxim Konovalov
853af3f3f0 o Do not overwrite saved interrupt priority level by alloc_hash(),
use a separate variable.
o Restore interrupt priority level before return (no-op in HEAD).

Spotted by:	Don Bowman <don@sandvine.com>
MFC after:	5 days
2003-07-25 09:59:16 +00:00
Sam Leffler
1f76a5e218 add IPSEC_FILTERGIF suport for FAST_IPSEC
PR:		kern/51922
Submitted by:	Eric Masson <e-masson@kisoft-services.com>
MFC after:	1 week
2003-07-22 18:58:34 +00:00
Mike Silbersack
7dc7f0311e Minor fix to the MBUF_STRESS_TEST code so that it keeps
pkthdr.len consistant at all times.  (Some debugging
code I'm working on is tripped otherwise.)

MFC after:	3 days
2003-07-19 05:50:32 +00:00
Robert Watson
83503a9227 Add a comment above rip_ctloutput() documenting that the privilege
check for raw IP system management operations is often (although
not always) implicit due to the namespacing of raw IP sockets.  I.e.,
you have to have privilege to get a raw IP socket, so much of the
management code sitting on raw IP sockets assumes that any requests
on the socket should be granted privilege.

Obtained from:	TrustedBSD Project
Product of:	France
2003-07-18 16:10:36 +00:00
Jeffrey Hsu
a12569ec4f Drop Giant around syncache timer processing. 2003-07-17 11:19:25 +00:00
Luigi Rizzo
4805529cf8 Allow set 31 to be used for rules other than 65535.
Set 31 is still special because rules belonging to it are not deleted
by the "ipfw flush" command, but must be deleted explicitly with
"ipfw delete set 31" or by individual rule numbers.

This implement a flexible form of "persistent rules" which you might
want to have available even after an "ipfw flush".
Note that this change does not violate POLA, because you could not
use set 31 in a ruleset before this change.

sbin/ipfw changes to allow manipulation of set 31 will follow shortly.

Suggested by: Paul Richards
2003-07-15 23:07:34 +00:00
Jeffrey Hsu
9d11646de7 Unify the "send high" and "recover" variables as specified in the
lastest rev of the spec.  Use an explicit flag for Fast Recovery. [1]

Fix bug with exiting Fast Recovery on a retransmit timeout
diagnosed by Lu Guohan. [2]

Reviewed by:		Thomas Henderson <thomas.r.henderson@boeing.com>
Reported and tested by:	Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2]
Approved by:		Thomas Henderson <thomas.r.henderson@boeing.com>,
			Sally Floyd <floyd@acm.org> [1]
2003-07-15 21:49:53 +00:00
Luigi Rizzo
72e02d4dac Implement comments embedded into ipfw2 instructions.
Since we already had 'O_NOP' instructions which always match, all
I needed to do is allow the NOP command to have arbitrary length
(i.e. move its label in a different part of the switch() which
validates instructions).

The kernel must know nothing about comments, everything else is
done in userland (which will be described in the upcoming ipfw2.c
commit).
2003-07-12 05:54:17 +00:00
Luigi Rizzo
7a1dfbc0d3 Merge the handlers of O_IP_SRC_MASK and O_IP_DST_MASK opcodes, and
support matching a list of addr/mask pairs so one can write
more efficient rulesets which were not possible before e.g.

    add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16

The change is fully backward compatible.
ipfw2 and manpage commit to follow.

MFC after: 3 days
2003-07-08 07:44:42 +00:00
Luigi Rizzo
c3e5b9f154 Implement the 'ipsec' option to match packets coming out of an ipsec tunnel.
Should work with both regular and fast ipsec (mutually exclusive).
See manpage for more details.

Submitted by: Ari Suutari (ari.suutari@syncrontech.com)
Revised by: sam
MFC after: 1 week
2003-07-04 21:42:32 +00:00
Luigi Rizzo
f030c1518d Correct some comments, add opcode O_IPSEC to match packets
coming out of an ipsec tunnel.
2003-07-04 21:39:51 +00:00
Luigi Rizzo
5d3b4c2480 Remove a stale comment, fix indentation. 2003-06-28 14:23:22 +00:00
Luigi Rizzo
b5f3c4cff3 whitespace fix 2003-06-28 14:16:53 +00:00
Luigi Rizzo
9c1cfc8650 remove unused file (ipfw2 is the default in RELENG_5 and above; the old
ipfw1 has been unused and unmaintained for a long time).
2003-06-24 07:12:11 +00:00
Luigi Rizzo
ec4270c021 Fix typo in a (commented out) debugging string.
Spotted by: diff
2003-06-23 21:38:21 +00:00
Luigi Rizzo
67ab48d1ae Remove whitespace at end of line. 2003-06-23 21:18:56 +00:00
Luigi Rizzo
44c884e134 Add support for multiple values and ranges for the "iplen", "ipttl",
"ipid" options. This feature has been requested by several users.
On passing, fix some minor bugs in the parser.  This change is fully
backward compatible so if you have an old /sbin/ipfw and a new
kernel you are not in trouble (but you need to update /sbin/ipfw
if you want to use the new features).

Document the changes in the manpage.

Now you can write things like

	ipfw add skipto 1000 iplen 0-500

which some people were asking to give preferential treatment to
short packets.

The 'MFC after' is just set as a reminder, because I still need
to merge the Alpha/Sparc64 fixes for ipfw2 (which unfortunately
change the size of certain kernel structures; not that it matters
a lot since ipfw2 is entirely optional and not the default...)

PR: bin/48015

MFC after: 1 week
2003-06-22 17:33:19 +00:00
Mike Silbersack
fcaf9f9146 Map icmp time exceeded responses to EHOSTUNREACH rather than 0 (no error);
this makes connect act more sensibly in these cases.

PR:				50839
Submitted by:			Barney Wolff <barney@pit.databus.com>
Patch delayed by laziness of:	silby
MFC after:			1 week
2003-06-17 06:21:08 +00:00
Ruslan Ermilov
ada24e690c In the PKT_ALIAS_PROXY_ONLY mode, make sure to preserve the
original source IP address, as promised in the manual page.

Spotted by:	Vaclav Petricek
2003-06-13 21:54:01 +00:00
Ruslan Ermilov
9c88dc8855 Removed a couple of .Xo/.Xc that are leftovers of the "ninth-argument
limit" mdoc(7) atavism.
2003-06-13 21:39:22 +00:00
Ruslan Ermilov
7176089886 Clarify that original address and port when doing transparent proxying
are _destination_ address and port.
2003-06-13 21:36:24 +00:00
Ruslan Ermilov
61de149d30 Added myself to the AUTHORS section. 2003-06-13 21:32:01 +00:00
Philippe Charnier
9703a107f2 The .Fn function 2003-06-08 09:53:08 +00:00
Robert Watson
042bbfa3b5 When setting fragment queue pointers to NULL, or comparing them with
NULL, use NULL rather than 0 to improve readability.
2003-06-06 19:32:48 +00:00
Jeffrey Hsu
f058535deb Compensate for decreasing the minimum retransmit timeout.
Reviewed by:	jlemon
2003-06-04 10:03:55 +00:00
Bernd Walter
330462a315 Change handling to support strong alignment architectures such as alpha and
sparc64.

PR:		alpha/50658
Submitted by:	rizzo
Tested on:	alpha
2003-06-04 01:17:37 +00:00
Kelly Yancey
ed7ea0e1ab Account for packets processed at layer-2 (i.e. net.link.ether.ipfw=1).
MFC after:	2 weeks
2003-06-02 23:54:09 +00:00
Ruslan Ermilov
234dfc904a A new API function PacketAliasRedirectDynamic() can be used
to mark a fully specified static link as dynamic; i.e. make
it a one-time link.
2003-06-01 23:15:00 +00:00
Ruslan Ermilov
f1a529f3da Make the PacketAliasSetAddress() function call optional. If it
is not called, and no static rules match an outgoing packet, the
latter retains its source IP address.  This is in support of the
"static NAT only" mode.
2003-06-01 22:49:59 +00:00
Poul-Henning Kamp
4df05d61bd Remove unused variables.
Found by:       FlexeLint
2003-06-01 09:20:38 +00:00
Poul-Henning Kamp
e4d2978dd8 Add /* FALLTHROUGH */
Found by:       FlexeLint
2003-05-31 19:07:22 +00:00
Garrett Wollman
6e49b1fe55 Don't generate an ip_id for packets with the DF bit set; ip_id is
only meaningful for fragments.  Also don't bother to byte-swap the
ip_id when we do generate it; it is only used at the receiver as a
nonce.  I tried several different permutations of this code with no
measurable difference to each other or to the unmodified version, so
I've settled on the one for which gcc seems to generate the best code.
(If anyone cares to microoptimize this differently for an architecture
where it actually matters, feel free.)

Suggested by:	Steve Bellovin's paper in IMW'02
2003-05-31 17:55:21 +00:00
Robert Watson
430c635447 Correct a bug introduced with reduced TCP state handling; make
sure that the MAC label on TCP responses during TIMEWAIT is
properly set from either the socket (if available), or the mbuf
that it's responding to.

Unfortunately, this is made somewhat difficult by the TCP code,
as tcp_twstart() calls tcp_twrespond() after discarding the socket
but without a reference to the mbuf that causes the "response".
Passing both the socket and the mbuf works arounds this--eventually
it might be good to make sure the mbuf always gets passed in in
"response" scenarios but working through this provided to
complicate things too much.

Approved by:	re (scottl)
Reviewed by:	hsu
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-07 05:26:27 +00:00
Robert Watson
688fe1d954 Trim a call to mac_create_mbuf_from_mbuf() since m_tag meta-data
copying for mbuf headers now works properly in m_dup_pkthdr(), so
we don't need to do an explicit copy.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-06 20:34:04 +00:00
Matthew N. Dodd
e97c58c8cf Add definitions for IN6ADDR_LINKLOCAL_ALLMDNS_INIT and INADDR_ALLMDNS_GROUP. 2003-04-29 22:03:46 +00:00
Matthew N. Dodd
4957466b8e IP_RECVTTL socket option.
Reviewed by:	Stuart Cheshire <cheshire@apple.com>
2003-04-29 21:36:18 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
David E. O'Brien
152385d122 Explicitly declare 'int' parameters. 2003-04-21 16:27:46 +00:00
David E. O'Brien
bfd738788b style.Makefile(5) 2003-04-20 18:38:59 +00:00
Mike Silbersack
53dcc544a8 Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended
to include more than just frag tests.
2003-04-12 06:11:46 +00:00
Robert Watson
cacd79e2c9 Remove a potential panic condition introduced by reduced TCP wait
state.  Those changed attempted to work around the changed invariant
that inp->in_socket was sometimes now NULL, but the logic wasn't
quite right, meaning that inp->in_socket would be dereferenced by
cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC
were in use.  Attempt to clarify and correct the logic.

Note: the work-around originally introduced with the reduced TCP
wait state handling to use cr_cansee() instead of cr_canseesocket()
in this case isn't really right, although it "Does the right thing"
for most of the cases in the base system.  We'll need to address
this at some point in the future.

Pointed out by:	dcs
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-04-10 20:33:10 +00:00
Dag-Erling Smørgrav
fe58453891 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
Dag-Erling Smørgrav
212059bd83 Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage. 2003-04-04 12:14:00 +00:00
Matthew N. Dodd
2c56e246fa Back out support for RFC3514.
RFC3514 poses an unacceptale risk to compliant systems.
2003-04-02 20:14:44 +00:00
Matthew N. Dodd
4f6425f7ae - Use the correct constant define.
- Add a missing break.
2003-04-02 18:02:58 +00:00
Matthew N. Dodd
8faf6df9b3 Sync constant define with NetBSD.
Requested by:	 Tom Spindler <dogcow@babymeat.com>
2003-04-02 10:28:47 +00:00
Jeffrey Hsu
48d2549c3e Observe conservation of packets when entering Fast Recovery while
doing Limited Transmit.  Only artificially inflate the congestion
window by 1 segment instead of the usual 3 to take into account
the 2 already sent by Limited Transmit.

Approved in principle by:	Mark Allman <mallman@grc.nasa.gov>,
Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
2003-04-01 21:16:46 +00:00
Matthew N. Dodd
09139a4537 Implement support for RFC 3514 (The Security Flag in the IPv4 Header).
(See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)

This fulfills the host requirements for userland support by
way of the setsockopt() IP_EVIL_INTENT message.

There are three sysctl tunables provided to govern system behavior.

	net.inet.ip.rfc3514:

		Enables support for rfc3514.  As this is an
		Informational RFC and support is not yet widespread
		this option is disabled by default.

	net.inet.ip.hear_no_evil

		 If set the host will discard all received evil packets.

	net.inet.ip.speak_no_evil

		If set the host will discard all transmitted evil packets.

The IP statistics counter 'ips_evil' (available via 'netstat') provides
information on the number of 'evil' packets recieved.

For reference, the '-E' option to 'ping' has been provided to demonstrate
and test the implementation.
2003-04-01 08:21:44 +00:00
Maxim Konovalov
7778283b40 Fix indentation. 2003-03-27 15:00:10 +00:00
Maxim Konovalov
be1e4c5162 o Protect set_fs_param() by splimp(9).
Quote from kern/37573:

	There is an obvious race in netinet/ip_dummynet.c:config_pipe().
	Interrupts are not blocked when changing the params of an
	existing pipe.  The specific crash observed:

	... -> config_pipe -> set_fs_parms -> config_red

	malloc a new w_q_lookup table but take an interrupt before
	intializing it, interrupt handler does:

	... -> dummynet_io -> red_drops

	red_drops dereferences the uninitialized (zeroed) w_q_lookup
	table.

o Flush accumulated credits for idle pipes.
o Flush accumulated credits when change pipe characteristics.
o Change dn_flow_queue.numbytes type to unsigned long.

	Overlapping dn_flow_queue->numbytes in ready_event() leads to
	numbytes becomes negative and SET_TICKS() macro returns a very
	big value.  heap_insert() overlaps dn_key again and inserts a
	queue to a ready heap with a sched_time points to the past.
	That leads to an "infinity" loop.

PR:		kern/33234, kern/37573, misc/42459, kern/43133,
		kern/44045, kern/48099
Submitted by:	Mike Hibler <mike@cs.utah.edu> (kern/37573)
MFC after:	6 weeks
2003-03-27 14:56:36 +00:00
Robert Watson
5e7ce4785f Modify the mac_init_ipq() MAC Framework entry point to accept an
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue.  This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-26 15:12:03 +00:00
Maxime Henrion
511e01e2d6 Try to make the MBUF_FRAG_TEST code work better.
- Don't try to fragment the packet if it's smaller than mbuf_frag_size.
- Preserve the size of the mbuf chain which is modified by m_split().
- Check that m_split() didn't return NULL.
- Make it so we don't end up with two M_PKTHDR mbuf in the chain.
- Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole
  chain and not just the first mbuf.
- Fix a nearby style bug and rework the logic of the loops so that it's
  more clear.

This is still not quite right, because we're clearly abusing m_split() to
do something it was not designed for, but at least it works now.  We
should probably move this code into a m_fragment() function when it's
correct.
2003-03-25 23:49:14 +00:00
Mike Silbersack
9d9edc5693 Add the MBUF_FRAG_TEST option. When compiled in, this option
allows you to tell ip_output to fragment all outgoing packets
into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes.
This is an excellent way to test if network drivers can properly
handle long mbuf chains being passed to them.

net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation)
so that you can at least boot before your network driver dies. :)
2003-03-25 05:45:05 +00:00
Maxime Henrion
aecfcdb824 Use __packed instead of __attribute__((__packed__)). 2003-03-22 00:25:14 +00:00
Matthew N. Dodd
57842a38fd Add a sysctl node allowing the specification of an address mask to use
when replying to ICMP Address Mask Request packets.
2003-03-21 15:43:06 +00:00
Matthew N. Dodd
21150298bb Add comments regarding the ICMP timestamp fields. 2003-03-21 15:28:10 +00:00
Crist J. Clark
010dabb047 Add a 'verrevpath' option that verifies the interface that a packet
comes in on is the same interface that we would route out of to get to
the packet's source address. Essentially automates an anti-spoofing
check using the information in the routing table.

Experimental. The usage and rule format for the feature may still be
subject to change.
2003-03-15 01:13:00 +00:00
Jeffrey Hsu
7792ea2700 Greatly simplify the unlocking logic by holding the TCP protocol lock until
after FIN_WAIT_2 processing.

Helped with debugging:	Doug Barton
2003-03-13 11:46:57 +00:00
Jeffrey Hsu
da3a8a1a4f Add support for RFC 3390, which allows for a variable-sized
initial congestion window.
2003-03-13 01:43:45 +00:00
Jeffrey Hsu
582a954b00 Implement the Limited Transmit algorithm (RFC 3042). 2003-03-12 20:27:28 +00:00
Sam Leffler
4a692a1fc2 correct two more flag misuses; m_tag* use malloc flags 2003-03-12 14:45:22 +00:00
Jonathan Lemon
a3b6edc353 Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure.
Sponsored by: DARPA, NAI Labs
2003-03-08 22:07:52 +00:00
Jonathan Lemon
607b0b0cc9 Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one.  Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs
2003-03-08 22:06:20 +00:00
Peter Wemm
3c6b084e96 Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
2003-03-05 19:24:24 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jonathan Lemon
272c5dfe93 In timewait state, if the incoming segment is a pure in-sequence ack
that matches snd_max, then do not respond with an ack, just drop the
segment.  This fixes a problem where a simultaneous close results in
an ack loop between two time-wait states.

Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG>
Sponsored by: DARPA, NAI Labs
2003-02-26 18:20:41 +00:00
Jonathan Lemon
ef6b48deb9 The TCP protocol lock may still be held if the reassembly queue dropped FIN.
Detect this case and drop the lock accordingly.

Sponsored by: DARPA, NAI Labs
2003-02-26 13:55:13 +00:00
Mike Silbersack
a75a485d62 Fix a condition so that ip reassembly queues are emptied immediately
when maxfragpackets is dropped to 0.

Noticed by:	bmah
2003-02-26 07:28:35 +00:00
Robert Watson
9327ee33bf When generating a TCP response to a connection, not only test if the
tcpcb is NULL, but also its connected inpcb, since we now allow
elements of a TCP connection to hang around after other state, such
as the socket, has been recycled.

Tested by:	dcs
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-02-25 14:08:41 +00:00
Maxim Konovalov
b36f5b3735 style(9): join lines. 2003-02-25 11:53:11 +00:00
Maxim Konovalov
99e8617d24 Ip reassembly queue structure has ipq_nfrags now. Count a number of
dropped ip fragments precisely.

Reviewed by:	silby
2003-02-25 11:49:01 +00:00
Jeffrey Hsu
edf02ff15d Hold the TCP protocol lock while modifying the connection hash table. 2003-02-25 01:32:03 +00:00
Mike Silbersack
af9c7d06d5 Fix a comment which didn't match the new cookie behavior.
Submitted by:	Scott Renfro <scott@renfro.org>
MFC after:	1 day
2003-02-24 03:15:48 +00:00
Jeffrey Hsu
11a20fb8b6 tcp_twstart() need to be called with the TCP protocol lock held to avoid
a race condition with the TCP timer routines.
2003-02-24 00:52:03 +00:00
Jeffrey Hsu
2fbef91887 Pass the right function to callout_reset() for a compressed
TIME-WAIT control block.
2003-02-24 00:48:12 +00:00
Mike Silbersack
a432399c56 Improve the security and performance of syncookies:
Security improvements:
- Increase the size of each syncookie secret from 32 to 128 bits
  in order to make brute force attacks on the secrets much more
  difficult.
- Always return the lowest order dword from the MD5 hash; this
  allows us to expose 2 more bits of the cookie and makes ACK
  floods which seek to guess the cookie value more difficult.

Performance improvements:
- Increase the lifetime of each syncookie from 4 seconds to 16
  seconds.  This increases the usefulness of syncookies during
  an attack.
- From Yahoo!: Reduce the number of calls to MD5Update; this
  results in a ~17% increase in cookie generation time here.

Reviewed by:	hsu, jayanth, jlemon, nectar
MFC After:	15 seconds
2003-02-23 19:04:23 +00:00
Jonathan Lemon
f243998be5 Yesterday just wasn't my day. Remove testing delta that crept into the diff.
Pointy hat provided by: sam
2003-02-23 15:40:36 +00:00
Sam Leffler
14dd6717f8 Add a new config option IPSEC_FILTERGIF to control whether or not
packets coming out of a GIF tunnel are re-processed by ipfw, et. al.
By default they are not reprocessed.  With the option they are.

This reverts 1.214.  Prior to that change packets were not re-processed.
After they were which caused problems because packets do not have
distinguishing characteristics (like a special network if) that allows
them to be filtered specially.

This is really a stopgap measure designed for immediate MFC so that
4.8 has consistent handling to what was in 4.7.

PR:		48159
Reviewed by:	Guido van Rooij <guido@gvr.org>
MFC after:	1 day
2003-02-23 00:47:06 +00:00
Jonathan Lemon
a14c749f04 Check to see if the TF_DELACK flag is set before returning from
tcp_input().  This unbreaks delack handling, while still preserving
correct T/TCP behavior

Tested by: maxim
Sponsored by: DARPA, NAI Labs
2003-02-22 21:54:57 +00:00
Mike Silbersack
375386e284 Add the ability to limit the number of IP fragments allowed per packet,
and enable it by default, with a limit of 16.

At the same time, tweak maxfragpackets downward so that in the worst
possible case, IP reassembly can use only 1/2 of all mbuf clusters.

MFC after: 	3 days
Reviewed by:	hsu
Liked by:	bmah
2003-02-22 06:41:47 +00:00
Poul-Henning Kamp
d25ecb917b - m = m_gethdr(M_NOWAIT, MT_HEADER);
+       m = m_gethdr(M_DONTWAIT, MT_HEADER);

'nuff said.
2003-02-21 23:17:12 +00:00
Crist J. Clark
b0d226932e The ancient and outdated concept of "privileged ports" in UNIX-type
OSes has probably caused more problems than it ever solved. Allow the
user to retire the old behavior by specifying their own privileged
range with,

  net.inet.ip.portrange.reservedhigh  default = IPPORT_RESERVED - 1
  net.inet.ip.portrange.reservedlo    default = 0

Now you can run that webserver without ever needing root at all. Or
just imagine, an ftpd that can really drop privileges, rather than
just set the euid, and still do PORT data transfers from 20/tcp.

Two edge cases to note,

  # sysctl net.inet.ip.portrange.reservedhigh=0

Opens all ports to everyone, and,

  # sysctl net.inet.ip.portrange.reservedhigh=65535

Locks all network activity to root only (which could actually have
been achieved before with ipfw(8), but is somewhat more
complicated).

For those who stick to the old religion that 0-1023 belong to root and
root alone, don't touch the knobs (or even lock them by raising
securelevel(8)), and nothing changes.
2003-02-21 05:28:27 +00:00
Jonathan Lemon
8608c4c1f9 Remove unused variables in the IPSEC case.
Submitted by:  Lars Eggert <larse@ISI.EDU>
2003-02-20 18:22:21 +00:00
Jonathan Lemon
ffae8c5a7e Unbreak non-IPV6 compilation.
Caught by: phk
Sponsored by: DARPA, NAI Labs
2003-02-19 23:43:04 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Jonathan Lemon
7990938421 Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the
routine does not require a tcpcb to operate.  Since we no longer keep
template mbufs around, move pseudo checksum out of this routine, and
merge it with the length update.

Sponsored by: DARPA, NAI Labs
2003-02-19 22:18:06 +00:00
Jonathan Lemon
414462252a Correct comments. 2003-02-19 21:33:46 +00:00
Jonathan Lemon
3bfd6421c2 Clean up delayed acks and T/TCP interactions:
- delay acks for T/TCP regardless of delack setting
   - fix bug where a single pass through tcp_input might not delay acks
   - use callout_active() instead of callout_pending()

Sponsored by: DARPA, NAI Labs
2003-02-19 21:18:23 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Maxim Konovalov
b52d5ea3d2 o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket
cr_uid.

Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its
own macro for a similar purpose that is why ipfw2 in RELENG_4 processes
uid rules correctly. I will MFC the diff for code consistency.

Reported by:	Oleg Baranov <ol@csa.ru>
Reviewed by:	luigi
MFC after:	1 month
2003-02-17 13:39:57 +00:00
Jeffrey Hsu
4b40c56c28 Take advantage of pre-existing lock-free synchronization and type stable memory
to avoid acquiring SMP locks during expensive copyout process.
2003-02-15 02:37:57 +00:00
Jeffrey Hsu
85e8b24343 The protocol lock is always held in the dropafterack case, so we don't
need to check for it at runtime.
2003-02-13 22:14:22 +00:00
Jeffrey Hsu
3dc7ebf9ff in_pcbnotifyall() requires an exclusive protocol lock for notify functions
which modify the connection list, namely, tcp_notify().
2003-02-12 23:55:07 +00:00
Jeffrey Hsu
6d45d64a8f Properly document that syncache timer processing requires an
exclusive TCP protocol lock.
2003-02-12 00:42:12 +00:00
Seigo Tanimura
cd6c2a8874 s/IPSSEC/IPSEC/ 2003-02-11 10:51:56 +00:00
Jeffrey Hsu
24652ff6e1 Get cosmetic changes out of the way before I add routing table SMP locks. 2003-02-10 22:01:34 +00:00