Commit Graph

348 Commits

Author SHA1 Message Date
Bruce M Simpson
d10910e6ce Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
Marko Zec
385195c062 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Marko Zec
f02493cbbd Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-28 23:30:51 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Ed Maste
d2035ffb7a Move CTASSERT from header file to source file, per implementation note now
in the CTASSERT man page.

Submitted by:	Ryan Stone
2008-09-26 18:30:11 +00:00
Julian Elischer
b53c8130e5 Another V_ forgotten 2008-08-25 05:49:16 +00:00
Julian Elischer
5ed3800e41 Fix some of the formatting fixes.. It's amazing how some thing stand out
in a commit message.
2008-08-20 01:24:55 +00:00
Julian Elischer
ac957cd271 A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.
2008-08-20 01:05:56 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Robert Watson
59dd72d040 Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred).  Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired.  This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by:	bz
MFC after:	1 month
X-MFC note:	We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
		but the rest can go back.
2008-07-04 00:21:38 +00:00
Bjoern A. Zeeb
62ee136457 Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029,
generating an RTM_MISS for every IP packet forwarded making user space
routing daemons unhappy.

PR:		kern/123621, kern/124540, kern/122338
Reported by:	Paul <paul gtcomm.net>, Mike Tancsa <mike sentex.net> on net@
Tested by:	Paul and Mike
Reviewed by:	andre
MFC after:	3 days
2008-07-03 12:44:36 +00:00
Julian Elischer
8b07e49a00 Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

  One thing where FreeBSD has been falling behind, and which by chance I
  have some time to work on is "policy based routing", which allows
  different
  packet streams to be routed by more than just the destination address.

  Constraints:
  ------------

  I want to make some form of this available in the 6.x tree
  (and by extension 7.x) , but FreeBSD in general needs it so I might as
  well do it in -current and back port the portions I need.

  One of the ways that this can be done is to have the ability to
  instantiate multiple kernel routing tables (which I will now
  refer to as "Forwarding Information Bases" or "FIBs" for political
  correctness reasons). Which FIB a particular packet uses to make
  the next hop decision can be decided by a number of mechanisms.
  The policies these mechanisms implement are the "Policies" referred
  to in "Policy based routing".

  One of the constraints I have if I try to back port this work to
  6.x is that it must be implemented as a EXTENSION to the existing
  ABIs in 6.x so that third party applications do not need to be
  recompiled in timespan of the branch.

  This first version will not have some of the bells and whistles that
  will come with later versions. It will, for example, be limited to 16
  tables in the first commit.
  Implementation method, Compatible version. (part 1)
  -------------------------------
  For this reason I have implemented a "sufficient subset" of a
  multiple routing table solution in Perforce, and back-ported it
  to 6.x. (also in Perforce though not  always caught up with what I
  have done in -current/P4). The subset allows a number of FIBs
  to be defined at compile time (8 is sufficient for my purposes in 6.x)
  and implements the changes needed to allow IPV4 to use them. I have not
  done the changes for ipv6 simply because I do not need it, and I do not
  have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

  Other protocol families are left untouched and should there be
  users with proprietary protocol families, they should continue to work
  and be oblivious to the existence of the extra FIBs.

  To understand how this is done, one must know that the current FIB
  code starts everything off with a single dimensional array of
  pointers to FIB head structures (One per protocol family), each of
  which in turn points to the trie of routes available to that family.

  The basic change in the ABI compatible version of the change is to
  extent that array to be a 2 dimensional array, so that
  instead of protocol family X looking at rt_tables[X] for the
  table it needs, it looks at rt_tables[Y][X] when for all
  protocol families except ipv4 Y is always 0.
  Code that is unaware of the change always just sees the first row
  of the table, which of course looks just like the one dimensional
  array that existed before.

  The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
  are all maintained, but refer only to the first row of the array,
  so that existing callers in proprietary protocols can continue to
  do the "right thing".
  Some new entry points are added, for the exclusive use of ipv4 code
  called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
  which have an extra argument which refers the code to the correct row.

  In addition, there are some new entry points (currently called
  rtalloc_fib() and friends) that check the Address family being
  looked up and call either rtalloc() (and friends) if the protocol
  is not IPv4 forcing the action to row 0 or to the appropriate row
  if it IS IPv4 (and that info is available). These are for calling
  from code that is not specific to any particular protocol. The way
  these are implemented would change in the non ABI preserving code
  to be added later.

  One feature of the first version of the code is that for ipv4,
  the interface routes show up automatically on all the FIBs, so
  that no matter what FIB you select you always have the basic
  direct attached hosts available to you. (rtinit() does this
  automatically).

  You CAN delete an interface route from one FIB should you want
  to but by default it's there. ARP information is also available
  in each FIB. It's assumed that the same machine would have the
  same MAC address, regardless of which FIB you are using to get
  to it.

  This brings us as to how the correct FIB is selected for an outgoing
  IPV4 packet.

  Firstly, all packets have a FIB associated with them. if nothing
  has been done to change it, it will be FIB 0. The FIB is changed
  in the following ways.

  Packets fall into one of a number of classes.

  1/ locally generated packets, coming from a socket/PCB.
     Such packets select a FIB from a number associated with the
     socket/PCB. This in turn is inherited from the process,
     but can be changed by a socket option. The process in turn
     inherits it on fork. I have written a utility call setfib
     that acts a bit like nice..

         setfib -3 ping target.example.com # will use fib 3 for ping.

     It is an obvious extension to make it a property of a jail
     but I have not done so. It can be achieved by combining the setfib and
     jail commands.

  2/ packets received on an interface for forwarding.
     By default these packets would use table 0,
     (or possibly a number settable in a sysctl(not yet)).
     but prior to routing the firewall can inspect them (see below).
     (possibly in the future you may be able to associate a FIB
     with packets received on an interface..  An ifconfig arg, but not yet.)

  3/ packets inspected by a packet classifier, which can arbitrarily
     associate a fib with it on a packet by packet basis.
     A fib assigned to a packet by a packet classifier
     (such as ipfw) would over-ride a fib associated by
     a more default source. (such as cases 1 or 2).

  4/ a tcp listen socket associated with a fib will generate
     accept sockets that are associated with that same fib.

  5/ Packets generated in response to some other packet (e.g. reset
     or icmp packets). These should use the FIB associated with the
     packet being reponded to.

  6/ Packets generated during encapsulation.
     gif, tun and other tunnel interfaces will encapsulate using the FIB
     that was in effect withthe proces that set up the tunnel.
     thus setfib 1 ifconfig gif0 [tunnel instructions]
     will set the fib for the tunnel to use to be fib 1.

  Routing messages would be associated with their
  process, and thus select one FIB or another.
  messages from the kernel would be associated with the fib they
  refer to and would only be received by a routing socket associated
  with that fib. (not yet implemented)

  In addition Netstat has been edited to be able to cope with the
  fact that the array is now 2 dimensional. (It looks in system
  memory using libkvm (!)). Old versions of netstat see only the first FIB.

  In addition two sysctls are added to give:
  a) the number of FIBs compiled in (active)
  b) the default FIB of the calling process.

  Early testing experience:
  -------------------------

  Basically our (IronPort's) appliance does this functionality already
  using ipfw fwd but that method has some drawbacks.

  For example,
  It can't fully simulate a routing table because it can't influence the
  socket's choice of local address when a connect() is done.

  Testing during the generating of these changes has been
  remarkably smooth so far. Multiple tables have co-existed
  with no notable side effects, and packets have been routes
  accordingly.

  ipfw has grown 2 new keywords:

  setfib N ip from anay to any
  count ip from any to any fib N

  In pf there seems to be a requirement to be able to give symbolic names to the
  fibs but I do not have that capacity. I am not sure if it is required.

  SCTP has interestingly enough built in support for this, called VRFs
  in Cisco parlance. it will be interesting to see how that handles it
  when it suddenly actually does something.

  Where to next:
  --------------------

  After committing the ABI compatible version and MFCing it, I'd
  like to proceed in a forward direction in -current. this will
  result in some roto-tilling in the routing code.

  Firstly: the current code's idea of having a separate tree per
  protocol family, all of the same format, and pointed to by the
  1 dimensional array is a bit silly. Especially when one considers that
  there is code that makes assumptions about every protocol having the
  same internal structures there. Some protocols don't WANT that
  sort of structure. (for example the whole idea of a netmask is foreign
  to appletalk). This needs to be made opaque to the external code.

  My suggested first change is to add routing method pointers to the
  'domain' structure, along with information pointing the data.
  instead of having an array of pointers to uniform structures,
  there would be an array pointing to the 'domain' structures
  for each protocol address domain (protocol family),
  and the methods this reached would be called. The methods would have
  an argument that gives FIB number, but the protocol would be free
  to ignore it.

  When the ABI can be changed it raises the possibilty of the
  addition of a fib entry into the "struct route". Currently,
  the structure contains the sockaddr of the desination, and the resulting
  fib entry. To make this work fully, one could add a fib number
  so that given an address and a fib, one can find the third element, the
  fib entry.

  Interaction with the ARP layer/ LL layer would need to be
  revisited as well. Qing Li has been working on this already.

  This work was sponsored by Ironport Systems/Cisco

Reviewed by:    several including rwatson, bz and mlair (parts each)
Obtained from:  Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
Bjoern A. Zeeb
b835b6fe2b Take the route mtu into account, if available, when sending an
ICMP unreach, frag needed.  Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.

In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.

Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.

PR:		kern/122338
Tested by:	Mark Cammidge  mark peralex.com
Reviewed by:	silence on net@
MFC after:	2 weeks
2008-04-09 05:17:18 +00:00
Guido van Rooij
d23d475fb4 Consider the following situation:
1. A packet comes in that is to be forwarded
2. The destination of the packet is rewritten by some firewall code
3. The next link's MTU is too small
4. The packet has the DF bit set

Then the current code is such that instead of setting the next
link's MTU in the ICMP error, ip_next_mtu() is called and a guess
is sent as to which MTU is supposed to be tried next. This is because
in this case ip_forward() is called with srcrt set to 1. In that
case the ia pointer remains NULL but it is needed to get the MTU
of the interface the packet is to be sent out from.
Thus, we always set ia to the outgoing interface.

MFC after:	2 weeks
2007-12-02 13:00:47 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Mike Silbersack
4b421e2daa Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by:	re (kensmith)
2007-10-07 20:44:24 +00:00
Bjoern A. Zeeb
cc977adc71 Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL.
Also rename the related functions in a similar way.
There are no functional changes.

For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.

With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.

The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.

Discussed at:			BSDCan 2007
Best new name suggested by:	rwatson
Reviewed by:			rwatson
Approved by:			re (bmah)
2007-08-05 16:16:15 +00:00
George V. Neville-Neil
b2630c2934 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Robert Watson
6751f8364e Remove leading spaces before tabs spotted thanks to silby using
kwrite to read ip_input.c.
2007-05-16 20:46:58 +00:00
Robert Watson
f2565d68a4 Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.
2007-05-10 15:58:48 +00:00
Robert Watson
30916a2d1d Replace a comment about RSVP/mrouting with a different but similar comment
explaining that some more locking is needed.  The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.

Spotted by:	kris
2007-03-25 21:49:50 +00:00
Andre Oppermann
6489fe6553 Match up SYSCTL declaration style. 2007-03-19 19:00:51 +00:00
Bruce M Simpson
f8429ca2e1 In regular forwarding path, reject packets destined for 169.254.0.0/16
link-local addresses. See RFC 3927 section 2.7.
2007-02-03 06:45:51 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Julian Elischer
010b65f54a revert last change.. premature.. need to wait until if_ethersubr.c
uses pfil to get to ipfw.
2006-10-21 00:16:31 +00:00
Julian Elischer
3df668cc38 Move some variables to a more likely place
and remove "temporary" stuff that is not needed any more.
2006-10-20 19:32:08 +00:00
Julian Elischer
b7522c27d2 Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was
in older versions of FreeBSD. This option is pointless as it is needed in just
about every interesting usage of forward that I have ever seen. It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x  or 7.x
Reviewed by:	glebius
MFC after:	1 week
2006-08-17 00:37:03 +00:00
Max Laier
e93187482d Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing
seperately.  Also use pfil hook/unhook instead of keeping the check
functions in pfil just to return there based on the sysctl.  While here fix
some whitespace on a nearby SYSCTL_ macro.
2006-05-12 04:41:27 +00:00
Pawel Jakub Dawidek
1d7d0bfe5e /tmp/cvsTXPIwQ 2006-05-05 06:24:34 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
Oleg Bulyzhin
6edb555dbc Fix five years old bug in ip_reass(): if we are using 'full' (i.e. including
pseudo header) hardware rx checksum offloading ip_reass() fails to calculate
TCP/UDP checksum for reassembled packet correctly.  This also should fix
recent 'NFS over UDP over bge' issue exposed by if_bge.c rev. 1.123

Reviewed by:	sam (earlier version), bde
Approved by:	glebius (mentor)
MFC after:	2 weeks
2006-02-07 11:48:10 +00:00
Christian S.J. Peron
604afec496 Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:

- Drop the variables stored in the phil_head structure which were specific to
  conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
  macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
  of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
  hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
  PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
  hooks to be ran, and returns if not. This check is already performed by the
  IP stacks when they call:

        if (!PFIL_HOOKED(ph))
                goto skip_hooks;

- Drop in assertion which makes sure that the number of hooks never drops
  below 0 for good measure. This in theory should never happen, and if it
  does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
  the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
  API instead of our home rolled version
- Convert the inlined functions to macros

Reviewed by:	mlaier, andre, glebius
Thanks to:	jhb for the new locking API
2006-02-02 03:13:16 +00:00
Andre Oppermann
1dfcf0d2a3 Move the IPSEC related code blocks to their own file to unclutter
and signifincantly improve the readability of ip_input() and
ip_output() again.

The resulting IPSEC hooks in ip_input() and ip_output() may be
used later on for making IPSEC loadable.

This move is mostly mechanical and should preserve current IPSEC
behaviour as-is.  Nothing shall prevent improvements in the way
IPSEC interacts with the IPv4 stack.

Discussed with:	bz, gnn, rwatson; (earlier version)
2006-02-01 13:55:03 +00:00
Andre Oppermann
ab48768b20 When doing IP forwarding with [FAST_]IPSEC compiled into the kernel
ip_forward() would report back a zero MTU in ICMP needfrag messages
because on a IPSEC SP lookup failure no MTU got computed.

Fix this by changing the logic to compute a new MTU in any case if
IPSEC didn't do it.

Change MTU computation logic to use egress interface MTU if available
or the next smaller MTU compared to the current packet size instead
of falling back to a very small fixed MTU.

Fix associated comment.

PR:		kern/91412
MFC after:	3 days
2006-01-24 17:57:19 +00:00
Robert Watson
d248c7d7f5 Modify the IP fragment reassembly code so that it uses a new UMA zone,
ipq_zone, to allocate fragment headers from, rather than using cast mbuf
storage.  This was one of the few remaining uses of mbuf storage for
local data structures that relied on dtom().  Implement the resource
limit on ipq's using UMA zone limits, but preserve current sysctl
semantics using a sysctl proc.

MFC after:	3 weeks
2006-01-15 18:58:21 +00:00
Robert Watson
dfa60d9354 Staticize ipqlock, since it is local to ip_input.c.
MFC after:	3 days
2006-01-15 17:05:48 +00:00
Ruslan Ermilov
f4e9888107 Fix -Wundef. 2005-12-04 02:12:43 +00:00
Andre Oppermann
22f2c8b5db Remove 'ipprintfs' which were protected under DIAGNOSTIC. It doesn't
have any know to enable it from userland and could only be enabled by
either setting it to 1 at compile time or through the kernel debugger.

In the future it may be brought back as KTR tracing points.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-19 17:04:52 +00:00
Andre Oppermann
ef39adf007 Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
  ip_dooptions(struct mbuf *m, int pass)
  save_rte(m, option, dst)
  ip_srcroute(m0)
  ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
  ip_insertoptions(m, opt, phlen)
  ip_optcopy(ip, jp)
  ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 20:12:40 +00:00
Andre Oppermann
780b2f698c In ip_forward() copy as much into the temporary error mbuf as we
have free space in it.  Allocate correct mbuf from the beginning.
This allows icmp_error() to quote the entire TCP header in error
messages.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 14:44:48 +00:00
Ruslan Ermilov
4a0d6638b3 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
Andre Oppermann
e0aec68255 Use the correct mbuf type for MGET(). 2005-08-30 16:35:27 +00:00
Robert Watson
dd5a318ba3 Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
Robert Watson
b77634d046 Remove spl() calls from ip_slowtimo(), as IP fragment queue locking was
merged several years ago.

Submitted by:	gnn
MFC after:	1 day
2005-07-19 12:14:22 +00:00
Andre Oppermann
c773494edd Pass icmp_error() the MTU argument directly instead of
an interface pointer.  This simplifies a couple of uses
and removes some XXX workarounds.
2005-05-04 13:09:19 +00:00