Commit Graph

16 Commits

Author SHA1 Message Date
Luigi Rizzo
cc4d3c30ea Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
Luigi Rizzo
59a613b14d check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO
2010-01-07 12:00:54 +00:00
Luigi Rizzo
62081e0f8d some header shuffling to help decoupling ip_divert from ipfw 2010-01-07 10:08:05 +00:00
Luigi Rizzo
c95477dfa1 this file does not require ip_dummynet.h 2010-01-05 11:00:31 +00:00
Luigi Rizzo
7173b6e554 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
Luigi Rizzo
e59084e086 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
Luigi Rizzo
830c6e2b97 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
Luigi Rizzo
de240d1013 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
Luigi Rizzo
97219abf05 Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
  static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
  (the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
  slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after:	1 month
2009-12-16 10:48:40 +00:00
Luigi Rizzo
70228fb346 Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
	now references the two new source files

netinet/ip_fw.h
	remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
	new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
	ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
	code related to dynamic rules

netinet/ipfw/ip_fw2.c
	removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
	minor rearrangement to remove LOOKUP_NAT from the
	main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after:	1 month
2009-12-15 16:15:14 +00:00
Julian Elischer
0b4b0b0fee Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
Julian Elischer
c4b21cbe4a Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.

This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0

Reviewed by:	zec@, rwatson@, bz@(earlier version)
Approved by:	re (rwatson)
MFC after:	Immediatly
2009-08-21 11:20:10 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Oleg Bulyzhin
dda10d624c Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after:	1 month
Tested by:	Mikolaj Golub
2009-06-09 21:27:11 +00:00
Luigi Rizzo
908e960ea6 move kernel ipfw-related sources to a separate directory,
adjust conf/files and modules' Makefiles accordingly.

No code or ABI changes so this and most of previous related
changes can be easily MFC'ed

MFC after:	5 days
2009-06-05 19:22:47 +00:00