Commit Graph

79 Commits

Author SHA1 Message Date
Luigi Rizzo
5007b59f26 implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.
2010-03-11 22:42:33 +00:00
Luigi Rizzo
642dddf0f8 fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by:	Riccardo Panicucci
2010-03-10 14:21:05 +00:00
Luigi Rizzo
feadd2b1ca cosmetic changes and C++ compatibility 2010-03-08 11:27:39 +00:00
Luigi Rizzo
d12cc63303 don't use C++ keywords as variable names 2010-03-08 11:27:08 +00:00
Luigi Rizzo
b854138d5f do not report an error unnecessarily 2010-03-08 11:22:47 +00:00
Bjoern A. Zeeb
e253cdd07c Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00
Luigi Rizzo
67d079f342 plug a memory leak on pipe's reconfiguration 2010-03-05 17:53:28 +00:00
Luigi Rizzo
6a82d14731 fix a memory leak when deleting RED queues 2010-03-05 12:58:19 +00:00
Luigi Rizzo
b05934e2cb portability fixes 2010-03-04 21:52:40 +00:00
Luigi Rizzo
ae8b199313 don't use keywords as variable names. 2010-03-04 21:01:59 +00:00
Luigi Rizzo
44e510399b use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by:	Francesco Magno
2010-03-04 16:53:38 +00:00
Luigi Rizzo
6aada3117b improve compatibility with RELENG_7.2 2010-03-04 16:52:26 +00:00
Luigi Rizzo
cc4d3c30ea Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
Luigi Rizzo
27c9c97a3e remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone
2010-02-17 13:06:06 +00:00
Hajimu UMEMOTO
416458131a Change 'me' to match any IPv6 address configured on an interface in
the system as well as any IPv4 address.

Reviewed by:	David Horn <dhorn2000__at__gmail.com>, luigi, qingli
MFC after:	2 weeks
2010-01-17 08:39:48 +00:00
Luigi Rizzo
5afa29b41a we don't use dummynet_drain! 2010-01-07 13:53:47 +00:00
Luigi Rizzo
59a613b14d check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO
2010-01-07 12:00:54 +00:00
Luigi Rizzo
b2019e1789 Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.
2010-01-07 10:39:15 +00:00
Luigi Rizzo
62081e0f8d some header shuffling to help decoupling ip_divert from ipfw 2010-01-07 10:08:05 +00:00
Luigi Rizzo
eb6842e2a9 put ip_len in correct order for ip_output().
This prevents a panic when ipfw generates packets on its own
(such as reject or keepalives for dynamic rules).

Reported by: Chagin Dmitry
2010-01-07 09:28:17 +00:00
Luigi Rizzo
c95477dfa1 this file does not require ip_dummynet.h 2010-01-05 11:00:31 +00:00
Luigi Rizzo
7173b6e554 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
Luigi Rizzo
bcd3b68dd2 we really need htonl() here, see the comment a few lines above in the code. 2009-12-29 00:02:57 +00:00
Luigi Rizzo
e59084e086 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
Luigi Rizzo
830c6e2b97 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
Luigi Rizzo
6cc7b9f5d9 readability fixes -- add braces on large blocks, remove unnecessary
initializations
2009-12-28 10:19:53 +00:00
Luigi Rizzo
6730dcaec7 explain details of operation of table lookups, and improve portability 2009-12-28 10:12:35 +00:00
Luigi Rizzo
2082ecd966 diverted packet must re-enter _after_ the matching rule,
or we create loops.
The divert cookie (that can be set from userland too)
contains the matching rule nr, so we must start from nr+1.

Reported by: Joe Marcus Clarke
2009-12-27 10:19:10 +00:00
Luigi Rizzo
4a3c1bd27f fix poor indentation resulting from a merge 2009-12-24 17:35:28 +00:00
Luigi Rizzo
84918f5bc8 mostly style changes, such as removal of trailing whitespace,
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.

As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.
2009-12-23 18:53:11 +00:00
Luigi Rizzo
3ae19c3ba3 fix build with the new fast lookup structure.
Also remove some unnecessary headers
2009-12-23 12:15:21 +00:00
Luigi Rizzo
6aab896346 fix build on 64-bit architectures.
Also fix the indentation on a few lines.
2009-12-23 12:00:50 +00:00
Luigi Rizzo
de240d1013 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
Luigi Rizzo
46fdc2bf60 some mostly cosmetic changes in preparation for upcoming work:
+ in many places, replace &V_layer3_chain with a local
  variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
  replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures
2009-12-22 13:53:34 +00:00
Ruslan Ermilov
bec5f27f73 Added proper attribution.
Requested by:	luigi
2009-12-18 17:22:21 +00:00
Luigi Rizzo
1328a38b96 Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

	ipfw add 5000 count log ....

and run
	tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.
2009-12-17 23:11:16 +00:00
Luigi Rizzo
60ab046a41 simplify and document lookup_next_rule() 2009-12-17 17:27:12 +00:00
Luigi Rizzo
59cd9f65f9 simplify the code that finds the next rule after reinjections
MFC after:	1 week
2009-12-17 12:27:54 +00:00
Luigi Rizzo
53638988bc remove a duplicate sysctl entry 2009-12-16 18:03:35 +00:00
Luigi Rizzo
1b5691c61e bring back a couple of #include that are supplied by nesting,
and explain why they are used.
2009-12-16 13:00:37 +00:00
Luigi Rizzo
97219abf05 Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
  static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
  (the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
  slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after:	1 month
2009-12-16 10:48:40 +00:00
Warner Losh
26bbc1fc5a Quick fix to make this compile:
Remove redundant extern declearations.
If the maintainer has a better fix, then feel free to back this out.
2009-12-16 03:26:37 +00:00
Luigi Rizzo
22f123afad more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).

Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.

MFC after:	1 month
2009-12-15 21:24:12 +00:00
Luigi Rizzo
70228fb346 Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
	now references the two new source files

netinet/ip_fw.h
	remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
	new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
	ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
	code related to dynamic rules

netinet/ipfw/ip_fw2.c
	removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
	minor rearrangement to remove LOOKUP_NAT from the
	main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after:	1 month
2009-12-15 16:15:14 +00:00
Luigi Rizzo
472099c4b0 implement a new match option,
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N

which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.

Work supported by the Onelab project.

MFC after:	1 week
2009-12-15 09:46:27 +00:00
Luigi Rizzo
b2089673e5 use div64 when converting back the burst value for userland 2009-12-10 18:37:14 +00:00
Luigi Rizzo
89717f91ef when draining a flowset free the entire chain, not just one packet. 2009-12-10 18:34:07 +00:00
Luigi Rizzo
478cae8a97 centralize the code to free a packet (or a chain) while in dummynet.
Remove an old macro and its stale comment.
2009-12-10 15:17:34 +00:00
Oleg Bulyzhin
22746035ec Fix burst processing for WF2Q pipes - do not increase available burst size
unless pipe is idle. This should fix follwing issues:
- 'dummynet: OUCH! pipe should have been idle!' log messages.
- exceeding configured pipe bandwidth.

MFC after:	1 week
2009-12-05 23:27:21 +00:00
Luigi Rizzo
f573a0a634 adjust comment in previous commit after Julian's explanation 2009-12-05 11:51:32 +00:00