* Use one u16 from op3 header to implement opcode versioning.
* IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current).
* Every getsockopt request is now handled in ip_fw_table.c
* Rename new opcodes:
IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY
IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE
IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST
IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO
IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH
* Add some docs about using given opcodes.
* Group some legacy opcode/handlers.
Kernel changes:
* Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set)
* Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set)
* Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables)
Userland changes:
* move tables code to separate tables.c file
* get rid of tables_max
* switch "all"/list handling to new opcodes
Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
new ip_fw_table_algo.c file.
Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+ char name[64];
+ int idx;
+ ta_init *init;
+ ta_destroy *destroy;
+ table_lookup_t *lookup;
+ ta_prepare_add *prepare_add;
+ ta_prepare_del *prepare_del;
+ ta_add *add;
+ ta_del *del;
+ ta_flush_entry *flush_entry;
+ ta_foreach *foreach;
+ ta_dump_entry *dump_entry;
+ ta_dump_xentry *dump_xentry;
+};
* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
->tablestate pointer (array of 32 bytes structures necessary for
runtime lookups (can be probably shrinked to 16 bytes later):
+struct table_info {
+ table_lookup_t *lookup; /* Lookup function */
+ void *state; /* Lookup radix/other structure */
+ void *xstate; /* eXtended state */
+ u_long data; /* Hints for given func */
+};
* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().
* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).
Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)
Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode
Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize
Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced
Sponsored by: Yandex LLC
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().
Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.
PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks
re-links dynamic states to default rule instead of
flushing on rule deletion.
This can be useful while performing ruleset reload
(think about `atomic` reload via changing sets).
Currently it is turned off by default.
MFC after: 2 weeks
Sponsored by: Yandex LLC
Print warning for IPv4 address strings which are valid in
inet_aton() but not valid in inet_pton(). (1)
Found by: Özkan KIRIK <ozkan.kirik@gmail.com>
Submitted by: Ian Smith <smithi@nimnet.asn.au> (1)
MFC after: 2 weeks
Sponsored by: Yandex LLC
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.
Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).
Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):
Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin
PR: kern/102471, kern/121122
MFC after: 2 weeks
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.
Suggested by: andre
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.
Sponsored by: Yandex LLC
Discussed with: net@
MFC after: 2 weeks
for an uninitialized variable.
unused parameters and variables are annotated with
(void)foo; /* UNUSED */
instead of __unused, because this code needs to build
also on linux and windows.
- Add a note to the ipfw(8) man page about the rules no longer being
case sensitive.
- Fix some typos in the man page.
PR: docs/164772
Reviewed by: bz
Approved by: gabor (doc mentor, src committer)
MFC after: 2 weeks
net.inet.ip.fw.tables_max is now read-write.
- Bump IPFW_TABLES_MAX to 65535
Default number of tables is still 128
- Remove IPFW_TABLES_MAX from ipfw(8) code.
Sponsored by Yandex LLC
Approved by: kib(mentor)
MFC after: 2 weeks
- Add support for IPv6 and interface extended tables
- Make number of tables to be loader tunable in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds
No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.
IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;
New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST
ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add host' now assumes 'host' to be interface name instead of
hostname.
New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.
New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out
This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.
Sponsored by Yandex LLC
Reviewed by: ae
Approved by: ae (mentor)
MFC after: 4 weeks
The index() and rindex() functions were marked LEGACY in the 2001
revision of POSIX and were subsequently removed from the 2008 revision.
The strchr() and strrchr() functions are part of the C standard.
This makes the source code a lot more consistent, as most of these C
files also call into other str*() routines. In fact, about a dozen
already perform strchr() calls.
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.
Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)
possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.
Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@
applicable here, since modifies the string. Switch to strchr().
- Restore support for undocumented optional parameters of
redir_port and redir_proto, that were disabled in 220835.
- While here, change !isalpha() checks on optinal parameters
for isdigit().
Submitted by: Alexander V. Chernikov <melifaro ipfw.ru>
PR: kern/143653
"globalport" option for multiple NAT instances.
If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.
User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.
PR: kern/157867
Submitted by: Alexander V. Chernikov (previous version)
Tested by: Eugene Grosbein
the "sockarg" ipfw option matches packets associated to
a local socket and with a non-zero so_user_cookie value.
The value is made available as tablearg, so it can be used
as a skipto target or pipe number in ipfw/dummynet rules.
Code by Paul Joe, manpage by me.
Submitted by: Paul Joe
MFC after: 1 week
It's a bit more pedantic regarding .Bl list elements. This has an added
benefit of unbreaking the ipfw(8) manpage, where groff was silently
skipping one list element.
thus don't depend on one_pass flag anymore.
This is a POLA violation, but it is quite difficult to restore
the old behavior with new code. Also, the new behavior matches
behavior of the older "tee" action, and this is more intuitive.
ipfw add 100 allow ip from { 1.2.3.4 or 5.6.7.8 }
(note that the above example could be better written as
ipfw add 100 allow dst-ip 1.2.3.4,5.6.7.8
Submitted by: Riccardo Panicucci
dscp as a search key in table lookups;
+ (re)implement a sysctl variable to control the expire frequency of
pipes and queues when they become empty;
+ add 'queue number' as optional part of the flow_id. This can be
enabled with the command
queue X config mask queue ...
and makes it possible to support priority-based schedulers, where
packets should be grouped according to the priority and not some
fields in the 5-tuple.
This is implemented as follows:
- redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
without changing the size or shape of the structure, so there are
no ABI changes. On passing, also document how other fields are
used, and remove some useless assignments in ip_fw2.c
- implement small changes in the userland code to set/read the field;
- revise the functions in ip_dummynet.c to manipulate masks so they
also handle the additional field;
There are no ABI changes in this commit.
of ip->ip_tos) in a table. This can be useful to direct traffic to
different pipes/queues according to the DSCP of the packet, as follows:
ipfw add 100 queue tablearg lookup dscp 3 // table 3 maps dscp->queue
This change is a no-op (but harmless) until the two-line kernel
side is committed, which will happen shortly.
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N
which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.
Work supported by the Onelab project.
MFC after: 1 week
it seems that now it is necessary for 'forward' to work outside lo0.
The bug (and fix) was reported on 8.0. This patch probably applies
to RELENG_7 as well.
It seems that 'pf' has a similar bug.
Submitted by: Lytochkin Boris
MFC after: 3 days
"profile" files (bandwidth is mandatory when using a
profile, so it makes sense to have everything in one place).
Update the manpage accordingly.
Submitted by: Marta Carbone
pipes, queues, tags, rule numbers and so on.
These are all different namespaces, and the only thing they have in
common is the fact they use a 16-bit slot to represent the argument.
There is some confusion in the code, mostly for historical reasons,
on how the values 0 and 65535 should be used. At the moment, 0 is
forbidden almost everywhere, while 65535 is used to represent a
'tablearg' argument, i.e. the result of the most recent table() lookup.
For now, try to use explicit constants for the min and max allowed
values, and do not overload the default rule number for that.
Also, make the MTAG_IPFW declaration only visible to the kernel.
NOTE: I think the issue needs to be revisited before 8.0 is out:
the 2^16 namespace limit for rule numbers and pipe/queue is
annoying, and we can easily bump the limit to 2^32 which gives
a lot more flexibility in partitioning the namespace.
MFC after: 5 days
types of MAC overheads such as preambles, link level retransmissions
and more.
Note- this commit changes the userland/kernel ABI for pipes
(but not for ordinary firewall rules) so you need to rebuild
kernel and /sbin/ipfw to use dummynet features.
Please check the manpage for details on the new feature.
The MFC would be trivial but it breaks the ABI, so it will
be postponed until after 7.2 is released.
Interested users are welcome to apply the patch manually
to their RELENG_7 tree.
Work supported by the European Commission, Projects Onelab and
Onelab2 (contract 224263).
above to avoid referencing undefined terms (humans are not compilers
but still care about these things).
Change some .Sh to .Ss to better reflect the structure of the text.
No new content.
Usual moving of code with no changes from ipfw2.c to the
newly created files, and addition of prototypes to ipfw2.h
I have added forward declarations for ipfw_insn_* in ipfw2.h
to avoid a global dependency on ip_fw.h
In this episode:
- introduce a common header with a minimal set of common definitions;
- bring the main() function and options parser in main.c
- rename the main functions with an ipfw_ prefix
No code changes except for the introduction of a global variable,
resvd_set_number, which stores the RESVD_SET value from ip_fw.h
and is used to remove the dependency of main.c from ip_fw.h
(and the subtree of dependencies) for just a single constant.
program name, and ignore that entry. ipfw2.c code instead skips
this entry and starts with options at offset 0, relying on a more
tolerant implementation of the library.
This change fixes the issue by always passing a program name
in the first entry to getopt. The motivation for this change
is to remove a potential compatibility issue should we use
a different getopt() implementation in the future.
No functional changes.
Submitted by: Marta Carbone (parts)
MFC after: 4 weeks
There are still several signed/unsigned warnings left, which
require a bit more study for a proper fix.
This file has grown beyond reasonable limits.
We really need to split it into separate components (ipv4, ipv6,
dummynet, nat, table, userland-kernel communication ...) so we can
make mainteinance easier.
MFC after: 1 weeks
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them
without thinking first.
MFC after: 3 weeks
table 'values' as IP addresses, use an explicit argument (-i).
This is a 'POLA' issue. This is a low risk change and should be MFC'd
to RELENG_6 and RELENG 7. it might be put as an errata item for 6.3.
(not sure about 6.2).
Fix suggested by: Eugene Grosbein
PR: 120720
MFC After: 3 days
exposing them to all consumers of ip_fw.h. These structures are
used in both ipfw(8) and ipfw(4), but not part of the user<->kernel
interface for other applications to use, rather, shared
implementation.
MFC after: 3 days
Reported by: Paul Vixie <paul at vix dot com>
- refer to the dummynet(4) man page only once, later use rather
the .Nm macro.
- use .Va macro when refering to the sysctl variables
- grammar and markup fixes
Reviewed by: keramida, trhodes, ru (roughly)
MFC-after: 1 week
If it is set to zero value (default) dummynet module will try to emulate
real link as close as possible (bandwidth & latency): packet will not leave
pipe faster than it should be on real link with given bandwidth.
(This is original behaviour of dummynet which was altered in previous commit)
If it is set to non-zero value only bandwidth is enforced: packet's latency
can be lower comparing to real link with given bandwidth.
- Document recently introduced dummynet(4) sysctl variables.
Requested by: luigi, julian
MFC after: 3 month
$ ipfw -n add 1 allow layer2 not mac-type ip
00001 allow ip from any to any layer2 not not mac-type 0x0800
PR: bin/115372
Submitted by: Andrey V. Elsukov
Approved by: re (hrs)
MFC after: 3 weeks
pack a set number correctly.
Submitted by: oleg
o Plug a memory leak.
Submitted by: oleg and Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 1 week
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
- to show a specific set: ipfw set 3 show
- to delete rules from the set: ipfw set 9 delete 100 200 300
- to flush the set: ipfw set 4 flush
- to reset rules counters in the set: ipfw set 1 zero
PR: kern/113388
Submitted by: Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 6 weeks
Before:
$ ipfw -n add 100 count icmp from any to any mac-type 0x01
00100 count icmp 0x0001
$ ipfw -n add 100 count icmp from any to any mac any any
00100 count icmp MAC any any any
After:
$ ipfw -n add 100 count icmp from any to any mac-type 0x01
00100 count icmp from any to any mac-type 0x0001
$ ipfw -n add 100 count icmp from any to any mac any any
00100 count icmp from any to any MAC any any
PR: bin/112244
Submitted by: Andrey V. Elsukov
MFC after: 1 month
With the second (and last) part of my previous Summer of Code work, we get:
-ipfw's in kernel nat
-redirect_* and LSNAT support
General information about nat syntax and some examples are available
in the ipfw (8) man page. The redirect and LSNAT syntax are identical
to natd, so please refer to natd (8) man page.
To enable in kernel nat in rc.conf, two options were added:
o firewall_nat_enable: equivalent to natd_enable
o firewall_nat_interface: equivalent to natd_interface
Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet
to continue being checked by the firewall ruleset after being
(de)aliased.
NOTA BENE: due to some problems with libalias architecture, in kernel
nat won't work with TSO enabled nic, thus you have to disable TSO via
ifconfig (ifconfig foo0 -tso).
Approved by: glebius (mentor)
address, to avoid confusing the users that a full address is
always required.
Submitted by: Josh Paetzel <josh@tcbug.org> (through freebsd-doc)
MFC after: 3 days
otherwise this command
ipfw add allow ipv6-icmp from any to 2002::1 icmp6types 1,2,128,129
turns into icmp6types 1,2,32,33,34,...94,95,128,129
PR: 102422 (part 1)
Submitted by: Andrey V. Elsukov <bu7cher at yandex.ru>
MFC after: 5 days
having trouble with the "me6" keyword. Also, we were using inet_pton on
the wrong variable in one place.
Reviewed by: mlaier (previous version of patch)
Obtained from: Sascha Blank (inet_pton change)
MFC after: 1 week
for example:
fwd tablearg ip from any to table(1)
where table 1 has entries of the form:
1.1.1.0/24 10.2.3.4
208.23.2.0/24 router2
This allows trivial implementation of a secondary routing table implemented
in the firewall layer.
I expect more work (under discussion with Glebius) to follow this to clean
up some of the messy parts of ipfw related to tables.
Reviewed by: Glebius
MFC after: 1 month
- 'tag' & 'untag' action parameters.
- 'tagged' & 'limit' rule options.
Rule examples:
pipe 1 tag tablearg ip from table(1) to any
allow ip from any to table(2) tagged tablearg
allow tcp from table(3) to any 25 setup limit src-addr tablearg
sbin/ipfw/ipfw2.c:
1) new macros
GET_UINT_ARG - support of 'tablearg' keyword, argument range checking.
PRINT_UINT_ARG - support of 'tablearg' keyword.
2) strtoport(): do not silently truncate/accept invalid port list expressions
like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup.
Approved by: glebius (mentor)
MFC after: 1 month
Since tags are kept while packet resides in kernelspace, it's possible to
use other kernel facilities (like netgraph nodes) for altering those tags.
Submitted by: Andrey Elsukov <bu7cher at yandex dot ru>
Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru>
Approved by: glebius (mentor)
Idea from: OpenBSD PF
MFC after: 1 month
doesn't exist or add one that is already present, if the -q flag
is set. Useful for "ipfw -q /dev/stdin" when the command above is
invoked from something like python or TCL to feed commands
down the throat of ipfw.
MFC in: 1 week
action argument with the value obtained from table lookup. The feature
is now applicable only to "pipe", "queue", "divert", "tee", "netgraph"
and "ngtee" rules.
An example usage:
ipfw pipe 1000 config bw 1000Kbyte/s
ipfw pipe 4000 config bw 4000Kbyte/s
ipfw table 1 add x.x.x.x 1000
ipfw table 1 add x.x.x.y 4000
ipfw pipe tablearg ip from table(1) to any
In the example above the rule will throw different packets to different pipes.
TODO:
- Support "skipto" action, but without searching all rules.
- Improve parser, so that it warns about bad rules. These are:
- "tablearg" argument to action, but no "table" in the rule. All
traffic will be blocked.
- "tablearg" argument to action, but "table" searches for entry with
a specific value. All traffic will be blocked.
- "tablearg" argument to action, and two "table" looks - for src and
for dst. The last lookup will match.
IPv6 support was committed:
- Stop treating `ip' and `ipv6' as special in `proto' option as they
conflict with /etc/protocols.
- Disuse `ipv4' in `proto' option as it is corresponding to `ipv6'.
- When protocol is specified as numeric, treat it as it is even it is
41 (ipv6).
- Allow zero for protocol as it is valid number of `ip'.
Still, we cannot specify an IPv6 over an IPv4 tunnel like before such
as:
pass ipv6 from any to any
But, now, you can specify it like:
pass ip4 from any to any proto ipv6
PR: kern/89472
Reported by: Ga l Roualland <gael.roualland__at__dial.oleane.com>
MFC after: 1 week
that debug.mpsafenet be set to 0. It is still possible for dead locks to
occur while these filtering options are used due to the layering violation
inherent in their implementation.
Discussed: -current, rwatson, glebius
* Correct handling of IPv6 Extension Headers.
* Add unreach6 code.
* Add logging for IPv6.
Submitted by: sysctl handling derived from patch from ume needed for ip6fw
Obtained from: is_icmp6_query and send_reject6 derived from similar
functions of netinet6,ip6fw
Reviewed by: ume, gnn; silence on ipfw@
Test setup provided by: CK Software GmbH
MFC after: 6 days
policy. It may be used to provide more detailed classification of
traffic without actually having to decide its fate at the time of
classification.
MFC after: 1 week
This is the last requirement before we can retire ip6fw.
Reviewed by: dwhite, brooks(earlier version)
Submitted by: dwhite (manpage)
Silence from: -ipfw
with the kernel compile time option:
options IPFIREWALL_FORWARD_EXTENDED
This option has to be specified in addition to IPFIRWALL_FORWARD.
With this option even packets targeted for an IP address local
to the host can be redirected. All restrictions to ensure proper
behaviour for locally generated packets are turned off. Firewall
rules have to be carefully crafted to make sure that things like
PMTU discovery do not break.
Document the two kernel options.
PR: kern/71910
PR: kern/73129
MFC after: 1 week