freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	ae01d73c04	Add ipfw support for setting/matching DiffServ codepoints (DSCP). Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg. Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words). Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order): Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin PR: kern/102471, kern/121122 MFC after: 2 weeks	2013-03-20 10:35:33 +00:00
Alexander V. Chernikov	bdf942c3f0	Revert r234834 per luigi@ request. Cleaner solution (e.g. adding another header) should be done here. Original log: Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Requested by: luigi Approved by: kib(mentor)	2012-05-03 08:56:43 +00:00
Alexander V. Chernikov	7bd5e9b143	Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Approved by: ae(mentor) MFC after: 2 weeks	2012-04-30 10:22:23 +00:00
Alexander V. Chernikov	732d27b32d	- Permit number of ipfw tables to be changed in runtime. net.inet.ip.fw.tables_max is now read-write. - Bump IPFW_TABLES_MAX to 65535 Default number of tables is still 128 - Remove IPFW_TABLES_MAX from ipfw(8) code. Sponsored by Yandex LLC Approved by: kib(mentor) MFC after: 2 weeks	2012-03-25 20:37:59 +00:00
Alexander V. Chernikov	f8bee51a69	- Add ipfw eXtended tables permitting radix to be used for any kind of keys. - Add support for IPv6 and interface extended tables - Make number of tables to be loader tunable in range 0..65534. - Use IP_FW3 opcode for all new extended table cmds No ABI changes are introduced. Old userland will see valid tables for IPv4 tables and no entries otherwise. Flush works for any table. IP_FW3 socket option is used to encapsulate all new opcodes: /* IP_FW3 header/opcodes / typedef struct _ip_fw3_opheader { uint16_t opcode; / Operation opcode / uint16_t reserved[3]; / Align to 64-bit boundary */ } ip_fw3_opheader; New opcodes added: IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST ipfw(8) table argument parsing behavior is changed: 'ipfw table 999 add host' now assumes 'host' to be interface name instead of hostname. New tunable: net.inet.ip.fw.tables_max controls number of table supported by ipfw in given VNET instance. 128 is still the default value. New syntax: ipfw add skipto tablearg ip from any to any via table(42) in ipfw add skipto tablearg ip from any to any via table(4242) out This is a bit hackish, special interface name '\1' is used to signal interface table number is passed in p.glob field. Sponsored by Yandex LLC Reviewed by: ae Approved by: ae (mentor) MFC after: 4 weeks	2012-03-12 14:07:57 +00:00
Bjoern A. Zeeb	8a006adb24	Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)	2011-08-20 17:05:11 +00:00
Andrey V. Elsukov	9527ec6e52	Add new rule actions "call" and "return" to ipfw. They make possible to organize subroutines with rules. The "call" action saves the current rule number in the internal stack and rules processing continues from the first rule with specified number (similar to skipto action). If later a rule with "return" action is encountered, the processing returns to the first rule with number of "call" rule saved in the stack plus one or higher. Submitted by: Vadim Goncharov Discussed by: ipfw@, luigi@	2011-06-29 10:06:58 +00:00
Gleb Smirnoff	9d0a2ddf69	- Rewrite functions that copyin/out NAT configuration, so that they calculate required memory size dynamically. - Fix races on chain re-lock. - Introduce new field to ip_fw_chain - generation count. Now utilized only in the NAT configuration, but can be utilized wider in ipfw. - Get rid of NAT_BUF_LEN in ip_fw.h PR: kern/143653	2011-04-19 15:06:33 +00:00
Luigi Rizzo	ae99fd0e07	The first customer of the SO_USER_COOKIE option: the "sockarg" ipfw option matches packets associated to a local socket and with a non-zero so_user_cookie value. The value is made available as tablearg, so it can be used as a skipto target or pipe number in ipfw/dummynet rules. Code by Paul Joe, manpage by me. Submitted by: Paul Joe MFC after: 1 week	2010-11-12 13:05:17 +00:00
Luigi Rizzo	f9f7bde3bc	+ implement (two lines) the kernel side of 'lookup dscp N' to use the dscp as a search key in table lookups; + (re)implement a sysctl variable to control the expire frequency of pipes and queues when they become empty; + add 'queue number' as optional part of the flow_id. This can be enabled with the command queue X config mask queue ... and makes it possible to support priority-based schedulers, where packets should be grouped according to the priority and not some fields in the 5-tuple. This is implemented as follows: - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but without changing the size or shape of the structure, so there are no ABI changes. On passing, also document how other fields are used, and remove some useless assignments in ip_fw2.c - implement small changes in the userland code to set/read the field; - revise the functions in ip_dummynet.c to manipulate masks so they also handle the additional field; There are no ABI changes in this commit.	2010-03-15 17:14:27 +00:00
Luigi Rizzo	cc4d3c30ea	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
Luigi Rizzo	de240d1013	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
Luigi Rizzo	70228fb346	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
Luigi Rizzo	9565806f16	change the type of the opcode from enum *:8 to u_int8_t so the size and alignment of the ipfw_insn is not compiler dependent. No changes in the code generated by gcc. There was only one instance of this kind in our entire source tree, so i suspect the old definition was a poor choice (which i made). MFC after: 3 days	2009-12-02 08:52:06 +00:00
Julian Elischer	c4b21cbe4a	Fix ipfw's initialization functions to get the correct order of evaluation to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers instead of the modevent handler. Correct some spelling errors in comments in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when ipfw is unloaded. This patch is a minimal patch for 8.0 I have a much larger patch that actually fixes the underlying problems that will be applied after 8.0 Reviewed by: zec@, rwatson@, bz@(earlier version) Approved by: re (rwatson) MFC after: Immediatly	2009-08-21 11:20:10 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Oleg Bulyzhin	dda10d624c	Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list'). P.S. this commit changes ABI so userland ipfw related binaries should be recompiled. MFC after: 1 month Tested by: Mikolaj Golub	2009-06-09 21:27:11 +00:00
Luigi Rizzo	b87ce5545b	Several ipfw options and actions use a 16-bit argument to indicate pipes, queues, tags, rule numbers and so on. These are all different namespaces, and the only thing they have in common is the fact they use a 16-bit slot to represent the argument. There is some confusion in the code, mostly for historical reasons, on how the values 0 and 65535 should be used. At the moment, 0 is forbidden almost everywhere, while 65535 is used to represent a 'tablearg' argument, i.e. the result of the most recent table() lookup. For now, try to use explicit constants for the min and max allowed values, and do not overload the default rule number for that. Also, make the MTAG_IPFW declaration only visible to the kernel. NOTE: I think the issue needs to be revisited before 8.0 is out: the 2^16 namespace limit for rule numbers and pipe/queue is annoying, and we can easily bump the limit to 2^32 which gives a lot more flexibility in partitioning the namespace. MFC after: 5 days	2009-06-05 16:16:07 +00:00
Luigi Rizzo	115a40c7bf	More cleanup in preparation of ipfw relocation (no actual code change): + move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c; + remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly; + remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly; + move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it; To be merged together with rev 193497 MFC after: 5 days	2009-06-05 13:44:30 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
Paolo Pisati	eb2e411915	Implement an ipfw action to reassemble ip packets: reass.	2009-04-01 20:23:47 +00:00
Luigi Rizzo	0906f40fd8	fw_debug has been unused for ages, so remove it from the list of sysctl_variables. I would also remove it from the VNET record but I am unsure if there is any ABI issue -- so for the time being just mark it as unused in ip_fw.h, and then we will collect the garbage at some appropriate time in the future. MFC after: 3 days	2009-03-02 22:11:48 +00:00
Luigi Rizzo	35b78b7520	remove dependency on eventhandler.h, we only need a forward declaration	2009-02-16 15:08:41 +00:00
Bjoern A. Zeeb	1b193af610	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
Marko Zec	385195c062	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
Robert Watson	1f6ef666b5	Fix content and spelling of comment on _ipfw_insn.len -- a count of 32-bit words, not 32-byte words. MFC after: 3 days	2008-10-10 14:33:47 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Roman Kurakin	f7b5554eb7	Export IPFW_TABLES_MAX value for compiled in defaults.	2008-09-21 20:42:42 +00:00
Roman Kurakin	eb29d14ccb	Make the commet for the default rule number more clear. Submitted by: yar@	2008-09-14 06:14:06 +00:00
Roman Kurakin	8191aa7c0b	Export the IPFW_DEFAULT_RULE outside ip_fw2.c. This number in not only the default rule number but also the maximum rule number. User space software such as ipfw and natd should be aware of its value. The software that already includes ip_fw.h should use the defined value. All other a expected to use sysctl (as discussed on net@). MFC after: 5 days. Discussed on: net@	2008-09-06 16:47:07 +00:00
Julian Elischer	8b07e49a00	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
Robert Watson	bcf5b9fa38	Fix a comment typo. MFC after: 3 days	2008-04-29 21:21:15 +00:00
Paolo Pisati	531c890b8a	Move ipfw's nat code into its own kld: ipfw_nat.	2008-02-29 22:27:19 +00:00
Robert Watson	bb5081a7eb	Hide ipfw internal data structures behind IPFW_INTERNAL rather than exposing them to all consumers of ip_fw.h. These structures are used in both ipfw(8) and ipfw(4), but not part of the user<->kernel interface for other applications to use, rather, shared implementation. MFC after: 3 days Reported by: Paul Vixie <paul at vix dot com>	2008-01-25 14:38:27 +00:00
Bjoern A. Zeeb	7a92401aea	Add support for filtering on Routing Header Type 0 and Mobile IPv6 Routing Header Type 2 in addition to filter on the non-differentiated presence of any Routing Header. MFC after: 3 weeks	2007-05-04 11:15:41 +00:00
Paolo Pisati	ff2f6fe80f	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
Julian Elischer	afad78e259	comply with style police Submitted by: ru MFC after: 1 month	2006-08-18 22:36:05 +00:00
Julian Elischer	c487be961a	Allow ipfw to forward to a destination that is specified by a table. for example: fwd tablearg ip from any to table(1) where table 1 has entries of the form: 1.1.1.0/24 10.2.3.4 208.23.2.0/24 router2 This allows trivial implementation of a secondary routing table implemented in the firewall layer. I expect more work (under discussion with Glebius) to follow this to clean up some of the messy parts of ipfw related to tables. Reviewed by: Glebius MFC after: 1 month	2006-08-17 22:49:50 +00:00
Oleg Bulyzhin	6a7d5cb645	Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9). Since tags are kept while packet resides in kernelspace, it's possible to use other kernel facilities (like netgraph nodes) for altering those tags. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru> Approved by: glebius (mentor) Idea from: OpenBSD PF MFC after: 1 month	2006-05-24 13:09:55 +00:00
Max Laier	e93187482d	Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing seperately. Also use pfil hook/unhook instead of keeping the check functions in pfil just to return there based on the sysctl. While here fix some whitespace on a nearby SYSCTL_ macro.	2006-05-12 04:41:27 +00:00
Ruslan Ermilov	ea9dce1461	When sending a packet from dummynet, indicate that we're forwarding it so that ip_id etc. don't get overwritten. This fixes forwarding of fragmented IP packets through a dummynet pipe -- fragments came out with modified and different(!) ip_id's, making it impossible to reassemble a datagram at the receiver side. Submitted by: Alexander Karptsov (reworked by me) MFC after: 3 days	2006-02-14 06:36:39 +00:00
Gleb Smirnoff	40b1ae9e00	Add a new feature for optimizining ipfw rulesets - substitution of the action argument with the value obtained from table lookup. The feature is now applicable only to "pipe", "queue", "divert", "tee", "netgraph" and "ngtee" rules. An example usage: ipfw pipe 1000 config bw 1000Kbyte/s ipfw pipe 4000 config bw 4000Kbyte/s ipfw table 1 add x.x.x.x 1000 ipfw table 1 add x.x.x.y 4000 ipfw pipe tablearg ip from table(1) to any In the example above the rule will throw different packets to different pipes. TODO: - Support "skipto" action, but without searching all rules. - Improve parser, so that it warns about bad rules. These are: - "tablearg" argument to action, but no "table" in the rule. All traffic will be blocked. - "tablearg" argument to action, but "table" searches for entry with a specific value. All traffic will be blocked. - "tablearg" argument to action, and two "table" looks - for src and for dst. The last lookup will match.	2005-12-13 12:16:03 +00:00
Gleb Smirnoff	b090e4ce1f	Garbage-collect now unused struct _ipfw_insn_pipe and flush_pipe_ptrs(), thus removing a few XXXes. Document the ABI breakage in UPDATING.	2005-11-29 08:59:41 +00:00
Bjoern A. Zeeb	9066356ba1	* Add dynamic sysctl for net.inet6.ip6.fw. * Correct handling of IPv6 Extension Headers. * Add unreach6 code. * Add logging for IPv6. Submitted by: sysctl handling derived from patch from ume needed for ip6fw Obtained from: is_icmp6_query and send_reject6 derived from similar functions of netinet6,ip6fw Reviewed by: ume, gnn; silence on ipfw@ Test setup provided by: CK Software GmbH MFC after: 6 days	2005-08-13 11:02:34 +00:00
Max Laier	57cd6d263b	Add support for IPv4 only rules to IPFW2 now that it supports IPv6 as well. This is the last requirement before we can retire ip6fw. Reviewed by: dwhite, brooks(earlier version) Submitted by: dwhite (manpage) Silence from: -ipfw	2005-06-03 01:10:28 +00:00
Gleb Smirnoff	a1429ad928	IPFW version 2 is the only option in HEAD and RELENG_5. Thus, cleanup unnecessary now ifdefs.	2005-05-04 13:12:52 +00:00
Brooks Davis	8195404bed	Add IPv6 support to IPFW and Dummynet. Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)	2005-04-18 18:35:05 +00:00

1 2 3

147 Commits