o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.
Obtained from: Yandex LLC
MFC after: 42 days
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685
IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Main user-visible changes are related to tables:
* Tables are now identified by names, not numbers.
There can be up to 65k tables with up to 63-byte long names.
* Tables are now set-aware (default off), so you can switch/move
them atomically with rules.
* More functionality is supported (swap, lock, limits, user-level lookup,
batched add/del) by generic table code.
* New table types are added (flow) so you can match multiple packet fields at once.
* Ability to add different type of lookup algorithms for particular
table type has been added.
* New table algorithms are added (cidr:hash, iface:array, number:array and
flow:hash) to make certain types of lookup more effective.
* Table value are now capable of holding multiple data fields for
different tablearg users
Performance changes:
* Main ipfw lock was converted to rmlock
* Rule counters were separated from rule itself and made per-cpu.
* Radix table entries fits into 128 bytes
* struct ip_fw is now more compact so more rules will fit into 64 bytes
* interface tables uses array of existing ifindexes for faster match
ABI changes:
All functionality supported by old ipfw(8) remains functional.
Old & new binaries can work together with the following restrictions:
* Tables named other than ^\d+$ are shown as table(65535) in
ruleset in old binaries
Internal changes:.
Changing table ids to numbers resulted in format modification for
most sockopt codes. Old sopt format was compact, but very hard to
extend (no versioning, inability to add more opcodes), so
* All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes.
* The remaining opcodes were also converted to be able to eliminate
all older opcodes at once
* All IP_FW3 handlers uses special API instead of calling sooptcopy*
directly to ease adding another communication methods
* struct ip_fw is now different for kernel and userland
* tablearg value has been changed to 0 to ease future extensions
* table "values" are now indexes in special value array which
holds extended data for given index
* Batched add/delete has been added to tables code
* Most changes has been done to permit batched rule addition.
* interface tracking API has been added (started on demand)
to permit effective interface tables operations
* O(1) skipto cache, currently turned off by default at
compile-time (eats 512K).
* Several steps has been made towards making libipfw:
* most of new functions were separated into "parse/prepare/show
and actuall-do-stuff" pieces (already merged).
* there are separate functions for parsing text string into "struct ip_fw"
and printing "struct ip_fw" to supplied buffer (already merged).
* Probably some more less significant/forgotten features
MFC after: 1 month
Sponsored by: Yandex LLC
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
opt_inet6.h into kmod.mk by forcing almost everybody to eat the same
dogfood. While at it, consolidate the opt_bpf.h and opt_mroute.h
targets here too.
* Rewrite interface tables to use interface indexes
Kernel changes:
* Add generic interface tracking API:
- ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates
state & bumps ref)
- ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to
update ifindex)
- ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer)
- ipfw_iface_unref(unlocked, drops reference)
Additionally, consumer callbacks are called in interface withdrawal/departure.
* Rewrite interface tables to use iface tracking API. Currently tables are
implemented the following way:
runtime data is stored as sorted array of {ifidx, val} for existing interfaces
full data is stored inside namedobj instance (chained hashed table).
* Add IP_FW_XIFLIST opcode to dump status of tracked interfaces
* Pass @chain ptr to most non-locked algorithm callbacks:
(prepare_add, prepare_del, flush_entry ..). This may be needed for better
interaction of given algorithm an other ipfw subsystems
* Add optional "change_ti" algorithm handler to permit updating of
cached table_info pointer (happens in case of table_max resize)
* Fix small bug in ipfw_list_tables()
* Add badd (insert into sorted array) and bdel (remove from sorted array) funcs
Userland changes:
* Add "iflist" cmd to print status of currently tracked interface
* Add stringnum_cmp for better interface/table names sorting
reside, and move there ipfw(4) and pf(4).
o Move most modified parts of pf out of contrib.
Actual movements:
sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5
sys/netinet/ipfw -> sys/netpfil/ipfw
The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.
Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.
The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.
Discussed with: bz, luigi
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.
This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.
While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.
Reported by: pluknet
Reviewed by: plunket, jhb
MFC After: 3 days
build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the
assumption that the private L2 hook (which hopefully eventually will be a
pfil hook as well) can still be useful.
Allow building the module without inet as well.
Glanced at by: jhb
MFC after: 3 days
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.
Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week
to list them all in the Makefile for the module,
otherwise it won't load due to missing symbols.
The problem only affected head with ipfw built as a module.
Reported by David Horn
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.
Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
adjust conf/files and modules' Makefiles accordingly.
No code or ABI changes so this and most of previous related
changes can be easily MFC'ed
MFC after: 5 days
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.
For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.
Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation
exists to allow the mandatory access control policy to properly initialize
mbufs generated by the firewall. An example where this might happen is keep
alive packets, or ICMP error packets in response to other packets.
This takes care of kernel panics associated with un-initialize mbuf labels
when the firewall generates packets.
[1] I modified this patch from it's original version, the initial patch
introduced a number of entry points which were programmatically
equivalent. So I introduced only one. Instead, we should leverage
mac_create_mbuf_netlayer() which is used for similar situations,
an example being icmp_error()
This will minimize the impact associated with the MFC
Submitted by: mlaier [1]
MFC after: 1 week
This is a RELENG_6 candidate
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files
Add netinet/ip_fw_pfil.c.
conf/options
Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile
Add ip_fw_pfil.c.
net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.
netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c
(Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.
netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)
netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.
netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.
Approved by: re (scottl)
This means that the kernel can be totally self contained now and is not
dependent on the last buildworld to update /usr/share/mk. This might
also make it easier to build 5.x kernels on 4.0 boxes etc, assuming
gensetdefs and config(8) are updated.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
LKM'ness. ACTUALLY_LKM_NOT_KERNEL is supposed to be so ugly that it
only gets used until <machine/conf.h> goes away. bsd.kmod.mk should
define a better-named general macro for this. Some places use
PSEUDO_LKM. This is another bad name.
Makefile:
Added IPFIREWALL_VERBOSE_LIMIT option (commented out).
firewall should *NOT* be compiled into kernel.
Then it can be loaded.This is misc module but i'v
got no problemms with it,so shouldn't you i suppose..
BTW this is very stupid to have one module in CVS
for ALL lkm's...