freebsd-skq

Author	SHA1	Message	Date
melifaro	f063418bd7	Please GCC by specifying proper cast.	2014-10-04 13:46:10 +00:00
melifaro	ea0abe8630	Bump max rule size to 512 opcodes.	2014-10-04 12:46:26 +00:00
melifaro	e8d559896c	Sync to HEAD@r272516.	2014-10-04 12:42:37 +00:00
melifaro	08c555cee7	Add "ipfw_ctl3" FEATURE to indicate presence of new ipfw interface.	2014-10-04 12:10:32 +00:00
melifaro	e2a6d82545	Switch ipfw to use rmlock for runtime locking.	2014-10-04 11:40:35 +00:00
melifaro	41c6784b49	Bump max rule size to 512 opcodes.	2014-10-04 10:15:49 +00:00
melifaro	c0f26d5a55	Make linear_skipto turned off by default.	2014-10-03 15:54:51 +00:00
melifaro	d8b683d70f	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
glebius	713d87864c	Use rn_detachhead() instead of direct free(9) for radix tables. Sponsored by: Nginx, Inc.	2014-10-01 13:35:41 +00:00
sbruno	22da1e9569	Fix NULL pointer deref in ipfw when using dummynet at layer 2. Drop packet if pkg->ifp is NULL, which is the case here. ref. https://github.com/HardenedBSD/hardenedBSD commit 4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7 PR 193861 -- DUMMYNET LAYER2: kernel panic in this case a kernel panic occurs. Hence, when we do not get an interface, we just drop the packet in question. PR: 193681 Submitted by: David Carlier <david.carlier@hardenedbsd.org> Obtained from: Hardened BSD MFC after: 2 weeks Relnotes: yes	2014-09-25 02:26:05 +00:00
melifaro	a95acb50bd	Add pre-alfa version of DXR lookup module. It does build but (currently) does not work. This change is not intended to be merged along with other ipfw changes.	2014-09-21 18:15:09 +00:00
glebius	16745af543	Mechanically convert to if_inc_counter().	2014-09-19 09:19:29 +00:00
glebius	72f04611ec	Remove ifq_drops from struct ifqueue. Now queue drops are accounted in struct ifnet if_oqdrops. Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops is simply removed from them. There were no API to read this statistic. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-19 09:01:19 +00:00
glebius	759eeea220	- Provide a sleepable lock to protect against ioctl() vs ioctl() races. - Use the new lock to protect against simultaneous DIOCSTART and/or DIOCSTOP ioctls. Reported & tested by: jmallett Sponsored by: Nginx, Inc.	2014-09-12 08:39:15 +00:00
melifaro	f7e6823045	Make ipfw_nat module use IP_FW3 codes. Kernel changes: * Split kernel/userland nat structures eliminating IPFW_INTERNAL hack. * Add IP_FW_NAT44_* codes resemblin old ones. * Assume that instances can be named (no kernel support currently). * Use both UH+WLOCK locks for all configuration changes. * Provide full ABI support for old sockopts. Userland changes: * Use IP_FW_NAT44_* codes for nat operations. * Remove undocumented ability to show ranges of nat "log" entries.	2014-09-07 18:30:29 +00:00
melifaro	595fec1055	Change copyrights to the proper one.	2014-09-05 14:19:02 +00:00
melifaro	21fa37c8e5	Sync to HEAD@r271160.	2014-09-05 13:52:39 +00:00
melifaro	03b9e62107	* Use modular opcode handling inside ipfw_ctl3() instead of static switch. * Provide hints for subsystem initializers if they are called for the first/last time. * Convert every IP_FW3 opcode user to use new sopt API.	2014-09-05 11:11:15 +00:00
melifaro	d8fb572c36	Be consistent and use same arguments for ctl3 opcodes. Move legacy IP_FW_TABLE_XGETSIZE handling to separate function.	2014-09-03 21:57:06 +00:00
glebius	2e01608625	Clean up unused CSUM_FRAGMENT. Sponsored by: Nginx, Inc.	2014-09-03 08:30:18 +00:00
melifaro	9677452b6e	* Fix crash due to forgotten value refcouting in ipfw_link_table_values() * Fix argument order in rollback_toperation_state() * Make flush_table() use operation state API to ease checks.	2014-09-02 20:46:18 +00:00
melifaro	416d664184	Add more comments on newly-added functions. Add back opstate handler function.	2014-09-02 14:27:12 +00:00
glebius	0cbf499e97	Explicitly free packet on PF_DROP, otherwise a "quick" rule with "route-to" may still forward it. PR: 177808 Submitted by: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH	2014-09-01 13:00:45 +00:00
melifaro	a1eca3cc0c	Add support for multi-field values inside ipfw tables. This is the last major change in given branch. Kernel changes: * Use 64-bytes structures to hold multi-value variables. * Use shared array to hold values from all tables (assume each table algo is capable of holding 32-byte variables). * Add some placeholders to support per-table value arrays in future. * Use simple eventhandler-style API to ease the process of adding new table items. Currently table addition may required multiple UH drops/ acquires which is quite tricky due to atomic table modificatio/swap support, shared array resize, etc. Deal with it by calling special notifier capable of rolling back state before actually performing swap/resize operations. Original operation then restarts itself after acquiring UH lock. * Bump all objhash users default values to at least 64 * Fix custom hashing inside objhash. Userland changes: * Add support for dumping shared value array via "vlist" internal cmd. * Some small print/fill_flags dixes to support u32 values. * valtype is now bitmask of <skipto\|pipe\|fib\|nat\|dscp\|tag\|divert\|netgraph\|limit\|ipv4\|ipv6>. New values can hold distinct values for each of this types. * Provide special "legacy" type which assumes all values are the same. * More helpers/docs following.. Some examples: 3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6 3:41 [1] zfscurr0# ipfw table mimimi info +++ table(mimimi), set(0) +++ kindex: 2, type: addr references: 0, valtype: skipto,limit,ipv4,ipv6 algorithm: addr:radix items: 0, size: 296 3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1 added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1 3:42 [1] zfscurr0# ipfw table mimimi list +++ table(mimimi), set(0) +++ 10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1	2014-08-31 23:51:09 +00:00
melifaro	631be4d79a	* Make objhash api a bit more abstract by providing ability to specify own hash/compare functions. * Add requirement for table algorithms to copy "valie" field in @add callback instead of "prepare_add". * Document existing requirement for table algorithms to store value of deleted record to @tei.	2014-08-30 17:18:11 +00:00
melifaro	06eb65b248	Whitespace/style changes merged from projects/ipfw.	2014-08-23 17:57:06 +00:00
melifaro	cf94663e69	Sync to HEAD@r270409.	2014-08-23 14:58:31 +00:00
melifaro	2e65f120c8	Simplify table reference/create chain.	2014-08-23 12:41:39 +00:00
melifaro	3498dca96e	* Use OP_ADD/OP_DEL macro instead of plain integers. * ipfw_foreach_table_tentry() to permit listing arbitrary ipfw table using standart format.	2014-08-23 11:27:49 +00:00
glebius	4242d9acba	Do not lookup source node twice when pf_map_addr() is used. PR: 184003 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 14:16:08 +00:00
glebius	9227a25906	pf_map_addr() can fail and in this case we should drop the packet, otherwise bad consequences including a routing loop can occur. Move pf_set_rt_ifp() earlier in state creation sequence and inline it, cutting some extra code. PR: 183997 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 14:02:24 +00:00
melifaro	b921074dbb	Make room for multi-type values in struct tentry.	2014-08-15 12:58:32 +00:00
glebius	45bdeab3db	Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL. PR: 127920 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-15 04:35:34 +00:00
kevlo	dd40fa7e62	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
melifaro	6f8397b648	Replace "cidr" table type with "addr" type. Suggested by: luigi	2014-08-14 21:43:20 +00:00
melifaro	7c57f4c90d	* Add cidr:kfib algo type just for fun. It binds kernel fib of given number to a table. Example: # ipfw table fib2 create algo "cidr:kfib fib=2" # ipfw table fib2 info +++ table(fib2), set(0) +++ kindex: 2, type: cidr, locked valtype: number, references: 0 algorithm: cidr:kfib fib=2 items: 11, size: 288 # ipfw table fib2 list +++ table(fib2), set(0) +++ 10.0.0.0/24 0 127.0.0.1/32 0 ::/96 0 ::1/128 0 ::ffff:0.0.0.0/96 0 2a02:978:2::/112 0 fe80::/10 0 fe80:1::/64 0 fe80:2::/64 0 fe80:3::/64 0 ff02::/16 0 # ipfw table fib2 lookup 10.0.0.5 10.0.0.0/24 0 # ipfw table fib2 lookup 2a02:978:2::11 2a02:978:2::/112 0 # ipfw table fib2 detail +++ table(fib2), set(0) +++ kindex: 2, type: cidr, locked valtype: number, references: 0 algorithm: cidr:kfib fib=2 items: 11, size: 288 IPv4 algorithm radix info items: 0 itemsize: 200 IPv6 algorithm radix info items: 0 itemsize: 200	2014-08-14 20:17:23 +00:00
glebius	7d0b571895	- Count global pf(4) statistics in counter(9). - Do not count global number of states and of src_nodes, use uma_zone_get_cur() to obtain values. - Struct pf_status becomes merely an ioctl API structure, and moves to netpfil/pf/pf.h with its constants. - V_pf_status is now of type struct pf_kstatus. Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-14 18:57:46 +00:00
melifaro	9b0fd0e183	* Document internal commands. * Do not require/set default table type if algo name is specified. * Add TA_FLAG_READONLY option for algorithms.	2014-08-14 17:31:04 +00:00
melifaro	a5e98ab07d	Clean up kernel interaction in ip_fw_iface.c Suggested by: ae	2014-08-14 13:24:59 +00:00
melifaro	ac476df0ec	Fix crash in case of iflist request on non-initialized tracker.	2014-08-14 08:42:16 +00:00
melifaro	ef7f079c1d	* Fix displaying dynamic rules for large rulesets. * Clean up some comments.	2014-08-14 08:21:22 +00:00
melifaro	9d56937f2a	Fix assertion.	2014-08-13 16:53:12 +00:00
melifaro	03e33c1ac5	Sync to HEAD@r269943.	2014-08-13 16:20:41 +00:00
melifaro	21ceaa3a9f	* Pass proper table set numbers from userland side. * Ignore them, but honor V_fw_tables_sets value on kernel side.	2014-08-13 12:04:45 +00:00
melifaro	2bb7ccb159	* Add jump_linear() function utilizing calculated skipto cache. * Update description for jump_fast() * Make jump_fast() users use JUMP() macro which is resolved to jump_fast() by default.	2014-08-13 09:34:33 +00:00
melifaro	1c05300c17	* Clarify ipfw_swap_table operations * Ensure <add\|del>_table_entry handle ta change properly.	2014-08-12 17:03:13 +00:00
melifaro	37a5b4aafb	* Rename ipfw_[un]bind_table_rule to ipfw_[un]ref_rule_tables * Update their descriptions.	2014-08-12 16:08:13 +00:00
melifaro	20eb17aed6	Change tablearg value to be 0 (try #2 ). Most of the tablearg-supported opcodes does not accept 0 as valid value: O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET, O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input. The rest are O_SETDSCP and O_SETFIB. 'Fix' them by adding high-order bit (0x8000) set for non-tablearg values. Do translation in kernel for old clients (import_rule0 / export_rule0), teach current ipfw(8) binary to add/remove given bit. This change does not affect handling SETDSCP values, but limit O_SETFIB values to 32767 instead of 65k. Since currently we have either old (16) or new (2^32) max fibs, this should not be a big deal: we're definitely OK for former and have to add another opcode to deal with latter, regardless of tablearg value.	2014-08-12 15:51:48 +00:00
melifaro	ac4e64f311	Do not use index 0 for tables.	2014-08-12 14:19:45 +00:00
melifaro	7f14a3576e	* Rename has_space to need_modify to be consistent with 0 as return values. * document all callbacks supported by algorithms code.	2014-08-12 14:09:15 +00:00
melifaro	324833519e	No functional changes, do better functions grouping.	2014-08-12 10:22:46 +00:00
melifaro	8c5ec3a86c	Simplify table auto-creation for old userland users.	2014-08-12 09:48:54 +00:00
melifaro	d633efff82	Simplify add/del_table_entry() by making their common pieces common functions.	2014-08-11 22:38:13 +00:00
melifaro	9266cc6d8f	Update functions descriptions.	2014-08-11 20:00:51 +00:00
melifaro	25473f8f4a	* Add the abilify to lock/unlock given table from changes. Example: # ipfw table si lock # ipfw table si info +++ table(si), set(0) +++ kindex: 0, type: cidr, locked valtype: number, references: 0 algorithm: cidr:radix items: 0, size: 288 # ipfw table si add 4.5.6.7 ignored: 4.5.6.7/32 0 ipfw: Adding record failed: table is locked # ipfw table si unlock # ipfw table si add 4.5.6.7 added: 4.5.6.7/32 0 # ipfw table si lock # ipfw table si delete 4.5.6.7 ignored: 4.5.6.7/32 0 ipfw: Deleting record failed: table is locked # ipfw table si unlock # ipfw table si delete 4.5.6.7 deleted: 4.5.6.7/32 0	2014-08-11 18:09:37 +00:00
melifaro	377bb9d131	* Add support for batched add/delete for ipfw tables * Add support for atomic batches add (all or none). * Fix panic on deleting non-existing entry in radix algo. Examples: # si is empty # ipfw table si add 1.1.1.1/32 1111 2.2.2.2/32 2222 added: 1.1.1.1/32 1111 added: 2.2.2.2/32 2222 # ipfw table si add 2.2.2.2/32 2200 4.4.4.4/32 4444 exists: 2.2.2.2/32 2200 added: 4.4.4.4/32 4444 ipfw: Adding record failed: record already exists ^^^^^ Returns error but keeps inserted items # ipfw table si list +++ table(si), set(0) +++ 1.1.1.1/32 1111 2.2.2.2/32 2222 4.4.4.4/32 4444 # ipfw table si atomic add 3.3.3.3/32 3333 4.4.4.4/32 4400 5.5.5.5/32 5555 added(reverted): 3.3.3.3/32 3333 exists: 4.4.4.4/32 4400 ignored: 5.5.5.5/32 5555 ipfw: Adding record failed: record already exists ^^^^^ Returns error and reverts added records # ipfw table si list +++ table(si), set(0) +++ 1.1.1.1/32 1111 2.2.2.2/32 2222 4.4.4.4/32 4444	2014-08-11 17:34:25 +00:00
melifaro	5b47ece0e9	* Use 2 32-bits field inside rule instead of 2 pointer to save skipto state. * Introduce ipfw_reap_add() to unify unlinking rules/adding it to reap queue * Unbreak FreeBSD7 export format.	2014-08-09 09:11:26 +00:00
melifaro	57d917cb99	Kernel changes: * Fix buffer calculation for table dumps * Fix IPv6 radix entiries addition broken in r269371. Userland changes: * Fix bug in retrieving statric ruleset * Fix several bugs in retrieving table list	2014-08-08 21:09:22 +00:00
melifaro	deeb40d882	Partially revert previous commit: "0" value is perfectly valid for O_SETFIB and O_SETDSCP, so tablearg remains to be 655535 for now.	2014-08-08 15:33:26 +00:00
melifaro	bc102dcade	* Switch tablearg value from 65535 to 0. * Use u16 table kidx instead of integer on for iface opcode. * Provide compability layer for old clients.	2014-08-08 14:23:20 +00:00
melifaro	2a5da00f23	* Add IP_FW_TABLE_XMODIFY opcode * Since there seems to be lack of consensus on strict value typing, remove non-default value types. Use userland-only "value format type" to print values. Kernel changes: * Add IP_FW_XMODIFY to permit table run-time modifications. Currently we support changing limit and value format type. Userland changes: * Support IP_FW_XMODIFY opcode. * Support specifying value format type (ftype) in tablble create/modify req * Fine-print value type/value format type.	2014-08-08 09:27:49 +00:00
melifaro	3ad34df447	Remove IP_FW_TABLES_XGETSIZE opcode. It is superseded by IP_FW_TABLES_XLIST.	2014-08-08 06:36:26 +00:00
kevlo	7727a3c215	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
melifaro	c2c120701d	Since all of base IP_FW opcodes has been converted to IP_FW3, switch default sopt handler to ipfw_clt3. Add some comments for ipfw_get_sopt* api.	2014-08-07 22:08:43 +00:00
melifaro	61bb76b813	Kernel changes: * Implement proper checks for switching between global and set-aware tables * Split IP_FW_DEL mess into the following opcodes: * IP_FW_XDEL (del rules matching pattern) * IP_FW_XMOVE (move rules matching pattern to another set) * IP_FW_SET_SWAP (swap between 2 sets) * IP_FW_SET_MOVE (move one set to another one) * IP_FW_SET_ENABLE (enable/disable sets) * Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration. * Use unified ipfw_range_tlv as range description for all of the above. * Check dynamic states IFF there was non-zero number of deleted dyn rules, * Del relevant dynamic states with singe traversal instead of per-rule one. Userland changes: * Switch ipfw(8) to use new opcodes.	2014-08-07 21:37:31 +00:00
melifaro	42eca8abfb	Implement atomic ipfw table swap. Kernel changes: * Add opcode IP_FW_TABLE_XSWAP * Add support for swapping 2 tables with the same type/ftype/vtype. * Make skipto cache init after ipfw locks init. Userland changes: * Add "table X swap Y" command.	2014-08-03 21:37:12 +00:00
melifaro	c7e5ac0567	Implement O(1) skipto using indexed array. This adds 512K (2 * sizeof(u32) * 65k) bytes to the memory footprint. This feature is optionaly and may be turned on in any time (however it starts immediately in this commit. This will be changed.)	2014-08-03 15:49:03 +00:00
melifaro	6e882e1221	Show algorithm-specific data in "table info" output.	2014-08-03 12:19:45 +00:00
melifaro	688e206691	Be consistent on cidr:radix function naming: use algo name instead of "cidr".	2014-08-03 09:53:34 +00:00
melifaro	4cdc519f54	Remove unneded headers.	2014-08-03 09:48:54 +00:00
melifaro	7bb611530d	Whitespace changes.	2014-08-03 09:40:50 +00:00
melifaro	d27a1eeff2	* Move all algo-specific structures to the top of algo definition. * Be consistent on naming variables in different algos. * Use exponential array grow in iface:array and number:array.	2014-08-03 09:04:36 +00:00
melifaro	bfd5bf65d9	Store entry value back in @tei on entry update/deletion as another step to batched atomic updates.	2014-08-03 08:32:54 +00:00
melifaro	a1876c68a2	* Fix case when returning more that 4096 bytes of data * Use different approach to ensure algo has enough space to store N elements: - explicitly ask algo (under UH_WLOCK) before/after insertion. This (along with existing reallocation callbacks) really guarantees us that it is safe to insert N elements at once while holding UH_WLOCK+WLOCK. - remove old aflags/flags approach	2014-08-02 17:18:47 +00:00
melifaro	178311d9d4	* Permit limiting number of items in table. Kernel changes: * Add TEI_FLAGS_DONTADD entry flag to indicate that insert is not possible * Support given flag in all algorithms * Add "limit" field to ipfw_xtable_info * Add actual limiting code into add_table_entry() Userland changes: * Add "limit" option as "create" table sub-option. Limit modification is currently impossible. * Print human-readable errors in table enry addition/deletion code.	2014-08-01 15:17:46 +00:00
melifaro	6d7452f13b	Do not perform memset() on ta_buf in algo callbacks: it is already zeroed by base code.	2014-08-01 08:39:47 +00:00
melifaro	f9c6e04aff	Simplify radix operations: use unified tei_to_sockaddr_ent() to generate keys for add/delete calls.	2014-08-01 08:28:18 +00:00
melifaro	4dc5f97e56	* Use TA_FLAG_DEFAULT for default algorithm selection instead of exporting algorithm structures directly. * Pass needed state buffer size in algo structures as preparation for tables add/del requests batching.	2014-08-01 07:35:17 +00:00
melifaro	58e70e361d	* Add new "flow" table type to support N=1..5-tuple lookups * Add "flow:hash" algorithm Kernel changes: * Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups * Add IPFW_TABLE_FLOW table type * Add "struct tflow_entry" as strage for 6-tuple flows * Add "flow:hash" algorithm. Basically it is auto-growing chained hash table. Additionally, we store mask of fields we need to compare in each instance/ * Increase ipfw_obj_tentry size by adding struct tflow_entry * Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info * Increase algoname length: 32 -> 64 (algo options passed there as string) * Assume every table type can be customized by flags, use u8 to store "tflags" field. * Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback. * Fix bug in cidr:chash resize procedure. Userland changes: * add "flow table(NAME)" syntax to support n-tuple checking tables. * make fill_flags() separate function to ease working with _s_x arrays * change "table info" output to reflect longer "type" fields Syntax: ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash] Examples: 0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash 0:02 [2] zfscurr0# ipfw table fl2 info +++ table(fl2), set(0) +++ kindex: 0, type: flow:src-ip,proto,dst-port valtype: number, references: 0 algorithm: flow:hash items: 0, size: 280 0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000 0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000 0:02 [2] zfscurr0# ipfw table fl2 list +++ table(fl2), set(0) +++ 2a02:6b8::333,6,443 45000 10.0.0.92,6,80 22000 0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)' 00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 0:03 [2] zfscurr0# ipfw show 00200 0 0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 65535 617 59416 allow ip from any to any 0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80 Trying 78.46.89.105... .. 0:04 [2] zfscurr0# ipfw show 00200 5 272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2) 65535 682 66733 allow ip from any to any	2014-07-31 20:08:19 +00:00
melifaro	4419c812fe	* Add number:array algorithm lookup method. Kernel changes: * s/IPFW_TABLE_U32/IPFW_TABLE_NUMBER/ * Force "lookup <port\|uid\|gid\|jid>" to be IPFW_TABLE_NUMBER * Support "lookup" method for number tables * Add number:array algorihm (i32 as key, auto-growing). Userland changes: * Support named tables in "lookup <tag> Table" * Fix handling of "table(NAME,val)" case * Support printing "number" table data.	2014-07-30 14:52:26 +00:00
melifaro	2ca9167fd0	* Add "lookup" method for cidr:hash algorithm type. * Add auoto-grow ability to cidr:hash type. * Fix some bugs / simplify implementation for cidr:hash.	2014-07-30 12:39:49 +00:00
melifaro	23cdd03b9c	Fix "flush" cmd for algorithms wih non-default parameters.	2014-07-30 09:17:40 +00:00
melifaro	389a854346	* Introduce ipfw_ctl3() handler and move all IP_FW3 opcodes there. The long-term goal is to switch remaining opcodes to IP_FW3 versions and use ipfw_ctl3() as default handler simplifying ipfw(4) interaction with external world.	2014-07-29 23:06:06 +00:00
melifaro	bf787a59a7	* Dump available table algorithms via "ipfw talist" cmd. Kernel changes: * Add type/refcount fields to table algo instances. * Add IP_FW_TABLES_ALIST opcode to export available algorihms to userland. Userland changes: * Fix cores on empty input inside "ipfw table" handler. * Add "ipfw talist" cmd to print availabled kernel algorithms. * Change "table info" output to reflect long algorithm config lines.	2014-07-29 22:44:26 +00:00
melifaro	7e2cb6d901	* Copy ta structures to stable storage to ease future extension. * Remove algo .lookup field since table lookup function is set by algo code.	2014-07-29 21:38:06 +00:00
melifaro	ce5a8379b8	* Add new ipfw cidr algorihm: hash table. Algorithm works with both IPv4 and IPv6 prefixes, /32 and /128 ranges are assumed by default. It works the following way: input IP address is masked to specified mask, hashed and searched inside hash bucket. Current implementation does not support "lookup" method and hash auto-resize. This will be changed soon. some examples: ipfw table mi_test2 create type cidr algo cidr:hash ipfw table mi_test create type cidr algo "cidr:hash masks=/30,/64" ipfw table mi_test2 info +++ table(mi_test2), set(0) +++ type: cidr, kindex: 7 valtype: number, references: 0 algorithm: cidr:hash items: 0, size: 220 ipfw table mi_test info +++ table(mi_test), set(0) +++ type: cidr, kindex: 6 valtype: number, references: 0 algorithm: cidr:hash masks=/30,/64 items: 0, size: 220 ipfw table mi_test add 10.0.0.5/30 ipfw table mi_test add 10.0.0.8/30 ipfw table mi_test add 2a02:6b8:b010::1/64 25 ipfw table mi_test list +++ table(mi_test), set(0) +++ 10.0.0.4/30 0 10.0.0.8/30 0 2a02:6b8:b010::/64 25	2014-07-29 19:49:38 +00:00
melifaro	286880219b	* Change algorthm names to "type:algo" (e.g. "iface:array", "cidr:radix") format. * Pass number of items changed in add/del hooks to permit adding/deleting multiple values at once.	2014-07-29 08:00:13 +00:00
melifaro	fa3f38a6a0	* Add generic ipfw interface tracking API * Rewrite interface tables to use interface indexes Kernel changes: * Add generic interface tracking API: - ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates state & bumps ref) - ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to update ifindex) - ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer) - ipfw_iface_unref(unlocked, drops reference) Additionally, consumer callbacks are called in interface withdrawal/departure. * Rewrite interface tables to use iface tracking API. Currently tables are implemented the following way: runtime data is stored as sorted array of {ifidx, val} for existing interfaces full data is stored inside namedobj instance (chained hashed table). * Add IP_FW_XIFLIST opcode to dump status of tracked interfaces * Pass @chain ptr to most non-locked algorithm callbacks: (prepare_add, prepare_del, flush_entry ..). This may be needed for better interaction of given algorithm an other ipfw subsystems * Add optional "change_ti" algorithm handler to permit updating of cached table_info pointer (happens in case of table_max resize) * Fix small bug in ipfw_list_tables() * Add badd (insert into sorted array) and bdel (remove from sorted array) funcs Userland changes: * Add "iflist" cmd to print status of currently tracked interface * Add stringnum_cmp for better interface/table names sorting	2014-07-28 19:01:25 +00:00
melifaro	505e5ae081	* Require explicit table creation before use on kernel side. * Add resize callbacks for upcoming table-based algorithms. Kernel changes: * s/ipfw_modify_table/ipfw_manage_table_ent/ * Simplify add_table_entry(): make table creation a separate piece of code. Do not perform creation if not in "compat" mode. * Add ability to perform modification of algorithm state (like table resize). The following callbacks were added: - prepare_mod (allocate new state, without locks) - fill_mod (UH_WLOCK, copy old state to new one) - modify (UH_WLOCK + WLOCK, switch state) - flush_mod (no locks, flushes allocated data) Given callbacks are called if table modification has been requested by add or delete callbacks. Additional u64 tc->'flags' field was added to pass these requests. * Change add/del table ent format: permit adding/removing multiple entries at once (only 1 supported at the moment). Userland changes: * Auto-create tables with warning	2014-07-26 13:37:25 +00:00
glebius	98615618b9	On machines with strict alignment copy pfsync_state_key from packet on stack to avoid unaligned access. PR: 187381 Submitted by: Lytochkin Boris <lytboris gmail.com>	2014-07-10 12:41:58 +00:00
melifaro	deb9ca0f18	* Reduce size of ipfw table entries for cidr/iface: Since old structures had _value as the last field, every table match required 3 cache lines instead of 2. Fix this by - using the fact that supplied masks are suplicated inside radix - using lightweigth sa_in6 structure as key for IPv6 Before (amd64): sizeof(table_entry): 136 sizeof(table_xentry): 160 After (amd64): sizeof(radix_cidr_entry): 120 sizeof(radix_cidr_xentry): 128 sizeof(radix_iface): 128 * Fix memory leak for table entry update * Do some more sanity checks while deleting entry * Do not store masks for host routes Sponsored by: Yandex LLC	2014-07-09 18:52:12 +00:00
melifaro	3f7d90b385	* Use different rule structures in kernel/userland. * Switch kernel to use per-cpu counters for rules. * Keep ABI/API. Kernel changes: * Each rules is now exported as TLV with optional extenable counter block (ip_fW_bcounter for base one) and ip_fw_rule for rule&cmd data. * Counters needs to be explicitly requested by IPFW_CFG_GET_COUNTERS flag. * Separate counters from rules in kernel and clean up ip_fw a bit. * Pack each rule in IPFW_TLV_RULE_ENT tlv to ease parsing. * Introduce versioning in container TLV (may be needed in future). * Fix ipfw_cfg_lheader broken u64 alignment. Userland changes: * Use set_mask from cfg header when requesting config * Fix incorrect read accouting in ipfw_show_config() * Use IPFW_RULE_NOOPT flag instead of playing with _pad * Fix "ipfw -d list": do not print counters for dynamic states * Some small fixes	2014-07-08 23:11:15 +00:00
melifaro	7189aec01e	* Prepare to pass other dynamic states via ipfw_dump_config() Kernel changes: * Change dump format for dynamic states: each state is now stored inside ipfw_obj_dyntlv last dynamic state is indicated by IPFW_DF_LAST flag * Do not perform sooptcopyout() for !SOPT_GET requests. Userland changes: * Introduce foreach_state() function handler to ease work with different states passed by ipfw_dump_config().	2014-07-06 23:26:34 +00:00
melifaro	0eba52a18e	* Add "lookup" table functionality to permit userland entry lookups. * Bump table dump format preserving old ABI. Kernel size: * Add IP_FW_TABLE_XFIND to handle "lookup" request from userland. * Add ta_find_tentry() algorithm callbacks/handlers to support lookups. * Fully switch to ipfw_obj_tentry for various table dumps: algorithms are now required to support the latest (ipfw_obj_tentry) entry dump format, the rest is handled by generic dump code. IP_FW_TABLE_XLIST opcode version bumped (0 -> 1). * Eliminate legacy ta_dump_entry algo handler: dump_table_entry() converts data from current to legacy format. Userland side: * Add "lookup" table parameter. * Change the way table type is guessed: call table_get_info() first, and check value for IPv4/IPv6 type IFF table does not exist. * Fix table_get_list(): do more tries if supplied buffer is not enough. * Sparate table_show_entry() from table_show_list().	2014-07-06 18:16:04 +00:00
melifaro	dfa3781d78	* Issue warning while requesting ruleset with new tables via legacy binary. Convert each unresolved table as table 65535 (which cannot be used normally). * Perform s/^ipfw_// for add_table_entry, del_table_entry and flush_table since these are internal functions exported to keep legacy interface. * Remove macro TABLE_SET. Operations with tables can be done in any set, the only thing net.inet.ip.fw.tables_sets affects is the set in which tables are looked up while binding them to the rule.	2014-07-04 07:02:11 +00:00
melifaro	99023231d3	Fully switch to named tables: Kernel changes: * Introduce ipfw_obj_tentry table entry structure to force u64 alignment. * Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED). * Use "subtype" field to distingush between IPv4 and IPv6 table records instead of previous hack. * Add value type (vtype) field for kernel tables. Current types are number,ip and dscp * Fix sets mask retrieval for old binaries * Fix crash while using interface tables Userland changes: * Switch ipfw_table_handler() to use named-only tables. * Add "table NAME create [type {cidr\|iface\|u32} [valtype {number\|ip\|dscp}] ..." * Switch ipfw_table_handler to match_token()-based parser. * Switch ipfw_sets_handler to use new ipfw_get_config() for mask retrieval. * Allow ipfw set X table ... syntax to permit using per-set table namespaces.	2014-07-03 22:25:59 +00:00
melifaro	75913dd997	* Add new IP_FW_XADD opcode which permits to a) specify table ids as names b) add multiple rules at once. Partially convert current code for atomic addition of multiple rules.	2014-06-29 22:35:47 +00:00
melifaro	145faf7cb6	Enable kernel-side rule filtering based on user request. Make do_get3() function return real error.	2014-06-29 09:29:27 +00:00
melifaro	5d627fdb8b	Suppord showing named tables in ipfw(8) rule listing. Kernel changes: * change base TLV header to be u64 (so size can be u32). * Introduce ipfw_obj_ctlv generc container TLV. * Add IP_FW_XGET opcode which is now used for atomic configuration retrieval. One can specify needed configuration pieces to retrieve via flags field. Currently supported are IPFW_CFG_GET_STATIC (static rules) and IPFW_CFG_GET_STATES (dynamic states). Other configuration pieces (tables, pipes, etc..) support is planned. Userland changes: * Switch ipfw(8) to use new IP_FW_XGET for rule listing. * Split rule listing code get and show pieces. * Make several steps forward towards libipfw: permit printing states and rules(paritally) to supplied buffer. do not die on malloc/kernel failure inside given printing functions. stop assuming cmdline_opts is global symbol.	2014-06-28 23:20:24 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
melifaro	9ff102accc	Use different approach for filling large datasets to userspace: Instead of trying to allocate bing contiguous chunk of memory, use intermediate-sized (page size) buffer as sliding window reducing number of sooptcopyout() calls to perform. This reduces dump functions complexity and provides additional layer of abstraction. User-visible api consists of 2 functions: ipfw_get_sopt_space() - gets contigious amount of storage (or NULL) and ipfw_get_sopt_header() - the same, but zeroes the rest of the buffer.	2014-06-27 10:07:00 +00:00
melifaro	8bc233982f	* Add IP_FW_TABLE_XCREATE / IP_FW_TABLE_XMODIFY opcodes. * Add 'algoname' string to ipfw_xtable_info permitting to specify lookup algoritm with parameters. * Rework part of ipfw_rewrite_table_uidx() Sponsored by: Yandex LLC	2014-06-16 13:05:07 +00:00
melifaro	e0a90fc44a	Remove unused ipfw_dump_xtable().	2014-06-15 13:43:44 +00:00
melifaro	b06860b3e2	Simplify opcode handling. * Use one u16 from op3 header to implement opcode versioning. * IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current). * Every getsockopt request is now handled in ip_fw_table.c * Rename new opcodes: IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH * Add some docs about using given opcodes. * Group some legacy opcode/handlers.	2014-06-15 13:40:27 +00:00
melifaro	fe9646e6ff	Move further to eliminate next pieces of number-assuming code inside tables. Kernel changes: * Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set) * Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set) * Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables) Userland changes: * move tables code to separate tables.c file * get rid of tables_max * switch "all"/list handling to new opcodes	2014-06-14 22:47:25 +00:00
melifaro	0001953a35	Move most of external table structures/functions to separate ip_fw_table.h	2014-06-14 11:13:02 +00:00
melifaro	f9fb63fe8c	Add API to ease adding new algorithms/new tabletypes to ipfw. Kernel-side changelog: * Split general tables code and algorithm-specific table data. Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to new ip_fw_table_algo.c file. Tables code now supports any algorithm implementing the following callbacks: +struct table_algo { + char name[64]; + int idx; + ta_init init; + ta_destroy destroy; + table_lookup_t lookup; + ta_prepare_add prepare_add; + ta_prepare_del prepare_del; + ta_add add; + ta_del del; + ta_flush_entry flush_entry; + ta_foreach foreach; + ta_dump_entry dump_entry; + ta_dump_xentry dump_xentry; +}; Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to ->tablestate pointer (array of 32 bytes structures necessary for runtime lookups (can be probably shrinked to 16 bytes later): +struct table_info { + table_lookup_t lookup; / Lookup function / + void state; /* Lookup radix/other structure / + void xstate; /* eXtended state / + u_long data; / Hints for given func / +}; Add count method for namedobj instance to ease size calculations * Bump ip_fw3 buffer in ipfw_clt 128->256 bytes. * Improve bitmask resizing on tables_max change. * Remove table numbers checking from most places. * Fix wrong nesting in ipfw_rewrite_table_uidx(). * Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently implemented for IPFW_OBJTYPE_TABLE). * Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data, currenly implemented for IPFW_OBJTYPE_TABLE). * Add IP_FW_OBJ_INFO (requests info for one object of given type). Some name changes: s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics) s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics) Userland changes: * Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes. * Add/improve support for destroy/info cmds.	2014-06-14 10:58:39 +00:00
melifaro	01ec53e019	Make ipfw tables use names as used-level identifier internally: * Add namedobject set-aware api capable of searching/allocation objects by their name/idx. * Switch tables code to use string ids for configuration tasks. * Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks. * Reduce number of arguments passed to ipfw_table_add/del by using separate structure. * Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support) * Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference) * Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode Namedobj more detailed: * Blackbox api providing methods to add/del/search/enumerate objects * Statically-sized hashes for names/indexes * Per-set bitmask to indicate free indexes * Separate methods for index alloc/delete/resize Basically, there should not be any user-visible changes except the following: * reducing table_max is not supported * flush & add change table type won't work if table is referenced Sponsored by: Yandex LLC	2014-06-12 09:59:11 +00:00
hiren	a877260646	DNOLD_IS_ECN introduced by r266941 is not required. DNOLD_* flags are for compat with old binaries. Suggested by: luigi	2014-06-01 20:19:17 +00:00
hiren	cc47b6d947	ECN marking implenetation for dummynet. Changes include both DCTCP and RFC 3168 ECN marking methodology. DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Submitted by: Midori Kato (aoimidori27@gmail.com) Worked with: Lars Eggert (lars@netapp.com) Reviewed by: luigi, hiren	2014-06-01 07:28:24 +00:00
jhb	91a569ad69	Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count, not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).	2014-05-29 19:17:10 +00:00
ae	26693dfcb9	Since ipfw nat configures all options in one step, we should set all bits in the mask when calling LibAliasSetMode() to properly clear unneeded options. PR: 189655 MFC after: 1 week Sponsored by: Yandex LLC	2014-05-18 14:25:19 +00:00
melifaro	f4783a05e9	Fix wrong formatting of 0.0.0.0/X table records in ipfw(8). Add `flags` u16 field to the hole in ipfw_table_xentry structure. Kernel has been guessing address family for supplied record based on xent length size. Userland, however, has been getting fixed-size ipfw_table_xentry structures guessing address family by checking address by IN6_IS_ADDR_V4COMPAT(). Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records. PR: bin/189471 Submitted by: Dennis Yusupoff <dyr@smartspb.net> MFC after: 2 weeks	2014-05-17 13:45:03 +00:00
glebius	9412c23d6c	o In pf_normalize_ip() we don't need mtag in !(PFRULE_FRAGCROP\|PFRULE_FRAGDROP) case. o In the (PFRULE_FRAGCROP\|PFRULE_FRAGDROP) case we should allocate mtag if we don't find any. Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2014-05-17 12:30:27 +00:00
trociny	bd951d3fdb	Define startup order the same way as it is in dummynet.	2014-04-26 08:05:16 +00:00
glebius	597bcfe53d	The current API for adding rules with pool addresses is the following: - DIOCADDADDR adds addresses and puts them into V_pf_pabuf - DIOCADDRULE takes all addresses from V_pf_pabuf and links them into rule. The ugly part is that if address is a table, then it is initialized in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't supply ruleset. But if address is a dynaddr, we need address family, and address family could be different for different addresses in one rule, so dynaddr is initialized in DIOCADDADDR. This leads to the entangled state of addresses on V_pf_pabuf. Some are initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf) can lead to a panic on a NULL table address. Since proper fix requires API/ABI change, for now simply plug the panic in pf_empty_pool(). Reported by: danger	2014-04-25 11:36:11 +00:00
mm	532d55ab5f	Backport from projects/pf r263908: De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. MFC after: 1 week	2014-04-20 09:17:48 +00:00
ae	d70382e43a	Set oif only for outgoing packets. PR: 188543 MFC after: 1 week Sponsored by: Yandex LLC	2014-04-16 14:37:11 +00:00
glebius	97ee1da70b	Backout r257223,r257224,r257225,r257246,r257710. The changes caused some regressions in ICMP handling, and right now me and Baptiste are out of time on analyzing them. PR: 188253	2014-04-16 09:25:20 +00:00
brueffer	f19c513644	Free resources and error cases; re-indent a curly brace while here. CID: 1199366 Found with: Coverity Prevent(tm) MFC after: 1 week	2014-04-13 21:13:33 +00:00
mm	257cccbfaa	Merge from projects/pf r264198: Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic. Reviewed by: trociny MFC after: 1 week	2014-04-07 07:06:13 +00:00
mm	c4f653f608	Merge from projects/pf r251993 (glebius@): De-vnet hash sizes and hash masks. Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny MFC after: 1 month	2014-03-25 06:55:53 +00:00
glebius	0825c0b36c	Fix breakage in ipfw+VIMAGE after r261590. PR: kern/187665 Sponsored by: Nginx, Inc.	2014-03-21 17:07:18 +00:00
glebius	8a3e4bbebb	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
glebius	c23c087e5b	Instead of playing games with casts simply add 3 more members to the structure pf_rule, that are used when the structure is passed via ioctl(). PR: 187074	2014-03-05 00:40:03 +00:00
mm	d2d19c03ef	Revert r262196 I am going to split this into two individual patches and test it with the projects/pf branch that may get merged later.	2014-02-19 17:06:04 +00:00
mm	73f4b67f38	De-virtualize pf_mtag_z [1] Process V_pf_overloadqueue in vnet context [2] This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf and vnet jails. pf inside jails remains broken. PR: kern/182964 Submitted by: glebius@FreeBSD.org [2], myself [1] Tested by: rodrigc@FreeBSD.org, myself MFC after: 2 weeks	2014-02-18 22:17:12 +00:00
gnn	1804af5050	Summary: Two quick edits to the implementation notes as they're no longer stored in netinet but in netpfil.	2014-02-15 18:36:31 +00:00
dim	4e20d58892	Under sys/netpfil/ipfw, surround two IPv6-specific static functions with #ifdef INET6, since they are unused when INET6 is disabled. MFC after: 3 days	2014-02-15 12:25:01 +00:00
glebius	1ea1d562a3	Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. Thanks to Dennis for providing me shell access to problematic box and his help with reproducing, debugging and investigating the problem. Thanks to: Dennis Yusupoff <dyr smartspb.net> Also reported by: dumbbell, pgj, Rambler Sponsored by: Nginx, Inc.	2014-02-14 10:05:21 +00:00
melifaro	c32089edca	Reorder struct ip_fw_chain: * move rarely-used fields down * move uh_lock to different cacheline * remove some usused fields Sponsored by: Yandex LLC	2014-01-24 09:13:30 +00:00
glebius	3f1e8f48cd	Remove NULL pointer dereference. CID: 1009118	2014-01-22 15:58:43 +00:00
glebius	4d8db193db	Fix resource leak and simplify code for DIOCCHANGEADDR. CID: 1007035	2014-01-22 15:44:38 +00:00
melifaro	104ab6ec12	Revert r260548. We really should not use IPFW_WLOCK() here but this requires some more playing with IPFW_UH_WLOCK(). Leave till later.	2014-01-11 18:27:34 +00:00
melifaro	9f930faa0d	We don't need chain write lock since we're not modifying its contents. LibAliasSetAddress() uses its own mutex to serialize changes. While here, convert ifp->if_xname access to if_name() function. MFC after: 2 weeks Sponsored by: Yandex LLC	2014-01-11 16:50:41 +00:00
glebius	353906d3d2	When pf_get_translation() fails, it should leave *sn pointer pristine, otherwise we will panic in pf_test_rule(). PR: 182557	2014-01-06 19:05:04 +00:00
melifaro	c491eeb2f3	Use rnh_matchaddr instead of rnh_lookup for longest-prefix match. rnh_lookup is effectively the same as rnh_matchaddr if called with empy network mask. MFC after: 2 weeks	2014-01-03 23:11:26 +00:00
dim	320e3d9bba	Fix incorrect header guard define in sys/netpfil/pf/pf.h, which snuck in in r257186. Found by clang 3.4.	2013-12-22 19:47:22 +00:00
glebius	964c4daeba	Fix fallout from r258479: in pf_free_src_node() the node must already be unlinked. Reported by: Konstantin Kukushkin <dark rambler-co.ru> Sponsored by: Nginx, Inc.	2013-12-22 12:10:36 +00:00
melifaro	ce16a97371	Add net.inet.ip.fw.dyn_keep_states sysctl which re-links dynamic states to default rule instead of flushing on rule deletion. This can be useful while performing ruleset reload (think about `atomic` reload via changing sets). Currently it is turned off by default. MFC after: 2 weeks Sponsored by: Yandex LLC	2013-12-18 20:17:05 +00:00
melifaro	031fdfe55b	Simplify O_NAT opcode handling. MFC after: 2 weeks Sponsored by: Yandex LLC	2013-11-28 15:28:51 +00:00
melifaro	c9cfc8e322	Check ipfw table numbers in both user and kernel space before rule addition. Found by: Saychik Pavel <umka@localka.net> MFC after: 2 weeks Sponsored by: Yandex LLC	2013-11-28 10:28:28 +00:00
rodrigc	ad77255ba1	In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);". This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit(). Therefore, vnet_ipfw_nat_uninit() must be called before vnet_ipfw_uninit(), but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions. In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c, IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN and IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER Consequently, if VIMAGE is enabled, and jails are created and destroyed, the system sometimes crashes, because we are trying to use a deleted lock. To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html ) Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL, so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().	2013-11-25 20:20:34 +00:00
glebius	f889028338	The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity, where "m" is number of source nodes and "n" is number of states. Thus, on heavy loaded router its processing consumed a lot of CPU time. Reimplement it with O(m+n) complexity. We first scan through source nodes and disconnect matching ones, putting them on the freelist and marking with a cookie value in their expire field. Then we scan through the states, detecting references to source nodes with a cookie, and disconnect them as well. Then the freelist is passed to pf_free_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> PR: kern/176763 Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:22:26 +00:00
glebius	c884926273	To support upcoming changes change internal API for source node handling: - Removed pf_remove_src_node(). - Introduce pf_unlink_src_node() and pf_unlink_src_node_locked(). These function do not proceed with freeing of a node, just disconnect it from storage. - New function pf_free_src_nodes() works on a list of previously disconnected nodes and frees them. - Utilize new API in pf_purge_expired_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:16:34 +00:00
glebius	70fc573907	Fix off by ones when scanning source nodes hash. Sponsored by: Nginx, Inc.	2013-11-22 18:57:27 +00:00
glebius	3583187336	Style: don't compare unsigned <= 0. Sponsored by: Nginx, Inc.	2013-11-22 18:54:06 +00:00
luigi	de6fdc14ad	add a counter on the struct mq (a queue of mbufs), and add a block for userspace compiling.	2013-11-22 05:02:37 +00:00
luigi	9b36fc77a2	disable some ipfw match options when compiling in userspace	2013-11-22 05:01:38 +00:00
luigi	f0a80e6e72	make this code compile in userspace on OSX	2013-11-22 05:00:18 +00:00
luigi	0a6b47e718	more support for userspace compiling of this code: emulate the uma_zone for dynamic rules.	2013-11-22 04:59:17 +00:00
luigi	b0b354e495	make ipfw_check_packet() and ipfw_check_frame() public, so they can be used in the userspace version of ipfw/dummynet (normally using netmap for the I/O path). This is the first of a few commits to ease compiling the ipfw kernel code in userspace.	2013-11-22 04:57:50 +00:00
glebius	544cc7da1e	Some fixups to pf_get_sport after r257223: - Do not return blindly if proto isn't ICMP. - The dport is in network order, so fix comparisons. - Remove ridiculous htonl(arc4random()). - Push local variable to a narrower block.	2013-11-14 14:20:35 +00:00
glebius	c5f4e2274d	Fix fallout from r257223. Since pf_test_state_icmp() can call pf_icmp_state_lookup() twice, we need to unlock previously found state. Reported & tested by: gavin	2013-11-05 16:54:25 +00:00
glebius	bce78dfe17	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.	2013-11-05 07:32:09 +00:00
glebius	b0e57b04de	Code logic of handling PFTM_PURGE into pf_find_state().	2013-11-04 08:20:06 +00:00
glebius	7a73235dfd	Remove unused PFTM_UNTIL_PACKET const.	2013-11-04 08:15:59 +00:00
glebius	5da8adaa10	- Fix VIMAGE build. - Fix build with gcc.	2013-10-28 10:12:19 +00:00
glebius	f469ae1d45	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
bapt	64ca26f948	Import pf.c 1.638 from OpenBSD Original log: Some ICMP types that also have icmp_id, pointed out by markus@ Obtained from: OpenBSD	2013-10-27 20:56:23 +00:00
bapt	1ff3794c18	Improt pf.c 1.636 from OpenBSD Original log: Make sure pd2 has a pointer to the icmp header in the payload; fixes panic seen with some some icmp types in icmp error message payloads. Obtained from: OpenBSD	2013-10-27 20:52:09 +00:00
bapt	b033691537	Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type in one port of the state key, using the type to determine which side should be the id, and which should be the type. Also: - Handle ICMP6 messages which are typically sent to multicast addresses but recieve unicast replies, by doing fallthrough lookups against the correct multicast address. - Clear up some mistaken assumptions in the PF code: - Not all ICMP packets have an icmp_id, so simulate one based on other data if we can, otherwise set it to 0. - Don't modify the icmp id field in NAT unless it's echo - Use the full range of possible id's when NATing icmp6 echoy Difference with OpenBSD version: - C99ify the new code - WITHOUT_INET6 safe Reviewed by: glebius Obtained from: OpenBSD	2013-10-27 20:44:42 +00:00
glebius	e352aa585e	Move new pf includes to the pf directory. The pfvar.h remain in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz	2013-10-27 16:25:57 +00:00
glebius	2c1ec831c9	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
philip	eb3631100b	Use the correct EtherType for logging IPv6 packets. Reviewed by: melifaro Approved by: re (kib, glebius) MFC after: 3 days	2013-09-28 15:49:36 +00:00
glebius	8159278dbd	Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit: date: 2010/02/04 14:10:12; author: sthen; state: Exp; lines: +24 -19; pf_get_sport() picks a random port from the port range specified in a nat rule. It should check to see if it's in-use (i.e. matches an existing PF state), if it is, it cycles sequentially through other ports until it finds a free one. However the check was being done with the state keys the wrong way round so it was never actually finding the state to be in-use. - switch the keys to correct this, avoiding random state collisions with nat. Fixes PR 6300 and problems reported by robert@ and viq. - check pf_get_sport() return code in pf_test(); if port allocation fails the packet should be dropped rather than sent out untranslated. Help/ok claudio@. Some additional changes to 1.12: - We also need to bzero() the key to zero padding, otherwise key won't match. - Collapse two if blocks into one with \|\|, since both conditions lead to the same processing. - Only naddr changes in the cycle, so move initialization of other fields above the cycle. - s/u_intXX_t/uintXX_t/g PR: kern/181690 Submitted by: Olivier Cochard-Labbé <olivier cochard.me> Sponsored by: Nginx, Inc.	2013-09-02 10:14:25 +00:00
mav	413bf347cd	Make dummynet use new direct callout(9) execution mechanism. Since the only thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't need extra switch to the clock SWI context. On idle system this change in half reduces number of active CPU cycles and wakes up only one CPU from sleep instead of two. I was going to make this change much earlier as part of calloutng project, but waited for better solution with skipping idle ticks to be implemented. Unfortunately with 10.0 release coming it is better get at least this.	2013-08-24 13:34:36 +00:00
trociny	583ac34809	Make ipfw nat init/unint work correctly for VIMAGE: * Do per vnet instance cleanup (previously it was only for vnet0 on module unload, and led to libalias leaks and possible panics due to stale pointer dereferences). * Instead of protecting ipfw hooks registering/deregistering by only vnet0 lock (which does not prevent pointers access from another vnets), introduce per vnet ipfw_nat_loaded variable. The variable is set after hooks are registered and unset before they are deregistered. * Devirtualize ifaddr_event_tag as we run only one event handler for all vnets. * It is supposed that ifaddr_change event handler is called in the interface vnet context, so add an assertion. Reviewed by: zec MFC after: 2 weeks	2013-08-24 11:59:51 +00:00
andre	7cc6cc696c	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
ae	2b407e5f3f	Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration. Reported by: Eugene M. Zheganin	2013-07-29 13:17:18 +00:00
glebius	6459f4d509	Improve locking strategy between keys hash and ID hash. Before this change state creating sequence was: 1) lock wire key hash 2) link state's wire key 3) unlock wire key hash 4) lock stack key hash 5) link state's stack key 6) unlock stack key hash 7) lock ID hash 8) link into ID hash 9) unlock ID hash What could happen here is that other thread finds the state via key hash lookup after 6), locks ID hash and does some processing of the state. When the thread creating state unblocks, it finds the state it was inserting already non-virgin. Now we perform proper interlocking between key hash locks and ID hash lock: 1) lock wire & stack hashes 2) link state's keys 3) lock ID hash 4) unlock wire & stack hashes 5) link into ID hash 6) unlock ID hash To achieve that, the following hacking was performed in pf_state_key_attach(): - Key hash mutex is marked with MTX_DUPOK. - To avoid deadlock on 2 key hash mutexes, we lock them in order determined by their address value. - pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked the conflicting state synchronously. In theory this could require locking a third key hash, which we can't do now. Now we do not remove the state immediately, instead we leave this task to the purge thread. To avoid conflicts in a short period before state is purged, we push to the very end of the TAILQ. - On success, before dropping key hash locks, pf_state_key_attach() locks ID hash and returns. Tested by: Ian FREISLICH <ianf clue.co.za>	2013-06-13 06:07:19 +00:00
glebius	35ec1b4a11	Return meaningful error code from pf_state_key_attach() and pf_state_insert().	2013-05-11 18:06:51 +00:00
glebius	3a8ddef6a9	Better debug message.	2013-05-11 18:03:36 +00:00
glebius	4a8f8f585a	Fix DIOCADDSTATE operation.	2013-05-11 17:58:26 +00:00
glebius	375ef2e633	Invalid creatorid is always EINVAL, not only when we are in verbose mode.	2013-05-11 17:57:52 +00:00
glebius	8adbc6e4ae	Improve KASSERT() message.	2013-05-06 21:44:06 +00:00
glebius	b3233b1bbb	Simplify printf().	2013-05-06 21:43:15 +00:00
melifaro	858e632fa7	Use unified method for accessing / updating cached rule pointers. MFC after: 2 weeks	2013-05-04 18:24:30 +00:00
eadler	a5a9ec51d6	Correct a few sizeof()s Submitted by: swildner@DragonFlyBSD.org Reviewed by: alfred	2013-05-01 04:37:34 +00:00
glebius	ccddbf9365	Remove useless ifdef KLD_MODULE from dummynet module unload path. This fixes panic on unload. Reported by: pho	2013-04-29 06:11:19 +00:00
glebius	b4bc270e8f	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
melifaro	bbeb8a5ba2	Fix ipfw rule validation partially broken by r248552. Pointed by: avg MFC with: r248552	2013-04-01 11:28:52 +00:00
ae	3d1df10de4	When we are removing a specific set, call ipfw_expire_dyn_rules only once. Obtained from: Yandex LLC MFC after: 1 week	2013-03-25 07:43:46 +00:00
melifaro	31a6358fff	Add ipfw support for setting/matching DiffServ codepoints (DSCP). Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg. Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words). Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order): Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin PR: kern/102471, kern/121122 MFC after: 2 weeks	2013-03-20 10:35:33 +00:00
ae	23037c29f1	Separate the locking macros that are used in the packet flow path from others. This helps easy switch to use pfil(4) lock.	2013-03-19 06:04:17 +00:00
glebius	b37af62b9e	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
glebius	37a43650ed	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.	2013-03-12 13:42:47 +00:00
melifaro	063bdc75f8	Fix callout expiring dynamic rules. PR: kern/175530 Submitted by: Vladimir Spiridenkov <vs@gtn.ru> MFC after: 2 weeks	2013-03-02 14:47:10 +00:00
glebius	f8098d720c	Finish the r244185. This fixes ever growing counter of pfsync bad length packets, which was actually harmless. Note that peers with different version of head/ may grow this counter, but it is harmless - all pfsync data is processed. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-15 09:03:56 +00:00
glebius	52213d7415	In netpfil/pf: - Add my copyright to files I've touched a lot this year. - Add dash in front of all copyright notices according to style(9). - Move $OpenBSD$ down below copyright notices. - Remove extra line between cdefs.h and __FBSDID.	2012-12-28 09:19:49 +00:00
melifaro	a7a75993c7	Add parentheses to IP_FW_ARG_TABLEARG() definition. Suggested by: glebius MFC with: r244633	2012-12-23 18:35:42 +00:00
melifaro	911df5a332	Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks. Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log(). Noticed by: Vitaliy Tokarenko <rphone@ukr.net> MFC after: 2 weeks	2012-12-23 16:28:18 +00:00
pjd	c4178b76f6	Warn about reaching various PF limits. Reviewed by: glebius Obtained from: WHEEL Systems	2012-12-17 10:10:13 +00:00
trociny	8458a615d7	In pfioctl, if the permission checks failed we returned with vnet context set. As the checks don't require vnet context, this is fixed by setting vnet after the checks. PR: kern/160541 Submitted by: Nikos Vassiliadis (slightly different approach)	2012-12-15 17:19:36 +00:00
glebius	18f1859422	Fix error in r235991. No-sleep version of IFNET_RLOCK() should be used here, since we may hold the main pf rulesets rwlock. Reported by: Fleuriot Damien <ml my.gd>	2012-12-14 13:01:16 +00:00
glebius	ae970fa20c	Fix VIMAGE build broken in r244185. Submitted by: Nikolai Lifanov <lifanov mail.lifanov.com>	2012-12-14 08:02:35 +00:00
glebius	9ffc5fc1cc	Merge rev. 1.119 from OpenBSD: date: 2009/03/31 01:21:29; author: dlg; state: Exp; lines: +9 -16 ... this also firms up some of the input parsing so it handles short frames a bit better. This actually fixes reading beyond mbuf data area in pfsync_input(), that may happen at certain pfsync datagrams.	2012-12-13 12:51:22 +00:00

... 2 3 4 5 6 ...

392 Commits