Commit Graph

74 Commits

Author SHA1 Message Date
Alexander V. Chernikov
74b22066b0 Make rule table kernel-index rewriting support any kind of objects.
Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
 values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
 do a) for them.

Kernel does almost the same when exporting rules to userland:
 prepares array of used tables in all rules in range, and
 prepends it before the actual ruleset retaining actual in-kernel
 indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
  between userland string identifier and in-kernel index.
  (See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
 shared index and names instead of numbers (in next commit).

Sponsored by:	Yandex LLC
2015-04-27 08:29:39 +00:00
Alexander V. Chernikov
0caab00959 * Make sure table algorithm destroy hook is always called without locks
* Explicitly lock freeing interface references in ta_destroy_ifidx
* Change ipfw_iface_unref() to require UH lock
* Add forgotten ipfw_iface_unref() to destroy_ifidx_locked()

PR:		kern/197276
Submitted by:	lev
Sponsored by:	Yandex LLC
2015-02-05 13:49:04 +00:00
Alexander V. Chernikov
038263c36a Remove unused variable.
Found by:	Coverity
CID:		1245739
2014-11-04 10:25:52 +00:00
Alexander V. Chernikov
4040f4ecd6 Perform more checks on the number of tables supplied by user. 2014-10-19 11:15:19 +00:00
Alexander V. Chernikov
4c060d851c Fix core on table destroy inroduced by table values code.
Rename @ti array copy to 'ti_copy'.
2014-10-09 14:33:20 +00:00
Alexander V. Chernikov
79e86902e9 Notify table algo aboute runtime data change on table flush. 2014-10-07 16:46:11 +00:00
Alexander V. Chernikov
8ebca97f5e * Fix crash in interface tracker due to using old "linked" field.
* Ensure we're flushing entries without any locks held.
* Free memory in (rare) case when interface tracker fails to register ifp.
* Add KASSERT on table values refcounts.
2014-10-07 10:54:53 +00:00
Alexander V. Chernikov
d4e1b51578 Fix build with gcc. 2014-10-04 13:57:14 +00:00
Alexander V. Chernikov
ccba94b8fc Switch ipfw to use rmlock for runtime locking. 2014-10-04 11:40:35 +00:00
Alexander V. Chernikov
1a33e79969 Change copyrights to the proper one. 2014-09-05 14:19:02 +00:00
Alexander V. Chernikov
6b988f3a27 * Use modular opcode handling inside ipfw_ctl3() instead of static switch.
* Provide hints for subsystem initializers if they are called for
  the first/last time.
* Convert every IP_FW3 opcode user to use new sopt API.
2014-09-05 11:11:15 +00:00
Alexander V. Chernikov
e822d9364e Be consistent and use same arguments for ctl3 opcodes.
Move legacy IP_FW_TABLE_XGETSIZE handling to separate function.
2014-09-03 21:57:06 +00:00
Alexander V. Chernikov
fb4b37a357 * Fix crash due to forgotten value refcouting in ipfw_link_table_values()
* Fix argument order in rollback_toperation_state()
* Make flush_table() use operation state API to ease checks.
2014-09-02 20:46:18 +00:00
Alexander V. Chernikov
71af39bf34 Add more comments on newly-added functions.
Add back opstate handler function.
2014-09-02 14:27:12 +00:00
Alexander V. Chernikov
0cba2b2802 Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.

Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
  each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
  table items. Currently table addition may required multiple UH drops/
  acquires which is quite tricky due to atomic table modificatio/swap
  support, shared array resize, etc. Deal with it by calling special
  notifier capable of rolling back state before actually performing
  swap/resize operations. Original operation then restarts itself after
  acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.

Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
  <skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
  New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..

Some examples:

3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
 kindex: 2, type: addr
 references: 0, valtype: skipto,limit,ipv4,ipv6
 algorithm: addr:radix
 items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
Alexander V. Chernikov
867708f7eb Simplify table reference/create chain. 2014-08-23 12:41:39 +00:00
Alexander V. Chernikov
4dff4ae028 * Use OP_ADD/OP_DEL macro instead of plain integers.
* ipfw_foreach_table_tentry() to permit listing
  arbitrary ipfw table using standart format.
2014-08-23 11:27:49 +00:00
Alexander V. Chernikov
4bbd15771b Make room for multi-type values in struct tentry. 2014-08-15 12:58:32 +00:00
Alexander V. Chernikov
c21034b744 Replace "cidr" table type with "addr" type.
Suggested by:	luigi
2014-08-14 21:43:20 +00:00
Alexander V. Chernikov
d3b00c08bc * Add cidr:kfib algo type just for fun. It binds kernel fib
of given number to a table.

Example:
# ipfw table fib2 create algo "cidr:kfib fib=2"
# ipfw table fib2 info
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
# ipfw table fib2 list
+++ table(fib2), set(0) +++
10.0.0.0/24 0
127.0.0.1/32 0
::/96 0
::1/128 0
::ffff:0.0.0.0/96 0
2a02:978:2::/112 0
fe80::/10 0
fe80:1::/64 0
fe80:2::/64 0
fe80:3::/64 0
ff02::/16 0
# ipfw table fib2 lookup 10.0.0.5
10.0.0.0/24 0
# ipfw table fib2 lookup 2a02:978:2::11
2a02:978:2::/112 0
# ipfw table fib2 detail
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
 IPv4 algorithm radix info
  items: 0 itemsize: 200
 IPv6 algorithm radix info
  items: 0 itemsize: 200
2014-08-14 20:17:23 +00:00
Alexander V. Chernikov
fd0869d547 * Document internal commands.
* Do not require/set default table type if algo name is specified.
* Add TA_FLAG_READONLY option for algorithms.
2014-08-14 17:31:04 +00:00
Alexander V. Chernikov
18ad419788 * Fix displaying dynamic rules for large rulesets.
* Clean up some comments.
2014-08-14 08:21:22 +00:00
Alexander V. Chernikov
fddbbf75c8 Fix assertion. 2014-08-13 16:53:12 +00:00
Alexander V. Chernikov
40e5f498de * Pass proper table set numbers from userland side.
* Ignore them, but honor V_fw_tables_sets value on kernel side.
2014-08-13 12:04:45 +00:00
Alexander V. Chernikov
c8d5d3088b * Clarify ipfw_swap_table operations
* Ensure <add|del>_table_entry handle ta change properly.
2014-08-12 17:03:13 +00:00
Alexander V. Chernikov
e5eec6dd21 * Rename ipfw_[un]bind_table_rule to ipfw_[un]ref_rule_tables
* Update their descriptions.
2014-08-12 16:08:13 +00:00
Alexander V. Chernikov
1940fa7727 Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
 O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
 O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.

The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.

This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
Alexander V. Chernikov
301290bc6d * Rename has_space to need_modify to be consistent with 0 as return values.
* document all callbacks supported by algorithms code.
2014-08-12 14:09:15 +00:00
Alexander V. Chernikov
f99fbf96c4 No functional changes, do better functions grouping. 2014-08-12 10:22:46 +00:00
Alexander V. Chernikov
0468c5bae9 Simplify table auto-creation for old userland users. 2014-08-12 09:48:54 +00:00
Alexander V. Chernikov
1bc0d45749 Simplify add/del_table_entry() by making their common pieces
common functions.
2014-08-11 22:38:13 +00:00
Alexander V. Chernikov
35e1bbd089 Update functions descriptions. 2014-08-11 20:00:51 +00:00
Alexander V. Chernikov
4f43138ade * Add the abilify to lock/unlock given table from changes.
Example:

# ipfw table si lock
# ipfw table si info
+++ table(si), set(0) +++
 kindex: 0, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:radix
 items: 0, size: 288
# ipfw table si add 4.5.6.7
ignored: 4.5.6.7/32 0
ipfw: Adding record failed: table is locked
# ipfw table si unlock
# ipfw table si add 4.5.6.7
added: 4.5.6.7/32 0
# ipfw table si lock
# ipfw table si delete 4.5.6.7
ignored: 4.5.6.7/32 0
ipfw: Deleting record failed: table is locked
# ipfw table si unlock
# ipfw table si delete 4.5.6.7
deleted: 4.5.6.7/32 0
2014-08-11 18:09:37 +00:00
Alexander V. Chernikov
3a845e1076 * Add support for batched add/delete for ipfw tables
* Add support for atomic batches add (all or none).
* Fix panic on deleting non-existing entry in radix algo.

Examples:

# si is empty
# ipfw table si add 1.1.1.1/32 1111 2.2.2.2/32 2222
added: 1.1.1.1/32 1111
added: 2.2.2.2/32 2222
# ipfw table si add 2.2.2.2/32 2200 4.4.4.4/32 4444
exists: 2.2.2.2/32 2200
added: 4.4.4.4/32 4444
ipfw: Adding record failed: record already exists
^^^^^ Returns error but keeps inserted items
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
# ipfw table si atomic add 3.3.3.3/32 3333 4.4.4.4/32 4400 5.5.5.5/32 5555
added(reverted): 3.3.3.3/32 3333
exists: 4.4.4.4/32 4400
ignored: 5.5.5.5/32 5555
ipfw: Adding record failed: record already exists
^^^^^ Returns error and reverts added records
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
2014-08-11 17:34:25 +00:00
Alexander V. Chernikov
030b184f10 * Use 2 32-bits field inside rule instead of 2 pointer to save skipto state.
* Introduce ipfw_reap_add() to unify unlinking rules/adding it to reap queue
* Unbreak FreeBSD7 export format.
2014-08-09 09:11:26 +00:00
Alexander V. Chernikov
720ee730c6 Kernel changes:
* Fix buffer calculation for table dumps
* Fix IPv6 radix entiries addition broken in r269371.

Userland changes:
* Fix bug in retrieving statric ruleset
* Fix several bugs in retrieving table list
2014-08-08 21:09:22 +00:00
Alexander V. Chernikov
8bd1921248 Partially revert previous commit:
"0" value is perfectly valid for O_SETFIB and O_SETDSCP,
  so tablearg remains to be 655535 for now.
2014-08-08 15:33:26 +00:00
Alexander V. Chernikov
2c452b20dd * Switch tablearg value from 65535 to 0.
* Use u16 table kidx instead of integer on for iface opcode.
* Provide compability layer for old clients.
2014-08-08 14:23:20 +00:00
Alexander V. Chernikov
adf3b2b9d8 * Add IP_FW_TABLE_XMODIFY opcode
* Since there seems to be lack of consensus on strict value typing,
  remove non-default value types. Use userland-only "value format type"
  to print values.

Kernel changes:
* Add IP_FW_XMODIFY to permit table run-time modifications.
  Currently we support changing limit and value format type.

Userland changes:
* Support IP_FW_XMODIFY opcode.
* Support specifying value format type (ftype) in tablble create/modify req
* Fine-print value type/value format type.
2014-08-08 09:27:49 +00:00
Alexander V. Chernikov
28ea4fa355 Remove IP_FW_TABLES_XGETSIZE opcode.
It is superseded by IP_FW_TABLES_XLIST.
2014-08-08 06:36:26 +00:00
Alexander V. Chernikov
a73d728d31 Kernel changes:
* Implement proper checks for switching between global and set-aware tables
* Split IP_FW_DEL mess into the following opcodes:
  * IP_FW_XDEL (del rules matching pattern)
  * IP_FW_XMOVE (move rules matching pattern to another set)
  * IP_FW_SET_SWAP (swap between 2 sets)
  * IP_FW_SET_MOVE (move one set to another one)
  * IP_FW_SET_ENABLE (enable/disable sets)
* Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration.
* Use unified ipfw_range_tlv as range description for all of the above.
* Check dynamic states IFF there was non-zero number of deleted dyn rules,
* Del relevant dynamic states with singe traversal instead of per-rule one.

Userland changes:
* Switch ipfw(8) to use new opcodes.
2014-08-07 21:37:31 +00:00
Alexander V. Chernikov
46d5200874 Implement atomic ipfw table swap.
Kernel changes:
* Add opcode IP_FW_TABLE_XSWAP
* Add support for swapping 2 tables with the same type/ftype/vtype.
* Make skipto cache init after ipfw locks init.

Userland changes:
* Add "table X swap Y" command.
2014-08-03 21:37:12 +00:00
Alexander V. Chernikov
5f379342d2 Show algorithm-specific data in "table info" output. 2014-08-03 12:19:45 +00:00
Alexander V. Chernikov
d20facb2f3 Remove unneded headers. 2014-08-03 09:48:54 +00:00
Alexander V. Chernikov
b6ee846e04 * Fix case when returning more that 4096 bytes of data
* Use different approach to ensure algo has enough space to store N elements:
  - explicitly ask algo (under UH_WLOCK) before/after insertion.  This (along
    with existing reallocation callbacks) really guarantees us that it is safe
    to insert N elements at once while holding UH_WLOCK+WLOCK.
  - remove old aflags/flags approach
2014-08-02 17:18:47 +00:00
Alexander V. Chernikov
4c0c07a552 * Permit limiting number of items in table.
Kernel changes:
* Add TEI_FLAGS_DONTADD entry flag to indicate that insert is not possible
* Support given flag in all algorithms
* Add "limit" field to ipfw_xtable_info
* Add actual limiting code into add_table_entry()

Userland changes:
* Add "limit" option as "create" table sub-option. Limit modification
  is currently impossible.
* Print human-readable errors in table enry addition/deletion code.
2014-08-01 15:17:46 +00:00
Alexander V. Chernikov
57a1cf954e * Use TA_FLAG_DEFAULT for default algorithm selection instead of
exporting algorithm structures directly.

* Pass needed state buffer size in algo structures as preparation
  for tables add/del requests batching.
2014-08-01 07:35:17 +00:00
Alexander V. Chernikov
914bffb6ab * Add new "flow" table type to support N=1..5-tuple lookups
* Add "flow:hash" algorithm

Kernel changes:
* Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups
* Add IPFW_TABLE_FLOW table type
* Add "struct tflow_entry" as strage for 6-tuple flows
* Add "flow:hash" algorithm. Basically it is auto-growing chained hash table.
  Additionally, we store mask of fields we need to compare in each instance/

* Increase ipfw_obj_tentry size by adding struct tflow_entry
* Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info
* Increase algoname length: 32 -> 64 (algo options passed there as string)
* Assume every table type can be customized by flags, use u8 to store "tflags" field.
* Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback.
* Fix bug in cidr:chash resize procedure.

Userland changes:
* add "flow table(NAME)" syntax to support n-tuple checking tables.
* make fill_flags() separate function to ease working with _s_x arrays
* change "table info" output to reflect longer "type" fields

Syntax:
ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash]

Examples:

0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash
0:02 [2] zfscurr0# ipfw table fl2 info
+++ table(fl2), set(0) +++
 kindex: 0, type: flow:src-ip,proto,dst-port
 valtype: number, references: 0
 algorithm: flow:hash
 items: 0, size: 280
0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000
0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000
0:02 [2] zfscurr0# ipfw table fl2 list
+++ table(fl2), set(0) +++
2a02:6b8::333,6,443 45000
10.0.0.92,6,80 22000
0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)'
00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
0:03 [2] zfscurr0# ipfw show
00200   0     0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 617 59416 allow ip from any to any
0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80
Trying 78.46.89.105...
..
0:04 [2] zfscurr0# ipfw show
00200   5   272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 682 66733 allow ip from any to any
2014-07-31 20:08:19 +00:00
Alexander V. Chernikov
b23d5de9b6 * Add number:array algorithm lookup method.
Kernel changes:
* s/IPFW_TABLE_U32/IPFW_TABLE_NUMBER/
* Force "lookup <port|uid|gid|jid>" to be IPFW_TABLE_NUMBER
* Support "lookup" method for number tables
* Add number:array algorihm (i32 as key, auto-growing).

Userland changes:
* Support named tables in "lookup <tag> Table"
* Fix handling of "table(NAME,val)" case
* Support printing "number" table data.
2014-07-30 14:52:26 +00:00
Alexander V. Chernikov
daabb523cc Fix "flush" cmd for algorithms wih non-default parameters. 2014-07-30 09:17:40 +00:00