Kernel changes:
* Change dump format for dynamic states:
each state is now stored inside ipfw_obj_dyntlv
last dynamic state is indicated by IPFW_DF_LAST flag
* Do not perform sooptcopyout() for !SOPT_GET requests.
Userland changes:
* Introduce foreach_state() function handler to ease work
with different states passed by ipfw_dump_config().
* Bump table dump format preserving old ABI.
Kernel size:
* Add IP_FW_TABLE_XFIND to handle "lookup" request from userland.
* Add ta_find_tentry() algorithm callbacks/handlers to support lookups.
* Fully switch to ipfw_obj_tentry for various table dumps:
algorithms are now required to support the latest (ipfw_obj_tentry) entry
dump format, the rest is handled by generic dump code.
IP_FW_TABLE_XLIST opcode version bumped (0 -> 1).
* Eliminate legacy ta_dump_entry algo handler:
dump_table_entry() converts data from current to legacy format.
Userland side:
* Add "lookup" table parameter.
* Change the way table type is guessed: call table_get_info() first,
and check value for IPv4/IPv6 type IFF table does not exist.
* Fix table_get_list(): do more tries if supplied buffer is not enough.
* Sparate table_show_entry() from table_show_list().
Convert each unresolved table as table 65535 (which cannot be used normally).
* Perform s/^ipfw_// for add_table_entry, del_table_entry and flush_table since
these are internal functions exported to keep legacy interface.
* Remove macro TABLE_SET. Operations with tables can be done in any set, the only
thing net.inet.ip.fw.tables_sets affects is the set in which tables are looked
up while binding them to the rule.
Kernel changes:
* Introduce ipfw_obj_tentry table entry structure to force u64 alignment.
* Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED).
* Use "subtype" field to distingush between IPv4 and IPv6 table records
instead of previous hack.
* Add value type (vtype) field for kernel tables. Current types are
number,ip and dscp
* Fix sets mask retrieval for old binaries
* Fix crash while using interface tables
Userland changes:
* Switch ipfw_table_handler() to use named-only tables.
* Add "table NAME create [type {cidr|iface|u32} [valtype {number|ip|dscp}] ..."
* Switch ipfw_table_handler to match_token()-based parser.
* Switch ipfw_sets_handler to use new ipfw_get_config() for mask retrieval.
* Allow ipfw set X table ... syntax to permit using per-set table namespaces.
Kernel changes:
* change base TLV header to be u64 (so size can be u32).
* Introduce ipfw_obj_ctlv generc container TLV.
* Add IP_FW_XGET opcode which is now used for atomic configuration
retrieval. One can specify needed configuration pieces to retrieve
via flags field. Currently supported are
IPFW_CFG_GET_STATIC (static rules) and
IPFW_CFG_GET_STATES (dynamic states).
Other configuration pieces (tables, pipes, etc..) support is planned.
Userland changes:
* Switch ipfw(8) to use new IP_FW_XGET for rule listing.
* Split rule listing code get and show pieces.
* Make several steps forward towards libipfw:
permit printing states and rules(paritally) to supplied buffer.
do not die on malloc/kernel failure inside given printing functions.
stop assuming cmdline_opts is global symbol.
Instead of trying to allocate bing contiguous chunk of memory,
use intermediate-sized (page size) buffer as sliding window
reducing number of sooptcopyout() calls to perform.
This reduces dump functions complexity and provides additional
layer of abstraction.
User-visible api consists of 2 functions:
ipfw_get_sopt_space() - gets contigious amount of storage (or NULL)
and
ipfw_get_sopt_header() - the same, but zeroes the rest of the buffer.
* Add 'algoname' string to ipfw_xtable_info permitting to specify lookup
algoritm with parameters.
* Rework part of ipfw_rewrite_table_uidx()
Sponsored by: Yandex LLC
* Use one u16 from op3 header to implement opcode versioning.
* IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current).
* Every getsockopt request is now handled in ip_fw_table.c
* Rename new opcodes:
IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY
IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE
IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST
IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO
IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH
* Add some docs about using given opcodes.
* Group some legacy opcode/handlers.
Kernel changes:
* Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set)
* Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set)
* Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables)
Userland changes:
* move tables code to separate tables.c file
* get rid of tables_max
* switch "all"/list handling to new opcodes
Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
new ip_fw_table_algo.c file.
Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+ char name[64];
+ int idx;
+ ta_init *init;
+ ta_destroy *destroy;
+ table_lookup_t *lookup;
+ ta_prepare_add *prepare_add;
+ ta_prepare_del *prepare_del;
+ ta_add *add;
+ ta_del *del;
+ ta_flush_entry *flush_entry;
+ ta_foreach *foreach;
+ ta_dump_entry *dump_entry;
+ ta_dump_xentry *dump_xentry;
+};
* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
->tablestate pointer (array of 32 bytes structures necessary for
runtime lookups (can be probably shrinked to 16 bytes later):
+struct table_info {
+ table_lookup_t *lookup; /* Lookup function */
+ void *state; /* Lookup radix/other structure */
+ void *xstate; /* eXtended state */
+ u_long data; /* Hints for given func */
+};
* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().
* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).
Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)
Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode
Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize
Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced
Sponsored by: Yandex LLC
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().
Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.
PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
if we don't find any.
Tested by: Ian FREISLICH <ianf cloudseed.co.za>
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
them into rule.
The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.
This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.
Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().
Reported by: danger
De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.
MFC after: 1 week
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.
Reviewed by: Nikos Vassiliadis, trociny@
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.
Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
race prone. Some just gather statistics, but some are later used in
different calculations.
A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.
Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.
Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.
Thanks to: Dennis Yusupoff <dyr smartspb.net>
Also reported by: dumbbell, pgj, Rambler
Sponsored by: Nginx, Inc.
LibAliasSetAddress() uses its own mutex to serialize changes.
While here, convert ifp->if_xname access to if_name() function.
MFC after: 2 weeks
Sponsored by: Yandex LLC