2009-12-15 21:24:12 +00:00
|
|
|
/*-
|
2017-11-27 15:23:17 +00:00
|
|
|
* SPDX-License-Identifier: BSD-2-Clause-FreeBSD
|
|
|
|
*
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
* Copyright (c) 2002-2009 Luigi Rizzo, Universita` di Pisa
|
2014-09-05 14:19:02 +00:00
|
|
|
* Copyright (c) 2014 Yandex LLC
|
|
|
|
* Copyright (c) 2014 Alexander V. Chernikov
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
*
|
|
|
|
* Supported by: Valeria Paoli
|
2009-12-15 21:24:12 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
|
|
|
/*
|
2014-09-05 14:19:02 +00:00
|
|
|
* Control socket and rule management routines for ipfw.
|
|
|
|
* Control is currently implemented via IP_FW3 setsockopt() code.
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include "opt_ipfw.h"
|
|
|
|
#include "opt_inet.h"
|
|
|
|
#ifndef INET
|
|
|
|
#error IPFIREWALL requires INET.
|
|
|
|
#endif /* INET */
|
|
|
|
#include "opt_inet6.h"
|
|
|
|
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/malloc.h>
|
2009-12-16 13:00:37 +00:00
|
|
|
#include <sys/mbuf.h> /* struct m_tag used by nested headers */
|
2009-12-15 21:24:12 +00:00
|
|
|
#include <sys/kernel.h>
|
|
|
|
#include <sys/lock.h>
|
|
|
|
#include <sys/priv.h>
|
|
|
|
#include <sys/proc.h>
|
|
|
|
#include <sys/rwlock.h>
|
2014-10-04 11:40:35 +00:00
|
|
|
#include <sys/rmlock.h>
|
2009-12-15 21:24:12 +00:00
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/socketvar.h>
|
|
|
|
#include <sys/sysctl.h>
|
|
|
|
#include <sys/syslog.h>
|
2014-06-12 09:59:11 +00:00
|
|
|
#include <sys/fnv_hash.h>
|
2009-12-15 21:24:12 +00:00
|
|
|
#include <net/if.h>
|
|
|
|
#include <net/route.h>
|
|
|
|
#include <net/vnet.h>
|
2014-10-09 12:37:53 +00:00
|
|
|
#include <vm/vm.h>
|
|
|
|
#include <vm/vm_extern.h>
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
#include <netinet/in.h>
|
2010-01-07 10:08:05 +00:00
|
|
|
#include <netinet/ip_var.h> /* hooks */
|
2009-12-15 21:24:12 +00:00
|
|
|
#include <netinet/ip_fw.h>
|
2012-09-14 11:51:49 +00:00
|
|
|
|
|
|
|
#include <netpfil/ipfw/ip_fw_private.h>
|
2014-06-14 11:13:02 +00:00
|
|
|
#include <netpfil/ipfw/ip_fw_table.h>
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
#ifdef MAC
|
|
|
|
#include <security/mac/mac_framework.h>
|
|
|
|
#endif
|
|
|
|
|
2014-08-07 22:08:43 +00:00
|
|
|
static int ipfw_ctl(struct sockopt *sopt);
|
2014-07-08 23:11:15 +00:00
|
|
|
static int check_ipfw_rule_body(ipfw_insn *cmd, int cmd_len,
|
|
|
|
struct rule_check_info *ci);
|
|
|
|
static int check_ipfw_rule1(struct ip_fw_rule *rule, int size,
|
|
|
|
struct rule_check_info *ci);
|
|
|
|
static int check_ipfw_rule0(struct ip_fw_rule0 *rule, int size,
|
|
|
|
struct rule_check_info *ci);
|
2016-04-14 20:49:27 +00:00
|
|
|
static int rewrite_rule_uidx(struct ip_fw_chain *chain,
|
|
|
|
struct rule_check_info *ci);
|
2014-07-08 23:11:15 +00:00
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
#define NAMEDOBJ_HASH_SIZE 32
|
|
|
|
|
|
|
|
struct namedobj_instance {
|
|
|
|
struct namedobjects_head *names;
|
|
|
|
struct namedobjects_head *values;
|
|
|
|
uint32_t nn_size; /* names hash size */
|
|
|
|
uint32_t nv_size; /* number hash size */
|
|
|
|
u_long *idx_mask; /* used items bitmask */
|
|
|
|
uint32_t max_blocks; /* number of "long" blocks in bitmask */
|
2014-06-14 10:58:39 +00:00
|
|
|
uint32_t count; /* number of items */
|
2014-06-12 09:59:11 +00:00
|
|
|
uint16_t free_off[IPFW_MAX_SETS]; /* first possible free offset */
|
2014-08-30 17:18:11 +00:00
|
|
|
objhash_hash_f *hash_f;
|
|
|
|
objhash_cmp_f *cmp_f;
|
2014-06-12 09:59:11 +00:00
|
|
|
};
|
|
|
|
#define BLOCK_ITEMS (8 * sizeof(u_long)) /* Number of items for ffsl() */
|
|
|
|
|
2016-04-14 22:51:23 +00:00
|
|
|
static uint32_t objhash_hash_name(struct namedobj_instance *ni,
|
|
|
|
const void *key, uint32_t kopt);
|
2014-08-30 17:18:11 +00:00
|
|
|
static uint32_t objhash_hash_idx(struct namedobj_instance *ni, uint32_t val);
|
2016-04-14 22:51:23 +00:00
|
|
|
static int objhash_cmp_name(struct named_object *no, const void *name,
|
|
|
|
uint32_t set);
|
2014-06-12 09:59:11 +00:00
|
|
|
|
2014-09-05 11:11:15 +00:00
|
|
|
MALLOC_DEFINE(M_IPFW, "IpFw/IpAcct", "IpFw/IpAcct chain's");
|
|
|
|
|
|
|
|
static int dump_config(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
static int add_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
static int del_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
static int clear_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
static int move_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
static int manage_sets(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
2014-10-08 11:12:14 +00:00
|
|
|
static int dump_soptcodes(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
2015-11-03 10:21:53 +00:00
|
|
|
static int dump_srvobjects(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd);
|
2014-09-05 11:11:15 +00:00
|
|
|
|
|
|
|
/* ctl3 handler data */
|
|
|
|
struct mtx ctl3_lock;
|
|
|
|
#define CTL3_LOCK_INIT() mtx_init(&ctl3_lock, "ctl3_lock", NULL, MTX_DEF)
|
|
|
|
#define CTL3_LOCK_DESTROY() mtx_destroy(&ctl3_lock)
|
|
|
|
#define CTL3_LOCK() mtx_lock(&ctl3_lock)
|
|
|
|
#define CTL3_UNLOCK() mtx_unlock(&ctl3_lock)
|
|
|
|
|
|
|
|
static struct ipfw_sopt_handler *ctl3_handlers;
|
|
|
|
static size_t ctl3_hsize;
|
|
|
|
static uint64_t ctl3_refct, ctl3_gencnt;
|
|
|
|
#define CTL3_SMALLBUF 4096 /* small page-size write buffer */
|
|
|
|
#define CTL3_LARGEBUF 16 * 1024 * 1024 /* handle large rulesets */
|
|
|
|
|
2014-06-27 10:07:00 +00:00
|
|
|
static int ipfw_flush_sopt_data(struct sockopt_data *sd);
|
2014-06-12 09:59:11 +00:00
|
|
|
|
2014-09-05 11:11:15 +00:00
|
|
|
static struct ipfw_sopt_handler scodes[] = {
|
|
|
|
{ IP_FW_XGET, 0, HDIR_GET, dump_config },
|
|
|
|
{ IP_FW_XADD, 0, HDIR_BOTH, add_rules },
|
|
|
|
{ IP_FW_XDEL, 0, HDIR_BOTH, del_rules },
|
|
|
|
{ IP_FW_XZERO, 0, HDIR_SET, clear_rules },
|
|
|
|
{ IP_FW_XRESETLOG, 0, HDIR_SET, clear_rules },
|
|
|
|
{ IP_FW_XMOVE, 0, HDIR_SET, move_rules },
|
|
|
|
{ IP_FW_SET_SWAP, 0, HDIR_SET, manage_sets },
|
|
|
|
{ IP_FW_SET_MOVE, 0, HDIR_SET, manage_sets },
|
|
|
|
{ IP_FW_SET_ENABLE, 0, HDIR_SET, manage_sets },
|
2014-10-08 11:12:14 +00:00
|
|
|
{ IP_FW_DUMP_SOPTCODES, 0, HDIR_GET, dump_soptcodes },
|
2015-11-03 10:21:53 +00:00
|
|
|
{ IP_FW_DUMP_SRVOBJECTS,0, HDIR_GET, dump_srvobjects },
|
2014-09-05 11:11:15 +00:00
|
|
|
};
|
2009-12-16 10:48:40 +00:00
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
static int
|
|
|
|
set_legacy_obj_kidx(struct ip_fw_chain *ch, struct ip_fw_rule0 *rule);
|
2016-04-14 21:31:16 +00:00
|
|
|
static struct opcode_obj_rewrite *find_op_rw(ipfw_insn *cmd,
|
|
|
|
uint16_t *puidx, uint8_t *ptype);
|
2016-04-14 20:49:27 +00:00
|
|
|
static int ref_rule_objects(struct ip_fw_chain *ch, struct ip_fw *rule,
|
|
|
|
struct rule_check_info *ci, struct obj_idx *oib, struct tid_info *ti);
|
|
|
|
static int ref_opcode_object(struct ip_fw_chain *ch, ipfw_insn *cmd,
|
2016-04-14 21:31:16 +00:00
|
|
|
struct tid_info *ti, struct obj_idx *pidx, int *unresolved);
|
2015-04-27 08:29:39 +00:00
|
|
|
static void unref_rule_objects(struct ip_fw_chain *chain, struct ip_fw *rule);
|
2016-04-14 20:49:27 +00:00
|
|
|
static void unref_oib_objects(struct ip_fw_chain *ch, ipfw_insn *cmd,
|
|
|
|
struct obj_idx *oib, struct obj_idx *end);
|
2015-04-27 08:29:39 +00:00
|
|
|
static int export_objhash_ntlv(struct namedobj_instance *ni, uint16_t kidx,
|
|
|
|
struct sockopt_data *sd);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Opcode object rewriter variables
|
|
|
|
*/
|
|
|
|
struct opcode_obj_rewrite *ctl3_rewriters;
|
|
|
|
static size_t ctl3_rsize;
|
|
|
|
|
2009-12-16 10:48:40 +00:00
|
|
|
/*
|
2014-07-08 23:11:15 +00:00
|
|
|
* static variables followed by global ones
|
2009-12-16 10:48:40 +00:00
|
|
|
*/
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2018-07-24 16:35:52 +00:00
|
|
|
VNET_DEFINE_STATIC(uma_zone_t, ipfw_cntr_zone);
|
2014-07-08 23:11:15 +00:00
|
|
|
#define V_ipfw_cntr_zone VNET(ipfw_cntr_zone)
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_init_counters()
|
|
|
|
{
|
|
|
|
|
|
|
|
V_ipfw_cntr_zone = uma_zcreate("IPFW counters",
|
2014-10-18 17:23:41 +00:00
|
|
|
IPFW_RULE_CNTR_SIZE, NULL, NULL, NULL, NULL,
|
2014-07-08 23:11:15 +00:00
|
|
|
UMA_ALIGN_PTR, UMA_ZONE_PCPU);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_destroy_counters()
|
|
|
|
{
|
|
|
|
|
|
|
|
uma_zdestroy(V_ipfw_cntr_zone);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct ip_fw *
|
|
|
|
ipfw_alloc_rule(struct ip_fw_chain *chain, size_t rulesize)
|
|
|
|
{
|
|
|
|
struct ip_fw *rule;
|
|
|
|
|
|
|
|
rule = malloc(rulesize, M_IPFW, M_WAITOK | M_ZERO);
|
2018-06-08 21:40:03 +00:00
|
|
|
rule->cntr = uma_zalloc_pcpu(V_ipfw_cntr_zone, M_WAITOK | M_ZERO);
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
rule->refcnt = 1;
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
return (rule);
|
|
|
|
}
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
void
|
|
|
|
ipfw_free_rule(struct ip_fw *rule)
|
2014-07-08 23:11:15 +00:00
|
|
|
{
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
/*
|
|
|
|
* We don't release refcnt here, since this function
|
|
|
|
* can be called without any locks held. The caller
|
|
|
|
* must release reference under IPFW_UH_WLOCK, and then
|
|
|
|
* call this function if refcount becomes 1.
|
|
|
|
*/
|
|
|
|
if (rule->refcnt > 1)
|
|
|
|
return;
|
2018-06-08 21:40:03 +00:00
|
|
|
uma_zfree_pcpu(V_ipfw_cntr_zone, rule->cntr);
|
2014-07-08 23:11:15 +00:00
|
|
|
free(rule, M_IPFW);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
* Find the smallest rule >= key, id.
|
|
|
|
* We could use bsearch but it is so simple that we code it directly
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
int
|
|
|
|
ipfw_find_rule(struct ip_fw_chain *chain, uint32_t key, uint32_t id)
|
2009-12-15 21:24:12 +00:00
|
|
|
{
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
int i, lo, hi;
|
|
|
|
struct ip_fw *r;
|
|
|
|
|
|
|
|
for (lo = 0, hi = chain->n_rules - 1; lo < hi;) {
|
|
|
|
i = (lo + hi) / 2;
|
|
|
|
r = chain->map[i];
|
|
|
|
if (r->rulenum < key)
|
|
|
|
lo = i + 1; /* continue from the next one */
|
|
|
|
else if (r->rulenum > key)
|
|
|
|
hi = i; /* this might be good */
|
|
|
|
else if (r->id < id)
|
|
|
|
lo = i + 1; /* continue from the next one */
|
|
|
|
else /* r->id >= id */
|
|
|
|
hi = i; /* this might be good */
|
2016-04-10 23:07:00 +00:00
|
|
|
}
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
return hi;
|
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2014-08-03 15:49:03 +00:00
|
|
|
/*
|
|
|
|
* Builds skipto cache on rule set @map.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
update_skipto_cache(struct ip_fw_chain *chain, struct ip_fw **map)
|
|
|
|
{
|
|
|
|
int *smap, rulenum;
|
|
|
|
int i, mi;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(chain);
|
|
|
|
|
|
|
|
mi = 0;
|
|
|
|
rulenum = map[mi]->rulenum;
|
|
|
|
smap = chain->idxmap_back;
|
|
|
|
|
|
|
|
if (smap == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < 65536; i++) {
|
|
|
|
smap[i] = mi;
|
|
|
|
/* Use the same rule index until i < rulenum */
|
|
|
|
if (i != rulenum || i == 65535)
|
|
|
|
continue;
|
|
|
|
/* Find next rule with num > i */
|
|
|
|
rulenum = map[++mi]->rulenum;
|
|
|
|
while (rulenum == i)
|
|
|
|
rulenum = map[++mi]->rulenum;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Swaps prepared (backup) index with current one.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
swap_skipto_cache(struct ip_fw_chain *chain)
|
|
|
|
{
|
|
|
|
int *map;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(chain);
|
|
|
|
IPFW_WLOCK_ASSERT(chain);
|
|
|
|
|
|
|
|
map = chain->idxmap;
|
|
|
|
chain->idxmap = chain->idxmap_back;
|
|
|
|
chain->idxmap_back = map;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate and initialize skipto cache.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_init_skipto_cache(struct ip_fw_chain *chain)
|
|
|
|
{
|
|
|
|
int *idxmap, *idxmap_back;
|
|
|
|
|
2018-07-12 11:38:18 +00:00
|
|
|
idxmap = malloc(65536 * sizeof(int), M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
idxmap_back = malloc(65536 * sizeof(int), M_IPFW, M_WAITOK);
|
2014-08-03 15:49:03 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Note we may be called at any time after initialization,
|
|
|
|
* for example, on first skipto rule, so we need to
|
|
|
|
* provide valid chain->idxmap on return
|
|
|
|
*/
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK(chain);
|
|
|
|
if (chain->idxmap != NULL) {
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
free(idxmap, M_IPFW);
|
|
|
|
free(idxmap_back, M_IPFW);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Set backup pointer first to permit building cache */
|
|
|
|
chain->idxmap_back = idxmap_back;
|
|
|
|
update_skipto_cache(chain, chain->map);
|
|
|
|
IPFW_WLOCK(chain);
|
|
|
|
/* It is now safe to set chain->idxmap ptr */
|
|
|
|
chain->idxmap = idxmap;
|
|
|
|
swap_skipto_cache(chain);
|
|
|
|
IPFW_WUNLOCK(chain);
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Destroys skipto cache.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_destroy_skipto_cache(struct ip_fw_chain *chain)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (chain->idxmap != NULL)
|
|
|
|
free(chain->idxmap, M_IPFW);
|
|
|
|
if (chain->idxmap != NULL)
|
|
|
|
free(chain->idxmap_back, M_IPFW);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
/*
|
|
|
|
* allocate a new map, returns the chain locked. extra is the number
|
|
|
|
* of entries to add or delete.
|
|
|
|
*/
|
|
|
|
static struct ip_fw **
|
|
|
|
get_map(struct ip_fw_chain *chain, int extra, int locked)
|
|
|
|
{
|
2009-12-15 21:24:12 +00:00
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
for (;;) {
|
|
|
|
struct ip_fw **map;
|
2018-01-22 02:08:10 +00:00
|
|
|
u_int i, mflags;
|
2014-08-07 21:37:31 +00:00
|
|
|
|
|
|
|
mflags = M_ZERO | ((locked != 0) ? M_NOWAIT : M_WAITOK);
|
2009-12-15 21:24:12 +00:00
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
i = chain->n_rules + extra;
|
2014-08-07 21:37:31 +00:00
|
|
|
map = malloc(i * sizeof(struct ip_fw *), M_IPFW, mflags);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
if (map == NULL) {
|
|
|
|
printf("%s: cannot allocate map\n", __FUNCTION__);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
if (!locked)
|
|
|
|
IPFW_UH_WLOCK(chain);
|
|
|
|
if (i >= chain->n_rules + extra) /* good */
|
|
|
|
return map;
|
|
|
|
/* otherwise we lost the race, free and retry */
|
|
|
|
if (!locked)
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
free(map, M_IPFW);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* swap the maps. It is supposed to be called with IPFW_UH_WLOCK
|
|
|
|
*/
|
|
|
|
static struct ip_fw **
|
|
|
|
swap_map(struct ip_fw_chain *chain, struct ip_fw **new_map, int new_len)
|
|
|
|
{
|
|
|
|
struct ip_fw **old_map;
|
|
|
|
|
|
|
|
IPFW_WLOCK(chain);
|
|
|
|
chain->id++;
|
|
|
|
chain->n_rules = new_len;
|
|
|
|
old_map = chain->map;
|
|
|
|
chain->map = new_map;
|
2014-08-03 15:49:03 +00:00
|
|
|
swap_skipto_cache(chain);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_WUNLOCK(chain);
|
|
|
|
return old_map;
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
static void
|
|
|
|
export_cntr1_base(struct ip_fw *krule, struct ip_fw_bcounter *cntr)
|
|
|
|
{
|
2016-07-27 11:08:59 +00:00
|
|
|
struct timeval boottime;
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
cntr->size = sizeof(*cntr);
|
|
|
|
|
|
|
|
if (krule->cntr != NULL) {
|
|
|
|
cntr->pcnt = counter_u64_fetch(krule->cntr);
|
|
|
|
cntr->bcnt = counter_u64_fetch(krule->cntr + 1);
|
|
|
|
cntr->timestamp = krule->timestamp;
|
|
|
|
}
|
2016-07-27 11:08:59 +00:00
|
|
|
if (cntr->timestamp > 0) {
|
|
|
|
getboottime(&boottime);
|
2014-07-08 23:11:15 +00:00
|
|
|
cntr->timestamp += boottime.tv_sec;
|
2016-07-27 11:08:59 +00:00
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
export_cntr0_base(struct ip_fw *krule, struct ip_fw_bcounter0 *cntr)
|
|
|
|
{
|
2016-07-27 11:08:59 +00:00
|
|
|
struct timeval boottime;
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
if (krule->cntr != NULL) {
|
|
|
|
cntr->pcnt = counter_u64_fetch(krule->cntr);
|
|
|
|
cntr->bcnt = counter_u64_fetch(krule->cntr + 1);
|
|
|
|
cntr->timestamp = krule->timestamp;
|
|
|
|
}
|
2016-07-27 11:08:59 +00:00
|
|
|
if (cntr->timestamp > 0) {
|
|
|
|
getboottime(&boottime);
|
2014-07-08 23:11:15 +00:00
|
|
|
cntr->timestamp += boottime.tv_sec;
|
2016-07-27 11:08:59 +00:00
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
}
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
* Copies rule @urule from v1 userland format (current).
|
2014-07-08 23:11:15 +00:00
|
|
|
* to kernel @krule.
|
|
|
|
* Assume @krule is zeroed.
|
2014-06-29 22:35:47 +00:00
|
|
|
*/
|
|
|
|
static void
|
2014-07-08 23:11:15 +00:00
|
|
|
import_rule1(struct rule_check_info *ci)
|
2014-06-29 22:35:47 +00:00
|
|
|
{
|
2014-07-08 23:11:15 +00:00
|
|
|
struct ip_fw_rule *urule;
|
|
|
|
struct ip_fw *krule;
|
|
|
|
|
|
|
|
urule = (struct ip_fw_rule *)ci->urule;
|
|
|
|
krule = (struct ip_fw *)ci->krule;
|
|
|
|
|
|
|
|
/* copy header */
|
|
|
|
krule->act_ofs = urule->act_ofs;
|
|
|
|
krule->cmd_len = urule->cmd_len;
|
|
|
|
krule->rulenum = urule->rulenum;
|
|
|
|
krule->set = urule->set;
|
|
|
|
krule->flags = urule->flags;
|
|
|
|
|
|
|
|
/* Save rulenum offset */
|
|
|
|
ci->urule_numoff = offsetof(struct ip_fw_rule, rulenum);
|
|
|
|
|
|
|
|
/* Copy opcodes */
|
|
|
|
memcpy(krule->cmd, urule->cmd, krule->cmd_len * sizeof(uint32_t));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Export rule into v1 format (Current).
|
|
|
|
* Layout:
|
|
|
|
* [ ipfw_obj_tlv(IPFW_TLV_RULE_ENT)
|
|
|
|
* [ ip_fw_rule ] OR
|
|
|
|
* [ ip_fw_bcounter ip_fw_rule] (depends on rcntrs).
|
|
|
|
* ]
|
|
|
|
* Assume @data is zeroed.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
export_rule1(struct ip_fw *krule, caddr_t data, int len, int rcntrs)
|
|
|
|
{
|
|
|
|
struct ip_fw_bcounter *cntr;
|
|
|
|
struct ip_fw_rule *urule;
|
|
|
|
ipfw_obj_tlv *tlv;
|
|
|
|
|
|
|
|
/* Fill in TLV header */
|
|
|
|
tlv = (ipfw_obj_tlv *)data;
|
|
|
|
tlv->type = IPFW_TLV_RULE_ENT;
|
|
|
|
tlv->length = len;
|
|
|
|
|
|
|
|
if (rcntrs != 0) {
|
|
|
|
/* Copy counters */
|
|
|
|
cntr = (struct ip_fw_bcounter *)(tlv + 1);
|
|
|
|
urule = (struct ip_fw_rule *)(cntr + 1);
|
|
|
|
export_cntr1_base(krule, cntr);
|
|
|
|
} else
|
|
|
|
urule = (struct ip_fw_rule *)(tlv + 1);
|
|
|
|
|
|
|
|
/* copy header */
|
|
|
|
urule->act_ofs = krule->act_ofs;
|
|
|
|
urule->cmd_len = krule->cmd_len;
|
|
|
|
urule->rulenum = krule->rulenum;
|
|
|
|
urule->set = krule->set;
|
|
|
|
urule->flags = krule->flags;
|
|
|
|
urule->id = krule->id;
|
|
|
|
|
|
|
|
/* Copy opcodes */
|
|
|
|
memcpy(urule->cmd, krule->cmd, krule->cmd_len * sizeof(uint32_t));
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Copies rule @urule from FreeBSD8 userland format (v0)
|
|
|
|
* to kernel @krule.
|
|
|
|
* Assume @krule is zeroed.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
import_rule0(struct rule_check_info *ci)
|
|
|
|
{
|
|
|
|
struct ip_fw_rule0 *urule;
|
|
|
|
struct ip_fw *krule;
|
2014-08-08 14:23:20 +00:00
|
|
|
int cmdlen, l;
|
|
|
|
ipfw_insn *cmd;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
ipfw_insn_limit *lcmd;
|
2014-08-08 14:23:20 +00:00
|
|
|
ipfw_insn_if *cmdif;
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
urule = (struct ip_fw_rule0 *)ci->urule;
|
|
|
|
krule = (struct ip_fw *)ci->krule;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
/* copy header */
|
|
|
|
krule->act_ofs = urule->act_ofs;
|
|
|
|
krule->cmd_len = urule->cmd_len;
|
|
|
|
krule->rulenum = urule->rulenum;
|
|
|
|
krule->set = urule->set;
|
|
|
|
if ((urule->_pad & 1) != 0)
|
|
|
|
krule->flags |= IPFW_RULE_NOOPT;
|
|
|
|
|
|
|
|
/* Save rulenum offset */
|
|
|
|
ci->urule_numoff = offsetof(struct ip_fw_rule0, rulenum);
|
|
|
|
|
|
|
|
/* Copy opcodes */
|
|
|
|
memcpy(krule->cmd, urule->cmd, krule->cmd_len * sizeof(uint32_t));
|
2014-08-08 14:23:20 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Alter opcodes:
|
2016-08-11 10:10:10 +00:00
|
|
|
* 1) convert tablearg value from 65535 to 0
|
|
|
|
* 2) Add high bit to O_SETFIB/O_SETDSCP values (to make room
|
|
|
|
* for targ).
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
* 3) convert table number in iface opcodes to u16
|
2016-08-11 10:10:10 +00:00
|
|
|
* 4) convert old `nat global` into new 65535
|
2014-08-08 14:23:20 +00:00
|
|
|
*/
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
l = krule->cmd_len;
|
|
|
|
cmd = krule->cmd;
|
2014-08-08 14:23:20 +00:00
|
|
|
cmdlen = 0;
|
|
|
|
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
|
|
|
|
switch (cmd->opcode) {
|
|
|
|
/* Opcodes supporting tablearg */
|
|
|
|
case O_TAG:
|
|
|
|
case O_TAGGED:
|
|
|
|
case O_PIPE:
|
|
|
|
case O_QUEUE:
|
|
|
|
case O_DIVERT:
|
|
|
|
case O_TEE:
|
|
|
|
case O_SKIPTO:
|
|
|
|
case O_CALLRETURN:
|
|
|
|
case O_NETGRAPH:
|
|
|
|
case O_NGTEE:
|
|
|
|
case O_NAT:
|
2016-08-11 10:10:10 +00:00
|
|
|
if (cmd->arg1 == IP_FW_TABLEARG)
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
cmd->arg1 = IP_FW_TARG;
|
2016-08-11 10:10:10 +00:00
|
|
|
else if (cmd->arg1 == 0)
|
|
|
|
cmd->arg1 = IP_FW_NAT44_GLOBAL;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
break;
|
|
|
|
case O_SETFIB:
|
|
|
|
case O_SETDSCP:
|
2016-08-11 10:10:10 +00:00
|
|
|
if (cmd->arg1 == IP_FW_TABLEARG)
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
cmd->arg1 = IP_FW_TARG;
|
|
|
|
else
|
|
|
|
cmd->arg1 |= 0x8000;
|
|
|
|
break;
|
|
|
|
case O_LIMIT:
|
|
|
|
lcmd = (ipfw_insn_limit *)cmd;
|
2016-08-11 10:10:10 +00:00
|
|
|
if (lcmd->conn_limit == IP_FW_TABLEARG)
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
lcmd->conn_limit = IP_FW_TARG;
|
|
|
|
break;
|
|
|
|
/* Interface tables */
|
|
|
|
case O_XMIT:
|
|
|
|
case O_RECV:
|
|
|
|
case O_VIA:
|
|
|
|
/* Interface table, possibly */
|
|
|
|
cmdif = (ipfw_insn_if *)cmd;
|
|
|
|
if (cmdif->name[0] != '\1')
|
|
|
|
break;
|
|
|
|
|
|
|
|
cmdif->p.kidx = (uint16_t)cmdif->p.glob;
|
|
|
|
break;
|
|
|
|
}
|
2014-08-08 14:23:20 +00:00
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
}
|
|
|
|
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
/*
|
|
|
|
* Copies rule @krule from kernel to FreeBSD8 userland format (v0)
|
|
|
|
*/
|
2014-07-08 23:11:15 +00:00
|
|
|
static void
|
|
|
|
export_rule0(struct ip_fw *krule, struct ip_fw_rule0 *urule, int len)
|
|
|
|
{
|
2014-08-08 14:23:20 +00:00
|
|
|
int cmdlen, l;
|
|
|
|
ipfw_insn *cmd;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
ipfw_insn_limit *lcmd;
|
2014-08-08 14:23:20 +00:00
|
|
|
ipfw_insn_if *cmdif;
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
/* copy header */
|
|
|
|
memset(urule, 0, len);
|
|
|
|
urule->act_ofs = krule->act_ofs;
|
|
|
|
urule->cmd_len = krule->cmd_len;
|
|
|
|
urule->rulenum = krule->rulenum;
|
|
|
|
urule->set = krule->set;
|
|
|
|
if ((krule->flags & IPFW_RULE_NOOPT) != 0)
|
|
|
|
urule->_pad |= 1;
|
|
|
|
|
|
|
|
/* Copy opcodes */
|
|
|
|
memcpy(urule->cmd, krule->cmd, krule->cmd_len * sizeof(uint32_t));
|
|
|
|
|
|
|
|
/* Export counters */
|
|
|
|
export_cntr0_base(krule, (struct ip_fw_bcounter0 *)&urule->pcnt);
|
2014-08-08 14:23:20 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Alter opcodes:
|
2016-08-11 10:10:10 +00:00
|
|
|
* 1) convert tablearg value from 0 to 65535
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
* 2) Remove highest bit from O_SETFIB/O_SETDSCP values.
|
|
|
|
* 3) convert table number in iface opcodes to int
|
2014-08-08 14:23:20 +00:00
|
|
|
*/
|
|
|
|
l = urule->cmd_len;
|
|
|
|
cmd = urule->cmd;
|
|
|
|
cmdlen = 0;
|
|
|
|
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
|
|
|
|
switch (cmd->opcode) {
|
|
|
|
/* Opcodes supporting tablearg */
|
|
|
|
case O_TAG:
|
|
|
|
case O_TAGGED:
|
|
|
|
case O_PIPE:
|
|
|
|
case O_QUEUE:
|
|
|
|
case O_DIVERT:
|
|
|
|
case O_TEE:
|
|
|
|
case O_SKIPTO:
|
|
|
|
case O_CALLRETURN:
|
|
|
|
case O_NETGRAPH:
|
|
|
|
case O_NGTEE:
|
|
|
|
case O_NAT:
|
|
|
|
if (cmd->arg1 == IP_FW_TARG)
|
2016-08-11 10:10:10 +00:00
|
|
|
cmd->arg1 = IP_FW_TABLEARG;
|
|
|
|
else if (cmd->arg1 == IP_FW_NAT44_GLOBAL)
|
|
|
|
cmd->arg1 = 0;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
break;
|
|
|
|
case O_SETFIB:
|
|
|
|
case O_SETDSCP:
|
|
|
|
if (cmd->arg1 == IP_FW_TARG)
|
2016-08-11 10:10:10 +00:00
|
|
|
cmd->arg1 = IP_FW_TABLEARG;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
else
|
|
|
|
cmd->arg1 &= ~0x8000;
|
|
|
|
break;
|
|
|
|
case O_LIMIT:
|
|
|
|
lcmd = (ipfw_insn_limit *)cmd;
|
|
|
|
if (lcmd->conn_limit == IP_FW_TARG)
|
2016-08-11 10:10:10 +00:00
|
|
|
lcmd->conn_limit = IP_FW_TABLEARG;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
break;
|
|
|
|
/* Interface tables */
|
|
|
|
case O_XMIT:
|
|
|
|
case O_RECV:
|
|
|
|
case O_VIA:
|
|
|
|
/* Interface table, possibly */
|
|
|
|
cmdif = (ipfw_insn_if *)cmd;
|
|
|
|
if (cmdif->name[0] != '\1')
|
|
|
|
break;
|
|
|
|
|
|
|
|
cmdif->p.glob = cmdif->p.kidx;
|
|
|
|
break;
|
|
|
|
}
|
2014-08-08 14:23:20 +00:00
|
|
|
}
|
2014-06-29 22:35:47 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add new rule(s) to the list possibly creating rule number for each.
|
2009-12-15 21:24:12 +00:00
|
|
|
* Update the rule_number in the input struct so the caller knows it as well.
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
* Must be called without IPFW_UH held
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
2014-06-12 09:59:11 +00:00
|
|
|
static int
|
2014-06-29 22:35:47 +00:00
|
|
|
commit_rules(struct ip_fw_chain *chain, struct rule_check_info *rci, int count)
|
2009-12-15 21:24:12 +00:00
|
|
|
{
|
2014-07-08 23:11:15 +00:00
|
|
|
int error, i, insert_before, tcount;
|
|
|
|
uint16_t rulenum, *pnum;
|
2014-06-29 22:35:47 +00:00
|
|
|
struct rule_check_info *ci;
|
2014-07-08 23:11:15 +00:00
|
|
|
struct ip_fw *krule;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
struct ip_fw **map; /* the new array of pointers */
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/* Check if we need to do table/obj index remap */
|
2014-06-29 22:35:47 +00:00
|
|
|
tcount = 0;
|
|
|
|
for (ci = rci, i = 0; i < count; ci++, i++) {
|
2015-04-27 08:29:39 +00:00
|
|
|
if (ci->object_opcodes == 0)
|
2014-06-29 22:35:47 +00:00
|
|
|
continue;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2014-06-29 22:35:47 +00:00
|
|
|
/*
|
2015-04-27 08:29:39 +00:00
|
|
|
* Rule has some object opcodes.
|
|
|
|
* We need to find (and create non-existing)
|
|
|
|
* kernel objects, and reference existing ones.
|
2014-06-29 22:35:47 +00:00
|
|
|
*/
|
2016-04-14 20:49:27 +00:00
|
|
|
error = rewrite_rule_uidx(chain, ci);
|
2014-06-29 22:35:47 +00:00
|
|
|
if (error != 0) {
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2014-06-29 22:35:47 +00:00
|
|
|
/*
|
|
|
|
* rewrite failed, state for current rule
|
|
|
|
* has been reverted. Check if we need to
|
|
|
|
* revert more.
|
|
|
|
*/
|
|
|
|
if (tcount > 0) {
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have some more table rules
|
|
|
|
* we need to rollback.
|
|
|
|
*/
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK(chain);
|
|
|
|
while (ci != rci) {
|
|
|
|
ci--;
|
2015-04-27 08:29:39 +00:00
|
|
|
if (ci->object_opcodes == 0)
|
2014-06-29 22:35:47 +00:00
|
|
|
continue;
|
2015-04-27 08:29:39 +00:00
|
|
|
unref_rule_objects(chain,ci->krule);
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
}
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
tcount++;
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* get_map returns with IPFW_UH_WLOCK if successful */
|
2014-06-29 22:35:47 +00:00
|
|
|
map = get_map(chain, count, 0 /* not locked */);
|
2014-06-12 09:59:11 +00:00
|
|
|
if (map == NULL) {
|
2014-06-29 22:35:47 +00:00
|
|
|
if (tcount > 0) {
|
|
|
|
/* Unbind tables */
|
2014-06-12 09:59:11 +00:00
|
|
|
IPFW_UH_WLOCK(chain);
|
2014-06-29 22:35:47 +00:00
|
|
|
for (ci = rci, i = 0; i < count; ci++, i++) {
|
2015-04-27 08:29:39 +00:00
|
|
|
if (ci->object_opcodes == 0)
|
2014-06-29 22:35:47 +00:00
|
|
|
continue;
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
unref_rule_objects(chain, ci->krule);
|
2014-06-29 22:35:47 +00:00
|
|
|
}
|
2014-06-12 09:59:11 +00:00
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (ENOSPC);
|
|
|
|
}
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
if (V_autoinc_step < 1)
|
|
|
|
V_autoinc_step = 1;
|
|
|
|
else if (V_autoinc_step > 1000)
|
|
|
|
V_autoinc_step = 1000;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
/* FIXME: Handle count > 1 */
|
|
|
|
ci = rci;
|
2014-07-08 23:11:15 +00:00
|
|
|
krule = ci->krule;
|
|
|
|
rulenum = krule->rulenum;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
/* find the insertion point, we will insert before */
|
2014-07-08 23:11:15 +00:00
|
|
|
insert_before = rulenum ? rulenum + 1 : IPFW_DEFAULT_RULE;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
i = ipfw_find_rule(chain, insert_before, 0);
|
|
|
|
/* duplicate first part */
|
|
|
|
if (i > 0)
|
|
|
|
bcopy(chain->map, map, i * sizeof(struct ip_fw *));
|
2014-07-08 23:11:15 +00:00
|
|
|
map[i] = krule;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
/* duplicate remaining part, we always have the default rule */
|
|
|
|
bcopy(chain->map + i, map + i + 1,
|
|
|
|
sizeof(struct ip_fw *) *(chain->n_rules - i));
|
2014-07-08 23:11:15 +00:00
|
|
|
if (rulenum == 0) {
|
|
|
|
/* Compute rule number and write it back */
|
|
|
|
rulenum = i > 0 ? map[i-1]->rulenum : 0;
|
|
|
|
if (rulenum < IPFW_DEFAULT_RULE - V_autoinc_step)
|
|
|
|
rulenum += V_autoinc_step;
|
|
|
|
krule->rulenum = rulenum;
|
|
|
|
/* Save number to userland rule */
|
|
|
|
pnum = (uint16_t *)((caddr_t)ci->urule + ci->urule_numoff);
|
|
|
|
*pnum = rulenum;
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
krule->id = chain->id + 1;
|
2014-08-03 15:49:03 +00:00
|
|
|
update_skipto_cache(chain, map);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
map = swap_map(chain, map, chain->n_rules + 1);
|
2014-07-08 23:11:15 +00:00
|
|
|
chain->static_len += RULEUSIZE0(krule);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
if (map)
|
|
|
|
free(map, M_IPFW);
|
2009-12-15 21:24:12 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2017-11-22 05:49:21 +00:00
|
|
|
int
|
|
|
|
ipfw_add_protected_rule(struct ip_fw_chain *chain, struct ip_fw *rule,
|
|
|
|
int locked)
|
|
|
|
{
|
|
|
|
struct ip_fw **map;
|
|
|
|
|
|
|
|
map = get_map(chain, 1, locked);
|
|
|
|
if (map == NULL)
|
|
|
|
return (ENOMEM);
|
|
|
|
if (chain->n_rules > 0)
|
|
|
|
bcopy(chain->map, map,
|
|
|
|
chain->n_rules * sizeof(struct ip_fw *));
|
|
|
|
map[chain->n_rules] = rule;
|
|
|
|
rule->rulenum = IPFW_DEFAULT_RULE;
|
|
|
|
rule->set = RESVD_SET;
|
|
|
|
rule->id = chain->id + 1;
|
|
|
|
/* We add rule in the end of chain, no need to update skipto cache */
|
|
|
|
map = swap_map(chain, map, chain->n_rules + 1);
|
|
|
|
chain->static_len += RULEUSIZE0(rule);
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
free(map, M_IPFW);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-08-09 09:11:26 +00:00
|
|
|
/*
|
|
|
|
* Adds @rule to the list of rules to reap
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_reap_add(struct ip_fw_chain *chain, struct ip_fw **head,
|
|
|
|
struct ip_fw *rule)
|
|
|
|
{
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(chain);
|
|
|
|
|
|
|
|
/* Unlink rule from everywhere */
|
2015-04-27 08:29:39 +00:00
|
|
|
unref_rule_objects(chain, rule);
|
2014-08-09 09:11:26 +00:00
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
rule->next = *head;
|
2014-08-09 09:11:26 +00:00
|
|
|
*head = rule;
|
|
|
|
}
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
|
|
|
* Reclaim storage associated with a list of rules. This is
|
|
|
|
* typically the list created using remove_rule.
|
|
|
|
* A NULL pointer on input is handled correctly.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_reap_rules(struct ip_fw *head)
|
|
|
|
{
|
|
|
|
struct ip_fw *rule;
|
|
|
|
|
|
|
|
while ((rule = head) != NULL) {
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
head = head->next;
|
|
|
|
ipfw_free_rule(rule);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-04-07 08:23:58 +00:00
|
|
|
/*
|
|
|
|
* Rules to keep are
|
|
|
|
* (default || reserved || !match_set || !match_number)
|
|
|
|
* where
|
|
|
|
* default ::= (rule->rulenum == IPFW_DEFAULT_RULE)
|
|
|
|
* // the default rule is always protected
|
|
|
|
*
|
|
|
|
* reserved ::= (cmd == 0 && n == 0 && rule->set == RESVD_SET)
|
|
|
|
* // RESVD_SET is protected only if cmd == 0 and n == 0 ("ipfw flush")
|
|
|
|
*
|
|
|
|
* match_set ::= (cmd == 0 || rule->set == set)
|
|
|
|
* // set number is ignored for cmd == 0
|
|
|
|
*
|
|
|
|
* match_number ::= (cmd == 1 || n == 0 || n == rule->rulenum)
|
|
|
|
* // number is ignored for cmd == 1 or n == 0
|
|
|
|
*
|
|
|
|
*/
|
2014-08-07 21:37:31 +00:00
|
|
|
int
|
|
|
|
ipfw_match_range(struct ip_fw *rule, ipfw_range_tlv *rt)
|
|
|
|
{
|
|
|
|
|
2014-10-13 13:49:28 +00:00
|
|
|
/* Don't match default rule for modification queries */
|
|
|
|
if (rule->rulenum == IPFW_DEFAULT_RULE &&
|
|
|
|
(rt->flags & IPFW_RCFLAG_DEFAULT) == 0)
|
2014-08-07 21:37:31 +00:00
|
|
|
return (0);
|
|
|
|
|
|
|
|
/* Don't match rules in reserved set for flush requests */
|
|
|
|
if ((rt->flags & IPFW_RCFLAG_ALL) != 0 && rule->set == RESVD_SET)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
/* If we're filtering by set, don't match other sets */
|
|
|
|
if ((rt->flags & IPFW_RCFLAG_SET) != 0 && rule->set != rt->set)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
if ((rt->flags & IPFW_RCFLAG_RANGE) != 0 &&
|
|
|
|
(rule->rulenum < rt->start_rule || rule->rulenum > rt->end_rule))
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
struct manage_sets_args {
|
|
|
|
uint16_t set;
|
|
|
|
uint8_t new_set;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
swap_sets_cb(struct namedobj_instance *ni, struct named_object *no,
|
|
|
|
void *arg)
|
|
|
|
{
|
|
|
|
struct manage_sets_args *args;
|
|
|
|
|
|
|
|
args = (struct manage_sets_args *)arg;
|
|
|
|
if (no->set == (uint8_t)args->set)
|
|
|
|
no->set = args->new_set;
|
|
|
|
else if (no->set == args->new_set)
|
|
|
|
no->set = (uint8_t)args->set;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
move_sets_cb(struct namedobj_instance *ni, struct named_object *no,
|
|
|
|
void *arg)
|
|
|
|
{
|
|
|
|
struct manage_sets_args *args;
|
|
|
|
|
|
|
|
args = (struct manage_sets_args *)arg;
|
|
|
|
if (no->set == (uint8_t)args->set)
|
|
|
|
no->set = args->new_set;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
test_sets_cb(struct namedobj_instance *ni, struct named_object *no,
|
|
|
|
void *arg)
|
|
|
|
{
|
|
|
|
struct manage_sets_args *args;
|
|
|
|
|
|
|
|
args = (struct manage_sets_args *)arg;
|
|
|
|
if (no->set != (uint8_t)args->set)
|
|
|
|
return (0);
|
|
|
|
if (ipfw_objhash_lookup_name_type(ni, args->new_set,
|
|
|
|
no->etlv, no->name) != NULL)
|
|
|
|
return (EEXIST);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generic function to handler moving and swapping sets.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_obj_manage_sets(struct namedobj_instance *ni, uint16_t type,
|
|
|
|
uint16_t set, uint8_t new_set, enum ipfw_sets_cmd cmd)
|
|
|
|
{
|
|
|
|
struct manage_sets_args args;
|
|
|
|
struct named_object *no;
|
|
|
|
|
|
|
|
args.set = set;
|
|
|
|
args.new_set = new_set;
|
|
|
|
switch (cmd) {
|
|
|
|
case SWAP_ALL:
|
|
|
|
return (ipfw_objhash_foreach_type(ni, swap_sets_cb,
|
|
|
|
&args, type));
|
|
|
|
case TEST_ALL:
|
|
|
|
return (ipfw_objhash_foreach_type(ni, test_sets_cb,
|
|
|
|
&args, type));
|
|
|
|
case MOVE_ALL:
|
|
|
|
return (ipfw_objhash_foreach_type(ni, move_sets_cb,
|
|
|
|
&args, type));
|
|
|
|
case COUNT_ONE:
|
|
|
|
/*
|
|
|
|
* @set used to pass kidx.
|
|
|
|
* When @new_set is zero - reset object counter,
|
|
|
|
* otherwise increment it.
|
|
|
|
*/
|
|
|
|
no = ipfw_objhash_lookup_kidx(ni, set);
|
|
|
|
if (new_set != 0)
|
|
|
|
no->ocnt++;
|
|
|
|
else
|
|
|
|
no->ocnt = 0;
|
|
|
|
return (0);
|
|
|
|
case TEST_ONE:
|
|
|
|
/* @set used to pass kidx */
|
|
|
|
no = ipfw_objhash_lookup_kidx(ni, set);
|
|
|
|
/*
|
|
|
|
* First check number of references:
|
|
|
|
* when it differs, this mean other rules are holding
|
|
|
|
* reference to given object, so it is not possible to
|
|
|
|
* change its set. Note that refcnt may account references
|
|
|
|
* to some going-to-be-added rules. Since we don't know
|
|
|
|
* their numbers (and even if they will be added) it is
|
|
|
|
* perfectly OK to return error here.
|
|
|
|
*/
|
|
|
|
if (no->ocnt != no->refcnt)
|
|
|
|
return (EBUSY);
|
|
|
|
if (ipfw_objhash_lookup_name_type(ni, new_set, type,
|
|
|
|
no->name) != NULL)
|
|
|
|
return (EEXIST);
|
|
|
|
return (0);
|
|
|
|
case MOVE_ONE:
|
|
|
|
/* @set used to pass kidx */
|
|
|
|
no = ipfw_objhash_lookup_kidx(ni, set);
|
|
|
|
no->set = new_set;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
/*
|
|
|
|
* Delete rules matching range @rt.
|
|
|
|
* Saves number of deleted rules in @ndel.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
delete_range(struct ip_fw_chain *chain, ipfw_range_tlv *rt, int *ndel)
|
|
|
|
{
|
|
|
|
struct ip_fw *reap, *rule, **map;
|
|
|
|
int end, start;
|
|
|
|
int i, n, ndyn, ofs;
|
|
|
|
|
|
|
|
reap = NULL;
|
|
|
|
IPFW_UH_WLOCK(chain); /* arbitrate writers */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Stage 1: Determine range to inspect.
|
|
|
|
* Range is half-inclusive, e.g [start, end).
|
|
|
|
*/
|
|
|
|
start = 0;
|
|
|
|
end = chain->n_rules - 1;
|
|
|
|
|
|
|
|
if ((rt->flags & IPFW_RCFLAG_RANGE) != 0) {
|
|
|
|
start = ipfw_find_rule(chain, rt->start_rule, 0);
|
|
|
|
|
2017-11-23 05:55:53 +00:00
|
|
|
if (rt->end_rule >= IPFW_DEFAULT_RULE)
|
|
|
|
rt->end_rule = IPFW_DEFAULT_RULE - 1;
|
|
|
|
end = ipfw_find_rule(chain, rt->end_rule, UINT32_MAX);
|
2014-08-07 21:37:31 +00:00
|
|
|
}
|
|
|
|
|
2018-12-04 16:12:43 +00:00
|
|
|
if (rt->flags & IPFW_RCFLAG_DYNAMIC) {
|
|
|
|
/*
|
|
|
|
* Requested deleting only for dynamic states.
|
|
|
|
*/
|
|
|
|
*ndel = 0;
|
|
|
|
ipfw_expire_dyn_states(chain, rt);
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
/* Allocate new map of the same size */
|
|
|
|
map = get_map(chain, 0, 1 /* locked */);
|
|
|
|
if (map == NULL) {
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
return (ENOMEM);
|
|
|
|
}
|
|
|
|
|
|
|
|
n = 0;
|
|
|
|
ndyn = 0;
|
|
|
|
ofs = start;
|
|
|
|
/* 1. bcopy the initial part of the map */
|
|
|
|
if (start > 0)
|
|
|
|
bcopy(chain->map, map, start * sizeof(struct ip_fw *));
|
|
|
|
/* 2. copy active rules between start and end */
|
|
|
|
for (i = start; i < end; i++) {
|
|
|
|
rule = chain->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0) {
|
|
|
|
map[ofs++] = rule;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
n++;
|
|
|
|
if (ipfw_is_dyn_rule(rule) != 0)
|
|
|
|
ndyn++;
|
|
|
|
}
|
|
|
|
/* 3. copy the final part of the map */
|
|
|
|
bcopy(chain->map + end, map + ofs,
|
|
|
|
(chain->n_rules - end) * sizeof(struct ip_fw *));
|
|
|
|
/* 4. recalculate skipto cache */
|
|
|
|
update_skipto_cache(chain, map);
|
|
|
|
/* 5. swap the maps (under UH_WLOCK + WHLOCK) */
|
|
|
|
map = swap_map(chain, map, chain->n_rules - n);
|
|
|
|
/* 6. Remove all dynamic states originated by deleted rules */
|
|
|
|
if (ndyn > 0)
|
2018-02-07 18:59:54 +00:00
|
|
|
ipfw_expire_dyn_states(chain, rt);
|
2014-08-07 21:37:31 +00:00
|
|
|
/* 7. now remove the rules deleted from the old map */
|
|
|
|
for (i = start; i < end; i++) {
|
|
|
|
rule = map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
chain->static_len -= RULEUSIZE0(rule);
|
2014-08-09 09:11:26 +00:00
|
|
|
ipfw_reap_add(chain, &reap, rule);
|
2014-08-07 21:37:31 +00:00
|
|
|
}
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
2014-08-09 09:11:26 +00:00
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
ipfw_reap_rules(reap);
|
|
|
|
if (map != NULL)
|
|
|
|
free(map, M_IPFW);
|
|
|
|
*ndel = n;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
static int
|
|
|
|
move_objects(struct ip_fw_chain *ch, ipfw_range_tlv *rt)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw;
|
|
|
|
struct ip_fw *rule;
|
|
|
|
ipfw_insn *cmd;
|
|
|
|
int cmdlen, i, l, c;
|
|
|
|
uint16_t kidx;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(ch);
|
|
|
|
|
|
|
|
/* Stage 1: count number of references by given rules */
|
|
|
|
for (c = 0, i = 0; i < ch->n_rules - 1; i++) {
|
|
|
|
rule = ch->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
if (rule->set == rt->new_set) /* nothing to do */
|
|
|
|
continue;
|
|
|
|
/* Search opcodes with named objects */
|
|
|
|
for (l = rule->cmd_len, cmdlen = 0, cmd = rule->cmd;
|
|
|
|
l > 0; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
rw = find_op_rw(cmd, &kidx, NULL);
|
|
|
|
if (rw == NULL || rw->manage_sets == NULL)
|
|
|
|
continue;
|
|
|
|
/*
|
|
|
|
* When manage_sets() returns non-zero value to
|
|
|
|
* COUNT_ONE command, consider this as an object
|
|
|
|
* doesn't support sets (e.g. disabled with sysctl).
|
|
|
|
* So, skip checks for this object.
|
|
|
|
*/
|
|
|
|
if (rw->manage_sets(ch, kidx, 1, COUNT_ONE) != 0)
|
|
|
|
continue;
|
|
|
|
c++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (c == 0) /* No objects found */
|
|
|
|
return (0);
|
|
|
|
/* Stage 2: verify "ownership" */
|
|
|
|
for (c = 0, i = 0; (i < ch->n_rules - 1) && c == 0; i++) {
|
|
|
|
rule = ch->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
if (rule->set == rt->new_set) /* nothing to do */
|
|
|
|
continue;
|
|
|
|
/* Search opcodes with named objects */
|
|
|
|
for (l = rule->cmd_len, cmdlen = 0, cmd = rule->cmd;
|
|
|
|
l > 0 && c == 0; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
rw = find_op_rw(cmd, &kidx, NULL);
|
|
|
|
if (rw == NULL || rw->manage_sets == NULL)
|
|
|
|
continue;
|
|
|
|
/* Test for ownership and conflicting names */
|
|
|
|
c = rw->manage_sets(ch, kidx,
|
|
|
|
(uint8_t)rt->new_set, TEST_ONE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* Stage 3: change set and cleanup */
|
|
|
|
for (i = 0; i < ch->n_rules - 1; i++) {
|
|
|
|
rule = ch->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
if (rule->set == rt->new_set) /* nothing to do */
|
|
|
|
continue;
|
|
|
|
/* Search opcodes with named objects */
|
|
|
|
for (l = rule->cmd_len, cmdlen = 0, cmd = rule->cmd;
|
|
|
|
l > 0; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
rw = find_op_rw(cmd, &kidx, NULL);
|
|
|
|
if (rw == NULL || rw->manage_sets == NULL)
|
|
|
|
continue;
|
|
|
|
/* cleanup object counter */
|
|
|
|
rw->manage_sets(ch, kidx,
|
|
|
|
0 /* reset counter */, COUNT_ONE);
|
|
|
|
if (c != 0)
|
|
|
|
continue;
|
|
|
|
/* change set */
|
|
|
|
rw->manage_sets(ch, kidx,
|
|
|
|
(uint8_t)rt->new_set, MOVE_ONE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (c);
|
2019-06-21 10:54:51 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-07 21:37:31 +00:00
|
|
|
* Changes set of given rule rannge @rt
|
|
|
|
* with each other.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
move_range(struct ip_fw_chain *chain, ipfw_range_tlv *rt)
|
|
|
|
{
|
|
|
|
struct ip_fw *rule;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK(chain);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Move rules with matching paramenerts to a new set.
|
|
|
|
* This one is much more complex. We have to ensure
|
|
|
|
* that all referenced tables (if any) are referenced
|
|
|
|
* by given rule subset only. Otherwise, we can't move
|
|
|
|
* them to new set and have to return error.
|
|
|
|
*/
|
2016-05-17 07:47:23 +00:00
|
|
|
if ((i = move_objects(chain, rt)) != 0) {
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
return (i);
|
2014-08-07 21:37:31 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* XXX: We have to do swap holding WLOCK */
|
2014-10-13 13:49:28 +00:00
|
|
|
for (i = 0; i < chain->n_rules; i++) {
|
2014-08-07 21:37:31 +00:00
|
|
|
rule = chain->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
rule->set = rt->new_set;
|
|
|
|
}
|
|
|
|
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2019-07-29 15:09:12 +00:00
|
|
|
/*
|
|
|
|
* Returns pointer to action instruction, skips all possible rule
|
|
|
|
* modifiers like O_LOG, O_TAG, O_ALTQ.
|
|
|
|
*/
|
|
|
|
ipfw_insn *
|
|
|
|
ipfw_get_action(struct ip_fw *rule)
|
|
|
|
{
|
|
|
|
ipfw_insn *cmd;
|
|
|
|
int l, cmdlen;
|
|
|
|
|
|
|
|
cmd = ACTION_PTR(rule);
|
|
|
|
l = rule->cmd_len - rule->act_ofs;
|
|
|
|
while (l > 0) {
|
|
|
|
switch (cmd->opcode) {
|
|
|
|
case O_ALTQ:
|
|
|
|
case O_LOG:
|
|
|
|
case O_TAG:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (cmd);
|
|
|
|
}
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
l -= cmdlen;
|
|
|
|
cmd += cmdlen;
|
|
|
|
}
|
|
|
|
panic("%s: rule (%p) has not action opcode", __func__, rule);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
/*
|
|
|
|
* Clear counters for a specific rule.
|
|
|
|
* Normally run under IPFW_UH_RLOCK, but these are idempotent ops
|
|
|
|
* so we only care that rules do not disappear.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
clear_counters(struct ip_fw *rule, int log_only)
|
|
|
|
{
|
|
|
|
ipfw_insn_log *l = (ipfw_insn_log *)ACTION_PTR(rule);
|
|
|
|
|
|
|
|
if (log_only == 0)
|
|
|
|
IPFW_ZERO_RULE_COUNTER(rule);
|
|
|
|
if (l->o.opcode == O_LOG)
|
|
|
|
l->log_left = l->max_log;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Flushes rules counters and/or log values on matching range.
|
|
|
|
*
|
|
|
|
* Returns number of items cleared.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
clear_range(struct ip_fw_chain *chain, ipfw_range_tlv *rt, int log_only)
|
|
|
|
{
|
|
|
|
struct ip_fw *rule;
|
|
|
|
int num;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
num = 0;
|
2014-10-13 13:49:28 +00:00
|
|
|
rt->flags |= IPFW_RCFLAG_DEFAULT;
|
2014-08-07 21:37:31 +00:00
|
|
|
|
|
|
|
IPFW_UH_WLOCK(chain); /* arbitrate writers */
|
2014-10-13 13:49:28 +00:00
|
|
|
for (i = 0; i < chain->n_rules; i++) {
|
2014-08-07 21:37:31 +00:00
|
|
|
rule = chain->map[i];
|
|
|
|
if (ipfw_match_range(rule, rt) == 0)
|
|
|
|
continue;
|
|
|
|
clear_counters(rule, log_only);
|
|
|
|
num++;
|
|
|
|
}
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
|
|
|
|
return (num);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
check_range_tlv(ipfw_range_tlv *rt)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (rt->head.length != sizeof(*rt))
|
|
|
|
return (1);
|
|
|
|
if (rt->start_rule > rt->end_rule)
|
|
|
|
return (1);
|
|
|
|
if (rt->set >= IPFW_MAX_SETS || rt->new_set >= IPFW_MAX_SETS)
|
|
|
|
return (1);
|
|
|
|
|
2014-10-13 13:49:28 +00:00
|
|
|
if ((rt->flags & IPFW_RCFLAG_USER) != rt->flags)
|
|
|
|
return (1);
|
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Delete rules matching specified parameters
|
|
|
|
* Data layout (v0)(current):
|
|
|
|
* Request: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
* Reply: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
*
|
|
|
|
* Saves number of deleted rules in ipfw_range_tlv->new_set.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
del_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_range_header *rh;
|
|
|
|
int error, ndel;
|
|
|
|
|
|
|
|
if (sd->valsize != sizeof(*rh))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
rh = (ipfw_range_header *)ipfw_get_sopt_space(sd, sd->valsize);
|
|
|
|
|
|
|
|
if (check_range_tlv(&rh->range) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
ndel = 0;
|
|
|
|
if ((error = delete_range(chain, &rh->range, &ndel)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
/* Save number of rules deleted */
|
|
|
|
rh->range.new_set = ndel;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Move rules/sets matching specified parameters
|
|
|
|
* Data layout (v0)(current):
|
|
|
|
* Request: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
move_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_range_header *rh;
|
|
|
|
|
|
|
|
if (sd->valsize != sizeof(*rh))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
rh = (ipfw_range_header *)ipfw_get_sopt_space(sd, sd->valsize);
|
|
|
|
|
|
|
|
if (check_range_tlv(&rh->range) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
return (move_range(chain, &rh->range));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Clear rule accounting data matching specified parameters
|
|
|
|
* Data layout (v0)(current):
|
|
|
|
* Request: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
* Reply: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
*
|
|
|
|
* Saves number of cleared rules in ipfw_range_tlv->new_set.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
2010-04-07 08:23:58 +00:00
|
|
|
static int
|
2014-08-07 21:37:31 +00:00
|
|
|
clear_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_range_header *rh;
|
|
|
|
int log_only, num;
|
|
|
|
char *msg;
|
|
|
|
|
|
|
|
if (sd->valsize != sizeof(*rh))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
rh = (ipfw_range_header *)ipfw_get_sopt_space(sd, sd->valsize);
|
|
|
|
|
|
|
|
if (check_range_tlv(&rh->range) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
log_only = (op3->opcode == IP_FW_XRESETLOG);
|
|
|
|
|
|
|
|
num = clear_range(chain, &rh->range, log_only);
|
|
|
|
|
|
|
|
if (rh->range.flags & IPFW_RCFLAG_ALL)
|
|
|
|
msg = log_only ? "All logging counts reset" :
|
|
|
|
"Accounting cleared";
|
|
|
|
else
|
|
|
|
msg = log_only ? "logging count reset" : "cleared";
|
|
|
|
|
|
|
|
if (V_fw_verbose) {
|
|
|
|
int lev = LOG_SECURITY | LOG_NOTICE;
|
|
|
|
log(lev, "ipfw: %s.\n", msg);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Save number of rules cleared */
|
|
|
|
rh->range.new_set = num;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
enable_sets(struct ip_fw_chain *chain, ipfw_range_tlv *rt)
|
2010-04-07 08:23:58 +00:00
|
|
|
{
|
2014-08-07 21:37:31 +00:00
|
|
|
uint32_t v_set;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(chain);
|
|
|
|
|
|
|
|
/* Change enabled/disabled sets mask */
|
|
|
|
v_set = (V_set_disable | rt->set) & ~rt->new_set;
|
|
|
|
v_set &= ~(1 << RESVD_SET); /* set RESVD_SET always enabled */
|
|
|
|
IPFW_WLOCK(chain);
|
|
|
|
V_set_disable = v_set;
|
|
|
|
IPFW_WUNLOCK(chain);
|
|
|
|
}
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
static int
|
2014-08-07 21:37:31 +00:00
|
|
|
swap_sets(struct ip_fw_chain *chain, ipfw_range_tlv *rt, int mv)
|
|
|
|
{
|
2016-05-17 07:47:23 +00:00
|
|
|
struct opcode_obj_rewrite *rw;
|
2014-08-07 21:37:31 +00:00
|
|
|
struct ip_fw *rule;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(chain);
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
if (rt->set == rt->new_set) /* nothing to do */
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
if (mv != 0) {
|
|
|
|
/*
|
|
|
|
* Berfore moving the rules we need to check that
|
|
|
|
* there aren't any conflicting named objects.
|
|
|
|
*/
|
|
|
|
for (rw = ctl3_rewriters;
|
|
|
|
rw < ctl3_rewriters + ctl3_rsize; rw++) {
|
|
|
|
if (rw->manage_sets == NULL)
|
|
|
|
continue;
|
|
|
|
i = rw->manage_sets(chain, (uint8_t)rt->set,
|
|
|
|
(uint8_t)rt->new_set, TEST_ALL);
|
|
|
|
if (i != 0)
|
|
|
|
return (EEXIST);
|
|
|
|
}
|
|
|
|
}
|
2014-08-07 21:37:31 +00:00
|
|
|
/* Swap or move two sets */
|
|
|
|
for (i = 0; i < chain->n_rules - 1; i++) {
|
|
|
|
rule = chain->map[i];
|
2016-05-17 07:47:23 +00:00
|
|
|
if (rule->set == (uint8_t)rt->set)
|
|
|
|
rule->set = (uint8_t)rt->new_set;
|
|
|
|
else if (rule->set == (uint8_t)rt->new_set && mv == 0)
|
|
|
|
rule->set = (uint8_t)rt->set;
|
|
|
|
}
|
|
|
|
for (rw = ctl3_rewriters; rw < ctl3_rewriters + ctl3_rsize; rw++) {
|
|
|
|
if (rw->manage_sets == NULL)
|
|
|
|
continue;
|
|
|
|
rw->manage_sets(chain, (uint8_t)rt->set,
|
|
|
|
(uint8_t)rt->new_set, mv != 0 ? MOVE_ALL: SWAP_ALL);
|
2014-08-07 21:37:31 +00:00
|
|
|
}
|
2016-05-17 07:47:23 +00:00
|
|
|
return (0);
|
2014-08-07 21:37:31 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Swaps or moves set
|
|
|
|
* Data layout (v0)(current):
|
|
|
|
* Request: [ ipfw_obj_header ipfw_range_tlv ]
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
manage_sets(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_range_header *rh;
|
2016-05-17 07:47:23 +00:00
|
|
|
int ret;
|
2014-08-07 21:37:31 +00:00
|
|
|
|
|
|
|
if (sd->valsize != sizeof(*rh))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
rh = (ipfw_range_header *)ipfw_get_sopt_space(sd, sd->valsize);
|
|
|
|
|
|
|
|
if (rh->range.head.length != sizeof(ipfw_range_tlv))
|
|
|
|
return (1);
|
2016-08-15 13:06:29 +00:00
|
|
|
/* enable_sets() expects bitmasks. */
|
|
|
|
if (op3->opcode != IP_FW_SET_ENABLE &&
|
|
|
|
(rh->range.set >= IPFW_MAX_SETS ||
|
|
|
|
rh->range.new_set >= IPFW_MAX_SETS))
|
2016-05-17 07:47:23 +00:00
|
|
|
return (EINVAL);
|
2014-08-07 21:37:31 +00:00
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
ret = 0;
|
2014-08-07 21:37:31 +00:00
|
|
|
IPFW_UH_WLOCK(chain);
|
|
|
|
switch (op3->opcode) {
|
|
|
|
case IP_FW_SET_SWAP:
|
|
|
|
case IP_FW_SET_MOVE:
|
2016-05-17 07:47:23 +00:00
|
|
|
ret = swap_sets(chain, &rh->range,
|
|
|
|
op3->opcode == IP_FW_SET_MOVE);
|
2014-08-07 21:37:31 +00:00
|
|
|
break;
|
|
|
|
case IP_FW_SET_ENABLE:
|
|
|
|
enable_sets(chain, &rh->range);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
IPFW_UH_WUNLOCK(chain);
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
return (ret);
|
2010-04-07 08:23:58 +00:00
|
|
|
}
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/**
|
2010-04-07 08:23:58 +00:00
|
|
|
* Remove all rules with given number, or do set manipulation.
|
2009-12-15 21:24:12 +00:00
|
|
|
* Assumes chain != NULL && *chain != NULL.
|
|
|
|
*
|
2010-04-07 08:23:58 +00:00
|
|
|
* The argument is an uint32_t. The low 16 bit are the rule or set number;
|
|
|
|
* the next 8 bits are the new set; the top 8 bits indicate the command:
|
2009-12-15 21:24:12 +00:00
|
|
|
*
|
2010-03-31 02:20:22 +00:00
|
|
|
* 0 delete rules numbered "rulenum"
|
|
|
|
* 1 delete rules in set "rulenum"
|
|
|
|
* 2 move rules "rulenum" to set "new_set"
|
|
|
|
* 3 move rules from set "rulenum" to set "new_set"
|
|
|
|
* 4 swap sets "rulenum" and "new_set"
|
|
|
|
* 5 delete rules "rulenum" and set "new_set"
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
|
|
|
static int
|
2010-04-07 08:23:58 +00:00
|
|
|
del_entry(struct ip_fw_chain *chain, uint32_t arg)
|
2009-12-15 21:24:12 +00:00
|
|
|
{
|
2010-04-07 08:23:58 +00:00
|
|
|
uint32_t num; /* rule number or old_set */
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
uint8_t cmd, new_set;
|
2014-08-07 21:37:31 +00:00
|
|
|
int do_del, ndel;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
int error = 0;
|
2014-08-07 21:37:31 +00:00
|
|
|
ipfw_range_tlv rt;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2010-04-07 08:23:58 +00:00
|
|
|
num = arg & 0xffff;
|
2009-12-15 21:24:12 +00:00
|
|
|
cmd = (arg >> 24) & 0xff;
|
|
|
|
new_set = (arg >> 16) & 0xff;
|
|
|
|
|
|
|
|
if (cmd > 5 || new_set > RESVD_SET)
|
|
|
|
return EINVAL;
|
|
|
|
if (cmd == 0 || cmd == 2 || cmd == 5) {
|
2010-04-07 08:23:58 +00:00
|
|
|
if (num >= IPFW_DEFAULT_RULE)
|
2009-12-15 21:24:12 +00:00
|
|
|
return EINVAL;
|
|
|
|
} else {
|
2010-04-07 08:23:58 +00:00
|
|
|
if (num > RESVD_SET) /* old_set */
|
2009-12-15 21:24:12 +00:00
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
/* Convert old requests into new representation */
|
|
|
|
memset(&rt, 0, sizeof(rt));
|
|
|
|
rt.start_rule = num;
|
|
|
|
rt.end_rule = num;
|
|
|
|
rt.set = num;
|
|
|
|
rt.new_set = new_set;
|
|
|
|
do_del = 0;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
switch (cmd) {
|
2014-08-07 21:37:31 +00:00
|
|
|
case 0: /* delete rules numbered "rulenum" */
|
|
|
|
if (num == 0)
|
|
|
|
rt.flags |= IPFW_RCFLAG_ALL;
|
|
|
|
else
|
|
|
|
rt.flags |= IPFW_RCFLAG_RANGE;
|
|
|
|
do_del = 1;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
2014-08-07 21:37:31 +00:00
|
|
|
case 1: /* delete rules in set "rulenum" */
|
|
|
|
rt.flags |= IPFW_RCFLAG_SET;
|
|
|
|
do_del = 1;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
2014-08-07 21:37:31 +00:00
|
|
|
case 5: /* delete rules "rulenum" and set "new_set" */
|
|
|
|
rt.flags |= IPFW_RCFLAG_RANGE | IPFW_RCFLAG_SET;
|
|
|
|
rt.set = new_set;
|
|
|
|
rt.new_set = 0;
|
|
|
|
do_del = 1;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
2014-08-07 21:37:31 +00:00
|
|
|
case 2: /* move rules "rulenum" to set "new_set" */
|
|
|
|
rt.flags |= IPFW_RCFLAG_RANGE;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
break;
|
2014-08-07 21:37:31 +00:00
|
|
|
case 3: /* move rules from set "rulenum" to set "new_set" */
|
|
|
|
IPFW_UH_WLOCK(chain);
|
2016-05-17 07:47:23 +00:00
|
|
|
error = swap_sets(chain, &rt, 1);
|
2014-08-07 21:37:31 +00:00
|
|
|
IPFW_UH_WUNLOCK(chain);
|
2016-05-17 07:47:23 +00:00
|
|
|
return (error);
|
2014-08-07 21:37:31 +00:00
|
|
|
case 4: /* swap sets "rulenum" and "new_set" */
|
|
|
|
IPFW_UH_WLOCK(chain);
|
2016-05-17 07:47:23 +00:00
|
|
|
error = swap_sets(chain, &rt, 0);
|
2014-08-07 21:37:31 +00:00
|
|
|
IPFW_UH_WUNLOCK(chain);
|
2016-05-17 07:47:23 +00:00
|
|
|
return (error);
|
2014-08-07 21:37:31 +00:00
|
|
|
default:
|
|
|
|
return (ENOTSUP);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
2010-04-07 08:23:58 +00:00
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
if (do_del != 0) {
|
|
|
|
if ((error = delete_range(chain, &rt, &ndel)) != 0)
|
|
|
|
return (error);
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2014-08-07 21:37:31 +00:00
|
|
|
if (ndel == 0 && (cmd != 1 && num != 0))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (move_range(chain, &rt));
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Reset some or all counters on firewall rules.
|
|
|
|
* The argument `arg' is an u_int32_t. The low 16 bit are the rule number,
|
|
|
|
* the next 8 bits are the set number, the top 8 bits are the command:
|
|
|
|
* 0 work with rules from all set's;
|
|
|
|
* 1 work with rules only from specified set.
|
|
|
|
* Specified rule number is zero if we want to clear all entries.
|
|
|
|
* log_only is 1 if we only want to reset logs, zero otherwise.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zero_entry(struct ip_fw_chain *chain, u_int32_t arg, int log_only)
|
|
|
|
{
|
|
|
|
struct ip_fw *rule;
|
|
|
|
char *msg;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
int i;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
uint16_t rulenum = arg & 0xffff;
|
|
|
|
uint8_t set = (arg >> 16) & 0xff;
|
|
|
|
uint8_t cmd = (arg >> 24) & 0xff;
|
|
|
|
|
|
|
|
if (cmd > 1)
|
|
|
|
return (EINVAL);
|
|
|
|
if (cmd == 1 && set > RESVD_SET)
|
|
|
|
return (EINVAL);
|
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_RLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
if (rulenum == 0) {
|
|
|
|
V_norule_counter = 0;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
for (i = 0; i < chain->n_rules; i++) {
|
|
|
|
rule = chain->map[i];
|
|
|
|
/* Skip rules not in our set. */
|
2009-12-15 21:24:12 +00:00
|
|
|
if (cmd == 1 && rule->set != set)
|
|
|
|
continue;
|
|
|
|
clear_counters(rule, log_only);
|
|
|
|
}
|
|
|
|
msg = log_only ? "All logging counts reset" :
|
|
|
|
"Accounting cleared";
|
|
|
|
} else {
|
|
|
|
int cleared = 0;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
for (i = 0; i < chain->n_rules; i++) {
|
|
|
|
rule = chain->map[i];
|
2009-12-15 21:24:12 +00:00
|
|
|
if (rule->rulenum == rulenum) {
|
2009-12-24 17:35:28 +00:00
|
|
|
if (cmd == 0 || rule->set == set)
|
|
|
|
clear_counters(rule, log_only);
|
2009-12-15 21:24:12 +00:00
|
|
|
cleared = 1;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
}
|
|
|
|
if (rule->rulenum > rulenum)
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
2009-12-24 17:35:28 +00:00
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
if (!cleared) { /* we did not find any matching rules */
|
2010-03-29 12:19:23 +00:00
|
|
|
IPFW_UH_RUNLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
msg = log_only ? "logging count reset" : "cleared";
|
|
|
|
}
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_RUNLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
if (V_fw_verbose) {
|
|
|
|
int lev = LOG_SECURITY | LOG_NOTICE;
|
|
|
|
|
|
|
|
if (rulenum)
|
|
|
|
log(lev, "ipfw: Entry %d %s.\n", rulenum, msg);
|
|
|
|
else
|
|
|
|
log(lev, "ipfw: %s.\n", msg);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check rule head in FreeBSD11 format
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
check_ipfw_rule1(struct ip_fw_rule *rule, int size,
|
|
|
|
struct rule_check_info *ci)
|
|
|
|
{
|
|
|
|
int l;
|
|
|
|
|
|
|
|
if (size < sizeof(*rule)) {
|
|
|
|
printf("ipfw: rule too short\n");
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check for valid cmd_len */
|
|
|
|
l = roundup2(RULESIZE(rule), sizeof(uint64_t));
|
|
|
|
if (l != size) {
|
|
|
|
printf("ipfw: size mismatch (have %d want %d)\n", size, l);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
if (rule->act_ofs >= rule->cmd_len) {
|
|
|
|
printf("ipfw: bogus action offset (%u > %u)\n",
|
|
|
|
rule->act_ofs, rule->cmd_len - 1);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (rule->rulenum > IPFW_DEFAULT_RULE - 1)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
return (check_ipfw_rule_body(rule->cmd, rule->cmd_len, ci));
|
|
|
|
}
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
2014-07-08 23:11:15 +00:00
|
|
|
* Check rule head in FreeBSD8 format
|
|
|
|
*
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
|
|
|
static int
|
2014-07-08 23:11:15 +00:00
|
|
|
check_ipfw_rule0(struct ip_fw_rule0 *rule, int size,
|
|
|
|
struct rule_check_info *ci)
|
2009-12-15 21:24:12 +00:00
|
|
|
{
|
2014-07-08 23:11:15 +00:00
|
|
|
int l;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
if (size < sizeof(*rule)) {
|
|
|
|
printf("ipfw: rule too short\n");
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
|
|
|
|
/* Check for valid cmd_len */
|
|
|
|
l = sizeof(*rule) + rule->cmd_len * 4 - 4;
|
2009-12-15 21:24:12 +00:00
|
|
|
if (l != size) {
|
|
|
|
printf("ipfw: size mismatch (have %d want %d)\n", size, l);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
if (rule->act_ofs >= rule->cmd_len) {
|
|
|
|
printf("ipfw: bogus action offset (%u > %u)\n",
|
|
|
|
rule->act_ofs, rule->cmd_len - 1);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
if (rule->rulenum > IPFW_DEFAULT_RULE - 1)
|
|
|
|
return (EINVAL);
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
return (check_ipfw_rule_body(rule->cmd, rule->cmd_len, ci));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
check_ipfw_rule_body(ipfw_insn *cmd, int cmd_len, struct rule_check_info *ci)
|
|
|
|
{
|
|
|
|
int cmdlen, l;
|
|
|
|
int have_action;
|
|
|
|
|
|
|
|
have_action = 0;
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
|
|
|
* Now go for the individual checks. Very simple ones, basically only
|
|
|
|
* instruction sizes.
|
|
|
|
*/
|
2014-07-08 23:11:15 +00:00
|
|
|
for (l = cmd_len; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
2009-12-15 21:24:12 +00:00
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
if (cmdlen > l) {
|
|
|
|
printf("ipfw: opcode %d size truncated\n",
|
|
|
|
cmd->opcode);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
switch (cmd->opcode) {
|
|
|
|
case O_PROBE_STATE:
|
|
|
|
case O_KEEP_STATE:
|
2016-07-19 04:56:59 +00:00
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
ci->object_opcodes++;
|
|
|
|
break;
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_PROTO:
|
|
|
|
case O_IP_SRC_ME:
|
|
|
|
case O_IP_DST_ME:
|
|
|
|
case O_LAYER2:
|
|
|
|
case O_IN:
|
|
|
|
case O_FRAG:
|
|
|
|
case O_DIVERTED:
|
|
|
|
case O_IPOPT:
|
|
|
|
case O_IPTOS:
|
|
|
|
case O_IPPRECEDENCE:
|
|
|
|
case O_IPVER:
|
2010-11-12 13:05:17 +00:00
|
|
|
case O_SOCKARG:
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_TCPFLAGS:
|
|
|
|
case O_TCPOPTS:
|
|
|
|
case O_ESTAB:
|
|
|
|
case O_VERREVPATH:
|
|
|
|
case O_VERSRCREACH:
|
|
|
|
case O_ANTISPOOF:
|
|
|
|
case O_IPSEC:
|
|
|
|
#ifdef INET6
|
|
|
|
case O_IP6_SRC_ME:
|
|
|
|
case O_IP6_DST_ME:
|
|
|
|
case O_EXT_HDR:
|
|
|
|
case O_IP6:
|
|
|
|
#endif
|
|
|
|
case O_IP4:
|
|
|
|
case O_TAG:
|
2018-07-09 11:35:18 +00:00
|
|
|
case O_SKIP_ACTION:
|
2009-12-15 21:24:12 +00:00
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
2016-04-14 22:51:23 +00:00
|
|
|
case O_EXTERNAL_ACTION:
|
|
|
|
if (cmd->arg1 == 0 ||
|
|
|
|
cmdlen != F_INSN_SIZE(ipfw_insn)) {
|
|
|
|
printf("ipfw: invalid external "
|
|
|
|
"action opcode\n");
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
ci->object_opcodes++;
|
2017-04-03 02:44:40 +00:00
|
|
|
/*
|
|
|
|
* Do we have O_EXTERNAL_INSTANCE or O_EXTERNAL_DATA
|
|
|
|
* opcode?
|
|
|
|
*/
|
2016-04-14 22:51:23 +00:00
|
|
|
if (l != cmdlen) {
|
|
|
|
l -= cmdlen;
|
|
|
|
cmd += cmdlen;
|
|
|
|
cmdlen = F_LEN(cmd);
|
2017-04-03 02:44:40 +00:00
|
|
|
if (cmd->opcode == O_EXTERNAL_DATA)
|
|
|
|
goto check_action;
|
2016-04-14 22:51:23 +00:00
|
|
|
if (cmd->opcode != O_EXTERNAL_INSTANCE) {
|
|
|
|
printf("ipfw: invalid opcode "
|
|
|
|
"next to external action %u\n",
|
|
|
|
cmd->opcode);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
if (cmd->arg1 == 0 ||
|
|
|
|
cmdlen != F_INSN_SIZE(ipfw_insn)) {
|
|
|
|
printf("ipfw: invalid external "
|
|
|
|
"action instance opcode\n");
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
ci->object_opcodes++;
|
|
|
|
}
|
|
|
|
goto check_action;
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_FIB:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
if (cmd->arg1 >= rt_numfibs) {
|
|
|
|
printf("ipfw: invalid fib number %d\n",
|
|
|
|
cmd->arg1);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_SETFIB:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.
The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.
This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
|
|
|
if ((cmd->arg1 != IP_FW_TARG) &&
|
2015-11-08 12:24:19 +00:00
|
|
|
((cmd->arg1 & 0x7FFF) >= rt_numfibs)) {
|
2009-12-15 21:24:12 +00:00
|
|
|
printf("ipfw: invalid fib number %d\n",
|
2015-11-08 12:24:19 +00:00
|
|
|
cmd->arg1 & 0x7FFF);
|
2009-12-15 21:24:12 +00:00
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
goto check_action;
|
|
|
|
|
|
|
|
case O_UID:
|
|
|
|
case O_GID:
|
|
|
|
case O_JAIL:
|
|
|
|
case O_IP_SRC:
|
|
|
|
case O_IP_DST:
|
|
|
|
case O_TCPSEQ:
|
|
|
|
case O_TCPACK:
|
|
|
|
case O_PROB:
|
|
|
|
case O_ICMPTYPE:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_u32))
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_LIMIT:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_limit))
|
|
|
|
goto bad_size;
|
2016-07-19 04:56:59 +00:00
|
|
|
ci->object_opcodes++;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case O_LOG:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_log))
|
|
|
|
goto bad_size;
|
|
|
|
|
|
|
|
((ipfw_insn_log *)cmd)->log_left =
|
|
|
|
((ipfw_insn_log *)cmd)->max_log;
|
|
|
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_IP_SRC_MASK:
|
|
|
|
case O_IP_DST_MASK:
|
|
|
|
/* only odd command lengths */
|
2015-10-03 05:42:25 +00:00
|
|
|
if ((cmdlen & 1) == 0)
|
2009-12-15 21:24:12 +00:00
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_IP_SRC_SET:
|
|
|
|
case O_IP_DST_SET:
|
|
|
|
if (cmd->arg1 == 0 || cmd->arg1 > 256) {
|
|
|
|
printf("ipfw: invalid set size %d\n",
|
|
|
|
cmd->arg1);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_u32) +
|
|
|
|
(cmd->arg1+31)/32 )
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_IP_SRC_LOOKUP:
|
2017-03-05 23:48:24 +00:00
|
|
|
if (cmdlen > F_INSN_SIZE(ipfw_insn_u32))
|
|
|
|
goto bad_size;
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_IP_DST_LOOKUP:
|
2013-11-28 10:28:28 +00:00
|
|
|
if (cmd->arg1 >= V_fw_tables_max) {
|
2009-12-15 21:24:12 +00:00
|
|
|
printf("ipfw: invalid table number %d\n",
|
|
|
|
cmd->arg1);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn) &&
|
|
|
|
cmdlen != F_INSN_SIZE(ipfw_insn_u32) + 1 &&
|
|
|
|
cmdlen != F_INSN_SIZE(ipfw_insn_u32))
|
|
|
|
goto bad_size;
|
2015-04-27 08:29:39 +00:00
|
|
|
ci->object_opcodes++;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
* Add new "flow" table type to support N=1..5-tuple lookups
* Add "flow:hash" algorithm
Kernel changes:
* Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups
* Add IPFW_TABLE_FLOW table type
* Add "struct tflow_entry" as strage for 6-tuple flows
* Add "flow:hash" algorithm. Basically it is auto-growing chained hash table.
Additionally, we store mask of fields we need to compare in each instance/
* Increase ipfw_obj_tentry size by adding struct tflow_entry
* Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info
* Increase algoname length: 32 -> 64 (algo options passed there as string)
* Assume every table type can be customized by flags, use u8 to store "tflags" field.
* Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback.
* Fix bug in cidr:chash resize procedure.
Userland changes:
* add "flow table(NAME)" syntax to support n-tuple checking tables.
* make fill_flags() separate function to ease working with _s_x arrays
* change "table info" output to reflect longer "type" fields
Syntax:
ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash]
Examples:
0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash
0:02 [2] zfscurr0# ipfw table fl2 info
+++ table(fl2), set(0) +++
kindex: 0, type: flow:src-ip,proto,dst-port
valtype: number, references: 0
algorithm: flow:hash
items: 0, size: 280
0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000
0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000
0:02 [2] zfscurr0# ipfw table fl2 list
+++ table(fl2), set(0) +++
2a02:6b8::333,6,443 45000
10.0.0.92,6,80 22000
0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)'
00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
0:03 [2] zfscurr0# ipfw show
00200 0 0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 617 59416 allow ip from any to any
0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80
Trying 78.46.89.105...
..
0:04 [2] zfscurr0# ipfw show
00200 5 272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 682 66733 allow ip from any to any
2014-07-31 20:08:19 +00:00
|
|
|
case O_IP_FLOW_LOOKUP:
|
|
|
|
if (cmd->arg1 >= V_fw_tables_max) {
|
|
|
|
printf("ipfw: invalid table number %d\n",
|
|
|
|
cmd->arg1);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn) &&
|
|
|
|
cmdlen != F_INSN_SIZE(ipfw_insn_u32))
|
|
|
|
goto bad_size;
|
2015-04-27 08:29:39 +00:00
|
|
|
ci->object_opcodes++;
|
* Add new "flow" table type to support N=1..5-tuple lookups
* Add "flow:hash" algorithm
Kernel changes:
* Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups
* Add IPFW_TABLE_FLOW table type
* Add "struct tflow_entry" as strage for 6-tuple flows
* Add "flow:hash" algorithm. Basically it is auto-growing chained hash table.
Additionally, we store mask of fields we need to compare in each instance/
* Increase ipfw_obj_tentry size by adding struct tflow_entry
* Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info
* Increase algoname length: 32 -> 64 (algo options passed there as string)
* Assume every table type can be customized by flags, use u8 to store "tflags" field.
* Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback.
* Fix bug in cidr:chash resize procedure.
Userland changes:
* add "flow table(NAME)" syntax to support n-tuple checking tables.
* make fill_flags() separate function to ease working with _s_x arrays
* change "table info" output to reflect longer "type" fields
Syntax:
ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash]
Examples:
0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash
0:02 [2] zfscurr0# ipfw table fl2 info
+++ table(fl2), set(0) +++
kindex: 0, type: flow:src-ip,proto,dst-port
valtype: number, references: 0
algorithm: flow:hash
items: 0, size: 280
0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000
0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000
0:02 [2] zfscurr0# ipfw table fl2 list
+++ table(fl2), set(0) +++
2a02:6b8::333,6,443 45000
10.0.0.92,6,80 22000
0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)'
00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
0:03 [2] zfscurr0# ipfw show
00200 0 0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 617 59416 allow ip from any to any
0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80
Trying 78.46.89.105...
..
0:04 [2] zfscurr0# ipfw show
00200 5 272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 682 66733 allow ip from any to any
2014-07-31 20:08:19 +00:00
|
|
|
break;
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_MACADDR2:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_mac))
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_NOP:
|
|
|
|
case O_IPID:
|
|
|
|
case O_IPTTL:
|
|
|
|
case O_IPLEN:
|
|
|
|
case O_TCPDATALEN:
|
2019-06-21 10:54:51 +00:00
|
|
|
case O_TCPMSS:
|
2012-02-06 11:35:29 +00:00
|
|
|
case O_TCPWIN:
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_TAGGED:
|
|
|
|
if (cmdlen < 1 || cmdlen > 31)
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
2013-04-01 11:28:52 +00:00
|
|
|
case O_DSCP:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_u32) + 1)
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_MAC_TYPE:
|
|
|
|
case O_IP_SRCPORT:
|
|
|
|
case O_IP_DSTPORT: /* XXX artificial limit, 30 port pairs */
|
|
|
|
if (cmdlen < 2 || cmdlen > 31)
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_RECV:
|
|
|
|
case O_XMIT:
|
|
|
|
case O_VIA:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_if))
|
|
|
|
goto bad_size;
|
2015-11-03 10:34:26 +00:00
|
|
|
ci->object_opcodes++;
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case O_ALTQ:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_altq))
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_PIPE:
|
|
|
|
case O_QUEUE:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
goto check_action;
|
|
|
|
|
|
|
|
case O_FORWARD_IP:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_sa))
|
|
|
|
goto bad_size;
|
|
|
|
goto check_action;
|
2011-08-20 17:05:11 +00:00
|
|
|
#ifdef INET6
|
|
|
|
case O_FORWARD_IP6:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_sa6))
|
|
|
|
goto bad_size;
|
|
|
|
goto check_action;
|
|
|
|
#endif /* INET6 */
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_DIVERT:
|
|
|
|
case O_TEE:
|
|
|
|
if (ip_divert_ptr == NULL)
|
|
|
|
return EINVAL;
|
|
|
|
else
|
|
|
|
goto check_size;
|
|
|
|
case O_NETGRAPH:
|
|
|
|
case O_NGTEE:
|
2010-01-07 10:08:05 +00:00
|
|
|
if (ng_ipfw_input_p == NULL)
|
2009-12-15 21:24:12 +00:00
|
|
|
return EINVAL;
|
|
|
|
else
|
|
|
|
goto check_size;
|
|
|
|
case O_NAT:
|
|
|
|
if (!IPFW_NAT_LOADED)
|
|
|
|
return EINVAL;
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_nat))
|
|
|
|
goto bad_size;
|
|
|
|
goto check_action;
|
|
|
|
case O_CHECK_STATE:
|
2016-07-19 04:56:59 +00:00
|
|
|
ci->object_opcodes++;
|
|
|
|
/* FALLTHROUGH */
|
|
|
|
case O_FORWARD_MAC: /* XXX not implemented yet */
|
2009-12-15 21:24:12 +00:00
|
|
|
case O_COUNT:
|
|
|
|
case O_ACCEPT:
|
|
|
|
case O_DENY:
|
|
|
|
case O_REJECT:
|
2013-03-20 10:35:33 +00:00
|
|
|
case O_SETDSCP:
|
2009-12-15 21:24:12 +00:00
|
|
|
#ifdef INET6
|
|
|
|
case O_UNREACH6:
|
|
|
|
#endif
|
|
|
|
case O_SKIPTO:
|
|
|
|
case O_REASS:
|
2011-06-29 10:06:58 +00:00
|
|
|
case O_CALLRETURN:
|
2009-12-15 21:24:12 +00:00
|
|
|
check_size:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
check_action:
|
|
|
|
if (have_action) {
|
|
|
|
printf("ipfw: opcode %d, multiple actions"
|
|
|
|
" not allowed\n",
|
|
|
|
cmd->opcode);
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EINVAL);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
have_action = 1;
|
|
|
|
if (l != cmdlen) {
|
|
|
|
printf("ipfw: opcode %d, action must be"
|
|
|
|
" last opcode\n",
|
|
|
|
cmd->opcode);
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EINVAL);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
#ifdef INET6
|
|
|
|
case O_IP6_SRC:
|
|
|
|
case O_IP6_DST:
|
|
|
|
if (cmdlen != F_INSN_SIZE(struct in6_addr) +
|
|
|
|
F_INSN_SIZE(ipfw_insn))
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_FLOW6ID:
|
|
|
|
if (cmdlen != F_INSN_SIZE(ipfw_insn_u32) +
|
|
|
|
((ipfw_insn_u32 *)cmd)->o.arg1)
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case O_IP6_SRC_MASK:
|
|
|
|
case O_IP6_DST_MASK:
|
|
|
|
if ( !(cmdlen & 1) || cmdlen > 127)
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
case O_ICMP6TYPE:
|
|
|
|
if( cmdlen != F_INSN_SIZE( ipfw_insn_icmp6 ) )
|
|
|
|
goto bad_size;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
default:
|
|
|
|
switch (cmd->opcode) {
|
|
|
|
#ifndef INET6
|
|
|
|
case O_IP6_SRC_ME:
|
|
|
|
case O_IP6_DST_ME:
|
|
|
|
case O_EXT_HDR:
|
|
|
|
case O_IP6:
|
|
|
|
case O_UNREACH6:
|
|
|
|
case O_IP6_SRC:
|
|
|
|
case O_IP6_DST:
|
|
|
|
case O_FLOW6ID:
|
|
|
|
case O_IP6_SRC_MASK:
|
|
|
|
case O_IP6_DST_MASK:
|
|
|
|
case O_ICMP6TYPE:
|
|
|
|
printf("ipfw: no IPv6 support in kernel\n");
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EPROTONOSUPPORT);
|
2009-12-15 21:24:12 +00:00
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
printf("ipfw: opcode %d, unknown opcode\n",
|
|
|
|
cmd->opcode);
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EINVAL);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (have_action == 0) {
|
|
|
|
printf("ipfw: missing action\n");
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EINVAL);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
bad_size:
|
|
|
|
printf("ipfw: opcode %d size %d wrong\n",
|
|
|
|
cmd->opcode, cmdlen);
|
2014-07-08 23:11:15 +00:00
|
|
|
return (EINVAL);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Translation of requests for compatibility with FreeBSD 7.2/8.
|
|
|
|
* a static variable tells us if we have an old client from userland,
|
|
|
|
* and if necessary we translate requests and responses between the
|
|
|
|
* two formats.
|
|
|
|
*/
|
|
|
|
static int is7 = 0;
|
|
|
|
|
|
|
|
struct ip_fw7 {
|
|
|
|
struct ip_fw7 *next; /* linked list of rules */
|
|
|
|
struct ip_fw7 *next_rule; /* ptr to next [skipto] rule */
|
|
|
|
/* 'next_rule' is used to pass up 'set_disable' status */
|
|
|
|
|
|
|
|
uint16_t act_ofs; /* offset of action in 32-bit units */
|
|
|
|
uint16_t cmd_len; /* # of 32-bit words in cmd */
|
|
|
|
uint16_t rulenum; /* rule number */
|
|
|
|
uint8_t set; /* rule set (0..31) */
|
|
|
|
// #define RESVD_SET 31 /* set for default and persistent rules */
|
|
|
|
uint8_t _pad; /* padding */
|
|
|
|
// uint32_t id; /* rule id, only in v.8 */
|
|
|
|
/* These fields are present in all rules. */
|
|
|
|
uint64_t pcnt; /* Packet counter */
|
|
|
|
uint64_t bcnt; /* Byte counter */
|
|
|
|
uint32_t timestamp; /* tv_sec of last match */
|
|
|
|
|
|
|
|
ipfw_insn cmd[1]; /* storage for commands */
|
|
|
|
};
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
static int convert_rule_to_7(struct ip_fw_rule0 *rule);
|
|
|
|
static int convert_rule_to_8(struct ip_fw_rule0 *rule);
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
|
|
|
#ifndef RULESIZE7
|
|
|
|
#define RULESIZE7(rule) (sizeof(struct ip_fw7) + \
|
|
|
|
((struct ip_fw7 *)(rule))->cmd_len * 4 - 4)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
|
|
|
* Copy the static and dynamic rules to the supplied buffer
|
|
|
|
* and return the amount of space actually used.
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
* Must be run under IPFW_UH_RLOCK
|
2009-12-15 21:24:12 +00:00
|
|
|
*/
|
|
|
|
static size_t
|
|
|
|
ipfw_getrules(struct ip_fw_chain *chain, void *buf, size_t space)
|
|
|
|
{
|
|
|
|
char *bp = buf;
|
|
|
|
char *ep = bp + space;
|
2014-07-08 23:11:15 +00:00
|
|
|
struct ip_fw *rule;
|
|
|
|
struct ip_fw_rule0 *dst;
|
2016-07-27 11:08:59 +00:00
|
|
|
struct timeval boottime;
|
2014-07-04 07:02:11 +00:00
|
|
|
int error, i, l, warnflag;
|
2009-12-15 21:24:12 +00:00
|
|
|
time_t boot_seconds;
|
|
|
|
|
2014-07-04 07:02:11 +00:00
|
|
|
warnflag = 0;
|
|
|
|
|
2016-07-27 11:08:59 +00:00
|
|
|
getboottime(&boottime);
|
2009-12-15 21:24:12 +00:00
|
|
|
boot_seconds = boottime.tv_sec;
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
for (i = 0; i < chain->n_rules; i++) {
|
|
|
|
rule = chain->map[i];
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
|
|
|
if (is7) {
|
|
|
|
/* Convert rule to FreeBSd 7.2 format */
|
|
|
|
l = RULESIZE7(rule);
|
|
|
|
if (bp + l + sizeof(uint32_t) <= ep) {
|
|
|
|
bcopy(rule, bp, l + sizeof(uint32_t));
|
2015-04-27 08:29:39 +00:00
|
|
|
error = set_legacy_obj_kidx(chain,
|
2014-07-08 23:11:15 +00:00
|
|
|
(struct ip_fw_rule0 *)bp);
|
2014-06-12 09:59:11 +00:00
|
|
|
if (error != 0)
|
|
|
|
return (0);
|
2014-07-08 23:11:15 +00:00
|
|
|
error = convert_rule_to_7((struct ip_fw_rule0 *) bp);
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
if (error)
|
|
|
|
return 0; /*XXX correct? */
|
|
|
|
/*
|
|
|
|
* XXX HACK. Store the disable mask in the "next"
|
|
|
|
* pointer in a wild attempt to keep the ABI the same.
|
|
|
|
* Why do we do this on EVERY rule?
|
|
|
|
*/
|
|
|
|
bcopy(&V_set_disable,
|
|
|
|
&(((struct ip_fw7 *)bp)->next_rule),
|
|
|
|
sizeof(V_set_disable));
|
|
|
|
if (((struct ip_fw7 *)bp)->timestamp)
|
|
|
|
((struct ip_fw7 *)bp)->timestamp += boot_seconds;
|
|
|
|
bp += l;
|
|
|
|
}
|
|
|
|
continue; /* go to next rule */
|
|
|
|
}
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
l = RULEUSIZE0(rule);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
if (bp + l > ep) { /* should not happen */
|
|
|
|
printf("overflow dumping static rules\n");
|
|
|
|
break;
|
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
dst = (struct ip_fw_rule0 *)bp;
|
|
|
|
export_rule0(rule, dst, l);
|
2015-04-27 08:29:39 +00:00
|
|
|
error = set_legacy_obj_kidx(chain, dst);
|
2014-06-12 09:59:11 +00:00
|
|
|
|
2009-12-24 17:35:28 +00:00
|
|
|
/*
|
|
|
|
* XXX HACK. Store the disable mask in the "next"
|
|
|
|
* pointer in a wild attempt to keep the ABI the same.
|
|
|
|
* Why do we do this on EVERY rule?
|
2014-07-03 22:25:59 +00:00
|
|
|
*
|
|
|
|
* XXX: "ipfw set show" (ab)uses IP_FW_GET to read disabled mask
|
|
|
|
* so we need to fail _after_ saving at least one mask.
|
2009-12-24 17:35:28 +00:00
|
|
|
*/
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
bcopy(&V_set_disable, &dst->next_rule, sizeof(V_set_disable));
|
|
|
|
if (dst->timestamp)
|
|
|
|
dst->timestamp += boot_seconds;
|
|
|
|
bp += l;
|
2014-07-03 22:25:59 +00:00
|
|
|
|
|
|
|
if (error != 0) {
|
2014-07-04 07:02:11 +00:00
|
|
|
if (error == 2) {
|
|
|
|
/* Non-fatal table rewrite error. */
|
|
|
|
warnflag = 1;
|
|
|
|
continue;
|
|
|
|
}
|
2014-07-03 22:25:59 +00:00
|
|
|
printf("Stop on rule %d. Fail to convert table\n",
|
|
|
|
rule->rulenum);
|
|
|
|
break;
|
|
|
|
}
|
2009-12-24 17:35:28 +00:00
|
|
|
}
|
2014-07-04 07:02:11 +00:00
|
|
|
if (warnflag != 0)
|
|
|
|
printf("ipfw: process %s is using legacy interfaces,"
|
|
|
|
" consider rebuilding\n", "");
|
2012-11-30 16:33:22 +00:00
|
|
|
ipfw_get_dynamic(chain, &bp, ep); /* protected by the dynamic lock */
|
2009-12-15 21:24:12 +00:00
|
|
|
return (bp - (char *)buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2014-06-28 23:20:24 +00:00
|
|
|
struct dump_args {
|
|
|
|
uint32_t b; /* start rule */
|
|
|
|
uint32_t e; /* end rule */
|
|
|
|
uint32_t rcount; /* number of rules */
|
|
|
|
uint32_t rsize; /* rules size */
|
|
|
|
uint32_t tcount; /* number of tables */
|
2014-07-08 23:11:15 +00:00
|
|
|
int rcounters; /* counters */
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
uint32_t *bmask; /* index bitmask of used named objects */
|
2014-06-28 23:20:24 +00:00
|
|
|
};
|
|
|
|
|
2015-11-03 10:21:53 +00:00
|
|
|
void
|
|
|
|
ipfw_export_obj_ntlv(struct named_object *no, ipfw_obj_ntlv *ntlv)
|
|
|
|
{
|
|
|
|
|
|
|
|
ntlv->head.type = no->etlv;
|
|
|
|
ntlv->head.length = sizeof(*ntlv);
|
|
|
|
ntlv->idx = no->kidx;
|
|
|
|
strlcpy(ntlv->name, no->name, sizeof(ntlv->name));
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/*
|
|
|
|
* Export named object info in instance @ni, identified by @kidx
|
|
|
|
* to ipfw_obj_ntlv. TLV is allocated from @sd space.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
export_objhash_ntlv(struct namedobj_instance *ni, uint16_t kidx,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
ipfw_obj_ntlv *ntlv;
|
|
|
|
|
|
|
|
no = ipfw_objhash_lookup_kidx(ni, kidx);
|
|
|
|
KASSERT(no != NULL, ("invalid object kernel index passed"));
|
|
|
|
|
|
|
|
ntlv = (ipfw_obj_ntlv *)ipfw_get_sopt_space(sd, sizeof(*ntlv));
|
|
|
|
if (ntlv == NULL)
|
|
|
|
return (ENOMEM);
|
|
|
|
|
2015-11-03 10:21:53 +00:00
|
|
|
ipfw_export_obj_ntlv(no, ntlv);
|
2015-04-27 08:29:39 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
static int
|
|
|
|
export_named_objects(struct namedobj_instance *ni, struct dump_args *da,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
int error, i;
|
|
|
|
|
|
|
|
for (i = 0; i < IPFW_TABLES_MAX && da->tcount > 0; i++) {
|
|
|
|
if ((da->bmask[i / 32] & (1 << (i % 32))) == 0)
|
|
|
|
continue;
|
|
|
|
if ((error = export_objhash_ntlv(ni, i, sd)) != 0)
|
|
|
|
return (error);
|
|
|
|
da->tcount--;
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
dump_named_objects(struct ip_fw_chain *ch, struct dump_args *da,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_obj_ctlv *ctlv;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
MPASS(da->tcount > 0);
|
|
|
|
/* Header first */
|
|
|
|
ctlv = (ipfw_obj_ctlv *)ipfw_get_sopt_space(sd, sizeof(*ctlv));
|
|
|
|
if (ctlv == NULL)
|
|
|
|
return (ENOMEM);
|
|
|
|
ctlv->head.type = IPFW_TLV_TBLNAME_LIST;
|
|
|
|
ctlv->head.length = da->tcount * sizeof(ipfw_obj_ntlv) +
|
|
|
|
sizeof(*ctlv);
|
|
|
|
ctlv->count = da->tcount;
|
|
|
|
ctlv->objsize = sizeof(ipfw_obj_ntlv);
|
|
|
|
|
|
|
|
/* Dump table names first (if any) */
|
|
|
|
error = export_named_objects(ipfw_get_table_objhash(ch), da, sd);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
/* Then dump another named objects */
|
|
|
|
da->bmask += IPFW_TABLES_MAX / 32;
|
|
|
|
return (export_named_objects(CHAIN_TO_SRV(ch), da, sd));
|
|
|
|
}
|
|
|
|
|
2014-06-28 23:20:24 +00:00
|
|
|
/*
|
|
|
|
* Dumps static rules with table TLVs in buffer @sd.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
dump_static_rules(struct ip_fw_chain *chain, struct dump_args *da,
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
struct sockopt_data *sd)
|
2014-06-28 23:20:24 +00:00
|
|
|
{
|
|
|
|
ipfw_obj_ctlv *ctlv;
|
2014-07-08 23:11:15 +00:00
|
|
|
struct ip_fw *krule;
|
|
|
|
caddr_t dst;
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
int i, l;
|
2014-06-28 23:20:24 +00:00
|
|
|
|
|
|
|
/* Dump rules */
|
|
|
|
ctlv = (ipfw_obj_ctlv *)ipfw_get_sopt_space(sd, sizeof(*ctlv));
|
|
|
|
if (ctlv == NULL)
|
|
|
|
return (ENOMEM);
|
|
|
|
ctlv->head.type = IPFW_TLV_RULE_LIST;
|
|
|
|
ctlv->head.length = da->rsize + sizeof(*ctlv);
|
|
|
|
ctlv->count = da->rcount;
|
|
|
|
|
|
|
|
for (i = da->b; i < da->e; i++) {
|
2014-07-08 23:11:15 +00:00
|
|
|
krule = chain->map[i];
|
2014-06-28 23:20:24 +00:00
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
l = RULEUSIZE1(krule) + sizeof(ipfw_obj_tlv);
|
|
|
|
if (da->rcounters != 0)
|
|
|
|
l += sizeof(struct ip_fw_bcounter);
|
|
|
|
dst = (caddr_t)ipfw_get_sopt_space(sd, l);
|
|
|
|
if (dst == NULL)
|
2014-06-28 23:20:24 +00:00
|
|
|
return (ENOMEM);
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
export_rule1(krule, dst, l, da->rcounters);
|
2014-06-28 23:20:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
int
|
|
|
|
ipfw_mark_object_kidx(uint32_t *bmask, uint16_t etlv, uint16_t kidx)
|
|
|
|
{
|
|
|
|
uint32_t bidx;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Maintain separate bitmasks for table and non-table objects.
|
|
|
|
*/
|
|
|
|
bidx = (etlv == IPFW_TLV_TBL_NAME) ? 0: IPFW_TABLES_MAX / 32;
|
|
|
|
bidx += kidx / 32;
|
|
|
|
if ((bmask[bidx] & (1 << (kidx % 32))) != 0)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
bmask[bidx] |= 1 << (kidx % 32);
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/*
|
|
|
|
* Marks every object index used in @rule with bit in @bmask.
|
|
|
|
* Used to generate bitmask of referenced tables/objects for given ruleset
|
|
|
|
* or its part.
|
|
|
|
*/
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
static void
|
|
|
|
mark_rule_objects(struct ip_fw_chain *ch, struct ip_fw *rule,
|
|
|
|
struct dump_args *da)
|
2015-04-27 08:29:39 +00:00
|
|
|
{
|
2016-04-14 21:31:16 +00:00
|
|
|
struct opcode_obj_rewrite *rw;
|
2015-04-27 08:29:39 +00:00
|
|
|
ipfw_insn *cmd;
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
int cmdlen, l;
|
2015-04-27 08:29:39 +00:00
|
|
|
uint16_t kidx;
|
|
|
|
uint8_t subtype;
|
|
|
|
|
|
|
|
l = rule->cmd_len;
|
|
|
|
cmd = rule->cmd;
|
|
|
|
cmdlen = 0;
|
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd, &kidx, &subtype);
|
2015-04-27 08:29:39 +00:00
|
|
|
if (rw == NULL)
|
|
|
|
continue;
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
if (ipfw_mark_object_kidx(da->bmask, rw->etlv, kidx))
|
|
|
|
da->tcount++;
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-06-28 23:20:24 +00:00
|
|
|
/*
|
|
|
|
* Dumps requested objects data
|
|
|
|
* Data layout (version 0)(current):
|
|
|
|
* Request: [ ipfw_cfg_lheader ] + IPFW_CFG_GET_* flags
|
|
|
|
* size = ipfw_cfg_lheader.size
|
2014-10-09 12:37:53 +00:00
|
|
|
* Reply: [ ipfw_cfg_lheader
|
2014-06-28 23:20:24 +00:00
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_TBL_LIST) ipfw_obj_ntlv x N ] (optional)
|
2014-07-08 23:11:15 +00:00
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_RULE_LIST)
|
|
|
|
* ipfw_obj_tlv(IPFW_TLV_RULE_ENT) [ ip_fw_bcounter (optional) ip_fw_rule ]
|
|
|
|
* ] (optional)
|
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_STATE_LIST) ipfw_obj_dyntlv x N ] (optional)
|
2014-06-28 23:20:24 +00:00
|
|
|
* ]
|
|
|
|
* * NOTE IPFW_TLV_STATE_LIST has the single valid field: objsize.
|
|
|
|
* The rest (size, count) are set to zero and needs to be ignored.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
2014-09-03 21:57:06 +00:00
|
|
|
dump_config(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
2014-06-28 23:20:24 +00:00
|
|
|
{
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
struct dump_args da;
|
2014-06-28 23:20:24 +00:00
|
|
|
ipfw_cfg_lheader *hdr;
|
|
|
|
struct ip_fw *rule;
|
2014-08-14 08:21:22 +00:00
|
|
|
size_t sz, rnum;
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
uint32_t hdr_flags, *bmask;
|
2014-06-28 23:20:24 +00:00
|
|
|
int error, i;
|
|
|
|
|
|
|
|
hdr = (ipfw_cfg_lheader *)ipfw_get_sopt_header(sd, sizeof(*hdr));
|
|
|
|
if (hdr == NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
error = 0;
|
|
|
|
bmask = NULL;
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
memset(&da, 0, sizeof(da));
|
|
|
|
/*
|
|
|
|
* Allocate needed state.
|
|
|
|
* Note we allocate 2xspace mask, for table & srv
|
|
|
|
*/
|
|
|
|
if (hdr->flags & (IPFW_CFG_GET_STATIC | IPFW_CFG_GET_STATES))
|
|
|
|
da.bmask = bmask = malloc(
|
|
|
|
sizeof(uint32_t) * IPFW_TABLES_MAX * 2 / 32, M_TEMP,
|
|
|
|
M_WAITOK | M_ZERO);
|
2014-06-28 23:20:24 +00:00
|
|
|
IPFW_UH_RLOCK(chain);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* STAGE 1: Determine size/count for objects in range.
|
|
|
|
* Prepare used tables bitmask.
|
|
|
|
*/
|
2014-10-09 12:37:53 +00:00
|
|
|
sz = sizeof(ipfw_cfg_lheader);
|
2014-06-28 23:20:24 +00:00
|
|
|
da.e = chain->n_rules;
|
|
|
|
|
2014-06-29 09:29:27 +00:00
|
|
|
if (hdr->end_rule != 0) {
|
|
|
|
/* Handle custom range */
|
|
|
|
if ((rnum = hdr->start_rule) > IPFW_DEFAULT_RULE)
|
|
|
|
rnum = IPFW_DEFAULT_RULE;
|
|
|
|
da.b = ipfw_find_rule(chain, rnum, 0);
|
2017-11-23 05:55:53 +00:00
|
|
|
rnum = (hdr->end_rule < IPFW_DEFAULT_RULE) ?
|
|
|
|
hdr->end_rule + 1: IPFW_DEFAULT_RULE;
|
|
|
|
da.e = ipfw_find_rule(chain, rnum, UINT32_MAX) + 1;
|
2014-06-29 09:29:27 +00:00
|
|
|
}
|
|
|
|
|
2014-06-28 23:20:24 +00:00
|
|
|
if (hdr->flags & IPFW_CFG_GET_STATIC) {
|
|
|
|
for (i = da.b; i < da.e; i++) {
|
|
|
|
rule = chain->map[i];
|
2014-07-08 23:11:15 +00:00
|
|
|
da.rsize += RULEUSIZE1(rule) + sizeof(ipfw_obj_tlv);
|
2014-06-28 23:20:24 +00:00
|
|
|
da.rcount++;
|
2015-04-27 08:29:39 +00:00
|
|
|
/* Update bitmask of used objects for given range */
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
mark_rule_objects(chain, rule, &da);
|
2014-06-28 23:20:24 +00:00
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
/* Add counters if requested */
|
|
|
|
if (hdr->flags & IPFW_CFG_GET_COUNTERS) {
|
|
|
|
da.rsize += sizeof(struct ip_fw_bcounter) * da.rcount;
|
|
|
|
da.rcounters = 1;
|
|
|
|
}
|
2014-06-28 23:20:24 +00:00
|
|
|
sz += da.rsize + sizeof(ipfw_obj_ctlv);
|
|
|
|
}
|
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
if (hdr->flags & IPFW_CFG_GET_STATES) {
|
|
|
|
sz += sizeof(ipfw_obj_ctlv) +
|
|
|
|
ipfw_dyn_get_count(bmask, &i) * sizeof(ipfw_obj_dyntlv);
|
|
|
|
da.tcount += i;
|
|
|
|
}
|
2014-06-28 23:20:24 +00:00
|
|
|
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
if (da.tcount > 0)
|
|
|
|
sz += da.tcount * sizeof(ipfw_obj_ntlv) +
|
|
|
|
sizeof(ipfw_obj_ctlv);
|
2014-08-14 08:21:22 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Fill header anyway.
|
|
|
|
* Note we have to save header fields to stable storage
|
|
|
|
* buffer inside @sd can be flushed after dumping rules
|
|
|
|
*/
|
2014-06-28 23:20:24 +00:00
|
|
|
hdr->size = sz;
|
2014-07-03 22:25:59 +00:00
|
|
|
hdr->set_mask = ~V_set_disable;
|
2014-08-14 08:21:22 +00:00
|
|
|
hdr_flags = hdr->flags;
|
|
|
|
hdr = NULL;
|
2014-06-28 23:20:24 +00:00
|
|
|
|
|
|
|
if (sd->valsize < sz) {
|
2014-08-14 08:21:22 +00:00
|
|
|
error = ENOMEM;
|
|
|
|
goto cleanup;
|
2014-06-28 23:20:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* STAGE2: Store actual data */
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
if (da.tcount > 0) {
|
|
|
|
error = dump_named_objects(chain, &da, sd);
|
|
|
|
if (error != 0)
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
2014-08-14 08:21:22 +00:00
|
|
|
if (hdr_flags & IPFW_CFG_GET_STATIC) {
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
error = dump_static_rules(chain, &da, sd);
|
2014-08-14 08:21:22 +00:00
|
|
|
if (error != 0)
|
|
|
|
goto cleanup;
|
2014-06-28 23:20:24 +00:00
|
|
|
}
|
|
|
|
|
2014-08-14 08:21:22 +00:00
|
|
|
if (hdr_flags & IPFW_CFG_GET_STATES)
|
2014-06-28 23:20:24 +00:00
|
|
|
error = ipfw_dump_states(chain, sd);
|
|
|
|
|
2014-08-14 08:21:22 +00:00
|
|
|
cleanup:
|
2014-06-28 23:20:24 +00:00
|
|
|
IPFW_UH_RUNLOCK(chain);
|
|
|
|
|
|
|
|
if (bmask != NULL)
|
|
|
|
free(bmask, M_TEMP);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2015-11-03 10:29:46 +00:00
|
|
|
int
|
|
|
|
ipfw_check_object_name_generic(const char *name)
|
2014-06-29 22:35:47 +00:00
|
|
|
{
|
2015-11-03 10:29:46 +00:00
|
|
|
int nsize;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
2015-11-03 10:29:46 +00:00
|
|
|
nsize = sizeof(((ipfw_obj_ntlv *)0)->name);
|
|
|
|
if (strnlen(name, nsize) == nsize)
|
|
|
|
return (EINVAL);
|
|
|
|
if (name[0] == '\0')
|
|
|
|
return (EINVAL);
|
2014-06-29 22:35:47 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/*
|
|
|
|
* Creates non-existent objects referenced by rule.
|
|
|
|
*
|
|
|
|
* Return 0 on success.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
create_objects_compat(struct ip_fw_chain *ch, ipfw_insn *cmd,
|
|
|
|
struct obj_idx *oib, struct obj_idx *pidx, struct tid_info *ti)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw;
|
|
|
|
struct obj_idx *p;
|
|
|
|
uint16_t kidx;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compatibility stuff: do actual creation for non-existing,
|
|
|
|
* but referenced objects.
|
|
|
|
*/
|
|
|
|
for (p = oib; p < pidx; p++) {
|
|
|
|
if (p->kidx != 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ti->uidx = p->uidx;
|
|
|
|
ti->type = p->type;
|
|
|
|
ti->atype = 0;
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd + p->off, NULL, NULL);
|
2015-04-27 08:29:39 +00:00
|
|
|
KASSERT(rw != NULL, ("Unable to find handler for op %d",
|
|
|
|
(cmd + p->off)->opcode));
|
|
|
|
|
2016-04-27 15:28:25 +00:00
|
|
|
if (rw->create_object == NULL)
|
|
|
|
error = EOPNOTSUPP;
|
|
|
|
else
|
|
|
|
error = rw->create_object(ch, ti, &kidx);
|
2015-04-27 08:29:39 +00:00
|
|
|
if (error == 0) {
|
|
|
|
p->kidx = kidx;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Error happened. We have to rollback everything.
|
|
|
|
* Drop all already acquired references.
|
|
|
|
*/
|
|
|
|
IPFW_UH_WLOCK(ch);
|
|
|
|
unref_oib_objects(ch, cmd, oib, pidx);
|
|
|
|
IPFW_UH_WUNLOCK(ch);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2015-04-27 10:01:22 +00:00
|
|
|
return (0);
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compatibility function for old ipfw(8) binaries.
|
|
|
|
* Rewrites table/nat kernel indices with userland ones.
|
|
|
|
* Convert tables matching '/^\d+$/' to their atoi() value.
|
|
|
|
* Use number 65535 for other tables.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
set_legacy_obj_kidx(struct ip_fw_chain *ch, struct ip_fw_rule0 *rule)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw;
|
2016-04-14 21:31:16 +00:00
|
|
|
struct named_object *no;
|
|
|
|
ipfw_insn *cmd;
|
2015-04-27 08:29:39 +00:00
|
|
|
char *end;
|
|
|
|
long val;
|
2016-04-14 21:31:16 +00:00
|
|
|
int cmdlen, error, l;
|
|
|
|
uint16_t kidx, uidx;
|
|
|
|
uint8_t subtype;
|
2015-04-27 08:29:39 +00:00
|
|
|
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
l = rule->cmd_len;
|
|
|
|
cmd = rule->cmd;
|
|
|
|
cmdlen = 0;
|
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
|
|
|
|
/* Check if is index in given opcode */
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd, &kidx, &subtype);
|
|
|
|
if (rw == NULL)
|
2015-04-27 08:29:39 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Try to find referenced kernel object */
|
|
|
|
no = rw->find_bykidx(ch, kidx);
|
|
|
|
if (no == NULL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
val = strtol(no->name, &end, 10);
|
|
|
|
if (*end == '\0' && val < 65535) {
|
|
|
|
uidx = val;
|
|
|
|
} else {
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We are called via legacy opcode.
|
|
|
|
* Save error and show table as fake number
|
|
|
|
* not to make ipfw(8) hang.
|
|
|
|
*/
|
|
|
|
uidx = 65535;
|
|
|
|
error = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
rw->update(cmd, uidx);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unreferences all already-referenced objects in given @cmd rule,
|
|
|
|
* using information in @oib.
|
|
|
|
*
|
|
|
|
* Used to rollback partially converted rule on error.
|
|
|
|
*/
|
2016-04-14 20:49:27 +00:00
|
|
|
static void
|
2015-04-27 08:29:39 +00:00
|
|
|
unref_oib_objects(struct ip_fw_chain *ch, ipfw_insn *cmd, struct obj_idx *oib,
|
|
|
|
struct obj_idx *end)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw;
|
|
|
|
struct named_object *no;
|
|
|
|
struct obj_idx *p;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(ch);
|
|
|
|
|
|
|
|
for (p = oib; p < end; p++) {
|
|
|
|
if (p->kidx == 0)
|
|
|
|
continue;
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd + p->off, NULL, NULL);
|
2015-04-27 08:29:39 +00:00
|
|
|
KASSERT(rw != NULL, ("Unable to find handler for op %d",
|
|
|
|
(cmd + p->off)->opcode));
|
|
|
|
|
|
|
|
/* Find & unref by existing idx */
|
|
|
|
no = rw->find_bykidx(ch, p->kidx);
|
|
|
|
KASSERT(no != NULL, ("Ref'd object %d disappeared", p->kidx));
|
|
|
|
no->refcnt--;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove references from every object used in @rule.
|
|
|
|
* Used at rule removal code.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
unref_rule_objects(struct ip_fw_chain *ch, struct ip_fw *rule)
|
|
|
|
{
|
2016-04-14 21:31:16 +00:00
|
|
|
struct opcode_obj_rewrite *rw;
|
2015-04-27 08:29:39 +00:00
|
|
|
struct named_object *no;
|
2016-04-14 21:31:16 +00:00
|
|
|
ipfw_insn *cmd;
|
|
|
|
int cmdlen, l;
|
2015-04-27 08:29:39 +00:00
|
|
|
uint16_t kidx;
|
|
|
|
uint8_t subtype;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK_ASSERT(ch);
|
|
|
|
|
|
|
|
l = rule->cmd_len;
|
|
|
|
cmd = rule->cmd;
|
|
|
|
cmdlen = 0;
|
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd, &kidx, &subtype);
|
2015-04-27 08:29:39 +00:00
|
|
|
if (rw == NULL)
|
|
|
|
continue;
|
|
|
|
no = rw->find_bykidx(ch, kidx);
|
|
|
|
|
2017-04-03 02:44:40 +00:00
|
|
|
KASSERT(no != NULL, ("object id %d not found", kidx));
|
2015-04-27 08:29:39 +00:00
|
|
|
KASSERT(no->subtype == subtype,
|
2017-04-03 02:44:40 +00:00
|
|
|
("wrong type %d (%d) for object id %d",
|
2015-04-27 08:29:39 +00:00
|
|
|
no->subtype, subtype, kidx));
|
2017-04-03 02:44:40 +00:00
|
|
|
KASSERT(no->refcnt > 0, ("refcount for object %d is %d",
|
2015-04-27 08:29:39 +00:00
|
|
|
kidx, no->refcnt));
|
|
|
|
|
2015-11-23 22:06:55 +00:00
|
|
|
if (no->refcnt == 1 && rw->destroy_object != NULL)
|
|
|
|
rw->destroy_object(ch, no);
|
|
|
|
else
|
|
|
|
no->refcnt--;
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find and reference object (if any) stored in instruction @cmd.
|
|
|
|
*
|
|
|
|
* Saves object info in @pidx, sets
|
|
|
|
* - @unresolved to 1 if object should exists but not found
|
|
|
|
*
|
|
|
|
* Returns non-zero value in case of error.
|
|
|
|
*/
|
2016-04-14 21:45:18 +00:00
|
|
|
static int
|
2015-04-27 08:29:39 +00:00
|
|
|
ref_opcode_object(struct ip_fw_chain *ch, ipfw_insn *cmd, struct tid_info *ti,
|
2016-04-14 21:31:16 +00:00
|
|
|
struct obj_idx *pidx, int *unresolved)
|
2015-04-27 08:29:39 +00:00
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
struct opcode_obj_rewrite *rw;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/* Check if this opcode is candidate for rewrite */
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd, &ti->uidx, &ti->type);
|
2015-04-27 08:29:39 +00:00
|
|
|
if (rw == NULL)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
/* Need to rewrite. Save necessary fields */
|
|
|
|
pidx->uidx = ti->uidx;
|
|
|
|
pidx->type = ti->type;
|
|
|
|
|
|
|
|
/* Try to find referenced kernel object */
|
|
|
|
error = rw->find_byname(ch, ti, &no);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
if (no == NULL) {
|
2016-04-14 21:31:16 +00:00
|
|
|
/*
|
|
|
|
* Report about unresolved object for automaic
|
|
|
|
* creation.
|
|
|
|
*/
|
2015-04-27 08:29:39 +00:00
|
|
|
*unresolved = 1;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2017-03-05 22:19:43 +00:00
|
|
|
/*
|
|
|
|
* Object is already exist.
|
|
|
|
* Its subtype should match with expected value.
|
|
|
|
*/
|
|
|
|
if (ti->type != no->subtype)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
/* Bump refcount and update kidx. */
|
2015-04-27 08:29:39 +00:00
|
|
|
no->refcnt++;
|
2016-04-14 21:31:16 +00:00
|
|
|
rw->update(cmd, no->kidx);
|
2015-04-27 08:29:39 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2016-04-14 20:49:27 +00:00
|
|
|
/*
|
|
|
|
* Finds and bumps refcount for objects referenced by given @rule.
|
|
|
|
* Auto-creates non-existing tables.
|
|
|
|
* Fills in @oib array with userland/kernel indexes.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
ref_rule_objects(struct ip_fw_chain *ch, struct ip_fw *rule,
|
|
|
|
struct rule_check_info *ci, struct obj_idx *oib, struct tid_info *ti)
|
|
|
|
{
|
|
|
|
struct obj_idx *pidx;
|
2016-04-14 21:31:16 +00:00
|
|
|
ipfw_insn *cmd;
|
|
|
|
int cmdlen, error, l, unresolved;
|
2016-04-14 20:49:27 +00:00
|
|
|
|
|
|
|
pidx = oib;
|
|
|
|
l = rule->cmd_len;
|
|
|
|
cmd = rule->cmd;
|
|
|
|
cmdlen = 0;
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
IPFW_UH_WLOCK(ch);
|
|
|
|
|
|
|
|
/* Increase refcount on each existing referenced table. */
|
|
|
|
for ( ; l > 0 ; l -= cmdlen, cmd += cmdlen) {
|
|
|
|
cmdlen = F_LEN(cmd);
|
2016-04-14 21:31:16 +00:00
|
|
|
unresolved = 0;
|
2016-04-14 20:49:27 +00:00
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
error = ref_opcode_object(ch, cmd, ti, pidx, &unresolved);
|
2016-04-14 20:49:27 +00:00
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
/*
|
2016-05-03 18:05:43 +00:00
|
|
|
* Compatibility stuff for old clients:
|
2016-04-14 21:31:16 +00:00
|
|
|
* prepare to automaitcally create non-existing objects.
|
2016-04-14 20:49:27 +00:00
|
|
|
*/
|
2016-04-14 21:31:16 +00:00
|
|
|
if (unresolved != 0) {
|
|
|
|
pidx->off = rule->cmd_len - l;
|
|
|
|
pidx++;
|
|
|
|
}
|
2016-04-14 20:49:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (error != 0) {
|
|
|
|
/* Unref everything we have already done */
|
|
|
|
unref_oib_objects(ch, rule->cmd, oib, pidx);
|
|
|
|
IPFW_UH_WUNLOCK(ch);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
IPFW_UH_WUNLOCK(ch);
|
|
|
|
|
|
|
|
/* Perform auto-creation for non-existing objects */
|
2016-04-14 21:31:16 +00:00
|
|
|
if (pidx != oib)
|
2016-04-14 20:49:27 +00:00
|
|
|
error = create_objects_compat(ch, rule->cmd, oib, pidx, ti);
|
|
|
|
|
|
|
|
/* Calculate real number of dynamic objects */
|
|
|
|
ci->object_opcodes = (uint16_t)(pidx - oib);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Checks is opcode is referencing table of appropriate type.
|
|
|
|
* Adds reference count for found table if true.
|
|
|
|
* Rewrites user-supplied opcode values with kernel ones.
|
|
|
|
*
|
|
|
|
* Returns 0 on success and appropriate error code otherwise.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
rewrite_rule_uidx(struct ip_fw_chain *chain, struct rule_check_info *ci)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
ipfw_insn *cmd;
|
|
|
|
uint8_t type;
|
|
|
|
struct obj_idx *p, *pidx_first, *pidx_last;
|
|
|
|
struct tid_info ti;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prepare an array for storing opcode indices.
|
|
|
|
* Use stack allocation by default.
|
|
|
|
*/
|
|
|
|
if (ci->object_opcodes <= (sizeof(ci->obuf)/sizeof(ci->obuf[0]))) {
|
|
|
|
/* Stack */
|
|
|
|
pidx_first = ci->obuf;
|
|
|
|
} else
|
|
|
|
pidx_first = malloc(
|
|
|
|
ci->object_opcodes * sizeof(struct obj_idx),
|
|
|
|
M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
|
|
|
|
error = 0;
|
|
|
|
type = 0;
|
|
|
|
memset(&ti, 0, sizeof(ti));
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
/* Use set rule is assigned to. */
|
|
|
|
ti.set = ci->krule->set;
|
2016-04-14 20:49:27 +00:00
|
|
|
if (ci->ctlv != NULL) {
|
|
|
|
ti.tlvs = (void *)(ci->ctlv + 1);
|
|
|
|
ti.tlen = ci->ctlv->head.length - sizeof(ipfw_obj_ctlv);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Reference all used tables and other objects */
|
|
|
|
error = ref_rule_objects(chain, ci->krule, ci, pidx_first, &ti);
|
|
|
|
if (error != 0)
|
|
|
|
goto free;
|
|
|
|
/*
|
|
|
|
* Note that ref_rule_objects() might have updated ci->object_opcodes
|
|
|
|
* to reflect actual number of object opcodes.
|
|
|
|
*/
|
|
|
|
|
2016-04-14 21:45:18 +00:00
|
|
|
/* Perform rewrite of remaining opcodes */
|
2016-04-14 20:49:27 +00:00
|
|
|
p = pidx_first;
|
|
|
|
pidx_last = pidx_first + ci->object_opcodes;
|
|
|
|
for (p = pidx_first; p < pidx_last; p++) {
|
|
|
|
cmd = ci->krule->cmd + p->off;
|
|
|
|
update_opcode_kidx(cmd, p->kidx);
|
|
|
|
}
|
|
|
|
|
|
|
|
free:
|
|
|
|
if (pidx_first != ci->obuf)
|
|
|
|
free(pidx_first, M_IPFW);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2014-06-29 22:35:47 +00:00
|
|
|
/*
|
|
|
|
* Adds one or more rules to ipfw @chain.
|
|
|
|
* Data layout (version 0)(current):
|
|
|
|
* Request:
|
|
|
|
* [
|
|
|
|
* ip_fw3_opheader
|
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_TBL_LIST) ipfw_obj_ntlv x N ] (optional *1)
|
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_RULE_LIST) ip_fw x N ] (*2) (*3)
|
|
|
|
* ]
|
|
|
|
* Reply:
|
|
|
|
* [
|
|
|
|
* ip_fw3_opheader
|
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_TBL_LIST) ipfw_obj_ntlv x N ] (optional)
|
|
|
|
* [ ipfw_obj_ctlv(IPFW_TLV_RULE_LIST) ip_fw x N ]
|
|
|
|
* ]
|
|
|
|
*
|
|
|
|
* Rules in reply are modified to store their actual ruleset number.
|
|
|
|
*
|
|
|
|
* (*1) TLVs inside IPFW_TLV_TBL_LIST needs to be sorted ascending
|
2016-05-03 18:05:43 +00:00
|
|
|
* according to their idx field and there has to be no duplicates.
|
2014-06-29 22:35:47 +00:00
|
|
|
* (*2) Numbered rules inside IPFW_TLV_RULE_LIST needs to be sorted ascending.
|
|
|
|
* (*3) Each ip_fw structure needs to be aligned to u64 boundary.
|
|
|
|
*
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
static int
|
2014-08-07 21:37:31 +00:00
|
|
|
add_rules(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
2014-06-29 22:35:47 +00:00
|
|
|
{
|
|
|
|
ipfw_obj_ctlv *ctlv, *rtlv, *tstate;
|
|
|
|
ipfw_obj_ntlv *ntlv;
|
|
|
|
int clen, error, idx;
|
|
|
|
uint32_t count, read;
|
2014-07-08 23:11:15 +00:00
|
|
|
struct ip_fw_rule *r;
|
2014-06-29 22:35:47 +00:00
|
|
|
struct rule_check_info rci, *ci, *cbuf;
|
|
|
|
int i, rsize;
|
|
|
|
|
|
|
|
op3 = (ip_fw3_opheader *)ipfw_get_sopt_space(sd, sd->valsize);
|
|
|
|
ctlv = (ipfw_obj_ctlv *)(op3 + 1);
|
|
|
|
|
|
|
|
read = sizeof(ip_fw3_opheader);
|
|
|
|
rtlv = NULL;
|
|
|
|
tstate = NULL;
|
|
|
|
cbuf = NULL;
|
|
|
|
memset(&rci, 0, sizeof(struct rule_check_info));
|
|
|
|
|
|
|
|
if (read + sizeof(*ctlv) > sd->valsize)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
if (ctlv->head.type == IPFW_TLV_TBLNAME_LIST) {
|
|
|
|
clen = ctlv->head.length;
|
2014-07-08 23:11:15 +00:00
|
|
|
/* Check size and alignment */
|
2014-06-29 22:35:47 +00:00
|
|
|
if (clen > sd->valsize || clen < sizeof(*ctlv))
|
|
|
|
return (EINVAL);
|
2014-07-08 23:11:15 +00:00
|
|
|
if ((clen % sizeof(uint64_t)) != 0)
|
|
|
|
return (EINVAL);
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Some table names or other named objects.
|
|
|
|
* Check for validness.
|
|
|
|
*/
|
|
|
|
count = (ctlv->head.length - sizeof(*ctlv)) / sizeof(*ntlv);
|
|
|
|
if (ctlv->count != count || ctlv->objsize != sizeof(*ntlv))
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check each TLV.
|
|
|
|
* Ensure TLVs are sorted ascending and
|
|
|
|
* there are no duplicates.
|
|
|
|
*/
|
|
|
|
idx = -1;
|
|
|
|
ntlv = (ipfw_obj_ntlv *)(ctlv + 1);
|
|
|
|
while (count > 0) {
|
|
|
|
if (ntlv->head.length != sizeof(ipfw_obj_ntlv))
|
|
|
|
return (EINVAL);
|
|
|
|
|
2015-11-03 10:29:46 +00:00
|
|
|
error = ipfw_check_object_name_generic(ntlv->name);
|
2014-06-29 22:35:47 +00:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (ntlv->idx <= idx)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
idx = ntlv->idx;
|
|
|
|
count--;
|
|
|
|
ntlv++;
|
|
|
|
}
|
|
|
|
|
|
|
|
tstate = ctlv;
|
|
|
|
read += ctlv->head.length;
|
|
|
|
ctlv = (ipfw_obj_ctlv *)((caddr_t)ctlv + ctlv->head.length);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (read + sizeof(*ctlv) > sd->valsize)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
if (ctlv->head.type == IPFW_TLV_RULE_LIST) {
|
|
|
|
clen = ctlv->head.length;
|
|
|
|
if (clen + read > sd->valsize || clen < sizeof(*ctlv))
|
|
|
|
return (EINVAL);
|
2014-07-08 23:11:15 +00:00
|
|
|
if ((clen % sizeof(uint64_t)) != 0)
|
|
|
|
return (EINVAL);
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* TODO: Permit adding multiple rules at once
|
|
|
|
*/
|
|
|
|
if (ctlv->count != 1)
|
|
|
|
return (ENOTSUP);
|
|
|
|
|
|
|
|
clen -= sizeof(*ctlv);
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
if (ctlv->count > clen / sizeof(struct ip_fw_rule))
|
2014-06-29 22:35:47 +00:00
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
/* Allocate state for each rule or use stack */
|
|
|
|
if (ctlv->count == 1) {
|
|
|
|
memset(&rci, 0, sizeof(struct rule_check_info));
|
|
|
|
cbuf = &rci;
|
|
|
|
} else
|
|
|
|
cbuf = malloc(ctlv->count * sizeof(*ci), M_TEMP,
|
|
|
|
M_WAITOK | M_ZERO);
|
|
|
|
ci = cbuf;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check each rule for validness.
|
2014-07-08 23:11:15 +00:00
|
|
|
* Ensure numbered rules are sorted ascending
|
|
|
|
* and properly aligned
|
2014-06-29 22:35:47 +00:00
|
|
|
*/
|
2014-07-08 23:11:15 +00:00
|
|
|
idx = 0;
|
|
|
|
r = (struct ip_fw_rule *)(ctlv + 1);
|
2014-06-29 22:35:47 +00:00
|
|
|
count = 0;
|
|
|
|
error = 0;
|
|
|
|
while (clen > 0) {
|
2014-07-08 23:11:15 +00:00
|
|
|
rsize = roundup2(RULESIZE(r), sizeof(uint64_t));
|
2014-06-29 22:35:47 +00:00
|
|
|
if (rsize > clen || ctlv->count <= count) {
|
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
ci->ctlv = tstate;
|
2014-07-08 23:11:15 +00:00
|
|
|
error = check_ipfw_rule1(r, rsize, ci);
|
2014-06-29 22:35:47 +00:00
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/* Check sorting */
|
|
|
|
if (r->rulenum != 0 && r->rulenum < idx) {
|
2014-07-08 23:11:15 +00:00
|
|
|
printf("rulenum %d idx %d\n", r->rulenum, idx);
|
2014-06-29 22:35:47 +00:00
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
idx = r->rulenum;
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
ci->urule = (caddr_t)r;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
|
|
|
rsize = roundup2(rsize, sizeof(uint64_t));
|
|
|
|
clen -= rsize;
|
2014-07-08 23:11:15 +00:00
|
|
|
r = (struct ip_fw_rule *)((caddr_t)r + rsize);
|
2014-06-29 22:35:47 +00:00
|
|
|
count++;
|
|
|
|
ci++;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ctlv->count != count || error != 0) {
|
|
|
|
if (cbuf != &rci)
|
|
|
|
free(cbuf, M_TEMP);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
rtlv = ctlv;
|
|
|
|
read += ctlv->head.length;
|
|
|
|
ctlv = (ipfw_obj_ctlv *)((caddr_t)ctlv + ctlv->head.length);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (read != sd->valsize || rtlv == NULL || rtlv->count == 0) {
|
|
|
|
if (cbuf != NULL && cbuf != &rci)
|
|
|
|
free(cbuf, M_TEMP);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Passed rules seems to be valid.
|
|
|
|
* Allocate storage and try to add them to chain.
|
|
|
|
*/
|
|
|
|
for (i = 0, ci = cbuf; i < rtlv->count; i++, ci++) {
|
2014-07-08 23:11:15 +00:00
|
|
|
clen = RULEKSIZE1((struct ip_fw_rule *)ci->urule);
|
|
|
|
ci->krule = ipfw_alloc_rule(chain, clen);
|
|
|
|
import_rule1(ci);
|
2014-06-29 22:35:47 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if ((error = commit_rules(chain, cbuf, rtlv->count)) != 0) {
|
|
|
|
/* Free allocate krules */
|
|
|
|
for (i = 0, ci = cbuf; i < rtlv->count; i++, ci++)
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
ipfw_free_rule(ci->krule);
|
2014-06-29 22:35:47 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (cbuf != NULL && cbuf != &rci)
|
|
|
|
free(cbuf, M_TEMP);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2014-10-08 11:12:14 +00:00
|
|
|
/*
|
|
|
|
* Lists all sopts currently registered.
|
|
|
|
* Data layout (v0)(current):
|
|
|
|
* Request: [ ipfw_obj_lheader ], size = ipfw_obj_lheader.size
|
|
|
|
* Reply: [ ipfw_obj_lheader ipfw_sopt_info x N ]
|
|
|
|
*
|
|
|
|
* Returns 0 on success
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
dump_soptcodes(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
struct _ipfw_obj_lheader *olh;
|
|
|
|
ipfw_sopt_info *i;
|
|
|
|
struct ipfw_sopt_handler *sh;
|
|
|
|
uint32_t count, n, size;
|
|
|
|
|
|
|
|
olh = (struct _ipfw_obj_lheader *)ipfw_get_sopt_header(sd,sizeof(*olh));
|
|
|
|
if (olh == NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
if (sd->valsize < olh->size)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
count = ctl3_hsize;
|
|
|
|
size = count * sizeof(ipfw_sopt_info) + sizeof(ipfw_obj_lheader);
|
|
|
|
|
|
|
|
/* Fill in header regadless of buffer size */
|
|
|
|
olh->count = count;
|
|
|
|
olh->objsize = sizeof(ipfw_sopt_info);
|
|
|
|
|
|
|
|
if (size > olh->size) {
|
|
|
|
olh->size = size;
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
return (ENOMEM);
|
|
|
|
}
|
|
|
|
olh->size = size;
|
|
|
|
|
|
|
|
for (n = 1; n <= count; n++) {
|
|
|
|
i = (ipfw_sopt_info *)ipfw_get_sopt_space(sd, sizeof(*i));
|
2016-04-15 12:24:01 +00:00
|
|
|
KASSERT(i != NULL, ("previously checked buffer is not enough"));
|
2014-10-08 11:12:14 +00:00
|
|
|
sh = &ctl3_handlers[n];
|
|
|
|
i->opcode = sh->opcode;
|
|
|
|
i->version = sh->version;
|
|
|
|
i->refcnt = sh->refcnt;
|
|
|
|
}
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/*
|
|
|
|
* Compares two opcodes.
|
|
|
|
* Used both in qsort() and bsearch().
|
|
|
|
*
|
|
|
|
* Returns 0 if match is found.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
compare_opcodes(const void *_a, const void *_b)
|
|
|
|
{
|
|
|
|
const struct opcode_obj_rewrite *a, *b;
|
|
|
|
|
|
|
|
a = (const struct opcode_obj_rewrite *)_a;
|
|
|
|
b = (const struct opcode_obj_rewrite *)_b;
|
|
|
|
|
|
|
|
if (a->opcode < b->opcode)
|
|
|
|
return (-1);
|
|
|
|
else if (a->opcode > b->opcode)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-04-14 21:31:16 +00:00
|
|
|
* XXX: Rewrite bsearch()
|
2015-04-27 08:29:39 +00:00
|
|
|
*/
|
2016-04-14 21:31:16 +00:00
|
|
|
static int
|
|
|
|
find_op_rw_range(uint16_t op, struct opcode_obj_rewrite **plo,
|
|
|
|
struct opcode_obj_rewrite **phi)
|
2015-04-27 08:29:39 +00:00
|
|
|
{
|
2016-04-14 21:31:16 +00:00
|
|
|
struct opcode_obj_rewrite *ctl3_max, *lo, *hi, h, *rw;
|
2015-04-27 08:29:39 +00:00
|
|
|
|
|
|
|
memset(&h, 0, sizeof(h));
|
2016-04-14 21:31:16 +00:00
|
|
|
h.opcode = op;
|
2015-04-27 08:29:39 +00:00
|
|
|
|
|
|
|
rw = (struct opcode_obj_rewrite *)bsearch(&h, ctl3_rewriters,
|
|
|
|
ctl3_rsize, sizeof(h), compare_opcodes);
|
2016-04-14 21:31:16 +00:00
|
|
|
if (rw == NULL)
|
|
|
|
return (1);
|
2015-04-27 08:29:39 +00:00
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
/* Find the first element matching the same opcode */
|
|
|
|
lo = rw;
|
|
|
|
for ( ; lo > ctl3_rewriters && (lo - 1)->opcode == op; lo--)
|
|
|
|
;
|
|
|
|
|
|
|
|
/* Find the last element matching the same opcode */
|
|
|
|
hi = rw;
|
|
|
|
ctl3_max = ctl3_rewriters + ctl3_rsize;
|
|
|
|
for ( ; (hi + 1) < ctl3_max && (hi + 1)->opcode == op; hi++)
|
|
|
|
;
|
|
|
|
|
|
|
|
*plo = lo;
|
|
|
|
*phi = hi;
|
|
|
|
|
|
|
|
return (0);
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
/*
|
|
|
|
* Finds opcode object rewriter based on @code.
|
|
|
|
*
|
|
|
|
* Returns pointer to handler or NULL.
|
|
|
|
*/
|
|
|
|
static struct opcode_obj_rewrite *
|
|
|
|
find_op_rw(ipfw_insn *cmd, uint16_t *puidx, uint8_t *ptype)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw, *lo, *hi;
|
|
|
|
uint16_t uidx;
|
|
|
|
uint8_t subtype;
|
|
|
|
|
|
|
|
if (find_op_rw_range(cmd->opcode, &lo, &hi) != 0)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
for (rw = lo; rw <= hi; rw++) {
|
|
|
|
if (rw->classifier(cmd, &uidx, &subtype) == 0) {
|
|
|
|
if (puidx != NULL)
|
|
|
|
*puidx = uidx;
|
|
|
|
if (ptype != NULL)
|
|
|
|
*ptype = subtype;
|
|
|
|
return (rw);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
2015-04-27 08:29:39 +00:00
|
|
|
int
|
|
|
|
classify_opcode_kidx(ipfw_insn *cmd, uint16_t *puidx)
|
|
|
|
{
|
|
|
|
|
2017-02-22 02:35:59 +00:00
|
|
|
if (find_op_rw(cmd, puidx, NULL) == NULL)
|
2015-04-27 08:29:39 +00:00
|
|
|
return (1);
|
2016-04-14 21:31:16 +00:00
|
|
|
return (0);
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
update_opcode_kidx(ipfw_insn *cmd, uint16_t idx)
|
|
|
|
{
|
|
|
|
struct opcode_obj_rewrite *rw;
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
rw = find_op_rw(cmd, NULL, NULL);
|
2015-04-27 08:29:39 +00:00
|
|
|
KASSERT(rw != NULL, ("No handler to update opcode %d", cmd->opcode));
|
|
|
|
rw->update(cmd, idx);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_init_obj_rewriter()
|
|
|
|
{
|
|
|
|
|
|
|
|
ctl3_rewriters = NULL;
|
|
|
|
ctl3_rsize = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_destroy_obj_rewriter()
|
|
|
|
{
|
|
|
|
|
|
|
|
if (ctl3_rewriters != NULL)
|
|
|
|
free(ctl3_rewriters, M_IPFW);
|
|
|
|
ctl3_rewriters = NULL;
|
|
|
|
ctl3_rsize = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Adds one or more opcode object rewrite handlers to the global array.
|
|
|
|
* Function may sleep.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_add_obj_rewriter(struct opcode_obj_rewrite *rw, size_t count)
|
|
|
|
{
|
|
|
|
size_t sz;
|
|
|
|
struct opcode_obj_rewrite *tmp;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
sz = ctl3_rsize + count;
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
tmp = malloc(sizeof(*rw) * sz, M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
CTL3_LOCK();
|
|
|
|
if (ctl3_rsize + count <= sz)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/* Retry */
|
|
|
|
free(tmp, M_IPFW);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Merge old & new arrays */
|
|
|
|
sz = ctl3_rsize + count;
|
|
|
|
memcpy(tmp, ctl3_rewriters, ctl3_rsize * sizeof(*rw));
|
|
|
|
memcpy(&tmp[ctl3_rsize], rw, count * sizeof(*rw));
|
|
|
|
qsort(tmp, sz, sizeof(*rw), compare_opcodes);
|
|
|
|
/* Switch new and free old */
|
|
|
|
if (ctl3_rewriters != NULL)
|
|
|
|
free(ctl3_rewriters, M_IPFW);
|
|
|
|
ctl3_rewriters = tmp;
|
|
|
|
ctl3_rsize = sz;
|
|
|
|
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Removes one or more object rewrite handlers from the global array.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_del_obj_rewriter(struct opcode_obj_rewrite *rw, size_t count)
|
|
|
|
{
|
|
|
|
size_t sz;
|
2016-04-14 21:31:16 +00:00
|
|
|
struct opcode_obj_rewrite *ctl3_max, *ktmp, *lo, *hi;
|
2015-04-27 08:29:39 +00:00
|
|
|
int i;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
|
|
|
|
for (i = 0; i < count; i++) {
|
2016-04-14 21:31:16 +00:00
|
|
|
if (find_op_rw_range(rw[i].opcode, &lo, &hi) != 0)
|
2015-04-27 08:29:39 +00:00
|
|
|
continue;
|
|
|
|
|
2016-04-14 21:31:16 +00:00
|
|
|
for (ktmp = lo; ktmp <= hi; ktmp++) {
|
|
|
|
if (ktmp->classifier != rw[i].classifier)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ctl3_max = ctl3_rewriters + ctl3_rsize;
|
|
|
|
sz = (ctl3_max - (ktmp + 1)) * sizeof(*ktmp);
|
|
|
|
memmove(ktmp, ktmp + 1, sz);
|
|
|
|
ctl3_rsize--;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ctl3_rsize == 0) {
|
|
|
|
if (ctl3_rewriters != NULL)
|
|
|
|
free(ctl3_rewriters, M_IPFW);
|
|
|
|
ctl3_rewriters = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2016-05-06 03:18:51 +00:00
|
|
|
static int
|
2015-11-03 10:21:53 +00:00
|
|
|
export_objhash_ntlv_internal(struct namedobj_instance *ni,
|
|
|
|
struct named_object *no, void *arg)
|
|
|
|
{
|
|
|
|
struct sockopt_data *sd;
|
|
|
|
ipfw_obj_ntlv *ntlv;
|
|
|
|
|
|
|
|
sd = (struct sockopt_data *)arg;
|
|
|
|
ntlv = (ipfw_obj_ntlv *)ipfw_get_sopt_space(sd, sizeof(*ntlv));
|
|
|
|
if (ntlv == NULL)
|
2016-05-06 03:18:51 +00:00
|
|
|
return (ENOMEM);
|
2015-11-03 10:21:53 +00:00
|
|
|
ipfw_export_obj_ntlv(no, ntlv);
|
2016-05-06 03:18:51 +00:00
|
|
|
return (0);
|
2015-11-03 10:21:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lists all service objects.
|
|
|
|
* Data layout (v0)(current):
|
2016-04-14 21:45:18 +00:00
|
|
|
* Request: [ ipfw_obj_lheader ] size = ipfw_obj_lheader.size
|
2015-11-03 10:21:53 +00:00
|
|
|
* Reply: [ ipfw_obj_lheader [ ipfw_obj_ntlv x N ] (optional) ]
|
|
|
|
* Returns 0 on success
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
dump_srvobjects(struct ip_fw_chain *chain, ip_fw3_opheader *op3,
|
|
|
|
struct sockopt_data *sd)
|
|
|
|
{
|
|
|
|
ipfw_obj_lheader *hdr;
|
|
|
|
int count;
|
|
|
|
|
|
|
|
hdr = (ipfw_obj_lheader *)ipfw_get_sopt_header(sd, sizeof(*hdr));
|
|
|
|
if (hdr == NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
IPFW_UH_RLOCK(chain);
|
|
|
|
count = ipfw_objhash_count(CHAIN_TO_SRV(chain));
|
|
|
|
hdr->size = sizeof(ipfw_obj_lheader) + count * sizeof(ipfw_obj_ntlv);
|
|
|
|
if (sd->valsize < hdr->size) {
|
|
|
|
IPFW_UH_RUNLOCK(chain);
|
|
|
|
return (ENOMEM);
|
|
|
|
}
|
|
|
|
hdr->count = count;
|
|
|
|
hdr->objsize = sizeof(ipfw_obj_ntlv);
|
|
|
|
if (count > 0)
|
|
|
|
ipfw_objhash_foreach(CHAIN_TO_SRV(chain),
|
|
|
|
export_objhash_ntlv_internal, sd);
|
|
|
|
IPFW_UH_RUNLOCK(chain);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-09-05 11:11:15 +00:00
|
|
|
/*
|
|
|
|
* Compares two sopt handlers (code, version and handler ptr).
|
|
|
|
* Used both as qsort() and bsearch().
|
|
|
|
* Does not compare handler for latter case.
|
|
|
|
*
|
|
|
|
* Returns 0 if match is found.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
compare_sh(const void *_a, const void *_b)
|
|
|
|
{
|
2014-10-04 13:46:10 +00:00
|
|
|
const struct ipfw_sopt_handler *a, *b;
|
2014-09-05 11:11:15 +00:00
|
|
|
|
2014-10-04 13:46:10 +00:00
|
|
|
a = (const struct ipfw_sopt_handler *)_a;
|
|
|
|
b = (const struct ipfw_sopt_handler *)_b;
|
2014-09-05 11:11:15 +00:00
|
|
|
|
|
|
|
if (a->opcode < b->opcode)
|
|
|
|
return (-1);
|
|
|
|
else if (a->opcode > b->opcode)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
if (a->version < b->version)
|
|
|
|
return (-1);
|
|
|
|
else if (a->version > b->version)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
/* bsearch helper */
|
|
|
|
if (a->handler == NULL)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
if ((uintptr_t)a->handler < (uintptr_t)b->handler)
|
|
|
|
return (-1);
|
2016-02-18 19:05:30 +00:00
|
|
|
else if ((uintptr_t)a->handler > (uintptr_t)b->handler)
|
2014-09-05 11:11:15 +00:00
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Finds sopt handler based on @code and @version.
|
|
|
|
*
|
|
|
|
* Returns pointer to handler or NULL.
|
|
|
|
*/
|
|
|
|
static struct ipfw_sopt_handler *
|
2015-05-19 16:51:30 +00:00
|
|
|
find_sh(uint16_t code, uint8_t version, sopt_handler_f *handler)
|
2014-09-05 11:11:15 +00:00
|
|
|
{
|
|
|
|
struct ipfw_sopt_handler *sh, h;
|
|
|
|
|
|
|
|
memset(&h, 0, sizeof(h));
|
|
|
|
h.opcode = code;
|
|
|
|
h.version = version;
|
|
|
|
h.handler = handler;
|
|
|
|
|
|
|
|
sh = (struct ipfw_sopt_handler *)bsearch(&h, ctl3_handlers,
|
|
|
|
ctl3_hsize, sizeof(h), compare_sh);
|
|
|
|
|
|
|
|
return (sh);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
find_ref_sh(uint16_t opcode, uint8_t version, struct ipfw_sopt_handler *psh)
|
|
|
|
{
|
|
|
|
struct ipfw_sopt_handler *sh;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
if ((sh = find_sh(opcode, version, NULL)) == NULL) {
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
printf("ipfw: ipfw_ctl3 invalid option %d""v""%d\n",
|
|
|
|
opcode, version);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
sh->refcnt++;
|
|
|
|
ctl3_refct++;
|
|
|
|
/* Copy handler data to requested buffer */
|
|
|
|
*psh = *sh;
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
find_unref_sh(struct ipfw_sopt_handler *psh)
|
|
|
|
{
|
|
|
|
struct ipfw_sopt_handler *sh;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
sh = find_sh(psh->opcode, psh->version, NULL);
|
|
|
|
KASSERT(sh != NULL, ("ctl3 handler disappeared"));
|
|
|
|
sh->refcnt--;
|
|
|
|
ctl3_refct--;
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_init_sopt_handler()
|
|
|
|
{
|
|
|
|
|
|
|
|
CTL3_LOCK_INIT();
|
|
|
|
IPFW_ADD_SOPT_HANDLER(1, scodes);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_destroy_sopt_handler()
|
|
|
|
{
|
|
|
|
|
|
|
|
IPFW_DEL_SOPT_HANDLER(1, scodes);
|
|
|
|
CTL3_LOCK_DESTROY();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Adds one or more sockopt handlers to the global array.
|
|
|
|
* Function may sleep.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_add_sopt_handler(struct ipfw_sopt_handler *sh, size_t count)
|
|
|
|
{
|
|
|
|
size_t sz;
|
|
|
|
struct ipfw_sopt_handler *tmp;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
sz = ctl3_hsize + count;
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
tmp = malloc(sizeof(*sh) * sz, M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
CTL3_LOCK();
|
|
|
|
if (ctl3_hsize + count <= sz)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/* Retry */
|
|
|
|
free(tmp, M_IPFW);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Merge old & new arrays */
|
|
|
|
sz = ctl3_hsize + count;
|
|
|
|
memcpy(tmp, ctl3_handlers, ctl3_hsize * sizeof(*sh));
|
|
|
|
memcpy(&tmp[ctl3_hsize], sh, count * sizeof(*sh));
|
|
|
|
qsort(tmp, sz, sizeof(*sh), compare_sh);
|
|
|
|
/* Switch new and free old */
|
|
|
|
if (ctl3_handlers != NULL)
|
|
|
|
free(ctl3_handlers, M_IPFW);
|
|
|
|
ctl3_handlers = tmp;
|
|
|
|
ctl3_hsize = sz;
|
|
|
|
ctl3_gencnt++;
|
|
|
|
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Removes one or more sockopt handlers from the global array.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_del_sopt_handler(struct ipfw_sopt_handler *sh, size_t count)
|
|
|
|
{
|
|
|
|
size_t sz;
|
|
|
|
struct ipfw_sopt_handler *tmp, *h;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
CTL3_LOCK();
|
|
|
|
|
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
tmp = &sh[i];
|
|
|
|
h = find_sh(tmp->opcode, tmp->version, tmp->handler);
|
|
|
|
if (h == NULL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
sz = (ctl3_handlers + ctl3_hsize - (h + 1)) * sizeof(*h);
|
|
|
|
memmove(h, h + 1, sz);
|
|
|
|
ctl3_hsize--;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ctl3_hsize == 0) {
|
|
|
|
if (ctl3_handlers != NULL)
|
|
|
|
free(ctl3_handlers, M_IPFW);
|
|
|
|
ctl3_handlers = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
ctl3_gencnt++;
|
|
|
|
|
|
|
|
CTL3_UNLOCK();
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-08-07 22:08:43 +00:00
|
|
|
/*
|
|
|
|
* Writes data accumulated in @sd to sockopt buffer.
|
|
|
|
* Zeroes internal @sd buffer.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
ipfw_flush_sopt_data(struct sockopt_data *sd)
|
|
|
|
{
|
2014-10-20 11:21:07 +00:00
|
|
|
struct sockopt *sopt;
|
2014-08-07 22:08:43 +00:00
|
|
|
int error;
|
|
|
|
size_t sz;
|
|
|
|
|
2014-10-20 11:21:07 +00:00
|
|
|
sz = sd->koff;
|
|
|
|
if (sz == 0)
|
2014-08-07 22:08:43 +00:00
|
|
|
return (0);
|
|
|
|
|
2014-10-20 11:21:07 +00:00
|
|
|
sopt = sd->sopt;
|
|
|
|
|
|
|
|
if (sopt->sopt_dir == SOPT_GET) {
|
|
|
|
error = copyout(sd->kbuf, sopt->sopt_val, sz);
|
2014-08-07 22:08:43 +00:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(sd->kbuf, 0, sd->ksize);
|
2014-10-20 11:21:07 +00:00
|
|
|
sd->ktotal += sz;
|
2014-08-07 22:08:43 +00:00
|
|
|
sd->koff = 0;
|
|
|
|
if (sd->ktotal + sd->ksize < sd->valsize)
|
|
|
|
sd->kavail = sd->ksize;
|
|
|
|
else
|
|
|
|
sd->kavail = sd->valsize - sd->ktotal;
|
|
|
|
|
2014-10-20 11:21:07 +00:00
|
|
|
/* Update sopt buffer data */
|
|
|
|
sopt->sopt_valsize = sd->ktotal;
|
|
|
|
sopt->sopt_val = sd->sopt_val + sd->ktotal;
|
2014-08-07 22:08:43 +00:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-05-03 18:05:43 +00:00
|
|
|
* Ensures that @sd buffer has contiguous @neeeded number of
|
2014-08-07 22:08:43 +00:00
|
|
|
* bytes.
|
|
|
|
*
|
|
|
|
* Returns pointer to requested space or NULL.
|
|
|
|
*/
|
|
|
|
caddr_t
|
|
|
|
ipfw_get_sopt_space(struct sockopt_data *sd, size_t needed)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
caddr_t addr;
|
|
|
|
|
|
|
|
if (sd->kavail < needed) {
|
|
|
|
/*
|
|
|
|
* Flush data and try another time.
|
|
|
|
*/
|
|
|
|
error = ipfw_flush_sopt_data(sd);
|
|
|
|
|
|
|
|
if (sd->kavail < needed || error != 0)
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
addr = sd->kbuf + sd->koff;
|
|
|
|
sd->koff += needed;
|
|
|
|
sd->kavail -= needed;
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-05-03 18:05:43 +00:00
|
|
|
* Requests @needed contiguous bytes from @sd buffer.
|
2014-08-07 22:08:43 +00:00
|
|
|
* Function is used to notify subsystem that we are
|
|
|
|
* interesed in first @needed bytes (request header)
|
|
|
|
* and the rest buffer can be safely zeroed.
|
|
|
|
*
|
|
|
|
* Returns pointer to requested space or NULL.
|
|
|
|
*/
|
|
|
|
caddr_t
|
|
|
|
ipfw_get_sopt_header(struct sockopt_data *sd, size_t needed)
|
|
|
|
{
|
|
|
|
caddr_t addr;
|
|
|
|
|
|
|
|
if ((addr = ipfw_get_sopt_space(sd, needed)) == NULL)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
if (sd->kavail > 0)
|
|
|
|
memset(sd->kbuf + sd->koff, 0, sd->kavail);
|
|
|
|
|
|
|
|
return (addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* New sockopt handler.
|
|
|
|
*/
|
2009-12-15 21:24:12 +00:00
|
|
|
int
|
2014-07-29 23:06:06 +00:00
|
|
|
ipfw_ctl3(struct sockopt *sopt)
|
2009-12-15 21:24:12 +00:00
|
|
|
{
|
2014-10-09 12:37:53 +00:00
|
|
|
int error, locked;
|
2014-09-05 11:11:15 +00:00
|
|
|
size_t size, valsize;
|
2009-12-22 13:53:34 +00:00
|
|
|
struct ip_fw_chain *chain;
|
* Add new "flow" table type to support N=1..5-tuple lookups
* Add "flow:hash" algorithm
Kernel changes:
* Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups
* Add IPFW_TABLE_FLOW table type
* Add "struct tflow_entry" as strage for 6-tuple flows
* Add "flow:hash" algorithm. Basically it is auto-growing chained hash table.
Additionally, we store mask of fields we need to compare in each instance/
* Increase ipfw_obj_tentry size by adding struct tflow_entry
* Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info
* Increase algoname length: 32 -> 64 (algo options passed there as string)
* Assume every table type can be customized by flags, use u8 to store "tflags" field.
* Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback.
* Fix bug in cidr:chash resize procedure.
Userland changes:
* add "flow table(NAME)" syntax to support n-tuple checking tables.
* make fill_flags() separate function to ease working with _s_x arrays
* change "table info" output to reflect longer "type" fields
Syntax:
ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash]
Examples:
0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash
0:02 [2] zfscurr0# ipfw table fl2 info
+++ table(fl2), set(0) +++
kindex: 0, type: flow:src-ip,proto,dst-port
valtype: number, references: 0
algorithm: flow:hash
items: 0, size: 280
0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000
0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000
0:02 [2] zfscurr0# ipfw table fl2 list
+++ table(fl2), set(0) +++
2a02:6b8::333,6,443 45000
10.0.0.92,6,80 22000
0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)'
00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
0:03 [2] zfscurr0# ipfw show
00200 0 0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 617 59416 allow ip from any to any
0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80
Trying 78.46.89.105...
..
0:04 [2] zfscurr0# ipfw show
00200 5 272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 682 66733 allow ip from any to any
2014-07-31 20:08:19 +00:00
|
|
|
char xbuf[256];
|
2014-06-27 10:07:00 +00:00
|
|
|
struct sockopt_data sdata;
|
2014-09-05 11:11:15 +00:00
|
|
|
struct ipfw_sopt_handler h;
|
2012-03-12 14:07:57 +00:00
|
|
|
ip_fw3_opheader *op3 = NULL;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
error = priv_check(sopt->sopt_td, PRIV_NETINET_IPFW);
|
2014-07-29 23:06:06 +00:00
|
|
|
if (error != 0)
|
2009-12-15 21:24:12 +00:00
|
|
|
return (error);
|
2014-07-29 23:06:06 +00:00
|
|
|
|
|
|
|
if (sopt->sopt_name != IP_FW3)
|
2014-08-07 22:08:43 +00:00
|
|
|
return (ipfw_ctl(sopt));
|
2009-12-15 21:24:12 +00:00
|
|
|
|
2014-06-29 22:35:47 +00:00
|
|
|
chain = &V_layer3_chain;
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
/* Save original valsize before it is altered via sooptcopyin() */
|
|
|
|
valsize = sopt->sopt_valsize;
|
|
|
|
memset(&sdata, 0, sizeof(sdata));
|
|
|
|
/* Read op3 header first to determine actual operation */
|
2014-07-29 23:06:06 +00:00
|
|
|
op3 = (ip_fw3_opheader *)xbuf;
|
|
|
|
error = sooptcopyin(sopt, op3, sizeof(*op3), sizeof(*op3));
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
sopt->sopt_valsize = valsize;
|
2014-06-29 22:35:47 +00:00
|
|
|
|
2014-08-11 17:34:25 +00:00
|
|
|
/*
|
2014-09-05 11:11:15 +00:00
|
|
|
* Find and reference command.
|
2014-08-11 17:34:25 +00:00
|
|
|
*/
|
2014-09-05 11:11:15 +00:00
|
|
|
error = find_ref_sh(op3->opcode, op3->version, &h);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2014-08-11 17:34:25 +00:00
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
/*
|
|
|
|
* Disallow modifications in really-really secure mode, but still allow
|
|
|
|
* the logging counters to be reset.
|
|
|
|
*/
|
2014-09-05 11:11:15 +00:00
|
|
|
if ((h.dir & HDIR_SET) != 0 && h.opcode != IP_FW_XRESETLOG) {
|
2009-12-15 21:24:12 +00:00
|
|
|
error = securelevel_ge(sopt->sopt_td->td_ucred, 3);
|
2014-09-05 11:11:15 +00:00
|
|
|
if (error != 0) {
|
|
|
|
find_unref_sh(&h);
|
2009-12-15 21:24:12 +00:00
|
|
|
return (error);
|
2014-09-05 11:11:15 +00:00
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
|
2014-07-29 23:06:06 +00:00
|
|
|
/*
|
|
|
|
* Fill in sockopt_data structure that may be useful for
|
|
|
|
* IP_FW3 get requests.
|
|
|
|
*/
|
2014-10-09 12:37:53 +00:00
|
|
|
locked = 0;
|
2014-07-29 23:06:06 +00:00
|
|
|
if (valsize <= sizeof(xbuf)) {
|
2014-09-05 11:11:15 +00:00
|
|
|
/* use on-stack buffer */
|
2014-07-29 23:06:06 +00:00
|
|
|
sdata.kbuf = xbuf;
|
|
|
|
sdata.ksize = sizeof(xbuf);
|
|
|
|
sdata.kavail = valsize;
|
|
|
|
} else {
|
2014-09-05 11:11:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine opcode type/buffer size:
|
|
|
|
* allocate sliding-window buf for data export or
|
2016-05-03 18:05:43 +00:00
|
|
|
* contiguous buffer for special ops.
|
2014-09-05 11:11:15 +00:00
|
|
|
*/
|
|
|
|
if ((h.dir & HDIR_SET) != 0) {
|
|
|
|
/* Set request. Allocate contigous buffer. */
|
|
|
|
if (valsize > CTL3_LARGEBUF) {
|
|
|
|
find_unref_sh(&h);
|
|
|
|
return (EFBIG);
|
|
|
|
}
|
|
|
|
|
2014-07-29 23:06:06 +00:00
|
|
|
size = valsize;
|
2014-09-05 11:11:15 +00:00
|
|
|
} else {
|
|
|
|
/* Get request. Allocate sliding window buffer */
|
|
|
|
size = (valsize<CTL3_SMALLBUF) ? valsize:CTL3_SMALLBUF;
|
2014-10-09 12:37:53 +00:00
|
|
|
|
|
|
|
if (size < valsize) {
|
|
|
|
/* We have to wire user buffer */
|
|
|
|
error = vslock(sopt->sopt_val, valsize);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
locked = 1;
|
|
|
|
}
|
2014-09-05 11:11:15 +00:00
|
|
|
}
|
2014-06-29 22:35:47 +00:00
|
|
|
|
2014-07-29 23:06:06 +00:00
|
|
|
sdata.kbuf = malloc(size, M_TEMP, M_WAITOK | M_ZERO);
|
|
|
|
sdata.ksize = size;
|
|
|
|
sdata.kavail = size;
|
|
|
|
}
|
|
|
|
|
|
|
|
sdata.sopt = sopt;
|
2014-08-02 17:18:47 +00:00
|
|
|
sdata.sopt_val = sopt->sopt_val;
|
2014-07-29 23:06:06 +00:00
|
|
|
sdata.valsize = valsize;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Copy either all request (if valsize < bsize_max)
|
|
|
|
* or first bsize_max bytes to guarantee most consumers
|
|
|
|
* that all necessary data has been copied).
|
|
|
|
* Anyway, copy not less than sizeof(ip_fw3_opheader).
|
|
|
|
*/
|
|
|
|
if ((error = sooptcopyin(sopt, sdata.kbuf, sdata.ksize,
|
|
|
|
sizeof(ip_fw3_opheader))) != 0)
|
|
|
|
return (error);
|
|
|
|
op3 = (ip_fw3_opheader *)sdata.kbuf;
|
|
|
|
|
2014-09-05 11:11:15 +00:00
|
|
|
/* Finally, run handler */
|
|
|
|
error = h.handler(chain, op3, &sdata);
|
|
|
|
find_unref_sh(&h);
|
2014-06-27 10:07:00 +00:00
|
|
|
|
2014-07-29 23:06:06 +00:00
|
|
|
/* Flush state and free buffers */
|
|
|
|
if (error == 0)
|
|
|
|
error = ipfw_flush_sopt_data(&sdata);
|
|
|
|
else
|
|
|
|
ipfw_flush_sopt_data(&sdata);
|
|
|
|
|
2014-10-09 12:37:53 +00:00
|
|
|
if (locked != 0)
|
|
|
|
vsunlock(sdata.sopt_val, valsize);
|
|
|
|
|
2014-08-02 17:18:47 +00:00
|
|
|
/* Restore original pointer and set number of bytes written */
|
|
|
|
sopt->sopt_val = sdata.sopt_val;
|
|
|
|
sopt->sopt_valsize = sdata.ktotal;
|
2014-07-29 23:06:06 +00:00
|
|
|
if (sdata.kbuf != xbuf)
|
|
|
|
free(sdata.kbuf, M_TEMP);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* {set|get}sockopt parser.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_ctl(struct sockopt *sopt)
|
|
|
|
{
|
2014-10-04 10:15:49 +00:00
|
|
|
#define RULE_MAXSIZE (512*sizeof(u_int32_t))
|
2014-07-29 23:06:06 +00:00
|
|
|
int error;
|
|
|
|
size_t size, valsize;
|
|
|
|
struct ip_fw *buf;
|
|
|
|
struct ip_fw_rule0 *rule;
|
|
|
|
struct ip_fw_chain *chain;
|
|
|
|
u_int32_t rulenum[2];
|
|
|
|
uint32_t opt;
|
|
|
|
struct rule_check_info ci;
|
2014-10-04 11:40:35 +00:00
|
|
|
IPFW_RLOCK_TRACKER;
|
2014-07-29 23:06:06 +00:00
|
|
|
|
|
|
|
chain = &V_layer3_chain;
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
/* Save original valsize before it is altered via sooptcopyin() */
|
|
|
|
valsize = sopt->sopt_valsize;
|
|
|
|
opt = sopt->sopt_name;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Disallow modifications in really-really secure mode, but still allow
|
|
|
|
* the logging counters to be reset.
|
|
|
|
*/
|
|
|
|
if (opt == IP_FW_ADD ||
|
|
|
|
(sopt->sopt_dir == SOPT_SET && opt != IP_FW_RESETLOG)) {
|
|
|
|
error = securelevel_ge(sopt->sopt_td->td_ucred, 3);
|
|
|
|
if (error != 0)
|
2012-03-12 14:07:57 +00:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (opt) {
|
2009-12-15 21:24:12 +00:00
|
|
|
case IP_FW_GET:
|
|
|
|
/*
|
|
|
|
* pass up a copy of the current rules. Static rules
|
|
|
|
* come first (the last of which has number IPFW_DEFAULT_RULE),
|
|
|
|
* followed by a possibly empty list of dynamic rule.
|
|
|
|
* The last dynamic rule has NULL in the "next" field.
|
|
|
|
*
|
|
|
|
* Note that the calculated size is used to bound the
|
|
|
|
* amount of data returned to the user. The rule set may
|
|
|
|
* change between calculating the size and returning the
|
|
|
|
* data in which case we'll just return what fits.
|
|
|
|
*/
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
for (;;) {
|
|
|
|
int len = 0, want;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
size = chain->static_len;
|
|
|
|
size += ipfw_dyn_len();
|
2009-12-24 17:35:28 +00:00
|
|
|
if (size >= sopt->sopt_valsize)
|
|
|
|
break;
|
2014-07-08 23:11:15 +00:00
|
|
|
buf = malloc(size, M_TEMP, M_WAITOK | M_ZERO);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_RLOCK(chain);
|
|
|
|
/* check again how much space we need */
|
|
|
|
want = chain->static_len + ipfw_dyn_len();
|
|
|
|
if (size >= want)
|
|
|
|
len = ipfw_getrules(chain, buf, size);
|
|
|
|
IPFW_UH_RUNLOCK(chain);
|
|
|
|
if (size >= want)
|
|
|
|
error = sooptcopyout(sopt, buf, len);
|
2009-12-24 17:35:28 +00:00
|
|
|
free(buf, M_TEMP);
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
if (size >= want)
|
|
|
|
break;
|
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_FLUSH:
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
/* locking is done within del_entry() */
|
|
|
|
error = del_entry(chain, 0); /* special case, rule=0, cmd=0 means all */
|
2009-12-15 21:24:12 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_ADD:
|
|
|
|
rule = malloc(RULE_MAXSIZE, M_TEMP, M_WAITOK);
|
|
|
|
error = sooptcopyin(sopt, rule, RULE_MAXSIZE,
|
2010-03-10 14:21:05 +00:00
|
|
|
sizeof(struct ip_fw7) );
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
memset(&ci, 0, sizeof(struct rule_check_info));
|
|
|
|
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
/*
|
|
|
|
* If the size of commands equals RULESIZE7 then we assume
|
|
|
|
* a FreeBSD7.2 binary is talking to us (set is7=1).
|
|
|
|
* is7 is persistent so the next 'ipfw list' command
|
|
|
|
* will use this format.
|
|
|
|
* NOTE: If wrong version is guessed (this can happen if
|
|
|
|
* the first ipfw command is 'ipfw [pipe] list')
|
|
|
|
* the ipfw binary may crash or loop infinitly...
|
|
|
|
*/
|
2014-07-08 23:11:15 +00:00
|
|
|
size = sopt->sopt_valsize;
|
|
|
|
if (size == RULESIZE7(rule)) {
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
is7 = 1;
|
|
|
|
error = convert_rule_to_8(rule);
|
2014-04-13 21:13:33 +00:00
|
|
|
if (error) {
|
|
|
|
free(rule, M_TEMP);
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
return error;
|
2014-04-13 21:13:33 +00:00
|
|
|
}
|
2014-07-08 23:11:15 +00:00
|
|
|
size = RULESIZE(rule);
|
|
|
|
} else
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
is7 = 0;
|
2009-12-15 21:24:12 +00:00
|
|
|
if (error == 0)
|
2014-07-08 23:11:15 +00:00
|
|
|
error = check_ipfw_rule0(rule, size, &ci);
|
2009-12-15 21:24:12 +00:00
|
|
|
if (error == 0) {
|
2014-06-12 09:59:11 +00:00
|
|
|
/* locking is done within add_rule() */
|
2014-06-29 22:35:47 +00:00
|
|
|
struct ip_fw *krule;
|
2014-07-08 23:11:15 +00:00
|
|
|
krule = ipfw_alloc_rule(chain, RULEKSIZE0(rule));
|
|
|
|
ci.urule = (caddr_t)rule;
|
2014-06-29 22:35:47 +00:00
|
|
|
ci.krule = krule;
|
2014-07-08 23:11:15 +00:00
|
|
|
import_rule0(&ci);
|
2014-06-29 22:35:47 +00:00
|
|
|
error = commit_rules(chain, &ci, 1);
|
2016-05-11 10:04:32 +00:00
|
|
|
if (error != 0)
|
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
|
|
|
ipfw_free_rule(ci.krule);
|
2016-05-11 10:04:32 +00:00
|
|
|
else if (sopt->sopt_dir == SOPT_GET) {
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
if (is7) {
|
|
|
|
error = convert_rule_to_7(rule);
|
|
|
|
size = RULESIZE7(rule);
|
2014-04-13 21:13:33 +00:00
|
|
|
if (error) {
|
|
|
|
free(rule, M_TEMP);
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
return error;
|
2014-04-13 21:13:33 +00:00
|
|
|
}
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
error = sooptcopyout(sopt, rule, size);
|
2014-04-13 21:13:33 +00:00
|
|
|
}
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
}
|
2009-12-15 21:24:12 +00:00
|
|
|
free(rule, M_TEMP);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_DEL:
|
|
|
|
/*
|
|
|
|
* IP_FW_DEL is used for deleting single rules or sets,
|
|
|
|
* and (ab)used to atomically manipulate sets. Argument size
|
|
|
|
* is used to distinguish between the two:
|
|
|
|
* sizeof(u_int32_t)
|
|
|
|
* delete single rule or set of rules,
|
|
|
|
* or reassign rules (or sets) to a different set.
|
|
|
|
* 2*sizeof(u_int32_t)
|
|
|
|
* atomic disable/enable sets.
|
|
|
|
* first u_int32_t contains sets to be disabled,
|
|
|
|
* second u_int32_t contains sets to be enabled.
|
|
|
|
*/
|
|
|
|
error = sooptcopyin(sopt, rulenum,
|
|
|
|
2*sizeof(u_int32_t), sizeof(u_int32_t));
|
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
size = sopt->sopt_valsize;
|
2009-12-22 13:53:34 +00:00
|
|
|
if (size == sizeof(u_int32_t) && rulenum[0] != 0) {
|
|
|
|
/* delete or reassign, locking done in del_entry() */
|
|
|
|
error = del_entry(chain, rulenum[0]);
|
|
|
|
} else if (size == 2*sizeof(u_int32_t)) { /* set enable/disable */
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_WLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
V_set_disable =
|
|
|
|
(V_set_disable | rulenum[0]) & ~rulenum[1] &
|
|
|
|
~(1<<RESVD_SET); /* set RESVD_SET always enabled */
|
merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
2009-12-22 19:01:47 +00:00
|
|
|
IPFW_UH_WUNLOCK(chain);
|
2009-12-22 13:53:34 +00:00
|
|
|
} else
|
2009-12-15 21:24:12 +00:00
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_ZERO:
|
|
|
|
case IP_FW_RESETLOG: /* argument is an u_int_32, the rule number */
|
|
|
|
rulenum[0] = 0;
|
|
|
|
if (sopt->sopt_val != 0) {
|
|
|
|
error = sooptcopyin(sopt, rulenum,
|
|
|
|
sizeof(u_int32_t), sizeof(u_int32_t));
|
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
}
|
2009-12-22 13:53:34 +00:00
|
|
|
error = zero_entry(chain, rulenum[0],
|
2009-12-15 21:24:12 +00:00
|
|
|
sopt->sopt_name == IP_FW_RESETLOG);
|
|
|
|
break;
|
|
|
|
|
2014-06-15 13:40:27 +00:00
|
|
|
/*--- TABLE opcodes ---*/
|
|
|
|
case IP_FW_TABLE_ADD:
|
|
|
|
case IP_FW_TABLE_DEL:
|
|
|
|
{
|
|
|
|
ipfw_table_entry ent;
|
|
|
|
struct tentry_info tei;
|
|
|
|
struct tid_info ti;
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
struct table_value v;
|
2014-06-15 13:40:27 +00:00
|
|
|
|
|
|
|
error = sooptcopyin(sopt, &ent,
|
|
|
|
sizeof(ent), sizeof(ent));
|
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
|
|
|
|
memset(&tei, 0, sizeof(tei));
|
|
|
|
tei.paddr = &ent.addr;
|
2014-07-03 22:25:59 +00:00
|
|
|
tei.subtype = AF_INET;
|
2014-06-15 13:40:27 +00:00
|
|
|
tei.masklen = ent.masklen;
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
ipfw_import_table_value_legacy(ent.value, &v);
|
|
|
|
tei.pvalue = &v;
|
2014-06-15 13:40:27 +00:00
|
|
|
memset(&ti, 0, sizeof(ti));
|
|
|
|
ti.uidx = ent.tbl;
|
|
|
|
ti.type = IPFW_TABLE_CIDR;
|
|
|
|
|
|
|
|
error = (opt == IP_FW_TABLE_ADD) ?
|
2014-08-11 17:34:25 +00:00
|
|
|
add_table_entry(chain, &ti, &tei, 0, 1) :
|
|
|
|
del_table_entry(chain, &ti, &tei, 0, 1);
|
2014-06-15 13:40:27 +00:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
|
2009-12-15 21:24:12 +00:00
|
|
|
case IP_FW_TABLE_FLUSH:
|
|
|
|
{
|
|
|
|
u_int16_t tbl;
|
2014-06-12 09:59:11 +00:00
|
|
|
struct tid_info ti;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
error = sooptcopyin(sopt, &tbl,
|
|
|
|
sizeof(tbl), sizeof(tbl));
|
|
|
|
if (error)
|
|
|
|
break;
|
2014-06-12 09:59:11 +00:00
|
|
|
memset(&ti, 0, sizeof(ti));
|
|
|
|
ti.uidx = tbl;
|
2014-07-04 07:02:11 +00:00
|
|
|
error = flush_table(chain, &ti);
|
2009-12-15 21:24:12 +00:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_TABLE_GETSIZE:
|
|
|
|
{
|
|
|
|
u_int32_t tbl, cnt;
|
2014-06-12 09:59:11 +00:00
|
|
|
struct tid_info ti;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
if ((error = sooptcopyin(sopt, &tbl, sizeof(tbl),
|
|
|
|
sizeof(tbl))))
|
|
|
|
break;
|
2014-06-12 09:59:11 +00:00
|
|
|
memset(&ti, 0, sizeof(ti));
|
|
|
|
ti.uidx = tbl;
|
2009-12-22 13:53:34 +00:00
|
|
|
IPFW_RLOCK(chain);
|
2014-06-12 09:59:11 +00:00
|
|
|
error = ipfw_count_table(chain, &ti, &cnt);
|
2009-12-22 13:53:34 +00:00
|
|
|
IPFW_RUNLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
error = sooptcopyout(sopt, &cnt, sizeof(cnt));
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_TABLE_LIST:
|
|
|
|
{
|
|
|
|
ipfw_table *tbl;
|
2014-06-12 09:59:11 +00:00
|
|
|
struct tid_info ti;
|
2009-12-15 21:24:12 +00:00
|
|
|
|
|
|
|
if (sopt->sopt_valsize < sizeof(*tbl)) {
|
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
size = sopt->sopt_valsize;
|
|
|
|
tbl = malloc(size, M_TEMP, M_WAITOK);
|
|
|
|
error = sooptcopyin(sopt, tbl, size, sizeof(*tbl));
|
|
|
|
if (error) {
|
|
|
|
free(tbl, M_TEMP);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
tbl->size = (size - sizeof(*tbl)) /
|
|
|
|
sizeof(ipfw_table_entry);
|
2014-06-12 09:59:11 +00:00
|
|
|
memset(&ti, 0, sizeof(ti));
|
|
|
|
ti.uidx = tbl->tbl;
|
2009-12-22 13:53:34 +00:00
|
|
|
IPFW_RLOCK(chain);
|
2014-06-14 22:47:25 +00:00
|
|
|
error = ipfw_dump_table_legacy(chain, &ti, tbl);
|
2009-12-22 13:53:34 +00:00
|
|
|
IPFW_RUNLOCK(chain);
|
2009-12-15 21:24:12 +00:00
|
|
|
if (error) {
|
|
|
|
free(tbl, M_TEMP);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
error = sooptcopyout(sopt, tbl, size);
|
|
|
|
free(tbl, M_TEMP);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2009-12-22 13:53:34 +00:00
|
|
|
/*--- NAT operations are protected by the IPFW_LOCK ---*/
|
2009-12-15 21:24:12 +00:00
|
|
|
case IP_FW_NAT_CFG:
|
|
|
|
if (IPFW_NAT_LOADED)
|
|
|
|
error = ipfw_nat_cfg_ptr(sopt);
|
|
|
|
else {
|
|
|
|
printf("IP_FW_NAT_CFG: %s\n",
|
|
|
|
"ipfw_nat not present, please load it");
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_NAT_DEL:
|
|
|
|
if (IPFW_NAT_LOADED)
|
|
|
|
error = ipfw_nat_del_ptr(sopt);
|
|
|
|
else {
|
|
|
|
printf("IP_FW_NAT_DEL: %s\n",
|
|
|
|
"ipfw_nat not present, please load it");
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_NAT_GET_CONFIG:
|
|
|
|
if (IPFW_NAT_LOADED)
|
|
|
|
error = ipfw_nat_get_cfg_ptr(sopt);
|
|
|
|
else {
|
|
|
|
printf("IP_FW_NAT_GET_CFG: %s\n",
|
|
|
|
"ipfw_nat not present, please load it");
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IP_FW_NAT_GET_LOG:
|
|
|
|
if (IPFW_NAT_LOADED)
|
|
|
|
error = ipfw_nat_get_log_ptr(sopt);
|
|
|
|
else {
|
|
|
|
printf("IP_FW_NAT_GET_LOG: %s\n",
|
|
|
|
"ipfw_nat not present, please load it");
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
printf("ipfw: ipfw_ctl invalid option %d\n", sopt->sopt_name);
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
#undef RULE_MAXSIZE
|
|
|
|
}
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
#define RULE_MAXSIZE (256*sizeof(u_int32_t))
|
|
|
|
|
|
|
|
/* Functions to convert rules 7.2 <==> 8.0 */
|
2014-07-08 23:11:15 +00:00
|
|
|
static int
|
|
|
|
convert_rule_to_7(struct ip_fw_rule0 *rule)
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
{
|
|
|
|
/* Used to modify original rule */
|
|
|
|
struct ip_fw7 *rule7 = (struct ip_fw7 *)rule;
|
|
|
|
/* copy of original rule, version 8 */
|
2014-08-09 09:11:26 +00:00
|
|
|
struct ip_fw_rule0 *tmp;
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
|
|
|
/* Used to copy commands */
|
|
|
|
ipfw_insn *ccmd, *dst;
|
|
|
|
int ll = 0, ccmdlen = 0;
|
|
|
|
|
|
|
|
tmp = malloc(RULE_MAXSIZE, M_TEMP, M_NOWAIT | M_ZERO);
|
|
|
|
if (tmp == NULL) {
|
|
|
|
return 1; //XXX error
|
|
|
|
}
|
|
|
|
bcopy(rule, tmp, RULE_MAXSIZE);
|
|
|
|
|
|
|
|
/* Copy fields */
|
2014-07-08 23:11:15 +00:00
|
|
|
//rule7->_pad = tmp->_pad;
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
rule7->set = tmp->set;
|
|
|
|
rule7->rulenum = tmp->rulenum;
|
|
|
|
rule7->cmd_len = tmp->cmd_len;
|
|
|
|
rule7->act_ofs = tmp->act_ofs;
|
|
|
|
rule7->next_rule = (struct ip_fw7 *)tmp->next_rule;
|
|
|
|
rule7->cmd_len = tmp->cmd_len;
|
2014-08-09 09:11:26 +00:00
|
|
|
rule7->pcnt = tmp->pcnt;
|
|
|
|
rule7->bcnt = tmp->bcnt;
|
|
|
|
rule7->timestamp = tmp->timestamp;
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
|
|
|
|
/* Copy commands */
|
|
|
|
for (ll = tmp->cmd_len, ccmd = tmp->cmd, dst = rule7->cmd ;
|
|
|
|
ll > 0 ; ll -= ccmdlen, ccmd += ccmdlen, dst += ccmdlen) {
|
|
|
|
ccmdlen = F_LEN(ccmd);
|
|
|
|
|
|
|
|
bcopy(ccmd, dst, F_LEN(ccmd)*sizeof(uint32_t));
|
2010-03-04 16:52:26 +00:00
|
|
|
|
|
|
|
if (dst->opcode > O_NAT)
|
|
|
|
/* O_REASS doesn't exists in 7.2 version, so
|
|
|
|
* decrement opcode if it is after O_REASS
|
|
|
|
*/
|
|
|
|
dst->opcode--;
|
|
|
|
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
if (ccmdlen > ll) {
|
|
|
|
printf("ipfw: opcode %d size truncated\n",
|
|
|
|
ccmd->opcode);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
free(tmp, M_TEMP);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-07-08 23:11:15 +00:00
|
|
|
static int
|
|
|
|
convert_rule_to_8(struct ip_fw_rule0 *rule)
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
{
|
|
|
|
/* Used to modify original rule */
|
|
|
|
struct ip_fw7 *rule7 = (struct ip_fw7 *) rule;
|
|
|
|
|
|
|
|
/* Used to copy commands */
|
|
|
|
ipfw_insn *ccmd, *dst;
|
|
|
|
int ll = 0, ccmdlen = 0;
|
|
|
|
|
|
|
|
/* Copy of original rule */
|
|
|
|
struct ip_fw7 *tmp = malloc(RULE_MAXSIZE, M_TEMP, M_NOWAIT | M_ZERO);
|
|
|
|
if (tmp == NULL) {
|
|
|
|
return 1; //XXX error
|
|
|
|
}
|
|
|
|
|
|
|
|
bcopy(rule7, tmp, RULE_MAXSIZE);
|
|
|
|
|
|
|
|
for (ll = tmp->cmd_len, ccmd = tmp->cmd, dst = rule->cmd ;
|
|
|
|
ll > 0 ; ll -= ccmdlen, ccmd += ccmdlen, dst += ccmdlen) {
|
|
|
|
ccmdlen = F_LEN(ccmd);
|
|
|
|
|
|
|
|
bcopy(ccmd, dst, F_LEN(ccmd)*sizeof(uint32_t));
|
2010-03-04 16:52:26 +00:00
|
|
|
|
|
|
|
if (dst->opcode > O_NAT)
|
|
|
|
/* O_REASS doesn't exists in 7.2 version, so
|
|
|
|
* increment opcode if it is after O_REASS
|
|
|
|
*/
|
|
|
|
dst->opcode++;
|
|
|
|
|
Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.
The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.
CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
|
|
|
if (ccmdlen > ll) {
|
|
|
|
printf("ipfw: opcode %d size truncated\n",
|
|
|
|
ccmd->opcode);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
rule->_pad = tmp->_pad;
|
|
|
|
rule->set = tmp->set;
|
|
|
|
rule->rulenum = tmp->rulenum;
|
|
|
|
rule->cmd_len = tmp->cmd_len;
|
|
|
|
rule->act_ofs = tmp->act_ofs;
|
|
|
|
rule->next_rule = (struct ip_fw *)tmp->next_rule;
|
|
|
|
rule->cmd_len = tmp->cmd_len;
|
|
|
|
rule->id = 0; /* XXX see if is ok = 0 */
|
|
|
|
rule->pcnt = tmp->pcnt;
|
|
|
|
rule->bcnt = tmp->bcnt;
|
|
|
|
rule->timestamp = tmp->timestamp;
|
|
|
|
|
|
|
|
free (tmp, M_TEMP);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
/*
|
|
|
|
* Named object api
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
void
|
|
|
|
ipfw_init_srv(struct ip_fw_chain *ch)
|
|
|
|
{
|
|
|
|
|
|
|
|
ch->srvmap = ipfw_objhash_create(IPFW_OBJECTS_DEFAULT);
|
|
|
|
ch->srvstate = malloc(sizeof(void *) * IPFW_OBJECTS_DEFAULT,
|
|
|
|
M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_destroy_srv(struct ip_fw_chain *ch)
|
|
|
|
{
|
|
|
|
|
|
|
|
free(ch->srvstate, M_IPFW);
|
|
|
|
ipfw_objhash_destroy(ch->srvmap);
|
|
|
|
}
|
|
|
|
|
2014-06-14 10:58:39 +00:00
|
|
|
/*
|
|
|
|
* Allocate new bitmask which can be used to enlarge/shrink
|
|
|
|
* named instance index.
|
|
|
|
*/
|
2014-06-12 09:59:11 +00:00
|
|
|
void
|
|
|
|
ipfw_objhash_bitmap_alloc(uint32_t items, void **idx, int *pblocks)
|
|
|
|
{
|
|
|
|
size_t size;
|
|
|
|
int max_blocks;
|
2014-08-12 14:19:45 +00:00
|
|
|
u_long *idx_mask;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
KASSERT((items % BLOCK_ITEMS) == 0,
|
2014-10-10 18:57:12 +00:00
|
|
|
("bitmask size needs to power of 2 and greater or equal to %zu",
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
BLOCK_ITEMS));
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
max_blocks = items / BLOCK_ITEMS;
|
|
|
|
size = items / 8;
|
|
|
|
idx_mask = malloc(size * IPFW_MAX_SETS, M_IPFW, M_WAITOK);
|
|
|
|
/* Mark all as free */
|
|
|
|
memset(idx_mask, 0xFF, size * IPFW_MAX_SETS);
|
2014-08-12 14:19:45 +00:00
|
|
|
*idx_mask &= ~(u_long)1; /* Skip index 0 */
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
*idx = idx_mask;
|
|
|
|
*pblocks = max_blocks;
|
|
|
|
}
|
|
|
|
|
2014-06-14 10:58:39 +00:00
|
|
|
/*
|
|
|
|
* Copy current bitmask index to new one.
|
|
|
|
*/
|
|
|
|
void
|
2014-06-12 09:59:11 +00:00
|
|
|
ipfw_objhash_bitmap_merge(struct namedobj_instance *ni, void **idx, int *blocks)
|
|
|
|
{
|
|
|
|
int old_blocks, new_blocks;
|
|
|
|
u_long *old_idx, *new_idx;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
old_idx = ni->idx_mask;
|
|
|
|
old_blocks = ni->max_blocks;
|
|
|
|
new_idx = *idx;
|
|
|
|
new_blocks = *blocks;
|
|
|
|
|
|
|
|
for (i = 0; i < IPFW_MAX_SETS; i++) {
|
|
|
|
memcpy(&new_idx[new_blocks * i], &old_idx[old_blocks * i],
|
|
|
|
old_blocks * sizeof(u_long));
|
|
|
|
}
|
2014-06-14 10:58:39 +00:00
|
|
|
}
|
2014-06-12 09:59:11 +00:00
|
|
|
|
2014-06-14 10:58:39 +00:00
|
|
|
/*
|
|
|
|
* Swaps current @ni index with new one.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ipfw_objhash_bitmap_swap(struct namedobj_instance *ni, void **idx, int *blocks)
|
|
|
|
{
|
|
|
|
int old_blocks;
|
|
|
|
u_long *old_idx;
|
|
|
|
|
|
|
|
old_idx = ni->idx_mask;
|
|
|
|
old_blocks = ni->max_blocks;
|
|
|
|
|
|
|
|
ni->idx_mask = *idx;
|
|
|
|
ni->max_blocks = *blocks;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
/* Save old values */
|
|
|
|
*idx = old_idx;
|
|
|
|
*blocks = old_blocks;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_objhash_bitmap_free(void *idx, int blocks)
|
|
|
|
{
|
|
|
|
|
|
|
|
free(idx, M_IPFW);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Creates named hash instance.
|
|
|
|
* Must be called without holding any locks.
|
|
|
|
* Return pointer to new instance.
|
|
|
|
*/
|
|
|
|
struct namedobj_instance *
|
|
|
|
ipfw_objhash_create(uint32_t items)
|
|
|
|
{
|
|
|
|
struct namedobj_instance *ni;
|
|
|
|
int i;
|
|
|
|
size_t size;
|
|
|
|
|
|
|
|
size = sizeof(struct namedobj_instance) +
|
|
|
|
sizeof(struct namedobjects_head) * NAMEDOBJ_HASH_SIZE +
|
|
|
|
sizeof(struct namedobjects_head) * NAMEDOBJ_HASH_SIZE;
|
|
|
|
|
|
|
|
ni = malloc(size, M_IPFW, M_WAITOK | M_ZERO);
|
|
|
|
ni->nn_size = NAMEDOBJ_HASH_SIZE;
|
|
|
|
ni->nv_size = NAMEDOBJ_HASH_SIZE;
|
|
|
|
|
|
|
|
ni->names = (struct namedobjects_head *)(ni +1);
|
|
|
|
ni->values = &ni->names[ni->nn_size];
|
|
|
|
|
|
|
|
for (i = 0; i < ni->nn_size; i++)
|
|
|
|
TAILQ_INIT(&ni->names[i]);
|
|
|
|
|
|
|
|
for (i = 0; i < ni->nv_size; i++)
|
|
|
|
TAILQ_INIT(&ni->values[i]);
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
/* Set default hashing/comparison functions */
|
|
|
|
ni->hash_f = objhash_hash_name;
|
|
|
|
ni->cmp_f = objhash_cmp_name;
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
/* Allocate bitmask separately due to possible resize */
|
|
|
|
ipfw_objhash_bitmap_alloc(items, (void*)&ni->idx_mask, &ni->max_blocks);
|
|
|
|
|
|
|
|
return (ni);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_objhash_destroy(struct namedobj_instance *ni)
|
|
|
|
{
|
|
|
|
|
|
|
|
free(ni->idx_mask, M_IPFW);
|
|
|
|
free(ni, M_IPFW);
|
|
|
|
}
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
void
|
|
|
|
ipfw_objhash_set_funcs(struct namedobj_instance *ni, objhash_hash_f *hash_f,
|
|
|
|
objhash_cmp_f *cmp_f)
|
|
|
|
{
|
|
|
|
|
|
|
|
ni->hash_f = hash_f;
|
|
|
|
ni->cmp_f = cmp_f;
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
static uint32_t
|
2016-04-14 22:51:23 +00:00
|
|
|
objhash_hash_name(struct namedobj_instance *ni, const void *name, uint32_t set)
|
2014-06-12 09:59:11 +00:00
|
|
|
{
|
|
|
|
|
2016-04-14 22:51:23 +00:00
|
|
|
return (fnv_32_str((const char *)name, FNV1_32_INIT));
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
static int
|
2016-04-14 22:51:23 +00:00
|
|
|
objhash_cmp_name(struct named_object *no, const void *name, uint32_t set)
|
2014-08-30 17:18:11 +00:00
|
|
|
{
|
|
|
|
|
2016-04-14 22:51:23 +00:00
|
|
|
if ((strcmp(no->name, (const char *)name) == 0) && (no->set == set))
|
2014-08-30 17:18:11 +00:00
|
|
|
return (0);
|
|
|
|
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
static uint32_t
|
2014-08-30 17:18:11 +00:00
|
|
|
objhash_hash_idx(struct namedobj_instance *ni, uint32_t val)
|
2014-06-12 09:59:11 +00:00
|
|
|
{
|
|
|
|
uint32_t v;
|
|
|
|
|
|
|
|
v = val % (ni->nv_size - 1);
|
|
|
|
|
|
|
|
return (v);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct named_object *
|
|
|
|
ipfw_objhash_lookup_name(struct namedobj_instance *ni, uint32_t set, char *name)
|
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
uint32_t hash;
|
|
|
|
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
hash = ni->hash_f(ni, name, set) % ni->nn_size;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
TAILQ_FOREACH(no, &ni->names[hash], nn_next) {
|
2014-08-30 17:18:11 +00:00
|
|
|
if (ni->cmp_f(no, name, set) == 0)
|
2014-06-12 09:59:11 +00:00
|
|
|
return (no);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2016-04-14 22:51:23 +00:00
|
|
|
/*
|
|
|
|
* Find named object by @uid.
|
|
|
|
* Check @tlvs for valid data inside.
|
|
|
|
*
|
|
|
|
* Returns pointer to found TLV or NULL.
|
|
|
|
*/
|
2016-05-05 20:15:46 +00:00
|
|
|
ipfw_obj_ntlv *
|
|
|
|
ipfw_find_name_tlv_type(void *tlvs, int len, uint16_t uidx, uint32_t etlv)
|
2016-04-14 22:51:23 +00:00
|
|
|
{
|
|
|
|
ipfw_obj_ntlv *ntlv;
|
|
|
|
uintptr_t pa, pe;
|
|
|
|
int l;
|
|
|
|
|
|
|
|
pa = (uintptr_t)tlvs;
|
|
|
|
pe = pa + len;
|
|
|
|
l = 0;
|
|
|
|
for (; pa < pe; pa += l) {
|
|
|
|
ntlv = (ipfw_obj_ntlv *)pa;
|
|
|
|
l = ntlv->head.length;
|
|
|
|
|
|
|
|
if (l != sizeof(*ntlv))
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
if (ntlv->idx != uidx)
|
|
|
|
continue;
|
|
|
|
/*
|
|
|
|
* When userland has specified zero TLV type, do
|
|
|
|
* not compare it with eltv. In some cases userland
|
|
|
|
* doesn't know what type should it have. Use only
|
|
|
|
* uidx and name for search named_object.
|
|
|
|
*/
|
|
|
|
if (ntlv->head.type != 0 &&
|
|
|
|
ntlv->head.type != (uint16_t)etlv)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (ipfw_check_object_name_generic(ntlv->name) != 0)
|
|
|
|
return (NULL);
|
|
|
|
|
|
|
|
return (ntlv);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Finds object config based on either legacy index
|
|
|
|
* or name in ntlv.
|
|
|
|
* Note @ti structure contains unchecked data from userland.
|
|
|
|
*
|
|
|
|
* Returns 0 in success and fills in @pno with found config
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_objhash_find_type(struct namedobj_instance *ni, struct tid_info *ti,
|
|
|
|
uint32_t etlv, struct named_object **pno)
|
|
|
|
{
|
|
|
|
char *name;
|
|
|
|
ipfw_obj_ntlv *ntlv;
|
|
|
|
uint32_t set;
|
|
|
|
|
|
|
|
if (ti->tlvs == NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
|
2016-05-05 20:15:46 +00:00
|
|
|
ntlv = ipfw_find_name_tlv_type(ti->tlvs, ti->tlen, ti->uidx, etlv);
|
2016-04-14 22:51:23 +00:00
|
|
|
if (ntlv == NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
name = ntlv->name;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use set provided by @ti instead of @ntlv one.
|
|
|
|
* This is needed due to different sets behavior
|
|
|
|
* controlled by V_fw_tables_sets.
|
|
|
|
*/
|
|
|
|
set = ti->set;
|
|
|
|
*pno = ipfw_objhash_lookup_name(ni, set, name);
|
|
|
|
if (*pno == NULL)
|
|
|
|
return (ESRCH);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2015-04-27 08:29:39 +00:00
|
|
|
/*
|
|
|
|
* Find named object by name, considering also its TLV type.
|
|
|
|
*/
|
|
|
|
struct named_object *
|
|
|
|
ipfw_objhash_lookup_name_type(struct namedobj_instance *ni, uint32_t set,
|
2016-04-14 22:51:23 +00:00
|
|
|
uint32_t type, const char *name)
|
2015-04-27 08:29:39 +00:00
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
uint32_t hash;
|
|
|
|
|
|
|
|
hash = ni->hash_f(ni, name, set) % ni->nn_size;
|
|
|
|
|
|
|
|
TAILQ_FOREACH(no, &ni->names[hash], nn_next) {
|
2016-04-14 21:52:31 +00:00
|
|
|
if (ni->cmp_f(no, name, set) == 0 &&
|
|
|
|
no->etlv == (uint16_t)type)
|
2015-04-27 08:29:39 +00:00
|
|
|
return (no);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
struct named_object *
|
2014-07-03 22:25:59 +00:00
|
|
|
ipfw_objhash_lookup_kidx(struct namedobj_instance *ni, uint16_t kidx)
|
2014-06-12 09:59:11 +00:00
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
uint32_t hash;
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
hash = objhash_hash_idx(ni, kidx);
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
TAILQ_FOREACH(no, &ni->values[hash], nv_next) {
|
2014-07-03 22:25:59 +00:00
|
|
|
if (no->kidx == kidx)
|
2014-06-12 09:59:11 +00:00
|
|
|
return (no);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2014-06-16 13:05:07 +00:00
|
|
|
int
|
|
|
|
ipfw_objhash_same_name(struct namedobj_instance *ni, struct named_object *a,
|
|
|
|
struct named_object *b)
|
|
|
|
{
|
|
|
|
|
|
|
|
if ((strcmp(a->name, b->name) == 0) && a->set == b->set)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
void
|
|
|
|
ipfw_objhash_add(struct namedobj_instance *ni, struct named_object *no)
|
|
|
|
{
|
|
|
|
uint32_t hash;
|
|
|
|
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
hash = ni->hash_f(ni, no->name, no->set) % ni->nn_size;
|
2014-06-12 09:59:11 +00:00
|
|
|
TAILQ_INSERT_HEAD(&ni->names[hash], no, nn_next);
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
hash = objhash_hash_idx(ni, no->kidx);
|
2014-06-12 09:59:11 +00:00
|
|
|
TAILQ_INSERT_HEAD(&ni->values[hash], no, nv_next);
|
2014-06-14 10:58:39 +00:00
|
|
|
|
|
|
|
ni->count++;
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
ipfw_objhash_del(struct namedobj_instance *ni, struct named_object *no)
|
|
|
|
{
|
|
|
|
uint32_t hash;
|
|
|
|
|
Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
|
|
|
hash = ni->hash_f(ni, no->name, no->set) % ni->nn_size;
|
2014-06-12 09:59:11 +00:00
|
|
|
TAILQ_REMOVE(&ni->names[hash], no, nn_next);
|
|
|
|
|
2014-08-30 17:18:11 +00:00
|
|
|
hash = objhash_hash_idx(ni, no->kidx);
|
2014-06-12 09:59:11 +00:00
|
|
|
TAILQ_REMOVE(&ni->values[hash], no, nv_next);
|
2014-06-14 10:58:39 +00:00
|
|
|
|
|
|
|
ni->count--;
|
|
|
|
}
|
|
|
|
|
|
|
|
uint32_t
|
|
|
|
ipfw_objhash_count(struct namedobj_instance *ni)
|
|
|
|
{
|
|
|
|
|
|
|
|
return (ni->count);
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
uint32_t
|
|
|
|
ipfw_objhash_count_type(struct namedobj_instance *ni, uint16_t type)
|
|
|
|
{
|
|
|
|
struct named_object *no;
|
|
|
|
uint32_t count;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
count = 0;
|
|
|
|
for (i = 0; i < ni->nn_size; i++) {
|
|
|
|
TAILQ_FOREACH(no, &ni->names[i], nn_next) {
|
|
|
|
if (no->etlv == type)
|
|
|
|
count++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (count);
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
/*
|
|
|
|
* Runs @func for each found named object.
|
|
|
|
* It is safe to delete objects from callback
|
|
|
|
*/
|
2016-05-06 03:18:51 +00:00
|
|
|
int
|
2014-06-12 09:59:11 +00:00
|
|
|
ipfw_objhash_foreach(struct namedobj_instance *ni, objhash_cb_t *f, void *arg)
|
|
|
|
{
|
|
|
|
struct named_object *no, *no_tmp;
|
2016-05-06 03:18:51 +00:00
|
|
|
int i, ret;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
for (i = 0; i < ni->nn_size; i++) {
|
2016-05-06 03:18:51 +00:00
|
|
|
TAILQ_FOREACH_SAFE(no, &ni->names[i], nn_next, no_tmp) {
|
|
|
|
ret = f(ni, no, arg);
|
|
|
|
if (ret != 0)
|
|
|
|
return (ret);
|
|
|
|
}
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
2016-05-06 03:18:51 +00:00
|
|
|
return (0);
|
2014-06-12 09:59:11 +00:00
|
|
|
}
|
|
|
|
|
2016-05-17 07:47:23 +00:00
|
|
|
/*
|
|
|
|
* Runs @f for each found named object with type @type.
|
|
|
|
* It is safe to delete objects from callback
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
ipfw_objhash_foreach_type(struct namedobj_instance *ni, objhash_cb_t *f,
|
|
|
|
void *arg, uint16_t type)
|
|
|
|
{
|
|
|
|
struct named_object *no, *no_tmp;
|
|
|
|
int i, ret;
|
|
|
|
|
|
|
|
for (i = 0; i < ni->nn_size; i++) {
|
|
|
|
TAILQ_FOREACH_SAFE(no, &ni->names[i], nn_next, no_tmp) {
|
|
|
|
if (no->etlv != type)
|
|
|
|
continue;
|
|
|
|
ret = f(ni, no, arg);
|
|
|
|
if (ret != 0)
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-06-12 09:59:11 +00:00
|
|
|
/*
|
|
|
|
* Removes index from given set.
|
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
int
|
2014-07-03 22:25:59 +00:00
|
|
|
ipfw_objhash_free_idx(struct namedobj_instance *ni, uint16_t idx)
|
2014-06-12 09:59:11 +00:00
|
|
|
{
|
|
|
|
u_long *mask;
|
|
|
|
int i, v;
|
|
|
|
|
|
|
|
i = idx / BLOCK_ITEMS;
|
|
|
|
v = idx % BLOCK_ITEMS;
|
|
|
|
|
2014-07-03 22:25:59 +00:00
|
|
|
if (i >= ni->max_blocks)
|
2014-06-12 09:59:11 +00:00
|
|
|
return (1);
|
|
|
|
|
2014-07-03 22:25:59 +00:00
|
|
|
mask = &ni->idx_mask[i];
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
if ((*mask & ((u_long)1 << v)) != 0)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
/* Mark as free */
|
|
|
|
*mask |= (u_long)1 << v;
|
|
|
|
|
|
|
|
/* Update free offset */
|
2014-07-03 22:25:59 +00:00
|
|
|
if (ni->free_off[0] > i)
|
|
|
|
ni->free_off[0] = i;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2014-08-30 17:18:11 +00:00
|
|
|
* Allocate new index in given instance and stores in in @pidx.
|
2014-06-12 09:59:11 +00:00
|
|
|
* Returns 0 on success.
|
|
|
|
*/
|
|
|
|
int
|
2014-07-03 22:25:59 +00:00
|
|
|
ipfw_objhash_alloc_idx(void *n, uint16_t *pidx)
|
2014-06-12 09:59:11 +00:00
|
|
|
{
|
|
|
|
struct namedobj_instance *ni;
|
|
|
|
u_long *mask;
|
|
|
|
int i, off, v;
|
|
|
|
|
|
|
|
ni = (struct namedobj_instance *)n;
|
|
|
|
|
2014-07-03 22:25:59 +00:00
|
|
|
off = ni->free_off[0];
|
|
|
|
mask = &ni->idx_mask[off];
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
for (i = off; i < ni->max_blocks; i++, mask++) {
|
|
|
|
if ((v = ffsl(*mask)) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Mark as busy */
|
|
|
|
*mask &= ~ ((u_long)1 << (v - 1));
|
|
|
|
|
2014-07-03 22:25:59 +00:00
|
|
|
ni->free_off[0] = i;
|
2014-06-12 09:59:11 +00:00
|
|
|
|
|
|
|
v = BLOCK_ITEMS * i + v - 1;
|
|
|
|
|
|
|
|
*pidx = v;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
2009-12-16 10:48:40 +00:00
|
|
|
/* end of file */
|