212 Commits

Author SHA1 Message Date
melifaro
f9fb63fe8c Add API to ease adding new algorithms/new tabletypes to ipfw.
Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
  Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
  new ip_fw_table_algo.c file.
  Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+       char            name[64];
+       int             idx;
+       ta_init         *init;
+       ta_destroy      *destroy;
+       table_lookup_t  *lookup;
+       ta_prepare_add  *prepare_add;
+       ta_prepare_del  *prepare_del;
+       ta_add          *add;
+       ta_del          *del;
+       ta_flush_entry  *flush_entry;
+       ta_foreach      *foreach;
+       ta_dump_entry   *dump_entry;
+       ta_dump_xentry  *dump_xentry;
+};

* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
   ->tablestate pointer (array of 32 bytes structures necessary for
   runtime lookups (can be probably shrinked to 16 bytes later):

   +struct table_info {
   +       table_lookup_t  *lookup;        /* Lookup function */
   +       void            *state;         /* Lookup radix/other structure */
   +       void            *xstate;        /* eXtended state */
   +       u_long          data;           /* Hints for given func */
   +};

* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().

* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
    implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
    currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).

Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)

Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
2014-06-14 10:58:39 +00:00
melifaro
01ec53e019 Make ipfw tables use names as used-level identifier internally:
* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode

Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize

Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced

Sponsored by:	Yandex LLC
2014-06-12 09:59:11 +00:00
hiren
a877260646 DNOLD_IS_ECN introduced by r266941 is not required.
DNOLD_* flags are for compat with old binaries.

Suggested by: luigi
2014-06-01 20:19:17 +00:00
hiren
cc47b6d947 ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by:	Midori Kato (aoimidori27@gmail.com)
Worked with:	Lars Eggert (lars@netapp.com)
Reviewed by:	luigi, hiren
2014-06-01 07:28:24 +00:00
ae
26693dfcb9 Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR:		189655
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-05-18 14:25:19 +00:00
melifaro
f4783a05e9 Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR:		bin/189471
Submitted by:	Dennis Yusupoff <dyr@smartspb.net>
MFC after:	2 weeks
2014-05-17 13:45:03 +00:00
trociny
bd951d3fdb Define startup order the same way as it is in dummynet. 2014-04-26 08:05:16 +00:00
ae
d70382e43a Set oif only for outgoing packets.
PR:		188543
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-04-16 14:37:11 +00:00
brueffer
f19c513644 Free resources and error cases; re-indent a curly brace while here.
CID:		1199366
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-04-13 21:13:33 +00:00
glebius
0825c0b36c Fix breakage in ipfw+VIMAGE after r261590.
PR:		kern/187665
Sponsored by:	Nginx, Inc.
2014-03-21 17:07:18 +00:00
gnn
1804af5050 Summary: Two quick edits to the implementation notes as they're no
longer stored in netinet but in netpfil.
2014-02-15 18:36:31 +00:00
dim
4e20d58892 Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.

MFC after:	3 days
2014-02-15 12:25:01 +00:00
melifaro
c32089edca Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields

Sponsored by:	Yandex LLC
2014-01-24 09:13:30 +00:00
melifaro
104ab6ec12 Revert r260548. We really should not use IPFW_WLOCK() here
but this requires some more playing with IPFW_UH_WLOCK(). Leave till later.
2014-01-11 18:27:34 +00:00
melifaro
9f930faa0d We don't need chain write lock since we're not modifying its contents.
LibAliasSetAddress() uses its own mutex to serialize changes.

While here, convert ifp->if_xname access to if_name() function.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-01-11 16:50:41 +00:00
melifaro
c491eeb2f3 Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

MFC after:	2 weeks
2014-01-03 23:11:26 +00:00
melifaro
ce16a97371 Add net.inet.ip.fw.dyn_keep_states sysctl which
re-links dynamic states to default rule instead of
flushing on rule deletion.
This can be useful while performing ruleset reload
(think about `atomic` reload via changing sets).
Currently it is turned off by default.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-12-18 20:17:05 +00:00
melifaro
031fdfe55b Simplify O_NAT opcode handling.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 15:28:51 +00:00
melifaro
c9cfc8e322 Check ipfw table numbers in both user and kernel space before rule addition.
Found by:	Saychik Pavel <umka@localka.net>
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2013-11-28 10:28:28 +00:00
rodrigc
ad77255ba1 In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);".
This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit().

Therefore, vnet_ipfw_nat_uninit() *must* be called before vnet_ipfw_uninit(),
but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions.
In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c,
IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN
and
IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER

Consequently, if VIMAGE is enabled, and jails are created and destroyed,
the system sometimes crashes, because we are trying to use a deleted lock.

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL,
so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().
2013-11-25 20:20:34 +00:00
luigi
de6fdc14ad add a counter on the struct mq (a queue of mbufs),
and add a block for userspace compiling.
2013-11-22 05:02:37 +00:00
luigi
9b36fc77a2 disable some ipfw match options when compiling in userspace 2013-11-22 05:01:38 +00:00
luigi
f0a80e6e72 make this code compile in userspace on OSX 2013-11-22 05:00:18 +00:00
luigi
0a6b47e718 more support for userspace compiling of this code:
emulate the uma_zone for dynamic rules.
2013-11-22 04:59:17 +00:00
luigi
b0b354e495 make ipfw_check_packet() and ipfw_check_frame() public,
so they can be used in the userspace version of ipfw/dummynet
(normally using netmap for the I/O path).

This is the first of a few commits to ease compiling the
ipfw kernel code in userspace.
2013-11-22 04:57:50 +00:00
glebius
bce78dfe17 Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
2013-11-05 07:32:09 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
e352aa585e Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with:		bz
2013-10-27 16:25:57 +00:00
glebius
2c1ec831c9 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
philip
eb3631100b Use the correct EtherType for logging IPv6 packets.
Reviewed by:	melifaro
Approved by:	re (kib, glebius)
MFC after:	3 days
2013-09-28 15:49:36 +00:00
mav
413bf347cd Make dummynet use new direct callout(9) execution mechanism. Since the only
thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't
need extra switch to the clock SWI context.

On idle system this change in half reduces number of active CPU cycles and
wakes up only one CPU from sleep instead of two.

I was going to make this change much earlier as part of calloutng project,
but waited for better solution with skipping idle ticks to be implemented.
Unfortunately with 10.0 release coming it is better get at least this.
2013-08-24 13:34:36 +00:00
trociny
583ac34809 Make ipfw nat init/unint work correctly for VIMAGE:
* Do per vnet instance cleanup (previously it was only for vnet0 on
  module unload, and led to libalias leaks and possible panics due to
  stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
  vnet0 lock (which does not prevent pointers access from another
  vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
  set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
  all vnets.

* It is supposed that ifaddr_change event handler is called in the
  interface vnet context, so add an assertion.

Reviewed by:	zec
MFC after:	2 weeks
2013-08-24 11:59:51 +00:00
melifaro
858e632fa7 Use unified method for accessing / updating cached rule pointers.
MFC after:	2 weeks
2013-05-04 18:24:30 +00:00
eadler
a5a9ec51d6 Correct a few sizeof()s
Submitted by:	swildner@DragonFlyBSD.org
Reviewed by:	alfred
2013-05-01 04:37:34 +00:00
glebius
ccddbf9365 Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by:	pho
2013-04-29 06:11:19 +00:00
glebius
b4bc270e8f Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
melifaro
bbeb8a5ba2 Fix ipfw rule validation partially broken by r248552.
Pointed by:	avg
MFC with:	r248552
2013-04-01 11:28:52 +00:00
ae
3d1df10de4 When we are removing a specific set, call ipfw_expire_dyn_rules only once.
Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-25 07:43:46 +00:00
melifaro
31a6358fff Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
ae
23037c29f1 Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
2013-03-19 06:04:17 +00:00
melifaro
063bdc75f8 Fix callout expiring dynamic rules.
PR:		kern/175530
Submitted by:	Vladimir Spiridenkov <vs@gtn.ru>
MFC after:	2 weeks
2013-03-02 14:47:10 +00:00
melifaro
a7a75993c7 Add parentheses to IP_FW_ARG_TABLEARG() definition.
Suggested by:	glebius
MFC with:	r244633
2012-12-23 18:35:42 +00:00
melifaro
911df5a332 Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by:	Vitaliy Tokarenko <rphone@ukr.net>
MFC after:	2 weeks
2012-12-23 16:28:18 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
melifaro
6a45724ec7 Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after:	3 weeks
2012-11-30 19:36:55 +00:00
melifaro
c07e3ec124 Make ipfw dynamic states operations SMP-ready.
* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with:	ipfw
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2012-11-30 16:33:22 +00:00
melifaro
b2297df0fc Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Reviewed by:	ae(basically)
MFC after:	2 weeks
2012-11-09 18:23:38 +00:00
melifaro
e570ee3854 Add assertion to enforce 'nat global' locking requierements changed by r241908.
Suggested by:	adrian, glebius
MFC after:	3 days
2012-11-05 22:54:00 +00:00
melifaro
6056a71b0e Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

MFC after:	1 week
2012-11-05 22:30:56 +00:00