Commit Graph

484 Commits

Author SHA1 Message Date
Luigi Rizzo
4af7aed7c6 assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors):
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.
2015-07-10 19:24:36 +00:00
Luigi Rizzo
e38e277fc4 one more warning suppression when compiling the test code in userspace. 2015-07-10 19:18:49 +00:00
Luigi Rizzo
e25716b7cc add code to compute fairness indexes;
cleanups to remove compile warnings.
2015-07-10 18:10:40 +00:00
Ermal Luçi
a5b789f65a ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
Kristof Provost
06ba348d27 pf: Remove frc_direction
We don't use the direction of the fragments for anything. The frc_direction
field is assigned, but never read.
Just remove it.

Differential Revision:	https://reviews.freebsd.org/D2773
Approved by:	philip (mentor)
2015-06-11 17:57:47 +00:00
Kristof Provost
837b925aba pf: Save the protocol number in the pf_fragment
When we try to look up a pf_fragment with pf_find_fragment() we compare (see
pf_frag_compare()) addresses (and family), id but also protocol.  We failed to
save the protocol to the pf_fragment in pf_fragcache(), resulting in failing
reassembly.

Differential Revision:	https://reviews.freebsd.org/D2772
2015-06-11 13:26:16 +00:00
Kristof Provost
0b7eba6ad4 pf: address family must be set when creating a pf_fragment
Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set.
In that scenario we take a different branch in pf_normalize_ip(), taking us to
pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a
pf_fragment, but do not set the address family. This leads to a panic when we
try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to
compare the pf_fragments doesn't know what to do if the address family is not
set.

Simply ensure that the address family is set correctly (always AF_INET in this
path).

PR:			200330
Differential Revision:	https://reviews.freebsd.org/D2769
Approved by:		philip (mentor), gnn (mentor)
2015-06-10 13:44:04 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Luigi Rizzo
62f42cf8ee use proper types to represent function pointers 2015-05-19 16:51:30 +00:00
Luigi Rizzo
352bc63d72 remove a redundant ; at the end of a function
MFC after:	1 week
2015-05-19 15:29:00 +00:00
Luigi Rizzo
bebf3c825f remove an extra ; after MODULE_DEPEND
(would otherwise generate a warning with more verbose compiler flags)

MFC after:	1 week
2015-05-19 14:49:31 +00:00
Gleb Smirnoff
3dd01a884c Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization
from associated structures initialization.  The mutexes are global, while
the structures are per-vnet.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
2015-05-19 14:04:21 +00:00
Gleb Smirnoff
30fe681e44 During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.

Sponsored by:	Nginx, Inc.
2015-05-19 14:02:40 +00:00
Gleb Smirnoff
78680d05d1 A miss from r283061: don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:51:27 +00:00
Gleb Smirnoff
b7f69c506d Don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:05:12 +00:00
Luigi Rizzo
8ff71b031e bugfix (only affecting the "lookup" option in the userspace version of ipfw):
the conditional block should not include the 'else' otherwise
the code does a 'break;' without completing the check
2015-05-13 11:53:25 +00:00
Alexander V. Chernikov
e09c1944a3 Remove ptei->value check from ipfw_link_table_values():
even if there was non-zero number of restarts, we would unref/clear
  all value references and start ipfw_link_table_values() once again
  with (mostly) cleared "tei" buffer.
 Additionally, ptei->ptv stores only to-be-added values, not existing ones.
 This is a forgotten piece of previous value refconting implementation,
  and now it is simply incorrect.
2015-05-12 20:42:42 +00:00
Alexander V. Chernikov
b45fa3fad6 Fix panic when prepare_batch_buffer() returns error. 2015-05-06 07:53:43 +00:00
Alexander V. Chernikov
caf993912e Fix KASSERT introduced in r282155.
Found by:	dhw
2015-04-30 21:51:12 +00:00
Alexander V. Chernikov
e948489558 Fix panic introduced by r282070.
Arm friendly KASSERT() to ease debug of similar crashes.

Submitted by:	Olivier Cochard-Labbé
2015-04-28 17:05:55 +00:00
Alexander V. Chernikov
a1bddc75b4 Fix 'may be used uninitialized' warning not caught by clang. 2015-04-27 10:01:22 +00:00
Alexander V. Chernikov
1a458088ff Use free_nat_instance() for nat instance deletion.
Sponsored by:	Yandex LLC
2015-04-27 09:16:22 +00:00
Alexander V. Chernikov
74b22066b0 Make rule table kernel-index rewriting support any kind of objects.
Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
 values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
 do a) for them.

Kernel does almost the same when exporting rules to userland:
 prepares array of used tables in all rules in range, and
 prepends it before the actual ruleset retaining actual in-kernel
 indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
  between userland string identifier and in-kernel index.
  (See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
 shared index and names instead of numbers (in next commit).

Sponsored by:	Yandex LLC
2015-04-27 08:29:39 +00:00
Gleb Smirnoff
fdf6290ea9 Fix memory leak.
PR:		199670
Reviewed by:	ae
2015-04-27 05:44:09 +00:00
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Kristof Provost
3d1bbe5fa0 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
George V. Neville-Neil
916e17fd56 I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
Kristof Provost
1873dcc8c9 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Kristof Provost
7dce9b515b pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Andrey V. Elsukov
bf55a0034d The offset variable has been cleared all bits except IP6F_OFF_MASK.
Use ip6f_mf variable instead of checking its bits.
2015-03-31 14:41:29 +00:00
Sergey Kandaurov
a4879be402 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
2530ed9e70 Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value
to obtain IPv4 next hop address in tablearg case.

Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.

Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.

Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.

Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.

Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.

Differential Revision:	https://reviews.freebsd.org/D2015
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-03-13 09:03:25 +00:00
Andrey V. Elsukov
998fbd14b8 Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
Gleb Smirnoff
4ac6485cc6 Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
39a58828ef In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
Gleb Smirnoff
f5ceb22b78 Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
Alexander V. Chernikov
9f925e8a92 Fix IP_FW_NAT44_LIST_NAT size calculation.
Found by:	lev
Sponsored by:	Yandex LLC
2015-02-05 14:54:53 +00:00
Alexander V. Chernikov
0caab00959 * Make sure table algorithm destroy hook is always called without locks
* Explicitly lock freeing interface references in ta_destroy_ifidx
* Change ipfw_iface_unref() to require UH lock
* Add forgotten ipfw_iface_unref() to destroy_ifidx_locked()

PR:		kern/197276
Submitted by:	lev
Sponsored by:	Yandex LLC
2015-02-05 13:49:04 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Alexander V. Chernikov
0b47e42b49 Use ipfw runtime lock only when real modification is required. 2015-01-16 10:49:27 +00:00
Craig Rodrigues
7259906eb0 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
4de985af0b Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Ermal Luçi
7b56cc430a pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
Alexander V. Chernikov
5b07fc31cc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
Alexander V. Chernikov
a458ad86ee Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Alexander V. Chernikov
038263c36a Remove unused variable.
Found by:	Coverity
CID:		1245739
2014-11-04 10:25:52 +00:00
Alexander V. Chernikov
552eb491ab Bump default dynamic limit to 16k entries.
Print better log message when limit is hit.

PR:		193300
Submitted by:	me at nileshgr.com
2014-10-24 13:57:15 +00:00
Alexander V. Chernikov
9e3a53fd35 Rename log2 to tal_log2.
Submitted by:	luigi
2014-10-22 21:20:37 +00:00
Luigi Rizzo
03be41e6a4 remove/fix old code for building ipfw and dummynet in userspace 2014-10-22 05:21:36 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Alexander V. Chernikov
54b38fcf03 Use copyout() directly instead of updating various fields
before/after each sooptcopyout() call.

Found by:	luigi
Sponsored by:	Yandex LLC
2014-10-20 11:21:07 +00:00
Alexander V. Chernikov
4040f4ecd6 Perform more checks on the number of tables supplied by user. 2014-10-19 11:15:19 +00:00
Dag-Erling Smørgrav
99e9de871a Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
Alexander V. Chernikov
0d90989bef Use IPFW_RULE_CNTR_SIZE macro instead of non-relevant ip_fw_cntr structure.
Found by:	luigi
2014-10-18 17:23:41 +00:00
Alexander V. Chernikov
2930362fb1 Fix matching default rule on clear/show commands.
Found by:	Oleg Ginzburg
2014-10-13 13:49:28 +00:00
Alexander V. Chernikov
956f6d3a3c Fix KASSERT typo. 2014-10-11 15:04:50 +00:00
Alexander V. Chernikov
3fd16a3a72 Remove redundant if_notifier declaration. 2014-10-10 20:37:06 +00:00
George V. Neville-Neil
1d2baefc13 Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
Alexander V. Chernikov
5f8ad2bd82 Fix KASSERT argument type. 2014-10-10 18:57:12 +00:00
Alexander V. Chernikov
d699ee2dc9 Fix NOINET6 build for ipfw. 2014-10-10 18:31:35 +00:00
Alexander V. Chernikov
9fe15d0612 Partially fix build on !amd64
Pointed by:	bz
2014-10-10 17:24:56 +00:00
Alexander V. Chernikov
a13a821641 Merge projects/ipfw to HEAD.
Main user-visible changes are related to tables:

* Tables are now identified by names, not numbers.
 There can be up to 65k tables with up to 63-byte long names.
* Tables are now set-aware (default off), so you can switch/move
 them atomically with rules.
* More functionality is supported (swap, lock, limits, user-level lookup,
 batched add/del) by generic table code.
* New table types are added (flow) so you can match multiple packet fields at once.
* Ability to add different type of lookup algorithms for particular
 table type has been added.
* New table algorithms are added (cidr:hash, iface:array, number:array and
 flow:hash) to make certain types of lookup more effective.
* Table value are now capable of holding multiple data fields for
  different tablearg users

Performance changes:
* Main ipfw lock was converted to rmlock
* Rule counters were separated from rule itself and made per-cpu.
* Radix table entries fits into 128 bytes
* struct ip_fw is now more compact so more rules will fit into 64 bytes
* interface tables uses array of existing ifindexes for faster match

ABI changes:
All functionality supported by old ipfw(8) remains functional.
 Old & new binaries can work together with the following restrictions:
* Tables named other than ^\d+$ are shown as table(65535) in
 ruleset in old binaries

Internal changes:.
Changing table ids to numbers resulted in format modification for
 most sockopt codes. Old sopt format was compact, but very hard to
 extend (no versioning, inability to add more opcodes), so
* All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes.
* The remaining opcodes were also converted to be able to eliminate
 all older opcodes at once
* All IP_FW3 handlers uses special API instead of calling sooptcopy*
 directly to ease adding another communication methods
* struct ip_fw is now different for kernel and userland
* tablearg value has been changed to 0 to ease future extensions
* table "values" are now indexes in special value array which
 holds extended data for given index
* Batched add/delete has been added to tables code
* Most changes has been done to permit batched rule addition.
* interface tracking API has been added (started on demand)
 to permit effective interface tables operations
* O(1) skipto cache, currently turned off by default at
 compile-time (eats 512K).

* Several steps has been made towards making libipfw:
  * most of new functions were separated into "parse/prepare/show
    and actuall-do-stuff" pieces (already merged).
  * there are separate functions for parsing text string into "struct ip_fw"
    and printing "struct ip_fw" to supplied buffer (already merged).
* Probably some more less significant/forgotten features

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-10-09 19:32:35 +00:00
Alexander V. Chernikov
f9ab623bf2 Bump ipfw module version. 2014-10-09 16:12:01 +00:00
Alexander V. Chernikov
779b53d008 Sync to HEAD@r272825. 2014-10-09 15:35:28 +00:00
Alexander V. Chernikov
4c060d851c Fix core on table destroy inroduced by table values code.
Rename @ti array copy to 'ti_copy'.
2014-10-09 14:33:20 +00:00
Alexander V. Chernikov
ce575f539f * Wire large user buffer before processing GET request.
* Fix incorrect size calculation for IP_FW_XGET request.
2014-10-09 12:37:53 +00:00
Alexander V. Chernikov
be8bc45790 Add IP_FW_DUMP_SOPTCODES sopt to be able to determine
which opcodes are currently available in kernel.
2014-10-08 11:12:14 +00:00
Alexander V. Chernikov
eadf3b965c Fix possible crash when old value pointer is not updated after array resize. 2014-10-07 18:22:05 +00:00
Alexander V. Chernikov
79e86902e9 Notify table algo aboute runtime data change on table flush. 2014-10-07 16:46:11 +00:00
Alexander V. Chernikov
8ebca97f5e * Fix crash in interface tracker due to using old "linked" field.
* Ensure we're flushing entries without any locks held.
* Free memory in (rare) case when interface tracker fails to register ifp.
* Add KASSERT on table values refcounts.
2014-10-07 10:54:53 +00:00
Alexander V. Chernikov
bbd5a84297 Improve r272609 (O_TCPOPTS).
MFC after:	3 dayes
2014-10-06 12:29:06 +00:00
Alexander V. Chernikov
a5fedf11fc Sync to HEAD@r272609. 2014-10-06 11:29:50 +00:00
Alexander V. Chernikov
3615981425 Fix O_TCPOPTS processing.
Obtained from:	luigi
2014-10-06 11:15:11 +00:00
Alexander V. Chernikov
d4e1b51578 Fix build with gcc. 2014-10-04 13:57:14 +00:00
Alexander V. Chernikov
e530ca7333 Please GCC by specifying proper cast. 2014-10-04 13:46:10 +00:00
Alexander V. Chernikov
e3cadfdb32 Bump max rule size to 512 opcodes. 2014-10-04 12:46:26 +00:00
Alexander V. Chernikov
1ce4b35740 Sync to HEAD@r272516. 2014-10-04 12:42:37 +00:00
Alexander V. Chernikov
60805b89df Add "ipfw_ctl3" FEATURE to indicate presence of new ipfw interface. 2014-10-04 12:10:32 +00:00
Alexander V. Chernikov
ccba94b8fc Switch ipfw to use rmlock for runtime locking. 2014-10-04 11:40:35 +00:00
Alexander V. Chernikov
be3cc1b567 Bump max rule size to 512 opcodes. 2014-10-04 10:15:49 +00:00
Alexander V. Chernikov
f8350f3a23 Make linear_skipto turned off by default. 2014-10-03 15:54:51 +00:00
Alexander V. Chernikov
31f0d081d8 Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
Gleb Smirnoff
495a22b595 Use rn_detachhead() instead of direct free(9) for radix tables.
Sponsored by:	Nginx, Inc.
2014-10-01 13:35:41 +00:00
Sean Bruno
488c0a7ca8 Fix NULL pointer deref in ipfw when using dummynet at layer 2.
Drop packet if pkg->ifp is NULL, which is the case here.

ref. https://github.com/HardenedBSD/hardenedBSD
commit 4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7

PR 193861 --  DUMMYNET LAYER2: kernel panic

in this case a kernel panic occurs. Hence, when we do not get an interface,
we just drop the packet in question.

PR:		193681
Submitted by:	David Carlier <david.carlier@hardenedbsd.org>
Obtained from:	Hardened BSD
MFC after:	2 weeks
Relnotes:	yes
2014-09-25 02:26:05 +00:00
Alexander V. Chernikov
b1d105bc68 Add pre-alfa version of DXR lookup module.
It does build but (currently) does not work.

This change is not intended to be merged along with other ipfw changes.
2014-09-21 18:15:09 +00:00
Gleb Smirnoff
2a6009bfa6 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Gleb Smirnoff
450cecf0a0 - Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
  DIOCSTOP ioctls.

Reported & tested by:	jmallett
Sponsored by:		Nginx, Inc.
2014-09-12 08:39:15 +00:00
Alexander V. Chernikov
d6164b77f8 Make ipfw_nat module use IP_FW3 codes.
Kernel changes:
* Split kernel/userland nat structures eliminating IPFW_INTERNAL hack.
* Add IP_FW_NAT44_* codes resemblin old ones.
* Assume that instances can be named (no kernel support currently).
* Use both UH+WLOCK locks for all configuration changes.
* Provide full ABI support for old sockopts.

Userland changes:
* Use IP_FW_NAT44_* codes for nat operations.
* Remove undocumented ability to show ranges of nat "log" entries.
2014-09-07 18:30:29 +00:00
Alexander V. Chernikov
1a33e79969 Change copyrights to the proper one. 2014-09-05 14:19:02 +00:00
Alexander V. Chernikov
c9daea0b86 Sync to HEAD@r271160. 2014-09-05 13:52:39 +00:00
Alexander V. Chernikov
6b988f3a27 * Use modular opcode handling inside ipfw_ctl3() instead of static switch.
* Provide hints for subsystem initializers if they are called for
  the first/last time.
* Convert every IP_FW3 opcode user to use new sopt API.
2014-09-05 11:11:15 +00:00
Alexander V. Chernikov
e822d9364e Be consistent and use same arguments for ctl3 opcodes.
Move legacy IP_FW_TABLE_XGETSIZE handling to separate function.
2014-09-03 21:57:06 +00:00
Gleb Smirnoff
bf7dcda366 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
Alexander V. Chernikov
fb4b37a357 * Fix crash due to forgotten value refcouting in ipfw_link_table_values()
* Fix argument order in rollback_toperation_state()
* Make flush_table() use operation state API to ease checks.
2014-09-02 20:46:18 +00:00
Alexander V. Chernikov
71af39bf34 Add more comments on newly-added functions.
Add back opstate handler function.
2014-09-02 14:27:12 +00:00
Gleb Smirnoff
b616ae250c Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR:		177808
Submitted by:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by:	InnoGames GmbH
2014-09-01 13:00:45 +00:00
Alexander V. Chernikov
0cba2b2802 Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.

Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
  each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
  table items. Currently table addition may required multiple UH drops/
  acquires which is quite tricky due to atomic table modificatio/swap
  support, shared array resize, etc. Deal with it by calling special
  notifier capable of rolling back state before actually performing
  swap/resize operations. Original operation then restarts itself after
  acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.

Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
  <skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
  New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..

Some examples:

3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
 kindex: 2, type: addr
 references: 0, valtype: skipto,limit,ipv4,ipv6
 algorithm: addr:radix
 items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
2014-08-31 23:51:09 +00:00
Alexander V. Chernikov
1326363253 * Make objhash api a bit more abstract by providing ability to specify
own hash/compare functions.
* Add requirement for table algorithms to copy "valie" field in @add
  callback instead of "prepare_add".
* Document existing requirement for table algorithms to store value
  of deleted record to @tei.
2014-08-30 17:18:11 +00:00
Alexander V. Chernikov
e86bb35d63 Whitespace/style changes merged from projects/ipfw. 2014-08-23 17:57:06 +00:00
Alexander V. Chernikov
832fd78087 Sync to HEAD@r270409. 2014-08-23 14:58:31 +00:00
Alexander V. Chernikov
867708f7eb Simplify table reference/create chain. 2014-08-23 12:41:39 +00:00
Alexander V. Chernikov
4dff4ae028 * Use OP_ADD/OP_DEL macro instead of plain integers.
* ipfw_foreach_table_tentry() to permit listing
  arbitrary ipfw table using standart format.
2014-08-23 11:27:49 +00:00
Gleb Smirnoff
e85343b1a5 Do not lookup source node twice when pf_map_addr() is used.
PR:		184003
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:16:08 +00:00
Gleb Smirnoff
afab0f7e01 pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR:		183997
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:02:24 +00:00
Alexander V. Chernikov
4bbd15771b Make room for multi-type values in struct tentry. 2014-08-15 12:58:32 +00:00
Gleb Smirnoff
11341cf97e Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.
PR:		127920
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 04:35:34 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Alexander V. Chernikov
c21034b744 Replace "cidr" table type with "addr" type.
Suggested by:	luigi
2014-08-14 21:43:20 +00:00
Alexander V. Chernikov
d3b00c08bc * Add cidr:kfib algo type just for fun. It binds kernel fib
of given number to a table.

Example:
# ipfw table fib2 create algo "cidr:kfib fib=2"
# ipfw table fib2 info
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
# ipfw table fib2 list
+++ table(fib2), set(0) +++
10.0.0.0/24 0
127.0.0.1/32 0
::/96 0
::1/128 0
::ffff:0.0.0.0/96 0
2a02:978:2::/112 0
fe80::/10 0
fe80:1::/64 0
fe80:2::/64 0
fe80:3::/64 0
ff02::/16 0
# ipfw table fib2 lookup 10.0.0.5
10.0.0.0/24 0
# ipfw table fib2 lookup 2a02:978:2::11
2a02:978:2::/112 0
# ipfw table fib2 detail
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
 IPv4 algorithm radix info
  items: 0 itemsize: 200
 IPv6 algorithm radix info
  items: 0 itemsize: 200
2014-08-14 20:17:23 +00:00
Gleb Smirnoff
a9572d8f02 - Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
  use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
  and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-14 18:57:46 +00:00
Alexander V. Chernikov
fd0869d547 * Document internal commands.
* Do not require/set default table type if algo name is specified.
* Add TA_FLAG_READONLY option for algorithms.
2014-08-14 17:31:04 +00:00
Alexander V. Chernikov
98eff10e84 Clean up kernel interaction in ip_fw_iface.c
Suggested by:	ae
2014-08-14 13:24:59 +00:00
Alexander V. Chernikov
35d5a820e5 Fix crash in case of iflist request on non-initialized tracker. 2014-08-14 08:42:16 +00:00
Alexander V. Chernikov
18ad419788 * Fix displaying dynamic rules for large rulesets.
* Clean up some comments.
2014-08-14 08:21:22 +00:00
Alexander V. Chernikov
fddbbf75c8 Fix assertion. 2014-08-13 16:53:12 +00:00
Alexander V. Chernikov
1b833d535b Sync to HEAD@r269943. 2014-08-13 16:20:41 +00:00
Alexander V. Chernikov
40e5f498de * Pass proper table set numbers from userland side.
* Ignore them, but honor V_fw_tables_sets value on kernel side.
2014-08-13 12:04:45 +00:00
Alexander V. Chernikov
ce743e5c77 * Add jump_linear() function utilizing calculated skipto cache.
* Update description for jump_fast()
* Make jump_fast() users use JUMP() macro which is resolved to
    jump_fast() by default.
2014-08-13 09:34:33 +00:00
Alexander V. Chernikov
c8d5d3088b * Clarify ipfw_swap_table operations
* Ensure <add|del>_table_entry handle ta change properly.
2014-08-12 17:03:13 +00:00
Alexander V. Chernikov
e5eec6dd21 * Rename ipfw_[un]bind_table_rule to ipfw_[un]ref_rule_tables
* Update their descriptions.
2014-08-12 16:08:13 +00:00
Alexander V. Chernikov
1940fa7727 Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
 O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
 O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.

The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.

This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
2014-08-12 15:51:48 +00:00
Alexander V. Chernikov
56f43a5e98 Do not use index 0 for tables. 2014-08-12 14:19:45 +00:00
Alexander V. Chernikov
301290bc6d * Rename has_space to need_modify to be consistent with 0 as return values.
* document all callbacks supported by algorithms code.
2014-08-12 14:09:15 +00:00
Alexander V. Chernikov
f99fbf96c4 No functional changes, do better functions grouping. 2014-08-12 10:22:46 +00:00
Alexander V. Chernikov
0468c5bae9 Simplify table auto-creation for old userland users. 2014-08-12 09:48:54 +00:00
Alexander V. Chernikov
1bc0d45749 Simplify add/del_table_entry() by making their common pieces
common functions.
2014-08-11 22:38:13 +00:00
Alexander V. Chernikov
35e1bbd089 Update functions descriptions. 2014-08-11 20:00:51 +00:00
Alexander V. Chernikov
4f43138ade * Add the abilify to lock/unlock given table from changes.
Example:

# ipfw table si lock
# ipfw table si info
+++ table(si), set(0) +++
 kindex: 0, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:radix
 items: 0, size: 288
# ipfw table si add 4.5.6.7
ignored: 4.5.6.7/32 0
ipfw: Adding record failed: table is locked
# ipfw table si unlock
# ipfw table si add 4.5.6.7
added: 4.5.6.7/32 0
# ipfw table si lock
# ipfw table si delete 4.5.6.7
ignored: 4.5.6.7/32 0
ipfw: Deleting record failed: table is locked
# ipfw table si unlock
# ipfw table si delete 4.5.6.7
deleted: 4.5.6.7/32 0
2014-08-11 18:09:37 +00:00
Alexander V. Chernikov
3a845e1076 * Add support for batched add/delete for ipfw tables
* Add support for atomic batches add (all or none).
* Fix panic on deleting non-existing entry in radix algo.

Examples:

# si is empty
# ipfw table si add 1.1.1.1/32 1111 2.2.2.2/32 2222
added: 1.1.1.1/32 1111
added: 2.2.2.2/32 2222
# ipfw table si add 2.2.2.2/32 2200 4.4.4.4/32 4444
exists: 2.2.2.2/32 2200
added: 4.4.4.4/32 4444
ipfw: Adding record failed: record already exists
^^^^^ Returns error but keeps inserted items
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
# ipfw table si atomic add 3.3.3.3/32 3333 4.4.4.4/32 4400 5.5.5.5/32 5555
added(reverted): 3.3.3.3/32 3333
exists: 4.4.4.4/32 4400
ignored: 5.5.5.5/32 5555
ipfw: Adding record failed: record already exists
^^^^^ Returns error and reverts added records
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
2014-08-11 17:34:25 +00:00
Alexander V. Chernikov
030b184f10 * Use 2 32-bits field inside rule instead of 2 pointer to save skipto state.
* Introduce ipfw_reap_add() to unify unlinking rules/adding it to reap queue
* Unbreak FreeBSD7 export format.
2014-08-09 09:11:26 +00:00
Alexander V. Chernikov
720ee730c6 Kernel changes:
* Fix buffer calculation for table dumps
* Fix IPv6 radix entiries addition broken in r269371.

Userland changes:
* Fix bug in retrieving statric ruleset
* Fix several bugs in retrieving table list
2014-08-08 21:09:22 +00:00
Alexander V. Chernikov
8bd1921248 Partially revert previous commit:
"0" value is perfectly valid for O_SETFIB and O_SETDSCP,
  so tablearg remains to be 655535 for now.
2014-08-08 15:33:26 +00:00
Alexander V. Chernikov
2c452b20dd * Switch tablearg value from 65535 to 0.
* Use u16 table kidx instead of integer on for iface opcode.
* Provide compability layer for old clients.
2014-08-08 14:23:20 +00:00
Alexander V. Chernikov
adf3b2b9d8 * Add IP_FW_TABLE_XMODIFY opcode
* Since there seems to be lack of consensus on strict value typing,
  remove non-default value types. Use userland-only "value format type"
  to print values.

Kernel changes:
* Add IP_FW_XMODIFY to permit table run-time modifications.
  Currently we support changing limit and value format type.

Userland changes:
* Support IP_FW_XMODIFY opcode.
* Support specifying value format type (ftype) in tablble create/modify req
* Fine-print value type/value format type.
2014-08-08 09:27:49 +00:00
Alexander V. Chernikov
28ea4fa355 Remove IP_FW_TABLES_XGETSIZE opcode.
It is superseded by IP_FW_TABLES_XLIST.
2014-08-08 06:36:26 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Alexander V. Chernikov
91e721d772 Since all of base IP_FW opcodes has been converted to IP_FW3,
switch default sopt handler to ipfw_clt3.
Add some comments for ipfw_get_sopt* api.
2014-08-07 22:08:43 +00:00
Alexander V. Chernikov
a73d728d31 Kernel changes:
* Implement proper checks for switching between global and set-aware tables
* Split IP_FW_DEL mess into the following opcodes:
  * IP_FW_XDEL (del rules matching pattern)
  * IP_FW_XMOVE (move rules matching pattern to another set)
  * IP_FW_SET_SWAP (swap between 2 sets)
  * IP_FW_SET_MOVE (move one set to another one)
  * IP_FW_SET_ENABLE (enable/disable sets)
* Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration.
* Use unified ipfw_range_tlv as range description for all of the above.
* Check dynamic states IFF there was non-zero number of deleted dyn rules,
* Del relevant dynamic states with singe traversal instead of per-rule one.

Userland changes:
* Switch ipfw(8) to use new opcodes.
2014-08-07 21:37:31 +00:00
Alexander V. Chernikov
46d5200874 Implement atomic ipfw table swap.
Kernel changes:
* Add opcode IP_FW_TABLE_XSWAP
* Add support for swapping 2 tables with the same type/ftype/vtype.
* Make skipto cache init after ipfw locks init.

Userland changes:
* Add "table X swap Y" command.
2014-08-03 21:37:12 +00:00
Alexander V. Chernikov
d5eb80cb0a Implement O(1) skipto using indexed array.
This adds 512K (2 * sizeof(u32) * 65k) bytes to the memory footprint.
This feature is optionaly and may be turned on in any time
(however it starts immediately in this commit. This will be changed.)
2014-08-03 15:49:03 +00:00
Alexander V. Chernikov
5f379342d2 Show algorithm-specific data in "table info" output. 2014-08-03 12:19:45 +00:00
Alexander V. Chernikov
a399f8be58 Be consistent on cidr:radix function naming: use algo name instead
of "cidr".
2014-08-03 09:53:34 +00:00
Alexander V. Chernikov
d20facb2f3 Remove unneded headers. 2014-08-03 09:48:54 +00:00
Alexander V. Chernikov
3fe2ef9129 Whitespace changes. 2014-08-03 09:40:50 +00:00
Alexander V. Chernikov
0bce0c23d6 * Move all algo-specific structures to the top of algo definition.
* Be consistent on naming variables in different algos.
* Use exponential array grow in iface:array and number:array.
2014-08-03 09:04:36 +00:00
Alexander V. Chernikov
648e838045 Store entry value back in @tei on entry update/deletion as another step
to batched atomic updates.
2014-08-03 08:32:54 +00:00
Alexander V. Chernikov
b6ee846e04 * Fix case when returning more that 4096 bytes of data
* Use different approach to ensure algo has enough space to store N elements:
  - explicitly ask algo (under UH_WLOCK) before/after insertion.  This (along
    with existing reallocation callbacks) really guarantees us that it is safe
    to insert N elements at once while holding UH_WLOCK+WLOCK.
  - remove old aflags/flags approach
2014-08-02 17:18:47 +00:00
Alexander V. Chernikov
4c0c07a552 * Permit limiting number of items in table.
Kernel changes:
* Add TEI_FLAGS_DONTADD entry flag to indicate that insert is not possible
* Support given flag in all algorithms
* Add "limit" field to ipfw_xtable_info
* Add actual limiting code into add_table_entry()

Userland changes:
* Add "limit" option as "create" table sub-option. Limit modification
  is currently impossible.
* Print human-readable errors in table enry addition/deletion code.
2014-08-01 15:17:46 +00:00
Alexander V. Chernikov
95c3c1e25f Do not perform memset() on ta_buf in algo callbacks:
it is already zeroed by base code.
2014-08-01 08:39:47 +00:00
Alexander V. Chernikov
2e324d2931 Simplify radix operations: use unified tei_to_sockaddr_ent() to generate
keys for add/delete calls.
2014-08-01 08:28:18 +00:00
Alexander V. Chernikov
57a1cf954e * Use TA_FLAG_DEFAULT for default algorithm selection instead of
exporting algorithm structures directly.

* Pass needed state buffer size in algo structures as preparation
  for tables add/del requests batching.
2014-08-01 07:35:17 +00:00
Alexander V. Chernikov
914bffb6ab * Add new "flow" table type to support N=1..5-tuple lookups
* Add "flow:hash" algorithm

Kernel changes:
* Add O_IP_FLOW_LOOKUP opcode to support "flow" lookups
* Add IPFW_TABLE_FLOW table type
* Add "struct tflow_entry" as strage for 6-tuple flows
* Add "flow:hash" algorithm. Basically it is auto-growing chained hash table.
  Additionally, we store mask of fields we need to compare in each instance/

* Increase ipfw_obj_tentry size by adding struct tflow_entry
* Add per-algorithm stat (ifpw_ta_tinfo) to ipfw_xtable_info
* Increase algoname length: 32 -> 64 (algo options passed there as string)
* Assume every table type can be customized by flags, use u8 to store "tflags" field.
* Simplify ipfw_find_table_entry() by providing @tentry directly to algo callback.
* Fix bug in cidr:chash resize procedure.

Userland changes:
* add "flow table(NAME)" syntax to support n-tuple checking tables.
* make fill_flags() separate function to ease working with _s_x arrays
* change "table info" output to reflect longer "type" fields

Syntax:
ipfw table fl2 create type flow:[src-ip][,proto][,src-port][,dst-ip][dst-port] [algo flow:hash]

Examples:

0:02 [2] zfscurr0# ipfw table fl2 create type flow:src-ip,proto,dst-port algo flow:hash
0:02 [2] zfscurr0# ipfw table fl2 info
+++ table(fl2), set(0) +++
 kindex: 0, type: flow:src-ip,proto,dst-port
 valtype: number, references: 0
 algorithm: flow:hash
 items: 0, size: 280
0:02 [2] zfscurr0# ipfw table fl2 add 2a02:6b8::333,tcp,443 45000
0:02 [2] zfscurr0# ipfw table fl2 add 10.0.0.92,tcp,80 22000
0:02 [2] zfscurr0# ipfw table fl2 list
+++ table(fl2), set(0) +++
2a02:6b8::333,6,443 45000
10.0.0.92,6,80 22000
0:02 [2] zfscurr0# ipfw add 200 count tcp from me to 78.46.89.105 80 flow 'table(fl2)'
00200 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
0:03 [2] zfscurr0# ipfw show
00200   0     0 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 617 59416 allow ip from any to any
0:03 [2] zfscurr0# telnet -s 10.0.0.92 78.46.89.105 80
Trying 78.46.89.105...
..
0:04 [2] zfscurr0# ipfw show
00200   5   272 count tcp from me to 78.46.89.105 dst-port 80 flow table(fl2)
65535 682 66733 allow ip from any to any
2014-07-31 20:08:19 +00:00
Alexander V. Chernikov
b23d5de9b6 * Add number:array algorithm lookup method.
Kernel changes:
* s/IPFW_TABLE_U32/IPFW_TABLE_NUMBER/
* Force "lookup <port|uid|gid|jid>" to be IPFW_TABLE_NUMBER
* Support "lookup" method for number tables
* Add number:array algorihm (i32 as key, auto-growing).

Userland changes:
* Support named tables in "lookup <tag> Table"
* Fix handling of "table(NAME,val)" case
* Support printing "number" table data.
2014-07-30 14:52:26 +00:00
Alexander V. Chernikov
ce2817b51c * Add "lookup" method for cidr:hash algorithm type.
* Add auoto-grow ability to cidr:hash type.
* Fix some bugs / simplify implementation for cidr:hash.
2014-07-30 12:39:49 +00:00
Alexander V. Chernikov
daabb523cc Fix "flush" cmd for algorithms wih non-default parameters. 2014-07-30 09:17:40 +00:00
Alexander V. Chernikov
b429d43c36 * Introduce ipfw_ctl3() handler and move all IP_FW3 opcodes there.
The long-term goal is to switch remaining opcodes to IP_FW3 versions
 and use ipfw_ctl3() as default handler simplifying ipfw(4) interaction
 with external world.
2014-07-29 23:06:06 +00:00
Alexander V. Chernikov
9d099b4f38 * Dump available table algorithms via "ipfw talist" cmd.
Kernel changes:
* Add type/refcount fields to table algo instances.
* Add IP_FW_TABLES_ALIST opcode to export available algorihms to userland.

Userland changes:
* Fix cores on empty input inside "ipfw table" handler.
* Add "ipfw talist" cmd to print availabled kernel algorithms.
* Change "table info" output to reflect long algorithm config lines.
2014-07-29 22:44:26 +00:00
Alexander V. Chernikov
0b565ac0e6 * Copy ta structures to stable storage to ease future extension.
* Remove algo .lookup field since table lookup function is set by algo code.
2014-07-29 21:38:06 +00:00
Alexander V. Chernikov
74b941f042 * Add new ipfw cidr algorihm: hash table.
Algorithm works with both IPv4 and IPv6 prefixes, /32 and /128
ranges are assumed by default.
It works the following way: input IP address is masked to specified
mask, hashed and searched inside hash bucket.

Current implementation does not support "lookup" method and hash auto-resize.
This will be changed soon.

some examples:

ipfw table mi_test2 create type cidr algo cidr:hash
ipfw table mi_test create type cidr algo "cidr:hash masks=/30,/64"

ipfw table mi_test2 info
+++ table(mi_test2), set(0) +++
 type: cidr, kindex: 7
 valtype: number, references: 0
 algorithm: cidr:hash
 items: 0, size: 220

ipfw table mi_test info
+++ table(mi_test), set(0) +++
 type: cidr, kindex: 6
 valtype: number, references: 0
 algorithm: cidr:hash masks=/30,/64
 items: 0, size: 220

ipfw table mi_test add 10.0.0.5/30
ipfw table mi_test add 10.0.0.8/30
ipfw table mi_test add 2a02:6b8:b010::1/64 25

ipfw table mi_test list
+++ table(mi_test), set(0) +++
10.0.0.4/30 0
10.0.0.8/30 0
2a02:6b8:b010::/64 25
2014-07-29 19:49:38 +00:00
Alexander V. Chernikov
adea620132 * Change algorthm names to "type:algo" (e.g. "iface:array", "cidr:radix") format.
* Pass number of items changed in add/del hooks to permit adding/deleting
  multiple values at once.
2014-07-29 08:00:13 +00:00
Alexander V. Chernikov
68394ec88e * Add generic ipfw interface tracking API
* Rewrite interface tables to use interface indexes

Kernel changes:
* Add generic interface tracking API:
 - ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates
  state & bumps ref)
 - ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to
  update ifindex)
 - ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer)
 - ipfw_iface_unref(unlocked, drops reference)
Additionally, consumer callbacks are called in interface withdrawal/departure.

* Rewrite interface tables to use iface tracking API. Currently tables are
  implemented the following way:
  runtime data is stored as sorted array of {ifidx, val} for existing interfaces
  full data is stored inside namedobj instance (chained hashed table).

* Add IP_FW_XIFLIST opcode to dump status of tracked interfaces

* Pass @chain ptr to most non-locked algorithm callbacks:
  (prepare_add, prepare_del, flush_entry ..). This may be needed for better
  interaction of given algorithm an other ipfw subsystems

* Add optional "change_ti" algorithm handler to permit updating of
  cached table_info pointer (happens in case of table_max resize)

* Fix small bug in ipfw_list_tables()
* Add badd (insert into sorted array) and bdel (remove from sorted array) funcs

Userland changes:
* Add "iflist" cmd to print status of currently tracked interface
* Add stringnum_cmp for better interface/table names sorting
2014-07-28 19:01:25 +00:00
Alexander V. Chernikov
db785d3199 * Require explicit table creation before use on kernel side.
* Add resize callbacks for upcoming table-based algorithms.

Kernel changes:
* s/ipfw_modify_table/ipfw_manage_table_ent/
* Simplify add_table_entry(): make table creation a separate piece of code.
  Do not perform creation if not in "compat" mode.
* Add ability to perform modification of algorithm state (like table resize).
  The following callbacks were added:
 - prepare_mod (allocate new state, without locks)
 - fill_mod (UH_WLOCK, copy old state to new one)
 - modify (UH_WLOCK + WLOCK, switch state)
 - flush_mod (no locks, flushes allocated data)
 Given callbacks are called if table modification has been requested by add or
   delete callbacks. Additional u64 tc->'flags' field was added to pass these
   requests.
* Change add/del table ent format: permit adding/removing multiple entries
   at once (only 1 supported at the moment).

Userland changes:
* Auto-create tables with warning
2014-07-26 13:37:25 +00:00
Gleb Smirnoff
8ff2bd98d6 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
Alexander V. Chernikov
e0a8b9ee29 * Reduce size of ipfw table entries for cidr/iface:
Since old structures had _value as the last field,
every table match required 3 cache lines instead of 2.
Fix this by
- using the fact that supplied masks are suplicated inside radix
- using lightweigth sa_in6 structure as key for IPv6

Before (amd64):
  sizeof(table_entry): 136
  sizeof(table_xentry): 160
After (amd64):
  sizeof(radix_cidr_entry): 120
  sizeof(radix_cidr_xentry): 128
  sizeof(radix_iface): 128

* Fix memory leak for table entry update
* Do some more sanity checks while deleting entry
* Do not store masks for host routes

Sponsored by:	Yandex LLC
2014-07-09 18:52:12 +00:00
Alexander V. Chernikov
7e767c791f * Use different rule structures in kernel/userland.
* Switch kernel to use per-cpu counters for rules.
* Keep ABI/API.

Kernel changes:
* Each rules is now exported as TLV with optional extenable
  counter block (ip_fW_bcounter for base one) and
  ip_fw_rule for rule&cmd data.
* Counters needs to be explicitly requested by IPFW_CFG_GET_COUNTERS flag.
* Separate counters from rules in kernel and clean up ip_fw a bit.
* Pack each rule in IPFW_TLV_RULE_ENT tlv to ease parsing.
* Introduce versioning in container TLV (may be needed in future).
* Fix ipfw_cfg_lheader broken u64 alignment.

Userland changes:
* Use set_mask from cfg header when requesting config
* Fix incorrect read accouting in ipfw_show_config()
* Use IPFW_RULE_NOOPT flag instead of playing with _pad
* Fix "ipfw -d list": do not print counters for dynamic states
* Some small fixes
2014-07-08 23:11:15 +00:00
Alexander V. Chernikov
6447bae661 * Prepare to pass other dynamic states via ipfw_dump_config()
Kernel changes:
* Change dump format for dynamic states:
  each state is now stored inside ipfw_obj_dyntlv
  last dynamic state is indicated by IPFW_DF_LAST flag
* Do not perform sooptcopyout() for !SOPT_GET requests.

Userland changes:
* Introduce foreach_state() function handler to ease work
  with different states passed by ipfw_dump_config().
2014-07-06 23:26:34 +00:00
Alexander V. Chernikov
81d3153d61 * Add "lookup" table functionality to permit userland entry lookups.
* Bump table dump format preserving old ABI.

Kernel size:
* Add IP_FW_TABLE_XFIND to handle "lookup" request from userland.
* Add ta_find_tentry() algorithm callbacks/handlers to support lookups.
* Fully switch to ipfw_obj_tentry for various table dumps:
  algorithms are now required to support the latest (ipfw_obj_tentry) entry
    dump format, the rest is handled by generic dump code.
  IP_FW_TABLE_XLIST opcode version bumped (0 -> 1).
* Eliminate legacy ta_dump_entry algo handler:
  dump_table_entry() converts data from current to legacy format.

Userland side:
* Add "lookup" table parameter.
* Change the way table type is guessed: call table_get_info() first,
  and check value for IPv4/IPv6 type IFF table does not exist.
* Fix table_get_list(): do more tries if supplied buffer is not enough.
* Sparate table_show_entry() from table_show_list().
2014-07-06 18:16:04 +00:00
Alexander V. Chernikov
1832a7b303 * Issue warning while requesting ruleset with new tables via legacy binary.
Convert each unresolved table as table 65535 (which cannot be used normally).
* Perform s/^ipfw_// for add_table_entry, del_table_entry and flush_table since
  these are internal functions exported to keep legacy interface.
* Remove macro TABLE_SET. Operations with tables can be done in any set, the only
  thing net.inet.ip.fw.tables_sets affects is the set in which tables are looked
  up while binding them to the rule.
2014-07-04 07:02:11 +00:00
Alexander V. Chernikov
ac35ff1784 Fully switch to named tables:
Kernel changes:
* Introduce ipfw_obj_tentry table entry structure to force u64 alignment.
* Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED).
* Use "subtype" field to distingush between IPv4 and IPv6 table records
  instead of previous hack.
* Add value type (vtype) field for kernel tables. Current types are
  number,ip and dscp
* Fix sets mask retrieval for old binaries
* Fix crash while using interface tables

Userland changes:
* Switch ipfw_table_handler() to use named-only tables.
* Add "table NAME create [type {cidr|iface|u32} [valtype {number|ip|dscp}] ..."
* Switch ipfw_table_handler to match_token()-based parser.
* Switch ipfw_sets_handler to use new ipfw_get_config() for mask  retrieval.
* Allow ipfw set X table ... syntax to permit using per-set table namespaces.
2014-07-03 22:25:59 +00:00
Alexander V. Chernikov
6c2997ffec * Add new IP_FW_XADD opcode which permits to
a) specify table ids as names
  b) add multiple rules at once.
Partially convert current code for atomic addition of multiple rules.
2014-06-29 22:35:47 +00:00
Alexander V. Chernikov
2aa75134b7 Enable kernel-side rule filtering based on user request.
Make do_get3() function return real error.
2014-06-29 09:29:27 +00:00
Alexander V. Chernikov
563b5ab132 Suppord showing named tables in ipfw(8) rule listing.
Kernel changes:
* change base TLV header to be u64 (so size can be u32).
* Introduce ipfw_obj_ctlv generc container TLV.
* Add IP_FW_XGET opcode which is now used for atomic configuration
  retrieval. One can specify needed configuration pieces to retrieve
  via flags field. Currently supported are
  IPFW_CFG_GET_STATIC (static rules) and
  IPFW_CFG_GET_STATES (dynamic states).
  Other configuration pieces (tables, pipes, etc..) support is planned.

Userland changes:
* Switch ipfw(8) to use new IP_FW_XGET for rule listing.
* Split rule listing code get and show pieces.
* Make several steps forward towards libipfw:
  permit printing states and rules(paritally) to supplied buffer.
  do not die on malloc/kernel failure inside given printing functions.
  stop assuming cmdline_opts is global symbol.
2014-06-28 23:20:24 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander V. Chernikov
2d99a3497d Use different approach for filling large datasets to userspace:
Instead of trying to allocate bing contiguous chunk of memory,
use intermediate-sized (page size) buffer as sliding window
reducing number of sooptcopyout() calls to perform.

This reduces dump functions complexity and provides additional
layer of abstraction.

User-visible api consists of 2 functions:
ipfw_get_sopt_space() - gets contigious amount of storage (or NULL)
and
ipfw_get_sopt_header() - the same, but zeroes the rest of the buffer.
2014-06-27 10:07:00 +00:00
Alexander V. Chernikov
9490a62716 * Add IP_FW_TABLE_XCREATE / IP_FW_TABLE_XMODIFY opcodes.
* Add 'algoname' string to ipfw_xtable_info permitting to specify lookup
algoritm with parameters.
* Rework part of ipfw_rewrite_table_uidx()

Sponsored by:	Yandex LLC
2014-06-16 13:05:07 +00:00
Alexander V. Chernikov
9c3c43aa77 Remove unused ipfw_dump_xtable(). 2014-06-15 13:43:44 +00:00
Alexander V. Chernikov
d3a4f9249c Simplify opcode handling.
* Use one u16 from op3 header to implement opcode versioning.
* IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current).
* Every getsockopt request is now handled in ip_fw_table.c
* Rename new opcodes:
IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY
IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE
IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST
IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO
IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH

* Add some docs about using given opcodes.
* Group some legacy opcode/handlers.
2014-06-15 13:40:27 +00:00
Alexander V. Chernikov
f1220db8d7 Move further to eliminate next pieces of number-assuming code inside tables.
Kernel changes:
* Add IP_FW_OBJ_FLUSH opcode (flush table based on its name/set)
* Add IP_FW_OBJ_DUMP opcode (dumps table data based on its names/set)
* Add IP_FW_OBJ_LISTSIZE / IP_FW_OBJ_LIST opcodes (get list of kernel tables)

Userland changes:
* move tables code to separate tables.c file
* get rid of tables_max
* switch "all"/list handling to new opcodes
2014-06-14 22:47:25 +00:00
Alexander V. Chernikov
ea761a5dc4 Move most of external table structures/functions to separate ip_fw_table.h 2014-06-14 11:13:02 +00:00
Alexander V. Chernikov
9f7d47b025 Add API to ease adding new algorithms/new tabletypes to ipfw.
Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
  Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
  new ip_fw_table_algo.c file.
  Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+       char            name[64];
+       int             idx;
+       ta_init         *init;
+       ta_destroy      *destroy;
+       table_lookup_t  *lookup;
+       ta_prepare_add  *prepare_add;
+       ta_prepare_del  *prepare_del;
+       ta_add          *add;
+       ta_del          *del;
+       ta_flush_entry  *flush_entry;
+       ta_foreach      *foreach;
+       ta_dump_entry   *dump_entry;
+       ta_dump_xentry  *dump_xentry;
+};

* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
   ->tablestate pointer (array of 32 bytes structures necessary for
   runtime lookups (can be probably shrinked to 16 bytes later):

   +struct table_info {
   +       table_lookup_t  *lookup;        /* Lookup function */
   +       void            *state;         /* Lookup radix/other structure */
   +       void            *xstate;        /* eXtended state */
   +       u_long          data;           /* Hints for given func */
   +};

* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().

* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
    implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
    currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).

Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)

Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
2014-06-14 10:58:39 +00:00
Alexander V. Chernikov
b074b7bbce Make ipfw tables use names as used-level identifier internally:
* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode

Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize

Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced

Sponsored by:	Yandex LLC
2014-06-12 09:59:11 +00:00
Hiren Panchasara
046faba0eb DNOLD_IS_ECN introduced by r266941 is not required.
DNOLD_* flags are for compat with old binaries.

Suggested by: luigi
2014-06-01 20:19:17 +00:00
Hiren Panchasara
fc5e1956d9 ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by:	Midori Kato (aoimidori27@gmail.com)
Worked with:	Lars Eggert (lars@netapp.com)
Reviewed by:	luigi, hiren
2014-06-01 07:28:24 +00:00
John Baldwin
b437b06c79 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
Andrey V. Elsukov
3a5db2d492 Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR:		189655
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-05-18 14:25:19 +00:00
Alexander V. Chernikov
c3015737f3 Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR:		bin/189471
Submitted by:	Dennis Yusupoff <dyr@smartspb.net>
MFC after:	2 weeks
2014-05-17 13:45:03 +00:00
Gleb Smirnoff
0e4f18aa68 o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
Mikolaj Golub
8a477d48e0 Define startup order the same way as it is in dummynet. 2014-04-26 08:05:16 +00:00
Gleb Smirnoff
53f4b0cf9b The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00