Commit Graph

270 Commits

Author SHA1 Message Date
Andrey V. Elsukov
6951cecf71 Add three helper function to manage tables from external modules.
ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-08-13 15:48:56 +00:00
Andrey V. Elsukov
56132dcc0d Move logging via BPF support into separate file.
* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
  ipfw0 by DLT type used in bpfattach. This interface is intended to
  used by ipfw modules to dump packets with additional info attached.
  Currently pflog format is used. ipfw_bpf_mtap2() function uses second
  argument to determine which interface use for dumping. If dlen is equal
  to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
  PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-08-13 15:41:04 +00:00
Andrey V. Elsukov
d6eb9b0249 Restore "nat global" support.
Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.

PR:		211256
Tested by:	Victor Chernov
MFC after:	3 days
2016-08-11 10:10:10 +00:00
Konstantin Belousov
584b675ed6 Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI.  The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure.  As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance.  Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	jhb (same)
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7302
2016-07-27 11:08:59 +00:00
Andrey V. Elsukov
ed22e564b8 Add named dynamic states support to ipfw(4).
The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by:	julian
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6674
2016-07-19 04:56:59 +00:00
Andrey V. Elsukov
b867e84e95 Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by:	hrs
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D6420
2016-07-18 19:46:31 +00:00
Don Lewis
98e82c02e5 Fix problems in the FQ-PIE AQM cleanup code that could leak memory or
cause a crash.

Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup()
is not able to use callout_drain() to make sure that all callouts are
finished before it returns, and callout_stop() is not sufficient to make
that guarantee.  After pie_cleanup() returns, dummynet will free a
structure that any remaining callouts will want to access.

Fix these problems by allocating a separate structure to contain the
data used by the callouts.  In pie_cleanup(), call callout_reset_sbt()
to replace the normal callout with a cleanup callout that does the cleanup
work for each sub-queue.  The instance of the cleanup callout that
destroys the last flow will also free the extra allocated block of memory.
Protect the reference count manipulation in the cleanup callout with
DN_BH_WLOCK() to be consistent with all of the other usage of the reference
count where this lock is held by the dummynet code.

Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D7174
2016-07-12 17:32:40 +00:00
Don Lewis
12be18c7d5 Fix a race condition between the main thread in aqm_pie_cleanup() and the
callout thread that can cause a kernel panic.  Always do the final cleanup
in the callout thread by passing a separate callout function for that task
to callout_reset_sbt().

Protect the ref_count decrement in the callout with DN_BH_WLOCK().  All
other ref_count manipulation is protected with this lock.

There is still a tiny window between ref_count reaching zero and the end
of the callout function where it is unsafe to unload the module.  Fixing
this would require the use of callout_drain(), but this can't be done
because dummynet holds a mutex and callout_drain() might sleep.

Remove the callout_pending(), callout_active(), and callout_deactivate()
calls from calculate_drop_prob().  They are not needed because this callout
uses callout_init_mtx().

Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
Approved by:	re (gjb)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D6928
2016-07-05 00:53:01 +00:00
Bjoern A. Zeeb
9ac51e7911 In case of the global eventhandler make sure the current VNET
is still operational before doing any work;  otherwise we might
run into, e.g., destroyed locks.

PR:		210724
Reported by:	olevole olevole.ru
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Obtained from:	projects/vnet
Approved by:	re (gjb)
2016-06-30 19:32:45 +00:00
Bjoern A. Zeeb
31fe4e62fa Move the ipfw_log_bpf() calls from global module initialisation to
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.

Approved by:		re (gjb)
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
2016-06-30 01:33:14 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Alexander V. Chernikov
37aefa2ad1 Fix 4-byte overflow in ipv6_writemask.
This bug could cause some IPv6 table prefix delete requests to fail.

Obtained from:	Yandex LLC
2016-06-05 10:33:53 +00:00
Don Lewis
d673654796 Replace constant expressions that contain multiplications by
fractional floating point values with integer divides.  This will
eliminate any chance that the compiler will generate code to evaluate
the expression using floating point at runtime.

Suggested by:	bde
Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after:	8 days (with r300779 and r300949)
2016-06-01 20:04:24 +00:00
Don Lewis
fe4b5f6659 Cast some expressions that multiply a long long constant by a
floating point constant to int64_t.  This avoids the runtime
conversion of the the other operand in a set of comparisons from
int64_t to floating point and doing the comparisions in floating
point.

Suggested by:	lidl
Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after:	2 weeks (with r300779)
2016-05-29 07:23:56 +00:00
Don Lewis
248c72bfb8 Correct a typo in a comment.
MFC after:	2 weeks (with r300779)
2016-05-26 22:03:28 +00:00
Don Lewis
4e59799e1b Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak
its expression to work on powerpc and sparc64 (gcc compatibility).

Correct a typo in a nearby comment.

MFC after:	2 weeks (with r300779)
2016-05-26 21:44:52 +00:00
Don Lewis
91336b403a Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE).
Centre for Advanced Internet Architectures

Implementing AQM in FreeBSD

* Overview <http://caia.swin.edu.au/freebsd/aqm/index.html>

* Articles, Papers and Presentations
  <http://caia.swin.edu.au/freebsd/aqm/papers.html>

* Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html>

Overview

Recent years have seen a resurgence of interest in better managing
the depth of bottleneck queues in routers, switches and other places
that get congested. Solutions include transport protocol enhancements
at the end-hosts (such as delay-based or hybrid congestion control
schemes) and active queue management (AQM) schemes applied within
bottleneck queues.

The notion of AQM has been around since at least the late 1990s
(e.g. RFC 2309). In recent years the proliferation of oversized
buffers in all sorts of network devices (aka bufferbloat) has
stimulated keen community interest in four new AQM schemes -- CoDel,
FQ-CoDel, PIE and FQ-PIE.

The IETF AQM working group is looking to document these schemes,
and independent implementations are a corner-stone of the IETF's
process for confirming the clarity of publicly available protocol
descriptions. While significant development work on all three schemes
has occured in the Linux kernel, there is very little in FreeBSD.

Project Goals

This project began in late 2015, and aims to design and implement
functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE
in FreeBSD (with code BSD-licensed as much as practical). We have
chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall
and traffic shaper. Implementation of these AQM schemes in FreeBSD
will:
* Demonstrate whether the publicly available documentation is
  sufficient to enable independent, functionally equivalent implementations

* Provide a broader suite of AQM options for sections the networking
  community that rely on FreeBSD platforms

Program Members:

* Rasool Al Saadi (developer)

* Grenville Armitage (project lead)

Acknowledgements:

This project has been made possible in part by a gift from the
Comcast Innovation Fund.

Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
X-No objection:	core
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6388
2016-05-26 21:40:13 +00:00
Andrey V. Elsukov
d16f495cad Fix the regression introduced in r300143.
When we are creating new dynamic state use MATCH_FORWARD direction to
correctly initialize protocol's state.
2016-05-20 15:00:12 +00:00
Andrey V. Elsukov
96e84c57e1 Move protocol state handling code from lookup_dyn_rule_locked() function
into dyn_update_proto_state(). This allows eliminate the second state
lookup in the ipfw_install_state().
Also remove MATCH_* macros, they are defined in ip_fw_private.h as enum.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-18 12:53:21 +00:00
Andrey V. Elsukov
2685841b38 Make named objects set-aware. Now it is possible to create named
objects with the same name in different sets.

Add optional manage_sets() callback to objects rewriting framework.
It is intended to implement handler for moving and swapping named
object's sets. Add ipfw_obj_manage_sets() function that implements
generic sets handler. Use new callback to implement sets support for
lookup tables.
External actions objects are global and they don't support sets.
Modify eaction_findbyname() to reflect this.
ipfw(8) now may fail to move rules or sets, because some named objects
in target set may have conflicting names.
Note that ipfw_obj_ntlv type was changed, but since lookup tables
actually didn't support sets, this change is harmless.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-17 07:47:23 +00:00
Andrey V. Elsukov
9f2e5ed3cc Fix memory leak possible in error case.
Use free_rule() instead of free(), it will also release memory allocated
for rule counters.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-11 10:04:32 +00:00
Andrey V. Elsukov
b309f085e0 Change the type of objhash_cb_t callback function to be able return an
error code. Use it to interrupt the loop in ipfw_objhash_foreach().

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-06 03:18:51 +00:00
Andrey V. Elsukov
2df1a11ffa Rename find_name_tlv_type() to ipfw_find_name_tlv_type() and make it
global. Use it in ip_fw_table.c instead of find_name_tlv() to reduce
duplicated code.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-05 20:15:46 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Andrey V. Elsukov
9a5be809ab Make create_object callback optional and return EOPNOTSUPP when it isn't
defined. Remove eaction_create_compat() and use designated initializers to
initialize eaction_opcodes structure.

Obtained from:	Yandex LLC
2016-04-27 15:28:25 +00:00
Pedro F. Giffuni
7a6ab8f19e netpfil: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae
2016-04-15 12:24:01 +00:00
Andrey V. Elsukov
2acdf79f53 Add External Actions KPI to ipfw(9).
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 22:51:23 +00:00
Andrey V. Elsukov
4bd916567e Change the type of 'etlv' field in struct named_object to uint16_t.
It should match with the type field in struct ipfw_obj_tlv.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 21:52:31 +00:00
Andrey V. Elsukov
f8e26ca319 Adjust some comments and make ref_opcode_object() static. 2016-04-14 21:45:18 +00:00
Andrey V. Elsukov
b2df1f7ea1 o Teach opcode rewriting framework handle several rewriters for
the same opcode.

o Reduce number of times classifier callback is called. It is
  redundant to call it just after find_op_rw(), since the last
  does call it already and can have all results.

o Do immediately opcode rewrite in the ref_opcode_object().
  This eliminates additional classifier lookup later on bulk update.
  For unresolved opcodes the behavior still the same, we save information
  from classifier callback in the obj_idx array, then perform automatic
  objects creation, then perform rewriting for opcodes using indeces
  from created objects.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 21:31:16 +00:00
Andrey V. Elsukov
f976a4edc0 Move several functions related to opcode rewriting framework from
ip_fw_table.c into ip_fw_sockopt.c and make them static.

Obtained from:	Yandex LLC
2016-04-14 20:49:27 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Andrey V. Elsukov
657592fd65 Use correct size for malloc.
Obtained from:	Yandex LLC
MFC after:	1 week
2016-03-03 13:07:59 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Andrey V. Elsukov
23a6c7330c Fix bug in filling and handling ipfw's O_DSCP opcode.
Due to integer overflow CS4 token was handled as BE.

PR:		207459
MFC after:	1 week
2016-02-24 13:16:03 +00:00
Gleb Smirnoff
cd82d21b2e Fix obvious typo, that lead to incorrect sorting.
Found by:	PVS-Studio
2016-02-18 19:05:30 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Luigi Rizzo
1cdc5f0b87 cleanup and document in some detail the internals of the testing code
for dummynet schedulers
2016-01-27 02:22:31 +00:00
Luigi Rizzo
ff8d60ab4d the _Static_assert was not supposed to be in the commit. 2016-01-27 02:14:08 +00:00
Luigi Rizzo
788c0c66ab bugfix: the scheduler template (dn_schk) for the round robin scheduler
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).

This is a merge candidate for stable and 10.3

MFC after:	3 days
2016-01-27 02:08:30 +00:00
Luigi Rizzo
10d72ffc7d fix various warnings to compile the test code with -Wextra 2016-01-26 23:37:07 +00:00
Luigi Rizzo
fa57c83c70 fix various warnings (signed/unsigned, printf types, unused arguments) 2016-01-26 23:36:18 +00:00
Luigi Rizzo
f6a5c66400 prevent warnings for signed/unsigned comparisons and unused arguments.
Add checks for parameters overflowing 32 bit.
2016-01-26 22:46:58 +00:00
Luigi Rizzo
e72cd9a70d prevent warning for unused argument 2016-01-26 22:45:45 +00:00
Luigi Rizzo
4d85bfeb07 avoid warnings for signed/unsigned comparison and unused arguments 2016-01-26 22:45:05 +00:00
Luigi Rizzo
f51b072d4c Revert one chunk from commit 285362, which introduced an off-by-one error
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.
2016-01-26 04:48:24 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Alexander V. Chernikov
fa7c058bf8 Fix panic on table/table entry delete. The panic could have happened
if more than 64 distinct values had been used.

Table value code uses internal objhash API which requires unique key
  for each object. For value code, pointer to the actual value data
  is used. The actual problem arises from the fact that 'actual' e.g.
  runtime data is stored in array and that array is auto-growing. There is
  special hook (update_tvalue() function) which is used to update the pointers
  after the change. For some reason, object 'key' was not updated.
  Fix this by adding update code to the update_tvalue().

Sponsored by:	Yandex LLC
2016-01-21 18:20:40 +00:00
Alexander V. Chernikov
89fc126add Initialize error value ta_lookup_kfib() by default to please compiler. 2016-01-10 08:37:00 +00:00
Bjoern A. Zeeb
60c274aaf8 Initialize error after r293626 in case neither INET nor INET6 is
compiled into the kernel.  Ideally lots more code would just not
be called (or compiled in) in that case but that requires a lot
more surgery.  For now try to make IP-less kernels compile again.
2016-01-10 08:14:25 +00:00