Commit Graph

520 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
7d7751a071 Make sure pflog is attached after pf is initializaed so we can
borrow pf's lock, and also make sure pflog goes after pf is gone
in order to avoid callouts in VNETs to an already freed instance.

Reported by:    Ivan Klymenko, Johan Hendriks  on current@ today
Obtained from:  projects/vnet
Sponsored by:   The FreeBSD Foundation
MFC after:      13 days
Approved by:	re (gjb)
2016-06-23 22:31:10 +00:00
Bjoern A. Zeeb
a8e8c57443 PFSTATE_NOSYNC goes onto state_flags, not sync_state;
this prevents: panic: pfsync_delete_state: unexpected sync state 8

Reviewed by:		kp
Approved by:		re (gjb)
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6942
2016-06-23 21:42:43 +00:00
Bjoern A. Zeeb
a0429b5459 Update pf(4) and pflog(4) to survive basic VNET testing, which includes
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.

Reviewed by:		kp
Approved by:		re (gjb)
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6924
2016-06-23 21:34:38 +00:00
Bjoern A. Zeeb
8147948e19 Import a fix for and old security issue (CVE-2010-3830) in pf which
was not relevant to FreeBSD as only root could open /dev/pf by default.
With VIMAGE this is will longer be the case.  As pf(4) starts to
be supported with VNETs 3rd party users may open /dev/pf inside the
virtual jail instance; thus we need to address this issue after all.
While OpenBSD largely rewrote code parts for the fix [1], and it's
unclear what Apple [3] did, import the minimal fix from NetBSD [2].

[1] http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_ioctl.c.diff?r1=1.235&r2=1.236
[2] http://mail-index.netbsd.org/source-changes/2011/01/19/msg017518.html
[3] https://support.apple.com/en-gb/HT202154

Obtained from:		http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dist/pf/net/pf_ioctl.c.diff?r1=1.42&r2=1.43&only_with_tag=MAIN
MFC After:		2 weeks
Approved by:		re (gjb)
Sponsored by:		The FreeBSD Foundation
Security:		CVE-2010-3830
2016-06-23 05:41:46 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Kristof Provost
3e248e0fb4 pf: Filter on and set vlan PCP values
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.

Reviewed by:    allanjude, araujo
Approved by:	re (gjb)
Obtained from:  OpenBSD (mostly)
Differential Revision:  https://reviews.freebsd.org/D6786
2016-06-17 18:21:55 +00:00
Alexander V. Chernikov
37aefa2ad1 Fix 4-byte overflow in ipv6_writemask.
This bug could cause some IPv6 table prefix delete requests to fail.

Obtained from:	Yandex LLC
2016-06-05 10:33:53 +00:00
Don Lewis
d673654796 Replace constant expressions that contain multiplications by
fractional floating point values with integer divides.  This will
eliminate any chance that the compiler will generate code to evaluate
the expression using floating point at runtime.

Suggested by:	bde
Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after:	8 days (with r300779 and r300949)
2016-06-01 20:04:24 +00:00
Don Lewis
fe4b5f6659 Cast some expressions that multiply a long long constant by a
floating point constant to int64_t.  This avoids the runtime
conversion of the the other operand in a set of comparisons from
int64_t to floating point and doing the comparisions in floating
point.

Suggested by:	lidl
Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after:	2 weeks (with r300779)
2016-05-29 07:23:56 +00:00
Don Lewis
248c72bfb8 Correct a typo in a comment.
MFC after:	2 weeks (with r300779)
2016-05-26 22:03:28 +00:00
Don Lewis
4e59799e1b Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak
its expression to work on powerpc and sparc64 (gcc compatibility).

Correct a typo in a nearby comment.

MFC after:	2 weeks (with r300779)
2016-05-26 21:44:52 +00:00
Don Lewis
91336b403a Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE).
Centre for Advanced Internet Architectures

Implementing AQM in FreeBSD

* Overview <http://caia.swin.edu.au/freebsd/aqm/index.html>

* Articles, Papers and Presentations
  <http://caia.swin.edu.au/freebsd/aqm/papers.html>

* Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html>

Overview

Recent years have seen a resurgence of interest in better managing
the depth of bottleneck queues in routers, switches and other places
that get congested. Solutions include transport protocol enhancements
at the end-hosts (such as delay-based or hybrid congestion control
schemes) and active queue management (AQM) schemes applied within
bottleneck queues.

The notion of AQM has been around since at least the late 1990s
(e.g. RFC 2309). In recent years the proliferation of oversized
buffers in all sorts of network devices (aka bufferbloat) has
stimulated keen community interest in four new AQM schemes -- CoDel,
FQ-CoDel, PIE and FQ-PIE.

The IETF AQM working group is looking to document these schemes,
and independent implementations are a corner-stone of the IETF's
process for confirming the clarity of publicly available protocol
descriptions. While significant development work on all three schemes
has occured in the Linux kernel, there is very little in FreeBSD.

Project Goals

This project began in late 2015, and aims to design and implement
functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE
in FreeBSD (with code BSD-licensed as much as practical). We have
chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall
and traffic shaper. Implementation of these AQM schemes in FreeBSD
will:
* Demonstrate whether the publicly available documentation is
  sufficient to enable independent, functionally equivalent implementations

* Provide a broader suite of AQM options for sections the networking
  community that rely on FreeBSD platforms

Program Members:

* Rasool Al Saadi (developer)

* Grenville Armitage (project lead)

Acknowledgements:

This project has been made possible in part by a gift from the
Comcast Innovation Fund.

Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
X-No objection:	core
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D6388
2016-05-26 21:40:13 +00:00
Kristof Provost
b599e8dc59 pf: Fix more ICMP mistranslation
In the default case fix the substitution of the destination address.

PR:		201519
Submitted by:	Max <maximos@als.nnov.ru>
MFC after:	1 week
2016-05-23 13:59:48 +00:00
Kristof Provost
c0c82715b8 pf: Fix ICMP translation
Fix ICMP source address rewriting in rdr scenarios.

PR:		201519
Submitted by:	Max <maximos@als.nnov.ru>
MFC after:	1 week
2016-05-23 12:41:29 +00:00
Kristof Provost
d9f4fce5a7 pf: Fix fragment timeout
We were inconsistent about the use of time_second vs. time_uptime.
Always use time_uptime so the value can be meaningfully compared.

Submitted by:	"Max" <maximos@als.nnov.ru>
MFC after:	4 days
2016-05-20 15:41:05 +00:00
Andrey V. Elsukov
d16f495cad Fix the regression introduced in r300143.
When we are creating new dynamic state use MATCH_FORWARD direction to
correctly initialize protocol's state.
2016-05-20 15:00:12 +00:00
Andrey V. Elsukov
96e84c57e1 Move protocol state handling code from lookup_dyn_rule_locked() function
into dyn_update_proto_state(). This allows eliminate the second state
lookup in the ipfw_install_state().
Also remove MATCH_* macros, they are defined in ip_fw_private.h as enum.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-18 12:53:21 +00:00
Andrey V. Elsukov
2685841b38 Make named objects set-aware. Now it is possible to create named
objects with the same name in different sets.

Add optional manage_sets() callback to objects rewriting framework.
It is intended to implement handler for moving and swapping named
object's sets. Add ipfw_obj_manage_sets() function that implements
generic sets handler. Use new callback to implement sets support for
lookup tables.
External actions objects are global and they don't support sets.
Modify eaction_findbyname() to reflect this.
ipfw(8) now may fail to move rules or sets, because some named objects
in target set may have conflicting names.
Note that ipfw_obj_ntlv type was changed, but since lookup tables
actually didn't support sets, this change is harmless.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-17 07:47:23 +00:00
Andrey V. Elsukov
9f2e5ed3cc Fix memory leak possible in error case.
Use free_rule() instead of free(), it will also release memory allocated
for rule counters.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-11 10:04:32 +00:00
Andrey V. Elsukov
b309f085e0 Change the type of objhash_cb_t callback function to be able return an
error code. Use it to interrupt the loop in ipfw_objhash_foreach().

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-06 03:18:51 +00:00
Andrey V. Elsukov
2df1a11ffa Rename find_name_tlv_type() to ipfw_find_name_tlv_type() and make it
global. Use it in ip_fw_table.c instead of find_name_tlv() to reduce
duplicated code.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-05-05 20:15:46 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Andrey V. Elsukov
9a5be809ab Make create_object callback optional and return EOPNOTSUPP when it isn't
defined. Remove eaction_create_compat() and use designated initializers to
initialize eaction_opcodes structure.

Obtained from:	Yandex LLC
2016-04-27 15:28:25 +00:00
Pedro F. Giffuni
7a6ab8f19e netpfil: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by:	ae
2016-04-15 12:24:01 +00:00
Andrey V. Elsukov
2acdf79f53 Add External Actions KPI to ipfw(9).
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 22:51:23 +00:00
Andrey V. Elsukov
4bd916567e Change the type of 'etlv' field in struct named_object to uint16_t.
It should match with the type field in struct ipfw_obj_tlv.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 21:52:31 +00:00
Andrey V. Elsukov
f8e26ca319 Adjust some comments and make ref_opcode_object() static. 2016-04-14 21:45:18 +00:00
Andrey V. Elsukov
b2df1f7ea1 o Teach opcode rewriting framework handle several rewriters for
the same opcode.

o Reduce number of times classifier callback is called. It is
  redundant to call it just after find_op_rw(), since the last
  does call it already and can have all results.

o Do immediately opcode rewrite in the ref_opcode_object().
  This eliminates additional classifier lookup later on bulk update.
  For unresolved opcodes the behavior still the same, we save information
  from classifier callback in the obj_idx array, then perform automatic
  objects creation, then perform rewriting for opcodes using indeces
  from created objects.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2016-04-14 21:31:16 +00:00
Andrey V. Elsukov
f976a4edc0 Move several functions related to opcode rewriting framework from
ip_fw_table.c into ip_fw_sockopt.c and make them static.

Obtained from:	Yandex LLC
2016-04-14 20:49:27 +00:00
Pedro F. Giffuni
74b8d63dcc Cleanup unnecessary semicolons from the kernel.
Found with devel/coccinelle.
2016-04-10 23:07:00 +00:00
Kristof Provost
0d8c93313e pf: Improve forwarding detection
When we guess the nature of the outbound packet (output vs. forwarding) we need
to take bridges into account. When bridging the input interface does not match
the output interface, but we're not forwarding. Similarly, it's possible for the
interface to actually be the bridge interface itself (and not a member interface).

PR:		202351
MFC after:	2 weeks
2016-03-16 06:42:15 +00:00
Andrey V. Elsukov
657592fd65 Use correct size for malloc.
Obtained from:	Yandex LLC
MFC after:	1 week
2016-03-03 13:07:59 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Kristof Provost
14b5e85b18 pf: Fix possible out-of-bounds write
In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs,
which is processed in pfr_set_addrs(). At the users request we also provide
feedback on the deleted addresses, by storing them after the new list
('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()).

This means we write outside the bounds of the buffer we've just allocated.
We need to look at pfrio_size2 instead (i.e. the size the user reserved for our
feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than
pfrio_size though, in which case we'd still read outside of the allocated
buffer. Instead we allocate the largest of the two values.

Reported By:	Paul J Murphy <paul@inetstat.net>
PR:		207463
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D5426
2016-02-25 07:33:59 +00:00
Andrey V. Elsukov
23a6c7330c Fix bug in filling and handling ipfw's O_DSCP opcode.
Due to integer overflow CS4 token was handled as BE.

PR:		207459
MFC after:	1 week
2016-02-24 13:16:03 +00:00
Kristof Provost
c90369f880 in pf_print_state_parts, do not use skw->proto to print the protocol but our
local copy proto that we very carefully set beforehands. skw being NULL is
perfectly valid there.

Obtained from:	OpenBSD (henning)
2016-02-20 12:53:53 +00:00
Gleb Smirnoff
cd82d21b2e Fix obvious typo, that lead to incorrect sorting.
Found by:	PVS-Studio
2016-02-18 19:05:30 +00:00
Gleb Smirnoff
8ec07310fa These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h
2016-02-01 17:41:21 +00:00
Luigi Rizzo
1cdc5f0b87 cleanup and document in some detail the internals of the testing code
for dummynet schedulers
2016-01-27 02:22:31 +00:00
Luigi Rizzo
ff8d60ab4d the _Static_assert was not supposed to be in the commit. 2016-01-27 02:14:08 +00:00
Luigi Rizzo
788c0c66ab bugfix: the scheduler template (dn_schk) for the round robin scheduler
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).

This is a merge candidate for stable and 10.3

MFC after:	3 days
2016-01-27 02:08:30 +00:00
Luigi Rizzo
10d72ffc7d fix various warnings to compile the test code with -Wextra 2016-01-26 23:37:07 +00:00
Luigi Rizzo
fa57c83c70 fix various warnings (signed/unsigned, printf types, unused arguments) 2016-01-26 23:36:18 +00:00
Luigi Rizzo
f6a5c66400 prevent warnings for signed/unsigned comparisons and unused arguments.
Add checks for parameters overflowing 32 bit.
2016-01-26 22:46:58 +00:00
Luigi Rizzo
e72cd9a70d prevent warning for unused argument 2016-01-26 22:45:45 +00:00
Luigi Rizzo
4d85bfeb07 avoid warnings for signed/unsigned comparison and unused arguments 2016-01-26 22:45:05 +00:00
Luigi Rizzo
f51b072d4c Revert one chunk from commit 285362, which introduced an off-by-one error
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.
2016-01-26 04:48:24 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Alexander V. Chernikov
fa7c058bf8 Fix panic on table/table entry delete. The panic could have happened
if more than 64 distinct values had been used.

Table value code uses internal objhash API which requires unique key
  for each object. For value code, pointer to the actual value data
  is used. The actual problem arises from the fact that 'actual' e.g.
  runtime data is stored in array and that array is auto-growing. There is
  special hook (update_tvalue() function) which is used to update the pointers
  after the change. For some reason, object 'key' was not updated.
  Fix this by adding update code to the update_tvalue().

Sponsored by:	Yandex LLC
2016-01-21 18:20:40 +00:00
Alexander V. Chernikov
89fc126add Initialize error value ta_lookup_kfib() by default to please compiler. 2016-01-10 08:37:00 +00:00
Bjoern A. Zeeb
60c274aaf8 Initialize error after r293626 in case neither INET nor INET6 is
compiled into the kernel.  Ideally lots more code would just not
be called (or compiled in) in that case but that requires a lot
more surgery.  For now try to make IP-less kernels compile again.
2016-01-10 08:14:25 +00:00
Alexander V. Chernikov
004d3e30a7 Make ipfw addr:kfib lookup algo use new routing KPI. 2016-01-10 06:43:43 +00:00
Alexander V. Chernikov
3673828490 Use already pre-calculated number of entries instead of tc->count. 2016-01-10 00:28:44 +00:00
Alexander V. Chernikov
ea8d14925c Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
Alexander V. Chernikov
460a5b502f Convert pf(4) to the new routing API.
Differential Revision:	https://reviews.freebsd.org/D4763
2016-01-07 10:20:03 +00:00
Hans Petter Selasky
c8cfbc066f Properly drain callouts in the IPFW subsystem to avoid use after free
panics when unloading the dummynet and IPFW modules:

- The callout drain function can sleep and should not be called having
a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)".

- Add a new "dn_gone" variable to prevent asynchronous restart of
dummynet callouts when unloading the dummynet kernel module.

- Call "dn_reschedule()" locked so that "dn_gone" can be set and
checked atomically with regard to starting a new callout.

Reviewed by:	hiren
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3855
2015-12-15 09:02:05 +00:00
Alexander V. Chernikov
65ff3638df Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
Andrey V. Elsukov
1cf09efe5d Add destroy_object callback to object rewriting framework.
It is called when last reference to named object is going to be released
and allows to do additional cleanup for implementation of named objects.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-23 22:06:55 +00:00
Bryan Drewery
7143303723 Fix dynamic IPv6 rules showing junk for non-specified address masks.
For example:
  00002      0         0 (19s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
  00002      4       412 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
  00002     10       777 (1s) LIMIT tcp 2001:894:5a24:653::503:1 52023 <-> 2001:894:5a24:653:ca0a:a9ff:fe04:3978 22
  00002      0         0 (17s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> 80f3:70d:23fe:ffff:1005:: 0

Fix this by zeroing the unused address, as is done for IPv4:
  00002     0         0 (18s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
  00002    36     14952 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
  00002     0         0 (0s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> :: 0
  00002     4       345 (274s) LIMIT tcp 2001:894:5a24:653::503:1 34131 <-> 2001:470:1f11:262:ca0a:a9ff:fe04:3978 22

MFC after:	2 weeks
2015-11-17 20:42:08 +00:00
Alexander V. Chernikov
637670e77e Bring back the ability of passing cached route via nd6_output_ifp(). 2015-11-15 16:02:22 +00:00
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Alexander V. Chernikov
91e93daf9c Print proper setfib values in ipfw log.
Submitted by:	Denis Schneider <v1ne2go at gmail>
2015-11-08 13:44:21 +00:00
Alexander V. Chernikov
b554a27822 Fix setfib target.
Problem was introduced in r272840 when converting tablearg value to 0.

Submitted by:	Denis Schneider <v1ne2go at gmail>
2015-11-08 12:24:19 +00:00
Kristof Provost
5a505b317a pf: Fix broken rule skip calculation
r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.
2015-11-07 23:51:42 +00:00
Andrey V. Elsukov
ee09cb0bfb Remove now obsolete KASSERT.
Actually, object classify callbacks can skip some opcodes, that could
be rewritten. We will deteremine real numbed of rewritten opcodes a bit
later in this function.

Reported by:	David H. Wolfskill <david at catwhisker dot org>
2015-11-03 22:23:09 +00:00
Andrey V. Elsukov
748c9559ee Eliminate any conditional increments of object_opcodes in the
check_ipfw_rule_body() function. This function is intended to just
determine that rule has some opcodes that can be rewrited. Then the
ref_rule_objects() function will determine real number of rewritten
opcodes using classify callback.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:34:26 +00:00
Andrey V. Elsukov
f81431cca1 Add ipfw_check_object_name_generic() function to do basic checks for an
object name correctness. Each type of object can do more strict checking
in own implementation. Do such checks for tables in check_table_name().

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:29:46 +00:00
Andrey V. Elsukov
5dc5a0e0aa Implement ipfw internal olist command to list named objects.
Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-11-03 10:21:53 +00:00
Kristof Provost
679e3c77b7 pf: Fix IPv6 checksums with route-to.
When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by:	Luoqi Chen
MFC after:	1 week
2015-10-29 20:45:53 +00:00
Alexander V. Chernikov
78546dad4e Eliminate last rtalloc_ign() caller.
Differential Revision:	https://reviews.freebsd.org/D3927
2015-10-27 21:25:40 +00:00
Kristof Provost
c110fc49da pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
Alexander V. Chernikov
c6fb65b1df Bump number of prefixes in O_IP_<SRC|DST> from 15 to 31 (max possible).
PR:		203459
Submitted by:	groos at xiplink.com
MFC after:	2 weeks
2015-10-03 05:42:25 +00:00
Alexander V. Chernikov
1fe201c322 Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
Kristof Provost
2f6c345adf pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set
If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR:	202351
Reviewed by:	eri
Differential Revision:	https://reviews.freebsd.org/D3534
2015-09-01 19:04:04 +00:00
Kristof Provost
64b3b4d611 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
Alexander V. Chernikov
3535eac433 Fix packets/bytes accounting on i386.
Spotted by:	julian
2015-08-27 07:53:58 +00:00
Luiz Otavio O Souza
22932fc9be Reapply r196551 which was accidentally reverted by r223637 (update to
OpenBSD pf 4.5).

Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

This fix the failure when you hit the self table size and force it to be
resized.

MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza
0a70aaf8f5 Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza
f2fc809dcd Fix the copy of addresses passed from userland in table replace command.
The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).

Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Rubicon Communications (Netgate)
2015-08-17 23:03:54 +00:00
Mariusz Zaborski
643ef281cd Use correct src/dst ports when removing states.
Submitted by:	Milosz Kaniewski <m.kaniewski@wheelsystems.com>,
		UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal)
Reviewed by:	glebius
Approved by:	pjd (mentor)
Obtained from:	OpenBSD
MFC after:	3 days
2015-08-11 17:24:34 +00:00
Andrey V. Elsukov
b13653baf9 Reduce overhead of ipfw's me6 opcode.
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 10:53:42 +00:00
Kristof Provost
48c29b118e pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

PR:		201879, 201932
MFC after:	5 days
2015-07-29 06:35:36 +00:00
Renato Botelho
299c819a75 Simplify logic added in r285945 as suggested by glebius
Approved by:	glebius
MFC after:	3 days
Sponsored by:	Netgate
2015-07-28 14:59:29 +00:00
Renato Botelho
b1b98a2db7 Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by:	gnn, eri
Approved by:	gnn
Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Netgate
Differential Revision:	https://reviews.freebsd.org/D3222
2015-07-28 10:31:34 +00:00
Gleb Smirnoff
3e437fd2c6 Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by:	dhartmei
2015-07-28 09:36:26 +00:00
Andrey V. Elsukov
af9aa0a837 Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 07:26:31 +00:00
Luigi Rizzo
4af7aed7c6 assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors):
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.
2015-07-10 19:24:36 +00:00
Luigi Rizzo
e38e277fc4 one more warning suppression when compiling the test code in userspace. 2015-07-10 19:18:49 +00:00
Luigi Rizzo
e25716b7cc add code to compute fairness indexes;
cleanups to remove compile warnings.
2015-07-10 18:10:40 +00:00
Ermal Luçi
a5b789f65a ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
Kristof Provost
06ba348d27 pf: Remove frc_direction
We don't use the direction of the fragments for anything. The frc_direction
field is assigned, but never read.
Just remove it.

Differential Revision:	https://reviews.freebsd.org/D2773
Approved by:	philip (mentor)
2015-06-11 17:57:47 +00:00
Kristof Provost
837b925aba pf: Save the protocol number in the pf_fragment
When we try to look up a pf_fragment with pf_find_fragment() we compare (see
pf_frag_compare()) addresses (and family), id but also protocol.  We failed to
save the protocol to the pf_fragment in pf_fragcache(), resulting in failing
reassembly.

Differential Revision:	https://reviews.freebsd.org/D2772
2015-06-11 13:26:16 +00:00
Kristof Provost
0b7eba6ad4 pf: address family must be set when creating a pf_fragment
Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set.
In that scenario we take a different branch in pf_normalize_ip(), taking us to
pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a
pf_fragment, but do not set the address family. This leads to a panic when we
try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to
compare the pf_fragments doesn't know what to do if the address family is not
set.

Simply ensure that the address family is set correctly (always AF_INET in this
path).

PR:			200330
Differential Revision:	https://reviews.freebsd.org/D2769
Approved by:		philip (mentor), gnn (mentor)
2015-06-10 13:44:04 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Luigi Rizzo
62f42cf8ee use proper types to represent function pointers 2015-05-19 16:51:30 +00:00
Luigi Rizzo
352bc63d72 remove a redundant ; at the end of a function
MFC after:	1 week
2015-05-19 15:29:00 +00:00
Luigi Rizzo
bebf3c825f remove an extra ; after MODULE_DEPEND
(would otherwise generate a warning with more verbose compiler flags)

MFC after:	1 week
2015-05-19 14:49:31 +00:00
Gleb Smirnoff
3dd01a884c Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization
from associated structures initialization.  The mutexes are global, while
the structures are per-vnet.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
2015-05-19 14:04:21 +00:00
Gleb Smirnoff
30fe681e44 During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.

Sponsored by:	Nginx, Inc.
2015-05-19 14:02:40 +00:00
Gleb Smirnoff
78680d05d1 A miss from r283061: don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:51:27 +00:00
Gleb Smirnoff
b7f69c506d Don't dereference NULL is pf_get_mtag() fails.
PR:		200222
Submitted by:	Franco Fichtner <franco opnsense.org>
2015-05-18 15:05:12 +00:00
Luigi Rizzo
8ff71b031e bugfix (only affecting the "lookup" option in the userspace version of ipfw):
the conditional block should not include the 'else' otherwise
the code does a 'break;' without completing the check
2015-05-13 11:53:25 +00:00
Alexander V. Chernikov
e09c1944a3 Remove ptei->value check from ipfw_link_table_values():
even if there was non-zero number of restarts, we would unref/clear
  all value references and start ipfw_link_table_values() once again
  with (mostly) cleared "tei" buffer.
 Additionally, ptei->ptv stores only to-be-added values, not existing ones.
 This is a forgotten piece of previous value refconting implementation,
  and now it is simply incorrect.
2015-05-12 20:42:42 +00:00
Alexander V. Chernikov
b45fa3fad6 Fix panic when prepare_batch_buffer() returns error. 2015-05-06 07:53:43 +00:00
Alexander V. Chernikov
caf993912e Fix KASSERT introduced in r282155.
Found by:	dhw
2015-04-30 21:51:12 +00:00
Alexander V. Chernikov
e948489558 Fix panic introduced by r282070.
Arm friendly KASSERT() to ease debug of similar crashes.

Submitted by:	Olivier Cochard-Labbé
2015-04-28 17:05:55 +00:00
Alexander V. Chernikov
a1bddc75b4 Fix 'may be used uninitialized' warning not caught by clang. 2015-04-27 10:01:22 +00:00
Alexander V. Chernikov
1a458088ff Use free_nat_instance() for nat instance deletion.
Sponsored by:	Yandex LLC
2015-04-27 09:16:22 +00:00
Alexander V. Chernikov
74b22066b0 Make rule table kernel-index rewriting support any kind of objects.
Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
 values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
 do a) for them.

Kernel does almost the same when exporting rules to userland:
 prepares array of used tables in all rules in range, and
 prepends it before the actual ruleset retaining actual in-kernel
 indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
  between userland string identifier and in-kernel index.
  (See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
 shared index and names instead of numbers (in next commit).

Sponsored by:	Yandex LLC
2015-04-27 08:29:39 +00:00
Gleb Smirnoff
fdf6290ea9 Fix memory leak.
PR:		199670
Reviewed by:	ae
2015-04-27 05:44:09 +00:00
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Kristof Provost
3d1bbe5fa0 pf: Fix forwarding detection
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision:	https://reviews.freebsd.org/D2286
Approved by:		gnn(mentor)
2015-04-14 19:07:37 +00:00
George V. Neville-Neil
916e17fd56 I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision:  https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)
2015-04-14 14:43:42 +00:00
Kristof Provost
1873dcc8c9 pf: Skip firewall for refragmented ip6 packets
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments.  Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision:	https://reviews.freebsd.org/D2197
Reviewed by:	glebius, gnn
Approved by:	gnn (mentor)
2015-04-06 19:05:00 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Kristof Provost
7dce9b515b pf: Deal with runt packets
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision:	https://reviews.freebsd.org/D2189
Approved by:		gnn (mentor)
2015-04-01 12:16:56 +00:00
Kristof Provost
798318490e Preserve IPv6 fragment IDs accross reassembly and refragmentation
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision:	https://reviews.freebsd.org/D2188
Approved by:		gnn (mentor)
2015-04-01 12:15:01 +00:00
Andrey V. Elsukov
bf55a0034d The offset variable has been cleared all bits except IP6F_OFF_MASK.
Use ip6f_mf variable instead of checking its bits.
2015-03-31 14:41:29 +00:00
Sergey Kandaurov
a4879be402 Static'ize pf_fillup_fragment body to match its declaration.
Missed in 278925.
2015-03-26 13:31:04 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
2530ed9e70 Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value
to obtain IPv4 next hop address in tablearg case.

Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.

Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.

Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.

Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.

Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.

Differential Revision:	https://reviews.freebsd.org/D2015
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-03-13 09:03:25 +00:00
Andrey V. Elsukov
998fbd14b8 Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by:	Kristof Provost
MFC after:	1 week
2015-03-12 08:57:24 +00:00
Gleb Smirnoff
4ac6485cc6 Even more fixes to !INET and !INET6 kernels.
In collaboration with:	pluknet
2015-02-17 22:33:22 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
39a58828ef In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by:		Kristof Provost
Differential Revision:	D1767
2015-02-16 07:01:02 +00:00
Gleb Smirnoff
f5ceb22b78 Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by:		Kristof Provost
Tested by:		peter
Differential Revision:	D1765
2015-02-16 03:38:27 +00:00
Alexander V. Chernikov
9f925e8a92 Fix IP_FW_NAT44_LIST_NAT size calculation.
Found by:	lev
Sponsored by:	Yandex LLC
2015-02-05 14:54:53 +00:00
Alexander V. Chernikov
0caab00959 * Make sure table algorithm destroy hook is always called without locks
* Explicitly lock freeing interface references in ta_destroy_ifidx
* Change ipfw_iface_unref() to require UH lock
* Add forgotten ipfw_iface_unref() to destroy_ifidx_locked()

PR:		kern/197276
Submitted by:	lev
Sponsored by:	Yandex LLC
2015-02-05 13:49:04 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Alexander V. Chernikov
0b47e42b49 Use ipfw runtime lock only when real modification is required. 2015-01-16 10:49:27 +00:00
Craig Rodrigues
7259906eb0 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
4de985af0b Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Ermal Luçi
7b56cc430a pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
Alexander V. Chernikov
5b07fc31cc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
Alexander V. Chernikov
a458ad86ee Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Alexander V. Chernikov
038263c36a Remove unused variable.
Found by:	Coverity
CID:		1245739
2014-11-04 10:25:52 +00:00
Alexander V. Chernikov
552eb491ab Bump default dynamic limit to 16k entries.
Print better log message when limit is hit.

PR:		193300
Submitted by:	me at nileshgr.com
2014-10-24 13:57:15 +00:00
Alexander V. Chernikov
9e3a53fd35 Rename log2 to tal_log2.
Submitted by:	luigi
2014-10-22 21:20:37 +00:00
Luigi Rizzo
03be41e6a4 remove/fix old code for building ipfw and dummynet in userspace 2014-10-22 05:21:36 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Alexander V. Chernikov
54b38fcf03 Use copyout() directly instead of updating various fields
before/after each sooptcopyout() call.

Found by:	luigi
Sponsored by:	Yandex LLC
2014-10-20 11:21:07 +00:00
Alexander V. Chernikov
4040f4ecd6 Perform more checks on the number of tables supplied by user. 2014-10-19 11:15:19 +00:00
Dag-Erling Smørgrav
99e9de871a Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
Alexander V. Chernikov
0d90989bef Use IPFW_RULE_CNTR_SIZE macro instead of non-relevant ip_fw_cntr structure.
Found by:	luigi
2014-10-18 17:23:41 +00:00
Alexander V. Chernikov
2930362fb1 Fix matching default rule on clear/show commands.
Found by:	Oleg Ginzburg
2014-10-13 13:49:28 +00:00
Alexander V. Chernikov
956f6d3a3c Fix KASSERT typo. 2014-10-11 15:04:50 +00:00