112 Commits

Author SHA1 Message Date
glebius
12e7b30255 Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
rodrigc
400655a4d3 Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx.
They are already initialized by MTX_SYSINIT.

Submitted by: Nikos Vassiliadis <nvass@gmx.com>
2015-01-08 17:49:07 +00:00
rodrigc
89bede2eff Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
rodrigc
b15d5b05bd Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR:                     194515
Differential Revision:  D1315
Submitted by:           Nikos Vassiliadis <nvass@gmx.com>
2015-01-06 09:03:03 +00:00
rodrigc
58319f89ed Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
eri
0d0f5282c7 pf(4) needs to have a correct checksum during its processing.
Calculate checksums for the IPv6 path when needed before
delving into pf(4) code as required.

PR:     172648, 179392
Reviewed by:    glebius@
Approved by:    gnn@
Obtained from:  pfSense
MFC after:      1 week
Sponsored by:   Netgate
2014-11-19 13:31:08 +00:00
melifaro
5edf5a79dc Finish r274315: remove union 'u' from struct pf_send_entry.
Suggested by:	kib
2014-11-09 17:01:54 +00:00
melifaro
6b3c0c962e Remove unused 'struct route' fields. 2014-11-09 16:15:28 +00:00
glebius
99f4ec50e8 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
des
325f19fddb Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom.  Document them in hash(9).

MFC after:	1 month
MFC with:	r272906
2014-10-18 22:15:11 +00:00
gnn
23f601a6ca Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision:	https://reviews.freebsd.org/D461
Submitted by:	des
Reviewed by:	emaste
MFC after:	1 month
2014-10-10 19:26:26 +00:00
melifaro
d8b683d70f Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
glebius
713d87864c Use rn_detachhead() instead of direct free(9) for radix tables.
Sponsored by:	Nginx, Inc.
2014-10-01 13:35:41 +00:00
glebius
16745af543 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
glebius
72f04611ec Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
glebius
759eeea220 - Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
  DIOCSTOP ioctls.

Reported & tested by:	jmallett
Sponsored by:		Nginx, Inc.
2014-09-12 08:39:15 +00:00
glebius
2e01608625 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
glebius
0cbf499e97 Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR:		177808
Submitted by:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by:	InnoGames GmbH
2014-09-01 13:00:45 +00:00
glebius
4242d9acba Do not lookup source node twice when pf_map_addr() is used.
PR:		184003
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:16:08 +00:00
glebius
9227a25906 pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR:		183997
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 14:02:24 +00:00
glebius
45bdeab3db Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.
PR:		127920
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-15 04:35:34 +00:00
kevlo
dd40fa7e62 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
glebius
7d0b571895 - Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
  use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
  and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by:	InnoGames GmbH
2014-08-14 18:57:46 +00:00
kevlo
7727a3c215 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
glebius
98615618b9 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
jhb
91a569ad69 Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
2014-05-29 19:17:10 +00:00
glebius
9412c23d6c o In pf_normalize_ip() we don't need mtag in
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
  if we don't find any.

Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2014-05-17 12:30:27 +00:00
glebius
597bcfe53d The current API for adding rules with pool addresses is the following:
- DIOCADDADDR adds addresses and puts them into V_pf_pabuf
- DIOCADDRULE takes all addresses from V_pf_pabuf and links
  them into rule.

The ugly part is that if address is a table, then it is initialized
in DIOCADDRULE, because we need ruleset, and DIOCADDADDR doesn't
supply ruleset. But if address is a dynaddr, we need address family,
and address family could be different for different addresses in one
rule, so dynaddr is initialized in DIOCADDADDR.

This leads to the entangled state of addresses on V_pf_pabuf. Some are
initialized, and some not. That's why running pf_empty_pool(&V_pf_pabuf)
can lead to a panic on a NULL table address.

Since proper fix requires API/ABI change, for now simply plug the panic
in pf_empty_pool().

Reported by:	danger
2014-04-25 11:36:11 +00:00
mm
532d55ab5f Backport from projects/pf r263908:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

MFC after:	1 week
2014-04-20 09:17:48 +00:00
glebius
97ee1da70b Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR:		188253
2014-04-16 09:25:20 +00:00
mm
257cccbfaa Merge from projects/pf r264198:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by:	trociny
MFC after:	1 week
2014-04-07 07:06:13 +00:00
mm
c4f653f608 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
glebius
8a3e4bbebb - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
glebius
c23c087e5b Instead of playing games with casts simply add 3 more members to the
structure pf_rule, that are used when the structure is passed via
ioctl().

PR:		187074
2014-03-05 00:40:03 +00:00
mm
d2d19c03ef Revert r262196
I am going to split this into two individual patches and test it with
the projects/pf branch that may get merged later.
2014-02-19 17:06:04 +00:00
mm
73f4b67f38 De-virtualize pf_mtag_z [1]
Process V_pf_overloadqueue in vnet context [2]

This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf
and vnet jails. pf inside jails remains broken.

PR:		kern/182964
Submitted by:	glebius@FreeBSD.org [2], myself [1]
Tested by:	rodrigc@FreeBSD.org, myself
MFC after:	2 weeks
2014-02-18 22:17:12 +00:00
glebius
1ea1d562a3 Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to:		Dennis Yusupoff <dyr smartspb.net>
Also reported by:	dumbbell, pgj, Rambler
Sponsored by:		Nginx, Inc.
2014-02-14 10:05:21 +00:00
glebius
3f1e8f48cd Remove NULL pointer dereference.
CID:	1009118
2014-01-22 15:58:43 +00:00
glebius
4d8db193db Fix resource leak and simplify code for DIOCCHANGEADDR.
CID:	1007035
2014-01-22 15:44:38 +00:00
glebius
353906d3d2 When pf_get_translation() fails, it should leave *sn pointer pristine,
otherwise we will panic in pf_test_rule().

PR:		182557
2014-01-06 19:05:04 +00:00
dim
320e3d9bba Fix incorrect header guard define in sys/netpfil/pf/pf.h, which snuck in
in r257186.  Found by clang 3.4.
2013-12-22 19:47:22 +00:00
glebius
964c4daeba Fix fallout from r258479: in pf_free_src_node() the node must already
be unlinked.

Reported by:	Konstantin Kukushkin <dark rambler-co.ru>
Sponsored by:	Nginx, Inc.
2013-12-22 12:10:36 +00:00
glebius
f889028338 The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity,
where "m" is number of source nodes and "n" is number of states. Thus,
on heavy loaded router its processing consumed a lot of CPU time.

Reimplement it with O(m+n) complexity. We first scan through source
nodes and disconnect matching ones, putting them on the freelist and
marking with a cookie value in their expire field. Then we scan through
the states, detecting references to source nodes with a cookie, and
disconnect them as well. Then the freelist is passed to pf_free_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
PR:		kern/176763
Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:22:26 +00:00
glebius
c884926273 To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
  These function do not proceed with freeing of a node, just disconnect
  it from storage.
- New function pf_free_src_nodes() works on a list of previously
  disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with:	Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by:	InnoGames GmbH
Sponsored by:	Nginx, Inc.
2013-11-22 19:16:34 +00:00
glebius
70fc573907 Fix off by ones when scanning source nodes hash.
Sponsored by:	Nginx, Inc.
2013-11-22 18:57:27 +00:00
glebius
3583187336 Style: don't compare unsigned <= 0.
Sponsored by:	Nginx, Inc.
2013-11-22 18:54:06 +00:00