Commit Graph

58 Commits

Author SHA1 Message Date
Pawel Biernacki
10b49b2302 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by:	kp
Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23634
2020-02-21 16:23:00 +00:00
Kristof Provost
ef1bd1e517 pfsync: Ensure we enter network epoch before calling ip_output
As of r356974 calls to ip_output() require us to be in the network epoch.
That wasn't the case for the calls done from pfsyncintr() and
pfsync_defer_tmo().
2020-01-22 21:01:19 +00:00
Hans Petter Selasky
59854ecf55 Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by:	markj @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:54:41 +00:00
Kristof Provost
812483c46e pf: Rename pfsync bucket lock
Previously the main pfsync lock and the bucket locks shared the same name.
This lead to spurious warnings from WITNESS like this:

    acquiring duplicate lock of same type: "pfsync"
     1st pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1402
     2nd pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1429

It's perfectly okay to grab both the main pfsync lock and a bucket lock at the
same time.

We don't need different names for each bucket lock, because we should always
only acquire a single one of those at a time.

MFC after:	1 week
2019-03-16 10:14:03 +00:00
Kristof Provost
6a8ee0f715 pf: fix pfsync breaking carp
Fix missing initialisation of sc_flags into a valid sync state on clone which
breaks carp in pfsync.

This regression was introduce by r342051.

PR:		235005
Submitted by:	smh@FreeBSD.org
Pointy hat to:	kp
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D18882
2019-01-18 08:19:54 +00:00
Kristof Provost
4fc65bcbe3 pfsync: Performance improvement
pfsync code is called for every new state, state update and state
deletion in pf. While pf itself can operate on multiple states at the
same time (on different cores, assuming the states hash to a different
hashrow), pfsync only had a single lock.
This greatly reduced throughput on multicore systems.

Address this by splitting the pfsync queues into buckets, based on the
state id. This ensures that updates for a given connection always end up
in the same bucket, which allows pfsync to still collapse multiple
updates into one, while allowing multiple cores to proceed at the same
time.

The number of buckets is tunable, but defaults to 2 x number of cpus.
Benchmarking has shown improvement, depending on hardware and setup, from ~30%
to ~100%.

MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D18373
2018-12-06 19:27:15 +00:00
Kristof Provost
dde6e1fecb pfsync: Add missing unlock
If we fail to set up the multicast entry for pfsync and return an error
we must release the pfsync lock first.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17506
2018-11-02 17:03:53 +00:00
Kristof Provost
04fe85f068 pfsync: Allow module to be unloaded
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17505
2018-11-02 17:01:18 +00:00
Kristof Provost
fbbf436d56 pfsync: Handle syncdev going away
If the syncdev is removed we no longer need to clean up the multicast
entry we've got set up for that device.

Pass the ifnet detach event through pf to pfsync, and remove our
multicast handle, and mark us as no longer having a syncdev.

Note that this callback is always installed, even if the pfsync
interface is disabled (and thus it's not a per-vnet callback pointer).

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17502
2018-11-02 16:57:23 +00:00
Kristof Provost
26549dfcad pfsync: Ensure uninit is done before pf
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17501
2018-11-02 16:53:15 +00:00
Kristof Provost
5f6cf24e2d pfsync: Make pfsync callbacks per-vnet
The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17499
2018-11-02 16:47:07 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Kristof Provost
de210decd1 pfsync: Fix state sync during initial bulk update
States learned via pfsync from a peer with the same ruleset checksum were not
getting assigned to rules like they should because pfsync_in_upd() wasn't
passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import.

PR:		229092
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	InnoGames GmbH
2018-06-30 12:51:08 +00:00
Kristof Provost
455969d305 pf: Replace rwlock on PF_RULES_LOCK with rmlock
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by:	farrokhi@
Differential Revision:	https://reviews.freebsd.org/D15502
2018-05-30 07:11:33 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Pedro F. Giffuni
8820ecc040 SPDX: Fix some cases wrongly attributed to MIT.
In the cases of BSD-style license variants without clauses, use 0BSD for
the time being in lack of a better description.
2017-11-30 15:10:11 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Marcel Moolenaar
aa8c6a6dca Improve upon r309394
Instead of taking an extra reference to deal with pfsync_q_ins()
and pfsync_q_del() taken and dropping a reference (resp,) make
it optional of those functions to take or drop a reference by
passing an extra argument.

Submitted by:	glebius@
2016-12-10 03:31:38 +00:00
Gleb Smirnoff
296d65b7a9 Backout accidentially leaked in r309746 not yet reviewed patch :( 2016-12-09 18:00:45 +00:00
Gleb Smirnoff
3cbee8caa1 Use counter_ratecheck() in the ICMP rate limiting.
Together with:	rrs, jtl
2016-12-09 17:59:15 +00:00
Marcel Moolenaar
d6d35f1561 Fix use-after-free bugs in pfsync(4)
Use after free happens for state that is deleted. The reference
count is what prevents the state from being freed. When the
state is dequeued, the reference count is dropped and the memory
freed. We can't dereference the next pointer or re-queue the
state.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D8671
2016-12-02 06:15:59 +00:00
Bjoern A. Zeeb
66c00e9efb Proerply virtualize pfsync for bringup after pf is initialized and
teardown of VNETs once pf(4) has been shut down.
Properly split resources into VNET_SYS(UN)INITs and one time module
loading.
While here cover the INET parts in the uninit callpath with proper
#ifdefs.

Approved by:	re (gjb)
Obtained from:  projects/vnet
MFC after:      2 weeks
Sponsored by:   The FreeBSD Foundation
2016-06-23 22:31:44 +00:00
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Gleb Smirnoff
2a6009bfa6 Mechanically convert to if_inc_counter(). 2014-09-19 09:19:29 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Gleb Smirnoff
8ff2bd98d6 On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR:		187381
Submitted by:	Lytochkin Boris <lytboris gmail.com>
2014-07-10 12:41:58 +00:00
Martin Matuska
d318d97fb5 Merge from projects/pf r251993 (glebius@):
De-vnet hash sizes and hash masks.

Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny

MFC after:	1 month
2014-03-25 06:55:53 +00:00
Gleb Smirnoff
48278b8846 Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to:		Dennis Yusupoff <dyr smartspb.net>
Also reported by:	dumbbell, pgj, Rambler
Sponsored by:		Nginx, Inc.
2014-02-14 10:05:21 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Andrey V. Elsukov
415077bad9 Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration.
Reported by:	Eugene M. Zheganin
2013-07-29 13:17:18 +00:00
Gleb Smirnoff
6828cc99e1 De-vnet hash sizes and hash masks.
Submitted by:	Nikos Vassiliadis <nvass gmx.com>
Reviewed by:	trociny
2013-06-19 13:37:29 +00:00
Gleb Smirnoff
b69d74e834 Invalid creatorid is always EINVAL, not only when we are in verbose mode. 2013-05-11 17:57:52 +00:00
Gleb Smirnoff
f8aa444783 Improve KASSERT() message. 2013-05-06 21:44:06 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Gleb Smirnoff
41a7572b26 Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.
2013-03-12 13:42:47 +00:00
Gleb Smirnoff
e2a55a0021 Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-15 09:03:56 +00:00
Gleb Smirnoff
d8aa10cc35 In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
  - Add dash in front of all copyright notices according to style(9).
  - Move $OpenBSD$ down below copyright notices.
  - Remove extra line between cdefs.h and __FBSDID.
2012-12-28 09:19:49 +00:00
Gleb Smirnoff
4c794f5c06 Fix VIMAGE build broken in r244185.
Submitted by:	Nikolai Lifanov <lifanov mail.lifanov.com>
2012-12-14 08:02:35 +00:00
Gleb Smirnoff
9ff7e6e922 Merge rev. 1.119 from OpenBSD:
date: 2009/03/31 01:21:29;  author: dlg;  state: Exp;  lines: +9 -16
  ...

  this also firms up some of the input parsing so it handles short frames a
  bit better.

This actually fixes reading beyond mbuf data area in pfsync_input(), that
may happen at certain pfsync datagrams.
2012-12-13 12:51:22 +00:00
Gleb Smirnoff
fed7635002 Merge 1.127 from OpenBSD, that closes a regression from 1.125 (merged
as r242694):
  do better detection of when we have a better version of the tcp sequence
  windows than our peer.

  this resolves the last of the pfsync traffic storm issues ive been able to
  produce, and therefore makes it possible to do usable active-active
  statuful firewalls with pf.
2012-12-11 08:37:08 +00:00
Gleb Smirnoff
8db7e13f1d Remove extra PFSYNC_LOCK() in pfsync_bulk_update() which lead to lock
recursion.

Reported by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-12-06 08:22:08 +00:00
Gleb Smirnoff
5da39c565b Revert erroneous r242693. A state may have PFTM_UNLINKED being on the
PFSYNC_S_DEL queue of pfsync.
2012-12-06 08:15:06 +00:00
Gleb Smirnoff
f18ab0ffa3 Merge rev. 1.125 from OpenBSD:
date: 2009/06/12 02:03:51;  author: dlg;  state: Exp;  lines: +59 -69
  rewrite the way states from pfsync are merged into the local state tree
  and the conditions on which pfsync will notify its peers on a stale update.

  each side (ie, the sending and receiving side) of the state update is
  compared separately. any side that is further along than the local state
  tree is merged. if any side is further along in the local state table, an
  update is sent out telling the peers about it.
2012-11-07 07:35:05 +00:00
Gleb Smirnoff
d75efebeab It may happen that pfsync holds the last reference on a state. In this
case keys had already been freed. If encountering such state, then
just release last reference.

Not sure this can happen as a runtime race, but can be reproduced by
the following scenario:

- enable pfsync
- disable pfsync
- wait some time
- enable pfsync
2012-11-07 07:30:40 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00