Commit Graph

144 Commits

Author SHA1 Message Date
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Alexander V. Chernikov
643ce94878 Convert pf rtable checks to the new routing KPI.
Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D24386
2020-04-15 13:00:48 +00:00
Pawel Biernacki
10b49b2302 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by:	kp
Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23634
2020-02-21 16:23:00 +00:00
Kristof Provost
cca2ea64e9 pf: Make request_maxcount runtime adjustable
There's no reason for this to be a tunable. It's perfectly safe to
change this at runtime.

Reviewed by:	Lutz Donnerhacke
Differential Revision:	https://reviews.freebsd.org/D22737
2019-12-14 02:06:07 +00:00
Kristof Provost
492f3a312a pf: Add endline to all DPFPRINTF()
DPFPRINTF() doesn't automatically add an endline, so be consistent and
always add it.
2019-11-24 13:53:36 +00:00
Kristof Provost
a0d571cbef pf: Must be in NET_EPOCH to call icmp_error
icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH.
Failure to hold it leads to the following panic (with INVARIANTS):

  panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742
  cpuid = 2
  time = 1571233273
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920
  vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980
  panic() at panic+0x43/frame 0xfffffe00e09779e0
  icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0
  icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10
  pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50
  ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0
  fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0
  fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0

Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(),
but ip_output() will soon be converted to a function that requires epoch, so
entering NET_EPOCH directly from pf_intr() makes more sense.

Discussed with:	glebius@
2019-10-18 03:36:26 +00:00
Mark Johnston
bff630d1dc Fix the build after r353458.
MFC with:	r353458
Sponsored by:	The FreeBSD Foundation
2019-10-13 00:08:17 +00:00
Mark Johnston
6cc9ab8610 Add a missing include of opt_sctp.h.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-12 23:01:16 +00:00
Kristof Provost
f287767d4f pf: Remove partial RFC2675 support
Remove our (very partial) support for RFC2675 Jumbograms. They're not
used, not actually supported and not a good idea.

Reviewed by:	thj@
Differential Revision:	https://reviews.freebsd.org/D21086
2019-07-29 13:21:31 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Li-Wen Hsu
d086d41363 Remove an uneeded indentation introduced in r223637 to silence gcc warnging
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-05-25 23:58:09 +00:00
Rodney W. Grimes
6c1c6ae537 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
Conrad Meyer
a8a16c7128 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
Kristof Provost
64af73aade pf: Ensure that IP addresses match in ICMP error packets
States in pf(4) let ICMP and ICMP6 packets pass if they have a
packet in their payload that matches an exiting connection.  It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet.  Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.

Reported by:	Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv
Obtained from:	OpenBSD
Security:	CVE-2019-5598
2019-03-21 08:09:52 +00:00
Gleb Smirnoff
1830dae3d3 Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.
2019-03-14 22:23:09 +00:00
Kristof Provost
22c58991e3 pf: Small performance tweak
Because fetching a counter is a rather expansive function we should use
counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

From the man page:
    The adaptive timeout values can be defined both globally and for
    each rule.  When used on a per-rule basis, the values relate to the
    number of states created by the rule, otherwise to the total number
    of states.

This handling of adaptive timeouts is done in pf_state_expires().  The
calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
   runs in the else-section and therefore takes "start" and "end" from
   the global default settings and sets "states" to pf_status.states
   (= total number of states).

2. Special rules like
   "pass .. keep state (adaptive.start 500 adaptive.end 1000)"
   have start != 0, run in the if-section and take "start" and "end"
   from the rule and set "states" to the number of states created by
   their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in the
above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
   Therefore we have "start != 0" in this case and we run the
   if-section but we better should run the else-section in this case and
   do not fetch the counter of the pf_default_rule but take the total
   number of states.

Submitted by:	Andreas Longwitz <longwitz@incore.de>
MFC after:	2 weeks
2019-02-24 17:23:55 +00:00
Patrick Kelsey
8f2ac65690 Reduce the time it takes the kernel to install a new PF config containing a large number of queues
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19131
2019-02-11 05:17:31 +00:00
Kristof Provost
336683f24f pf: Fix endless loop on NAT exhaustion with sticky-address
When we try to find a source port in pf_get_sport() it's possible that
all available source ports will be in use. In that case we call
pf_map_addr() to try to find a new source IP to try from. If there are
no more available source IPs pf_map_addr() will return 1 and we stop
trying.

However, if sticky-address is set we'll always return the same IP
address, even if we've already tried that one.
We need to check the supplied address, because if that's the one we'd
set it means pf_get_sport() has already tried it, and we should error
out rather than keep trying.

PR:		233867
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18483
2018-12-12 20:15:06 +00:00
Kristof Provost
5b551954ab pf: Prevent integer overflow in PF when calculating the adaptive timeout.
Mainly states of established TCP connections would be affected resulting
in immediate state removal once the number of states is bigger than
adaptive.start.  Disabling adaptive timeouts is a workaround to avoid this bug.
Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr)

Reported by: Andreas Longwitz <longwitz AT incore.de>
Obtained from:  OpenBSD
MFC after:	2 weeks
2018-12-11 21:44:39 +00:00
Kristof Provost
5f6cf24e2d pfsync: Make pfsync callbacks per-vnet
The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17499
2018-11-02 16:47:07 +00:00
Kristof Provost
13d640d376 pf: Fix copy/paste error in IPv6 address rewriting
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.

CID:		1009561
MFC after:	3 weeks
2018-10-24 00:19:44 +00:00
Kristof Provost
1563a27e1f pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.

PR:		197484
Obtained from:	OpenBSD
MFC after:	2 weeks
2018-10-20 18:37:21 +00:00
John-Mark Gurney
032d3aaa96 Significantly improve pf purge cpu usage by only taking locks
when there is work to do.  This reduces CPU consumption to one
third on systems.  This will help keep the thread CPU usage under
control now that the default hash size has increased.

Reviewed by:	kp
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17097
2018-09-16 00:44:23 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Kristof Provost
32ece669c2 pf: Fix synproxy
Synproxy was accidentally broken by r335569. The 'return (action)' must be
executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP
error) should only be sent if the packet was dropped (i.e. PF_DROP) and the
return flag is set.

PR:		229477
Submitted by:	Andre Albsmeier <mail AT fbsd.e4m.org>
MFC after:	1 week
2018-07-14 10:14:59 +00:00
Kristof Provost
150182e309 pf: Support "return" statements in passing rules when they fail.
Normally pf rules are expected to do one of two things: pass the traffic or
block it. Blocking can be silent - "drop", or loud - "return", "return-rst",
"return-icmp". Yet there is a 3rd category of traffic passing through pf:
Packets matching a "pass" rule but when applying the rule fails. This happens
when redirection table is empty or when src node or state creation fails. Such
rules always fail silently without notifying the sender.

Allow users to configure this behaviour too, so that pf returns an error packet
in these cases.

PR:		226850
Submitted by:	Kajetan Staszkiewicz <vegeta tuxpowered.net>
MFC after:	1 week
Sponsored by:	InnoGames GmbH
2018-06-22 21:59:30 +00:00
Kristof Provost
0b799353d8 pf: Fix deadlock with route-to
If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR:		228782
MFC after:	1 week
2018-06-09 14:17:06 +00:00
Kristof Provost
455969d305 pf: Replace rwlock on PF_RULES_LOCK with rmlock
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by:	farrokhi@
Differential Revision:	https://reviews.freebsd.org/D15502
2018-05-30 07:11:33 +00:00
Sean Bruno
2695c9c109 Retire ixgb(4)
This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.

Submitted by:	kbowling
Reviewed by:	brooks imp jeffrey.e.pieper@intel.com
Relnotes:	yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15234
2018-05-02 15:59:15 +00:00
Kristof Provost
c41420d5dc pf: limit ioctl to a reasonable and tuneable number of elements
pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations.  These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.

Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.

Differential Revision:	https://reviews.freebsd.org/D15018
2018-04-11 11:43:12 +00:00
Kristof Provost
effaab8861 netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by:	ae, kevans
Differential Revision:	https://reviews.freebsd.org/D13715
2018-03-23 16:56:44 +00:00
Kristof Provost
bf56a3fe47 pf: Cope with overly large net.pf.states_hashsize
If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR:		209475
Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14367
2018-02-25 08:56:44 +00:00
Kristof Provost
c201b5644d pf: Avoid warning without INVARIANTS
When INVARIANTS is not set the 'last' variable is not used, which can generate
compiler warnings.
If this invariant is ever violated it'd result in a KASSERT failure in
refcount_release(), so this one is not strictly required.
2018-02-01 07:52:06 +00:00
Kristof Provost
6701c43213 pf: States have at least two references
pf_unlink_state() releases a reference to the state without checking if
this is the last reference. It can't be, because pf_state_insert()
initialises it to two. KASSERT() that this is always the case.

CID:	1347140
2018-01-24 04:29:16 +00:00
Kristof Provost
5d0020d6d7 pf: Clean all fragments on shutdown
When pf is unloaded, or a vnet jail using pf is stopped we need to
ensure we clean up all fragments, not just the expired ones.
2017-12-31 10:01:31 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Kristof Provost
b7ae43552b pf: Fix vnet purging
pf_purge_thread() breaks up the work of iterating all states (in
pf_purge_expired_states()) and tracks progress in the idx variable.

If multiple vnets exist this results in pf_purge_thread() only calling
pf_purge_expired_states() for part of the states (the first part of the
first vnet, second part of the second vnet and so on).
Combined with the mark-and-sweep approach to cleaning up old rules (in
V_pf_unlinked_rules) that resulted in pf freeing rules that were still
referenced by states. This in turn caused panics when pf_state_expires()
encounters that state and attempts to access the rule.

We need to track the progress per vnet, not globally, so idx is moved
into a per-vnet V_pf_purge_idx.

PR:		219251
Sponsored by:	Hackathon Essen 2017
2017-07-09 17:56:39 +00:00
Kristof Provost
3601d25181 pf: Fix leak of pf_state_keys
If we hit the state limit we returned from pf_create_state() without cleaning
up.

PR:		217997
Submitted by:	Max <maximos@als.nnov.ru>
MFC after:	1 week
2017-04-01 12:22:34 +00:00
Kristof Provost
2f8fb3a868 pf: Fix possible shutdown race
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.

Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,

Pointed out by: eri, glebius, jhb
Differential Revision:	https://reviews.freebsd.org/D10026
2017-03-22 21:18:18 +00:00
Kristof Provost
08ef4ddb0f pf: Fix rule evaluation after inet6 route-to
In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).

This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.

PR:		217883
Submitted by:	Kajetan Staszkiewicz
MFC after:	1 week
Sponsored by:	InnoGames GmbH
2017-03-19 03:06:09 +00:00
Kristof Provost
2a57d24bd1 pf: Fix incorrect rw_sleep() in pf_unload()
When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep()
with it, because it would release a lock we do not hold. There's no need for the
lock either, so we can just tsleep().

While here also make the same change in pf_purge_thread(), because it explicitly
takes the lock before rw_sleep() and then immediately releases it afterwards.
2017-03-12 05:42:57 +00:00
Kristof Provost
f618201314 pf: Do not lose the VNET lock when ending the purge thread
When the pf_purge_thread() exits it must make sure to release the
VNET_LIST_RLOCK it still holds.
kproc_exit() does not return.
2017-03-12 05:00:04 +00:00
Gleb Smirnoff
164aa3ce5e Fix indentantion in pf_purge_thread(). No functional change. 2017-01-30 22:47:48 +00:00
Luiz Otavio O Souza
a5c1a50a26 Do not run the pf purge thread while the VNET variables are not
initialized, this can cause a divide by zero (if the VNET initialization
takes to long to complete).

Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-29 02:17:52 +00:00
Kristof Provost
1f4955785d pf: port extended DSCP support from OpenBSD
Ignore the ECN bits on 'tos' and 'set-tos' and allow to use
DCSP names instead of having to embed their TOS equivalents
as plain numbers.

Obtained from:	OpenBSD
Sponsored by:	OPNsense
Differential Revision:	https://reviews.freebsd.org/D8165
2016-10-13 20:34:44 +00:00
Kristof Provost
813196a11a pf: remove fastroute tag
The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code
allows to skip the in pfil hooks and completely removes the out pfil invoke,
albeit looking up a route that the IP stack will likely find on its own.
The code between IPv4 and IPv6 is also inconsistent and marked as "XXX"
for years.

Submitted by:	Franco Fichtner <franco@opnsense.org>
Differential Revision:	https://reviews.freebsd.org/D8058
2016-10-04 19:35:14 +00:00
Kristof Provost
0df377cbb8 pf: Add missing byte-order swap to pf_match_addr_range
Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.

PR:		211796
Obtained from:	OpenBSD (sthen)
MFC after:	3 days
2016-08-15 12:13:14 +00:00
Bjoern A. Zeeb
a0429b5459 Update pf(4) and pflog(4) to survive basic VNET testing, which includes
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.

Reviewed by:		kp
Approved by:		re (gjb)
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6924
2016-06-23 21:34:38 +00:00
Kristof Provost
3e248e0fb4 pf: Filter on and set vlan PCP values
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.

Reviewed by:    allanjude, araujo
Approved by:	re (gjb)
Obtained from:  OpenBSD (mostly)
Differential Revision:  https://reviews.freebsd.org/D6786
2016-06-17 18:21:55 +00:00
Kristof Provost
b599e8dc59 pf: Fix more ICMP mistranslation
In the default case fix the substitution of the destination address.

PR:		201519
Submitted by:	Max <maximos@als.nnov.ru>
MFC after:	1 week
2016-05-23 13:59:48 +00:00