Avoid potential structure padding leak. r350294 identified a leak via
static analysis; although there's no report of a leak with the
DIOCGETSRCNODES ioctl it's a good practice to zero the memory.
Suggested by: kp
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Now that we don't hold a lock during DIOCRSETTFLAGS memory allocation we can
use M_WAITOK.
MFC after: 1 week
Event: Aberdeen hackathon 2019
Pointed out by: glebius@
If during DIOCRSETTFLAGS pfrio_buffer is NULL copyin() will fault, which we're
not allowed to do with a lock held.
We must count the number of entries in the table and release the lock during
copyin(). Only then can we re-acquire the lock. Note that this is safe, because
pfr_set_tflags() will check if the table and entries exist.
This was discovered by a local syzcaller instance.
MFC after: 1 week
Event: Aberdeen hackathon 2019
r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the
number of source tracking nodes.
This meant that we never copied the information to userspace, leading to '? ->
?' output from pfctl.
PR: 236368
MFC after: 1 week
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.
In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.
There are now two new tunables to control the hash size used for each
tag set (default for each is 128):
net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize
Reviewed by: kp
MFC after: 1 week
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D19131
The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.
Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951
Re-evaluating the ALTQ kernel configuration can be expensive,
particularly when there are a large number (hundreds or thousands) of
queues, and is wholly unnecessary in response to events on interfaces
that do not support ALTQ as such interfaces cannot be part of an ALTQ
configuration.
Reviewed by: kp
MFC after: 1 week
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D18918
When cleaning up a vnet we free the counters in V_pf_default_rule and
V_pf_status from shutdown_pf(), but we can still use them later, for example
through pf_purge_expired_src_nodes().
Free them as the very last operation, as they rely on nothing else themselves.
PR: 235097
MFC after: 1 week
psn_len is controlled by user space, but we allocated memory based on it.
Check how much memory we might need at most (i.e. how many source nodes we
have) and limit the allocation to that.
Reported by: markj
MFC after: 1 week
If the syncdev is removed we no longer need to clean up the multicast
entry we've got set up for that device.
Pass the ifnet detach event through pf to pfsync, and remove our
multicast handle, and mark us as no longer having a syncdev.
Note that this callback is always installed, even if the pfsync
interface is disabled (and thus it's not a per-vnet callback pointer).
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17502
The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17499
2^32 bps or greater to be used. Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary. The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps. No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold). pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.
The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications. If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.
All in-tree consumers of the pf(4) interface have been updated. To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.
PR: 211730
Reviewed by: jmallett, kp, loos
MFC after: 2 weeks
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D16782
When shutting down a vnet jail pf_shutdown() clears the remaining states, which
through pf_clear_states() calls pf_unlink_state().
For synproxy states pf_unlink_state() will send a TCP RST, which eventually
tries to schedule the pf swi in pf_send(). This means we can't remove the
software interrupt until after pf_shutdown().
MFC after: 1 week
Several third-parties use at least some of these ioctls. While it would be
better for regression testing if they were used in base (or at least in the
test suite), it's currently not worth the trouble to push through removal.
Submitted by: antoine, markj
Several ioctls are unused in pf, in the sense that no base utility
references them. Additionally, a cursory review of pf-based ports
indicates they're not used elsewhere either. Some of them have been
unused since the original import. As far as I can tell, they're also
unused in OpenBSD. Finally, removing this code removes the need for
future pf work to take them into account.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D16076
Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.
While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.
Submitted by: farrokhi@
Differential Revision: https://reviews.freebsd.org/D15502
pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations. These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.
Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.
Differential Revision: https://reviews.freebsd.org/D15018
Ensure that multiplications for memory allocations cannot overflow, and
that we'll not try to allocate M_WAITOK for potentially overly large
allocations.
MFC after: 1 week
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
There's no obvious limit to the request size for these, so we limit the
requests to something which won't overflow. Change the memory allocation
to M_NOWAIT so excessive requests will fail rather than stall forever.
MFC after: 1 week
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.
MFC after: 1 week
The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of
tables at a time, and as such try to allocate <number of tables> *
sizeof(struct pfr_table). This multiplication can overflow. Thanks to
mallocarray() this is not exploitable, but an overflow does panic the
system.
Arbitrarily limit this to 65535 tables. pfctl only ever processes one
table at a time, so it presents no issues there.
MFC after: 1 week
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.
Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.
Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715
pfioctl() handles several ioctl that takes variable length input, these
include:
- DIOCRADDTABLES
- DIOCRDELTABLES
- DIOCRGETTABLES
- DIOCRGETTSTATS
- DIOCRCLRTSTATS
- DIOCRSETTFLAGS
All of them take a pfioc_table struct as input from userland. One of
its elements (pfrio_size) is used in a buffer length calculation.
The calculation contains an integer overflow which if triggered can lead
to out of bound reads and writes later on.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
When running the vnet init code (pf_load_vnet()) we used to iterate over
all vnets, marking them as unhooked.
This is incorrect and leads to panics if pf is unloaded, as the unload
code does not unregister the pfil hooks (because the vnet is marked as
unhooked).
There's no need or reason to touch other vnets during initialisation.
Their pf_load_vnet() function will be triggered, which handles all
required initialisation.
Reviewed by: zec, gnn
Differential Revision: https://reviews.freebsd.org/D10592
vnet_pf_uninit() is called through vnet_deregister_sysuninit() and
linker_file_unload() when the pf module is unloaded. This is executed
after pf_unload() so we end up trying to take locks which have been
destroyed already.
Move pf_unload() to a separate SYSUNINIT() to ensure it's called after
all the vnet_pf_uninit() calls.
Differential Revision: https://reviews.freebsd.org/D10025
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.
Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,
Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026
Rules are unlinked in shutdown_pf(), so we must call
pf_unload_vnet_purge(), which frees unlinked rules, after that, not
before.
Reviewed by: eri, bz
Differential Revision: https://reviews.freebsd.org/D10040
When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep()
with it, because it would release a lock we do not hold. There's no need for the
lock either, so we can just tsleep().
While here also make the same change in pf_purge_thread(), because it explicitly
takes the lock before rw_sleep() and then immediately releases it afterwards.
The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code
allows to skip the in pfil hooks and completely removes the out pfil invoke,
albeit looking up a route that the IP stack will likely find on its own.
The code between IPv4 and IPv6 is also inconsistent and marked as "XXX"
for years.
Submitted by: Franco Fichtner <franco@opnsense.org>
Differential Revision: https://reviews.freebsd.org/D8058
pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers
expect to get E<foo> error codes.
Map the returns values. A pass is 0 (everything is OK), anything else means
pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP
error message.
PR: 207598
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.
Reviewed by: kp
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6924
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.
Reviewed by: allanjude, araujo
Approved by: re (gjb)
Obtained from: OpenBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D6786
In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs,
which is processed in pfr_set_addrs(). At the users request we also provide
feedback on the deleted addresses, by storing them after the new list
('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()).
This means we write outside the bounds of the buffer we've just allocated.
We need to look at pfrio_size2 instead (i.e. the size the user reserved for our
feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than
pfrio_size though, in which case we'd still read outside of the allocated
buffer. Instead we allocate the largest of the two values.
Reported By: Paul J Murphy <paul@inetstat.net>
PR: 207463
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D5426
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.
The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.
To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.
PR: 154428, 193579, 198868
Reviewed by: sbruno
MFC after: 1 week
Relnotes: yes
Sponsored by: RootBSD
Differential Revision: https://reviews.freebsd.org/D3779
The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).
Obtained from: pfSense
MFC after: 3 days
Sponsored by: Rubicon Communications (Netgate)