This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.
Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2
OpenBSD commit message:
Use a 32 bit variable to detect integer overflow when searching for
an unused nat port. Prevents a possible endless loop if high port
is 65535 or low port is 0.
report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c
PR: 221201
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: OpenBSD via ElectroBSD
MFC after: 1 week
pf_purge_thread() breaks up the work of iterating all states (in
pf_purge_expired_states()) and tracks progress in the idx variable.
If multiple vnets exist this results in pf_purge_thread() only calling
pf_purge_expired_states() for part of the states (the first part of the
first vnet, second part of the second vnet and so on).
Combined with the mark-and-sweep approach to cleaning up old rules (in
V_pf_unlinked_rules) that resulted in pf freeing rules that were still
referenced by states. This in turn caused panics when pf_state_expires()
encounters that state and attempts to access the rule.
We need to track the progress per vnet, not globally, so idx is moved
into a per-vnet V_pf_purge_idx.
PR: 219251
Sponsored by: Hackathon Essen 2017
When running the vnet init code (pf_load_vnet()) we used to iterate over
all vnets, marking them as unhooked.
This is incorrect and leads to panics if pf is unloaded, as the unload
code does not unregister the pfil hooks (because the vnet is marked as
unhooked).
There's no need or reason to touch other vnets during initialisation.
Their pf_load_vnet() function will be triggered, which handles all
required initialisation.
Reviewed by: zec, gnn
Differential Revision: https://reviews.freebsd.org/D10592
vnet_pf_uninit() is called through vnet_deregister_sysuninit() and
linker_file_unload() when the pf module is unloaded. This is executed
after pf_unload() so we end up trying to take locks which have been
destroyed already.
Move pf_unload() to a separate SYSUNINIT() to ensure it's called after
all the vnet_pf_uninit() calls.
Differential Revision: https://reviews.freebsd.org/D10025
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.
For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).
We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.
Reported by: Antonios Atlasis <aatlasis at secfu net>
MFC after: 3 days
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.
Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,
Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026
In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).
This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.
PR: 217883
Submitted by: Kajetan Staszkiewicz
MFC after: 1 week
Sponsored by: InnoGames GmbH
Rules are unlinked in shutdown_pf(), so we must call
pf_unload_vnet_purge(), which frees unlinked rules, after that, not
before.
Reviewed by: eri, bz
Differential Revision: https://reviews.freebsd.org/D10040
When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep()
with it, because it would release a lock we do not hold. There's no need for the
lock either, so we can just tsleep().
While here also make the same change in pf_purge_thread(), because it explicitly
takes the lock before rw_sleep() and then immediately releases it afterwards.
If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's
no more memory for it) it frees skp. This is wrong, because skp is a
pf_state_key **, so we need to free *skp, as is done later in the function.
Getting it wrong means we try to free a stack variable of the calling
pf_test_rule() function, and we panic.
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.
This code had an INET6 conditional before this commit, but opt_inet6.h
was not included, so INET6 was never defined. Apparently, pf's OS
fingerprinting hasn't worked with IPv6 for quite some time.
This commit might fix it, but I didn't test that.
Reviewed by: gnn, kp
MFC after: 2 weeks
Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6)
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9625
initialized, this can cause a divide by zero (if the VNET initialization
takes to long to complete).
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Instead of taking an extra reference to deal with pfsync_q_ins()
and pfsync_q_del() taken and dropping a reference (resp,) make
it optional of those functions to take or drop a reference by
passing an extra argument.
Submitted by: glebius@
subrulenr is considered unset if it's set to -1, not if it's set to 1.
See contrib/tcpdump/print-pflog.c pflog_print() for a user.
This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0):
rule 0..16777216(match)
instead of the correct output of
rule 0/0(match)
PR: 214832
Submitted by: andywhite@gmail.com
Use after free happens for state that is deleted. The reference
count is what prevents the state from being freed. When the
state is dequeued, the reference count is dropped and the memory
freed. We can't dereference the next pointer or re-queue the
state.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D8671
Ignore the ECN bits on 'tos' and 'set-tos' and allow to use
DCSP names instead of having to embed their TOS equivalents
as plain numbers.
Obtained from: OpenBSD
Sponsored by: OPNsense
Differential Revision: https://reviews.freebsd.org/D8165
The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code
allows to skip the in pfil hooks and completely removes the out pfil invoke,
albeit looking up a route that the IP stack will likely find on its own.
The code between IPv4 and IPv6 is also inconsistent and marked as "XXX"
for years.
Submitted by: Franco Fichtner <franco@opnsense.org>
Differential Revision: https://reviews.freebsd.org/D8058
While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming
for the interface address list in sctp_bsd_addr.c
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D8051
Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.
PR: 211796
Obtained from: OpenBSD (sthen)
MFC after: 3 days
pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers
expect to get E<foo> error codes.
Map the returns values. A pass is 0 (everything is OK), anything else means
pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP
error message.
PR: 207598
teardown of VNETs once pf(4) has been shut down.
Properly split resources into VNET_SYS(UN)INITs and one time module
loading.
While here cover the INET parts in the uninit callpath with proper
#ifdefs.
Approved by: re (gjb)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
borrow pf's lock, and also make sure pflog goes after pf is gone
in order to avoid callouts in VNETs to an already freed instance.
Reported by: Ivan Klymenko, Johan Hendriks on current@ today
Obtained from: projects/vnet
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
Approved by: re (gjb)
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.
Reviewed by: kp
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6924
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.
Reviewed by: allanjude, araujo
Approved by: re (gjb)
Obtained from: OpenBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D6786
We were inconsistent about the use of time_second vs. time_uptime.
Always use time_uptime so the value can be meaningfully compared.
Submitted by: "Max" <maximos@als.nnov.ru>
MFC after: 4 days
When we guess the nature of the outbound packet (output vs. forwarding) we need
to take bridges into account. When bridging the input interface does not match
the output interface, but we're not forwarding. Similarly, it's possible for the
interface to actually be the bridge interface itself (and not a member interface).
PR: 202351
MFC after: 2 weeks
In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs,
which is processed in pfr_set_addrs(). At the users request we also provide
feedback on the deleted addresses, by storing them after the new list
('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()).
This means we write outside the bounds of the buffer we've just allocated.
We need to look at pfrio_size2 instead (i.e. the size the user reserved for our
feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than
pfrio_size though, in which case we'd still read outside of the allocated
buffer. Instead we allocate the largest of the two values.
Reported By: Paul J Murphy <paul@inetstat.net>
PR: 207463
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D5426
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.
So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.
Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).
New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.
MFC after: 1 Month with associated callout change.
r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.