This will make future extensions of the API much easier.
The intent is to remove support for DIOCADDRULE in FreeBSD 14.
Reviewed by: markj (previous version), glebius (previous version)
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29557
When we find a state for packets that was created by a reply-to rule we
still need to process the packet. The state may require us to modify the
packet (e.g. in rdr or nat cases), which we won't do with the shortcut.
MFC after: 2 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
These two have proven to be useful during debugging. We may as well keep
them permanently.
Others will be added as their utility becomes clear.
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29555
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.
just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.
the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.
another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.
the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.
discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@
Obtained from: OpenBSD
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29554
Just like with the packet counters move the timekeeping information into
dn_cfg. This reduces the global name space use for dummynet and will
make subsequent work to add vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Different Revision: https://reviews.freebsd.org/D29246
Move the packets counters into the dn_cfg struct. This reduces the
global name space use for dummynet and will make future work for things
like vnet support and re-use in pf easier.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29245
PR: 254419
Reviewed by: gallatin, kp
Tested by: Igor A. Valkov <viaprog@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29378
When we request a bulk sync we need to ensure we actually send out that
request, not just buffer it until we have enough data to send a full
packet.
PR: 254236
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29271
stuct pf_pool and struct pf_kpool are different. We should not simply
bcopy() them.
Happily it turns out that their differences were all pointers, and the
userspace provided pointers were overwritten by the kernel, so this did
actually work correctly, but we should fix it anyway.
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29216
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.
We can remove it manually with intr_event_destroy().
PR: 254171
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29211
We can now counter_u64_free(NULL), so remove the checks.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29190
pf_kkif_free() already checks for NULL, so we don't have to check before
we call it.
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29195
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.
Use them where appropriate.
Reviewed by: ae (previous version), rscheff, tuexen, rgrimes
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29056
Teach pf to read the DSCP value from the IPv6 header so that we can
match on them.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29048
A sequence of overlapping IPv4 fragments could crash the kernel in
pf due to an assertion.
Reported by: Alexander Bluhm
Obtained from: OpenBSD
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
These should only fail if we use them incorrectly, so assert that they
succeed.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (“Netgate”’)
These functions always return 0, which is good, because the code calling
them doesn't handle this error gracefully.
As the functions always succeed remove their return value, and the code
handling their errors (because it was never executed anyway).
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (“Netgate”’)
to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user,
subnet) can have their own dedicated port aliasing ranges.
Reviewed by: donner, kp
Approved by: 0mp (mentor), donner, kp
Differential Revision: https://reviews.freebsd.org/D23450
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.
Reported by: syzbot+81e380344d4a6c37d78a@syzkaller.appspotmail.com
MFC after: 1 week
This was overlooked in the pfi_kkif/pfi_kif splitup and as a result
userspace could no longer tell which interfaces had the skip flag
applied.
MFC after: 2 weeks
Now that we've split up the datastructures used by the kernel and
userspace there's essentually no more overlap between the pf_ruleset.c
code used by userspace and kernelspace.
Copy the userspace bits to the pfctl directory and stop using the kernel
file.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27764
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.
The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:
x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962 9526356 9343902 9371057.6 116720.36
+ 5 19427190 19698400 19502922 19546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763
Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.
No functional change.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27762
This improves the cache behaviour of pf and results in improved
throughput.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27760
As part of the split between user and kernel mode structures we're
moving all user space usable definitions into pf.h.
No functional change intended.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27757
Introduce a kernel version of struct pf_src_node (pf_ksrc_node).
This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27707
This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.
pf_state is only used in the kernel, so can be safely modified.
Reviewed by: Lutz Donnerhacke, philip
MFC after: 1 week
Sponsed by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27661
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).
Import the OpenBSD fix for this issue.
PR: 240416
Obtained from: OpenBSD
MFC after: 1 week
Reviewed by: tuexen (previous version)
Differential Revision: https://reviews.freebsd.org/D27696
Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids usings getting caught out by the change from tunable
to run time configuration.
Suggested by: Franco Fichtner
MFC after: 3 days
The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.
Reviewed by: imp, emaste, bcr
Sponsored by: DARPA
When updating a table, pf will keep existing table entry structures
corresponding to addresses that are in both of the old and new tables.
However, the update may also enable or disable per-entry counters which
are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the
entries may be missing counters or may have unused counters allocated.
Fix the problem by modifying pfr_ina_commit() to transfer counters
from or to entries in the shadow table.
PR: 251414
Reported by: sigsys@gmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27440
tagname2tag() hashes the tag name before truncating it to 63 characters.
tag_unref() removes the tag from the name hash by computing the hash
over the truncated name. Ensure that both operations compute the same
hash for a given tag.
The larger issue is a lack of string validation in pf(4) ioctl handlers.
This is intended to be fixed with some future work, but an extra safety
belt in tagname2hashindex() is worthwhile regardless.
Reported by: syzbot+a0988828aafb00de7d68@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27346
We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set
PFRULE_SRCTRACK, so update the assertion to match.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27254