We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.
Reported by: syzbot+81e380344d4a6c37d78a@syzkaller.appspotmail.com
MFC after: 1 week
This was overlooked in the pfi_kkif/pfi_kif splitup and as a result
userspace could no longer tell which interfaces had the skip flag
applied.
MFC after: 2 weeks
Now that we've split up the datastructures used by the kernel and
userspace there's essentually no more overlap between the pf_ruleset.c
code used by userspace and kernelspace.
Copy the userspace bits to the pfctl directory and stop using the kernel
file.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27764
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.
The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:
x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962 9526356 9343902 9371057.6 116720.36
+ 5 19427190 19698400 19502922 19546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763
Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.
No functional change.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27762
This improves the cache behaviour of pf and results in improved
throughput.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27760
As part of the split between user and kernel mode structures we're
moving all user space usable definitions into pf.h.
No functional change intended.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27757
Introduce a kernel version of struct pf_src_node (pf_ksrc_node).
This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.
Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27707
This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.
pf_state is only used in the kernel, so can be safely modified.
Reviewed by: Lutz Donnerhacke, philip
MFC after: 1 week
Sponsed by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27661
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).
Import the OpenBSD fix for this issue.
PR: 240416
Obtained from: OpenBSD
MFC after: 1 week
Reviewed by: tuexen (previous version)
Differential Revision: https://reviews.freebsd.org/D27696
Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids usings getting caught out by the change from tunable
to run time configuration.
Suggested by: Franco Fichtner
MFC after: 3 days
The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.
Reviewed by: imp, emaste, bcr
Sponsored by: DARPA
When updating a table, pf will keep existing table entry structures
corresponding to addresses that are in both of the old and new tables.
However, the update may also enable or disable per-entry counters which
are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the
entries may be missing counters or may have unused counters allocated.
Fix the problem by modifying pfr_ina_commit() to transfer counters
from or to entries in the shadow table.
PR: 251414
Reported by: sigsys@gmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27440
tagname2tag() hashes the tag name before truncating it to 63 characters.
tag_unref() removes the tag from the name hash by computing the hash
over the truncated name. Ensure that both operations compute the same
hash for a given tag.
The larger issue is a lack of string validation in pf(4) ioctl handlers.
This is intended to be fixed with some future work, but an extra safety
belt in tagname2hashindex() is worthwhile regardless.
Reported by: syzbot+a0988828aafb00de7d68@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27346
We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set
PFRULE_SRCTRACK, so update the assertion to match.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27254
It helps to reduce complexity with debugging of large ipfw rulesets.
Also define several constants and translators, that can by used by
dtrace scripts with this probe.
Reviewed by: gnn
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26879
NAT64LSN requires the presence of upper level protocol header
in a IPv4 datagram to find corresponding state to make translation.
Now it will be handled automatically by nat64lsn instance.
Reviewed by: melifaro
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26758
Even if a kif doesn't have an ifp or if_group pointer we still can't delete it
if it's referenced by a rule. In other words: we must check rulerefs as well.
While we're here also teach pfi_kif_unref() not to remove kifs with flags.
Reported-by: syzbot+b31d1d7e12c5d4d42f28@syzkaller.appspotmail.com
MFC after: 2 weeks
If userspace tries to set flags (e.g. 'set skip on <ifspec>') and <ifspec>
doesn't exist we should create a kif so that we apply the flags when the
<ifspec> does turn up.
Otherwise we'd end up in surprising situations where the rules say the
interface should be skipped, but it's not until the rules get re-applied.
Reviewed by: Lutz Donnerhacke <lutz_donnerhacke.de>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26742
"df", "rf" and "offset". This allows to match on specific
bits of ip_off field.
For compatibility reasons lack of keyword means "offset".
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D26021
Upper level protocols defer checksums calculation in hope we have
checksums offloading in a network card. CSUM_DELAY_DATA flag is used
to determine that checksum calculation was deferred. And IP output
routine checks for this flag before pass mbuf to lower layer. Forwarded
packets have not this flag.
NAT64 uses checksums adjustment when it translates IP headers.
In most cases NAT64 is used for forwarded packets, but in case when it
handles locally originated packets we need to finish checksum calculation
that was deferred to correctly adjust it.
Add check for presence of CSUM_DELAY_DATA flag and finish checksum
calculation before adjustment.
Reported and tested by: Evgeniy Khramtsov <evgeniy at khramtsov org>
MFC after: 1 week
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.
PR: 238658
Submitted by: huangfq.daxian@gmail.com
Reviewed by: markj, bz, gnn, kp, melifaro
Approved by: jtl (co-mentor), bz (co-mentor)
Event: July 2020 Bugathon
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D25619
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
This is in preparation for enabling a loadable SCTP stack. Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.
Discussed with: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Right now we optionally allocate 8 counters per table entry, so in
addition to memory consumed by counters, we require 8 pointers worth of
space in each entry even when counters are not allocated (the default).
Instead, define a UMA zone that returns contiguous per-CPU counter
arrays for use in table entries. On amd64 this reduces sizeof(struct
pfr_kentry) from 216 to 160. The smaller size also results in better
slab efficiency, so memory usage for large tables is reduced by about
28%.
Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24843
pf by default does not do per-table address accounting unless the
"counters" keyword is specified in the corresponding pf.conf table
definition. Yet, we always allocate 12 per-CPU counters per table. For
large tables this carries a lot of overhead, so only allocate counters
when they will actually be used.
A further enhancement might be to use a dedicated UMA zone to allocate
counter arrays for table entries, since close to half of the structure
size comes from counter pointers. A related issue is the cost of
zeroing counters, since counter_u64_zero() calls smp_rendezvous() on
some architectures.
Reported by: loos, Jim Pingle <jimp@netgate.com>
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D24803
Nexthop objects implementation, defined in r359823,
introduced sys/net/route directory intended to hold all
routing-related code. Move recently-introduced route_temporal.c and
private route_var.h header there.
Differential Revision: https://reviews.freebsd.org/D24597
The pf_frag_mtx mutex protects the fragments queue. The fragments queue
is virtualised already (i.e. per-vnet) so it makes no sense to block
jail A from accessing its fragments queue while jail B is accessing its
own fragments queue.
Virtualise the lock for improved concurrency.
Differential Revision: https://reviews.freebsd.org/D24504
If we pass an anchor name which doesn't exist pfr_table_count() returns
-1, which leads to an overflow in mallocarray() and thus a panic.
Explicitly check that pfr_table_count() does not return an error.
Reported-by: syzbot+bd09d55d897d63d5f4f4@syzkaller.appspotmail.com
Reviewed by: melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24539
Both DIOCCHANGEADDR and DIOCADDADDR take a struct pf_pooladdr from
userspace. They failed to validate the dyn pointer contained in its
struct pf_addr_wrap member structure.
This triggered assertion failures under fuzz testing in
pfi_dynaddr_setup(). Happily the dyn variable was overruled there, but
we should verify that it's set to NULL anyway.
Reported-by: syzbot+93e93150bc29f9b4b85f@syzkaller.appspotmail.com
Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24431
Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D24386
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
Mark all nodes in pf, pfsync and carp as MPSAFE.
Reviewed by: kp
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23634