Instead serialize against these operations with a dedicated lock.
Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln. With
the change there is no slowdown.
Reviewed by: kp (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.
Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.
Sample timings when loading a config with 100k tables on a 104-way box:
stock:
pfctl -f tables100000.conf 0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf 0.40s user 68.14s system 99% cpu 1:08.54 total
patched:
pfctl -f tables100000.conf 0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf 0.48s user 6.47s system 99% cpu 6.949 total
Reviewed by: kp (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
This call is particularly slow due to the large amount of data it
returns. Remove all fields pfctl does not use. There is no functional
impact to pfctl, but it somewhat speeds up the call.
It might affect other (i.e. non-FreeBSD) code that uses the new
interface, but this call is very new, so there's unlikely to be any. No
releases contained the previous version, so we choose to live with the
ABI modification.
Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30944
The sysctl nodes which use V_dn_cfg must be marked as CTLFLAG_VNET so
that we use the correct per-vnet offset
PR: 256819
Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30974
stats are not shared and consequently per-CPU counters only waste
memory.
No slowdown was measured when passing over 20M pps.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
ipfw_chk() might call m_pullup() and thus can change the mbuf chain
head. In this case, the new chain head has to be returned to the pfil
hook caller, otherwise the pfil hook caller is left with a dangling
pointer.
Note that this affects only the link-layer hooks installed when the
net.link.ether.ipfw sysctl is set to 1.
PR: 256439, 254015, 255069, 255104
Fixes: f355cb3e6
Reviewed by: ae
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30764
Rather than pointers to the headers store full copies. This brings us
slightly closer to what OpenBSD does, and also makes more sense than
storing pointers to stack variable copies of the headers.
Reviewed by: donner, scottl
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30719
copyout() can trigger page faults, so it may potentially sleep.
Reported by: avg
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
In the ioctl path use M_WAITOK allocations whereever possible. These are
less sensitive to memory pressure, and ioctl requests have no hard
deadlines.
Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30702
There's no need to check pointers for NULL before free()ing them.
No functional change.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30382
These are global (i.e. shared across vnets) structures, so we need
global lock to protect them. However, we look up entries in these lists
(find_aqm_type(), find_sched_type()) and return them. We must ensure
that the returned structures cannot go away while we are using them.
Resolve this by using NET_EPOCH(). The structures can be safely accessed
under it, and we postpone their cleanup until we're sure they're no
longer used.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30381
This moves dn_cfg and other parameters into per VNET variables.
The taskqueue and control state remains global.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29274
There is padding between pfr_astats.pfras_a and pfras_packets that was
not getting initialized.
Reported by: KMSAN
Reviewed by: kp, imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30585
We must also remember to free nvlists added to a parent nvlist with
nvlist_append_nvlist_array().
More importantly, when nvlist_pack() allocates memory for us it does so
in the M_NVLIST zone, so we must free it with free(.., M_NVLIST). Using
free(.., M_TEMP) as we did silently failed to free the memory.
MFC after: 3 days
Reported by: kib@
Tested by: kib@
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30595
This simplifies life a bit, by not requiring us to repease the
declaration for every file where we want static probe points.
It also makes the gcc6 build happy.
Add _opt() variants for the uint* functions. These functions set the
provided default value if the nvlist doesn't contain the relevant value.
This is helpful for optional values (e.g. when the API is extended to
add new fields).
While here simplify the header by also using macros to create the
prototypes for the macro-generated function implementations.
Reviewed by: scottl
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30510
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30359
When we create an nvlist and insert it into another nvlist we must
remember to destroy it. The nvlist_add_nvlist() function makes a copy,
just like nvlist_add_string() makes a copy of the string. If we don't
we're leaking memory on every (nvlist-based) ioctl() call.
While here remove two redundant 'break' statements.
PR: 255971
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Floating states get assigned to interface 'all' (V_pfi_all), so when we
try to flush all states for an interface states originally created
through this interface are not flushed. Only if-bound states can be
flushed in this way.
Given that we track the original interface we can check if the state's
interface is 'all', and if so compare to the orig_if instead.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30246
Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30245
We never set 'busy' and never dequeue from the pending mq. Remove this
code.
Reviewed by: ae
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30313
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.
PR: 255852
Reviewed by: donner
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30284
This allows us to kill states created from a rule with route-to/reply-to
set. This is particularly useful in multi-wan setups, where one of the
WAN links goes down.
Submitted by: Steven Brown
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30058
Introduce an nvlist based alternative to DIOCKILLSTATES.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30054
If we reassemble a packet we modify the IP header (to set the length and
remove the fragment offset information), but we failed to update the
checksum. On certain setups (mostly where we did not re-fragment again
afterwards) this could lead to us sending out packets with incorrect
checksums.
PR: 255432
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30026
When parsing the nvlist for a struct pf_addr_wrap we unconditionally
tried to parse "ifname". This broke for PF_ADDR_TABLE when the table
name was longer than IFNAMSIZ. PF_TABLE_NAME_SIZE is longer than
IFNAMSIZ, so this is a valid configuration.
Only parse (or return) ifname or tblname for the corresponding
pf_addr_wrap type.
This manifested as a failure to set rules such as these, where the pfctl
optimiser generated an automatic table:
pass in proto tcp to 192.168.0.1 port ssh
pass in proto tcp to 192.168.0.2 port ssh
pass in proto tcp to 192.168.0.3 port ssh
pass in proto tcp to 192.168.0.4 port ssh
pass in proto tcp to 192.168.0.5 port ssh
pass in proto tcp to 192.168.0.6 port ssh
pass in proto tcp to 192.168.0.7 port ssh
Reported by: Florian Smeets
Tested by: Florian Smeets
Reviewed by: donner
X-MFC-With: 5c11c5a3655842a176124ef2334fcdf830422c8a
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29962
Add 'syncok' field to ifconfig's pfsync interface output. This allows
userspace to figure out when pfsync has completed the initial bulk
import.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29948
Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29936
Also add an M_ASSERTMAPPED() macro to verify that all mbufs in the chain
are mapped. Use it in ipfw_nat, which operates on a chain returned by
m_megapullup().
PR: 255164
Reviewed by: ae, gallatin
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29838
Extract the state killing code from pfioctl() and rephrase the filtering
conditions for readability.
No functional change intended.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29795
Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29780
PFRULE_REFS should never be used by userspace, so hide it behind #ifdef
_KERNEL.
MFC after: never
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29779
Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29778
Use M_NOWAIT flag when hash growing is called from callout.
PR: 255041
Reviewed by: kevans
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D29772
MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.
PR: 254577
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29468