739 Commits

Author SHA1 Message Date
Kristof Provost
ea36212bf5 pf: Don't hold PF_RULES_WLOCK during copyin() on DIOCRCLRTSTATS
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.

Reported by:	syzbot+81e380344d4a6c37d78a@syzkaller.appspotmail.com
MFC after:	1 week
2021-01-13 19:49:42 +01:00
Kristof Provost
86b653ed7e pf: quiet debugging printfs
Only log these when debugging output is enabled.
2021-01-11 22:30:44 +01:00
Kristof Provost
0fcb03fbac pf: Copy kif flags to userspace
This was overlooked in the pfi_kkif/pfi_kif splitup and as a result
userspace could no longer tell which interfaces had the skip flag
applied.

MFC after:	2 weeks
2021-01-07 22:26:05 +01:00
Kristof Provost
fda7daf063 pfctl: Stop sharing pf_ruleset.c with the kernel
Now that we've split up the datastructures used by the kernel and
userspace there's essentually no more overlap between the pf_ruleset.c
code used by userspace and kernelspace.

Copy the userspace bits to the pfctl directory and stop using the kernel
file.

Reviewed by:	philip
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27764
2021-01-05 23:35:37 +01:00
Kristof Provost
5a3b9507d7 pf: Convert pfi_kkif to use counter_u64
Improve caching behaviour by using counter_u64 rather than variables
shared between cores.

The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:

x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
|                                                                        + |
| xx                                                                     + |
|xxx                                                                    +++|
||A|                                                                       |
|                                                                       |A||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       9216962       9526356       9343902     9371057.6     116720.36
+   5      19427190      19698400      19502922      19546509     109084.92
Difference at 95.0% confidence
        1.01755e+07 +/- 164756
        108.584% +/- 2.9359%
        (Student's t, pooled s = 112967)

Reviewed by:	philip
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27763
2021-01-05 23:35:37 +01:00
Kristof Provost
26c841e2a4 pf: Allocate and free pfi_kkif in separate functions
Factor out allocating and freeing pfi_kkif structures. This will be
useful when we change the counters to be counter_u64, so we don't have
to deal with that complexity in the multiple locations where we allocate
pfi_kkif structures.

No functional change.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27762
2021-01-05 23:35:37 +01:00
Kristof Provost
320c11165b pf: Split pfi_kif into a user and kernel space structure
No functional change.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27761
2021-01-05 23:35:37 +01:00
Kristof Provost
c3adacdad4 pf: Change pf_krule counters to use counter_u64
This improves the cache behaviour of pf and results in improved
throughput.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27760
2021-01-05 23:35:37 +01:00
Kristof Provost
e86bddea9f pf: Split pf_rule into kernel and user space versions
No functional change intended.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27758
2021-01-05 23:35:36 +01:00
Kristof Provost
dc865dae89 pf: Migrate pf_rule and related structs to pf.h
As part of the split between user and kernel mode structures we're
moving all user space usable definitions into pf.h.

No functional change intended.

MFC after:      2 weeks
Sponsored by:   Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27757
2021-01-05 23:35:36 +01:00
Kristof Provost
fbbf270eef pf: Use counter_u64 in pf_src_node
Reviewd by:	philip
MFC after:      2 weeks
Sponsored by:   Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27756
2021-01-05 23:35:36 +01:00
Kristof Provost
17ad7334ca pf: Split pf_src_node into a kernel and userspace struct
Introduce a kernel version of struct pf_src_node (pf_ksrc_node).

This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.

Reviewed by:	philip
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27707
2021-01-05 23:35:36 +01:00
Kristof Provost
1c00efe98e pf: Use counter(9) for pf_state byte/packet tracking
This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.

pf_state is only used in the kernel, so can be safely modified.

Reviewed by:	Lutz Donnerhacke, philip
MFC after:	1 week
Sponsed by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D27661
2020-12-23 12:03:21 +01:00
Kristof Provost
c3f69af03a pf: Fix unaligned checksum updates
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).

Import the OpenBSD fix for this issue.

PR:		240416
Obtained from:	OpenBSD
MFC after:	1 week
Reviewed by:	tuexen (previous version)
Differential Revision:	https://reviews.freebsd.org/D27696
2020-12-23 12:03:20 +01:00
Alexander V. Chernikov
3ad80c6531 Fix LINT-NOINET6 build after r368571.
Reported by:	mjg
2020-12-14 22:54:32 +00:00
Kristof Provost
3420068a73 pf: Allow net.pf.request_maxcount to be set from loader.conf
Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids usings getting caught out by the change from tunable
to run time configuration.

Suggested by:	Franco Fichtner
MFC after:	3 days
2020-12-12 20:14:39 +00:00
Alexander V. Chernikov
2616eaa3d9 Fix NOINET6 build broken by r368571. 2020-12-12 01:05:31 +00:00
Alexander V. Chernikov
4451d8939c ipfw kfib algo: Use rt accessors instead of accessing rib/rtentry directly.
This removes assumptions on prefix storage and rtentry layout
 from an external code.

Differential Revision:	https://reviews.freebsd.org/D27450
2020-12-11 23:57:30 +00:00
Brooks Davis
9ee99cec1f hme(4): Remove as previous announced
The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.

Reviewed by:	imp, emaste, bcr
Sponsored by:	DARPA
2020-12-11 21:40:38 +00:00
Mark Johnston
e6aed06fdf pf: Fix table entry counter toggling
When updating a table, pf will keep existing table entry structures
corresponding to addresses that are in both of the old and new tables.
However, the update may also enable or disable per-entry counters which
are allocated separately.  Thus when toggling PFR_TFLAG_COUNTERS, the
entries may be missing counters or may have unused counters allocated.

Fix the problem by modifying pfr_ina_commit() to transfer counters
from or to entries in the shadow table.

PR:		251414
Reported by:	sigsys@gmail.com
Reviewed by:	kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27440
2020-12-02 16:01:43 +00:00
Mark Johnston
5d49283f88 pf: Make tag hashing more robust
tagname2tag() hashes the tag name before truncating it to 63 characters.
tag_unref() removes the tag from the name hash by computing the hash
over the truncated name.  Ensure that both operations compute the same
hash for a given tag.

The larger issue is a lack of string validation in pf(4) ioctl handlers.
This is intended to be fixed with some future work, but an extra safety
belt in tagname2hashindex() is worthwhile regardless.

Reported by:	syzbot+a0988828aafb00de7d68@syzkaller.appspotmail.com
Reviewed by:	kp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27346
2020-11-24 16:18:47 +00:00
Kristof Provost
71c9acef8c pf: Fix incorrect assertion
We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set
PFRULE_SRCTRACK, so update the assertion to match.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27254
2020-11-20 10:08:33 +00:00
Andrey V. Elsukov
7ec2f6bce5 Add dtrace SDT probe ipfw:::rule-matched.
It helps to reduce complexity with debugging of large ipfw rulesets.
Also define several constants and translators, that can by used by
dtrace scripts with this probe.

Reviewed by:	gnn
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D26879
2020-10-21 15:01:33 +00:00
Andrey V. Elsukov
f909db0b19 Add IPv4 fragments reassembling to NAT64LSN.
NAT64LSN requires the presence of upper level protocol header
in a IPv4 datagram to find corresponding state to make translation.
Now it will be handled automatically by nat64lsn instance.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D26758
2020-10-13 18:57:42 +00:00
Kristof Provost
52b83a0618 pf: do not remove kifs that are referenced by rules
Even if a kif doesn't have an ifp or if_group pointer we still can't delete it
if it's referenced by a rule. In other words: we must check rulerefs as well.

While we're here also teach pfi_kif_unref() not to remove kifs with flags.

Reported-by: syzbot+b31d1d7e12c5d4d42f28@syzkaller.appspotmail.com
MFC after:   2 weeks
2020-10-13 11:04:00 +00:00
Kristof Provost
c9449e4fb8 pf: create a kif for flags
If userspace tries to set flags (e.g. 'set skip on <ifspec>') and <ifspec>
doesn't exist we should create a kif so that we apply the flags when the
<ifspec> does turn up.

Otherwise we'd end up in surprising situations where the rules say the
interface should be skipped, but it's not until the rules get re-applied.

Reviewed by:	Lutz Donnerhacke <lutz_donnerhacke.de>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26742
2020-10-12 12:39:37 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Ed Maste
5e79303ba1 ipfw: style(9) fixes
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Reviewed by:	emaste, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D26126
2020-08-20 16:56:13 +00:00
Gleb Smirnoff
825398f946 ipfw: make the "frag" keyword accept additional options "mf",
"df", "rf" and "offset".  This allows to match on specific
bits of ip_off field.

For compatibility reasons lack of keyword means "offset".

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D26021
2020-08-11 15:46:22 +00:00
Andrey V. Elsukov
aaef76e1fd Handle delayed checksums if needed in NAT64.
Upper level protocols defer checksums calculation in hope we have
checksums offloading in a network card. CSUM_DELAY_DATA flag is used
to determine that checksum calculation was deferred. And IP output
routine checks for this flag before pass mbuf to lower layer. Forwarded
packets have not this flag.

NAT64 uses checksums adjustment when it translates IP headers.
In most cases NAT64 is used for forwarded packets, but in case when it
handles locally originated packets we need to finish checksum calculation
that was deferred to correctly adjust it.

Add check for presence of CSUM_DELAY_DATA flag and finish checksum
calculation before adjustment.

Reported and tested by:	Evgeniy Khramtsov <evgeniy at khramtsov org>
MFC after:	1 week
2020-08-05 09:16:35 +00:00
Tom Jones
b2776a1809 Don't print VNET pointer when initializing dummynet
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.

PR:		238658
Submitted by:	huangfq.daxian@gmail.com
Reviewed by:	markj, bz, gnn, kp, melifaro
Approved by:	jtl (co-mentor), bz (co-mentor)
Event:		July 2020 Bugathon
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D25619
2020-07-13 13:35:36 +00:00
Alexander V. Chernikov
6ad7446c6f Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
 over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25445
2020-07-02 21:04:08 +00:00
Mark Johnston
1388cfe1b5 ipfw(4): make O_IPVER/ipversion match IPv4 or 6, not just IPv4.
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Reviewed by:	Lutz Donnerhacke
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25227
2020-06-24 15:46:33 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Eugene Grosbein
47cb0632e8 ipfw: unbreak matching with big table type flow.
Test case:

# n=32769
# ipfw -q table 1 create type flow:proto,dst-ip,dst-port
# jot -w 'table 1 add tcp,127.0.0.1,' $n 1 | ipfw -q /dev/stdin
# ipfw -q add 5 unreach filter-prohib flow 'table(1)'

The rule 5 matches nothing without the fix if n>=32769.

With the fix, it works:
# telnet localhost 10001
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Permission denied
telnet: Unable to connect to remote host

MFC after:	2 weeks
Discussed with: ae, melifaro
2020-06-04 14:15:39 +00:00
Andrey V. Elsukov
e43ae8dcb5 Fix O_IP_FLOW_LOOKUP opcode handling.
Do not check table value matching when table lookup has failed.

Reported by:	Sergey Lobanov
MFC after:	1 week
2020-05-29 10:37:42 +00:00
Mark Johnston
c1be839971 pf: Add a new zone for per-table entry counters.
Right now we optionally allocate 8 counters per table entry, so in
addition to memory consumed by counters, we require 8 pointers worth of
space in each entry even when counters are not allocated (the default).

Instead, define a UMA zone that returns contiguous per-CPU counter
arrays for use in table entries.  On amd64 this reduces sizeof(struct
pfr_kentry) from 216 to 160.  The smaller size also results in better
slab efficiency, so memory usage for large tables is reduced by about
28%.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24843
2020-05-16 00:28:12 +00:00
Mark Johnston
21121f9bbe pf: Don't allocate per-table entry counters unless required.
pf by default does not do per-table address accounting unless the
"counters" keyword is specified in the corresponding pf.conf table
definition.  Yet, we always allocate 12 per-CPU counters per table.  For
large tables this carries a lot of overhead, so only allocate counters
when they will actually be used.

A further enhancement might be to use a dedicated UMA zone to allocate
counter arrays for table entries, since close to half of the structure
size comes from counter pointers.  A related issue is the cost of
zeroing counters, since counter_u64_zero() calls smp_rendezvous() on
some architectures.

Reported by:	loos, Jim Pingle <jimp@netgate.com>
Reviewed by:	kp
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D24803
2020-05-11 18:47:38 +00:00
Kristof Provost
1ef06ed8de pf: Improve DIOCADDRULE validation
We expect the addrwrap.p.dyn value to be set to NULL (and assert such),
but do not verify it on input.

Reported-by:	syzbot+936a89182e7d8f927de1@syzkaller.appspotmail.com
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24538
2020-05-03 16:09:35 +00:00
Ed Maste
db462d948f ipfw: whitespace fix in SCTP_ABORT_ASSOCIATION case statement comment
Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Reviewed by:	rgrimes, tuexen
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24602
2020-05-03 03:44:16 +00:00
Alexander V. Chernikov
e7d8af4f65 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
Kristof Provost
df03977dd8 pf: Virtualise pf_frag_mtx
The pf_frag_mtx mutex protects the fragments queue. The fragments queue
is virtualised already (i.e. per-vnet) so it makes no sense to block
jail A from accessing its fragments queue while jail B is accessing its
own fragments queue.

Virtualise the lock for improved concurrency.

Differential Revision:	https://reviews.freebsd.org/D24504
2020-04-26 16:30:00 +00:00
Kristof Provost
a7c8533634 pf: Improve input validation
If we pass an anchor name which doesn't exist pfr_table_count() returns
-1, which leads to an overflow in mallocarray() and thus a panic.

Explicitly check that pfr_table_count() does not return an error.

Reported-by:	syzbot+bd09d55d897d63d5f4f4@syzkaller.appspotmail.com
Reviewed by:	melifaro
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24539
2020-04-26 16:16:39 +00:00
Kristof Provost
98582ce381 pf: Improve ioctl() input validation
Both DIOCCHANGEADDR and DIOCADDADDR take a struct pf_pooladdr from
userspace. They failed to validate the dyn pointer contained in its
struct pf_addr_wrap member structure.

This triggered assertion failures under fuzz testing in
pfi_dynaddr_setup(). Happily the dyn variable was overruled there, but
we should verify that it's set to NULL anyway.

Reported-by:	syzbot+93e93150bc29f9b4b85f@syzkaller.appspotmail.com
Reviewed by:	emaste
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24431
2020-04-19 16:10:20 +00:00
Kristof Provost
95324dc3f4 pf: Do not allow negative ps_len in DIOCGETSTATES
Userspace may pass a negative ps_len value to us, which causes an
assertion failure in malloc().
Treat negative values as zero, i.e. return the required size.

Reported-by:	syzbot+53370d9d0358ee2a059a@syzkaller.appspotmail.com
Reviewed by:	lutz at donnerhacke.de
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D24447
2020-04-17 14:35:11 +00:00
Alexander V. Chernikov
643ce94878 Convert pf rtable checks to the new routing KPI.
Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D24386
2020-04-15 13:00:48 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Pawel Biernacki
10b49b2302 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by:	kp
Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23634
2020-02-21 16:23:00 +00:00
Hans Petter Selasky
fbb890056e Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

This patch extends r357772.

Differential Revision:	https://reviews.freebsd.org/D23742
Reviewed by:	glebius@
Sponsored by:	Mellanox Technologies
2020-02-18 19:53:36 +00:00
Hans Petter Selasky
b4426a7175 Add missing EPOCH(9) wrapper in ipfw(8).
Backtrace:
panic()
ip_output()
dyn_tick()
softclock_call_cc()
softclock()
ithread_loop()

Differential Revision:	https://reviews.freebsd.org/D23599
Reviewed by:	glebius@ and ae@
Found by:	mmacy@
Reported by:	jmd@
Sponsored by:	Mellanox Technologies
2020-02-11 18:16:29 +00:00