freebsd-skq

Author	SHA1	Message	Date
Kristof Provost	5a3b9507d7	pf: Convert pfi_kkif to use counter_u64 Improve caching behaviour by using counter_u64 rather than variables shared between cores. The result of converting all counters to counter(9) (i.e. this full patch series) is a significant improvement in throughput. As tested by olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see: x FreeBSD 20201223: inet packets-per-second + FreeBSD 20201223 with pf patches: inet packets-per-second +--------------------------------------------------------------------------+ \| + \| \| xx + \| \|xxx +++\| \|\|A\| \| \| \|A\|\| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 9216962 9526356 9343902 9371057.6 116720.36 + 5 19427190 19698400 19502922 19546509 109084.92 Difference at 95.0% confidence 1.01755e+07 +/- 164756 108.584% +/- 2.9359% (Student's t, pooled s = 112967) Reviewed by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27763	2021-01-05 23:35:37 +01:00
Kristof Provost	26c841e2a4	pf: Allocate and free pfi_kkif in separate functions Factor out allocating and freeing pfi_kkif structures. This will be useful when we change the counters to be counter_u64, so we don't have to deal with that complexity in the multiple locations where we allocate pfi_kkif structures. No functional change. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27762	2021-01-05 23:35:37 +01:00
Kristof Provost	320c11165b	pf: Split pfi_kif into a user and kernel space structure No functional change. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27761	2021-01-05 23:35:37 +01:00
Kristof Provost	c3adacdad4	pf: Change pf_krule counters to use counter_u64 This improves the cache behaviour of pf and results in improved throughput. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27760	2021-01-05 23:35:37 +01:00
Kristof Provost	e86bddea9f	pf: Split pf_rule into kernel and user space versions No functional change intended. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27758	2021-01-05 23:35:36 +01:00
Kristof Provost	dc865dae89	pf: Migrate pf_rule and related structs to pf.h As part of the split between user and kernel mode structures we're moving all user space usable definitions into pf.h. No functional change intended. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27757	2021-01-05 23:35:36 +01:00
Kristof Provost	fbbf270eef	pf: Use counter_u64 in pf_src_node Reviewd by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27756	2021-01-05 23:35:36 +01:00
Kristof Provost	17ad7334ca	pf: Split pf_src_node into a kernel and userspace struct Introduce a kernel version of struct pf_src_node (pf_ksrc_node). This will allow us to improve the in-kernel data structure without breaking userspace compatibility. Reviewed by: philip MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27707	2021-01-05 23:35:36 +01:00
Kristof Provost	1c00efe98e	pf: Use counter(9) for pf_state byte/packet tracking This improves cache behaviour by not writing to the same variable from multiple cores simultaneously. pf_state is only used in the kernel, so can be safely modified. Reviewed by: Lutz Donnerhacke, philip MFC after: 1 week Sponsed by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D27661	2020-12-23 12:03:21 +01:00
Kristof Provost	c3f69af03a	pf: Fix unaligned checksum updates The algorithm we use to update checksums only works correctly if the updated data is aligned on 16-bit boundaries (relative to the start of the packet). Import the OpenBSD fix for this issue. PR: 240416 Obtained from: OpenBSD MFC after: 1 week Reviewed by: tuexen (previous version) Differential Revision: https://reviews.freebsd.org/D27696	2020-12-23 12:03:20 +01:00
Kristof Provost	3420068a73	pf: Allow net.pf.request_maxcount to be set from loader.conf Mark request_maxcount as RWTUN so we can set it both at runtime and from loader.conf. This avoids usings getting caught out by the change from tunable to run time configuration. Suggested by: Franco Fichtner MFC after: 3 days	2020-12-12 20:14:39 +00:00
Brooks Davis	9ee99cec1f	hme(4): Remove as previous announced The hme (Happy Meal Ethernet) driver was the onboard NIC in most supported sparc64 platforms. A few PCI NICs do exist, but we have seen no evidence of use on non-sparc systems. Reviewed by: imp, emaste, bcr Sponsored by: DARPA	2020-12-11 21:40:38 +00:00
Mark Johnston	e6aed06fdf	pf: Fix table entry counter toggling When updating a table, pf will keep existing table entry structures corresponding to addresses that are in both of the old and new tables. However, the update may also enable or disable per-entry counters which are allocated separately. Thus when toggling PFR_TFLAG_COUNTERS, the entries may be missing counters or may have unused counters allocated. Fix the problem by modifying pfr_ina_commit() to transfer counters from or to entries in the shadow table. PR: 251414 Reported by: sigsys@gmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27440	2020-12-02 16:01:43 +00:00
Mark Johnston	5d49283f88	pf: Make tag hashing more robust tagname2tag() hashes the tag name before truncating it to 63 characters. tag_unref() removes the tag from the name hash by computing the hash over the truncated name. Ensure that both operations compute the same hash for a given tag. The larger issue is a lack of string validation in pf(4) ioctl handlers. This is intended to be fixed with some future work, but an extra safety belt in tagname2hashindex() is worthwhile regardless. Reported by: syzbot+a0988828aafb00de7d68@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27346	2020-11-24 16:18:47 +00:00
Kristof Provost	71c9acef8c	pf: Fix incorrect assertion We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set PFRULE_SRCTRACK, so update the assertion to match. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27254	2020-11-20 10:08:33 +00:00
Kristof Provost	52b83a0618	pf: do not remove kifs that are referenced by rules Even if a kif doesn't have an ifp or if_group pointer we still can't delete it if it's referenced by a rule. In other words: we must check rulerefs as well. While we're here also teach pfi_kif_unref() not to remove kifs with flags. Reported-by: syzbot+b31d1d7e12c5d4d42f28@syzkaller.appspotmail.com MFC after: 2 weeks	2020-10-13 11:04:00 +00:00
Kristof Provost	c9449e4fb8	pf: create a kif for flags If userspace tries to set flags (e.g. 'set skip on <ifspec>') and <ifspec> doesn't exist we should create a kif so that we apply the flags when the <ifspec> does turn up. Otherwise we'd end up in surprising situations where the rules say the interface should be skipped, but it's not until the rules get re-applied. Reviewed by: Lutz Donnerhacke <lutz_donnerhacke.de> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26742	2020-10-12 12:39:37 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Mark Johnston	95033af923	Add the SCTP_SUPPORT kernel option. This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2020-06-18 19:32:34 +00:00
Mark Johnston	c1be839971	pf: Add a new zone for per-table entry counters. Right now we optionally allocate 8 counters per table entry, so in addition to memory consumed by counters, we require 8 pointers worth of space in each entry even when counters are not allocated (the default). Instead, define a UMA zone that returns contiguous per-CPU counter arrays for use in table entries. On amd64 this reduces sizeof(struct pfr_kentry) from 216 to 160. The smaller size also results in better slab efficiency, so memory usage for large tables is reduced by about 28%. Reviewed by: kp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24843	2020-05-16 00:28:12 +00:00
Mark Johnston	21121f9bbe	pf: Don't allocate per-table entry counters unless required. pf by default does not do per-table address accounting unless the "counters" keyword is specified in the corresponding pf.conf table definition. Yet, we always allocate 12 per-CPU counters per table. For large tables this carries a lot of overhead, so only allocate counters when they will actually be used. A further enhancement might be to use a dedicated UMA zone to allocate counter arrays for table entries, since close to half of the structure size comes from counter pointers. A related issue is the cost of zeroing counters, since counter_u64_zero() calls smp_rendezvous() on some architectures. Reported by: loos, Jim Pingle <jimp@netgate.com> Reviewed by: kp MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate) Differential Revision: https://reviews.freebsd.org/D24803	2020-05-11 18:47:38 +00:00
Kristof Provost	1ef06ed8de	pf: Improve DIOCADDRULE validation We expect the addrwrap.p.dyn value to be set to NULL (and assert such), but do not verify it on input. Reported-by: syzbot+936a89182e7d8f927de1@syzkaller.appspotmail.com Reviewed by: melifaro (previous version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24538	2020-05-03 16:09:35 +00:00
Kristof Provost	df03977dd8	pf: Virtualise pf_frag_mtx The pf_frag_mtx mutex protects the fragments queue. The fragments queue is virtualised already (i.e. per-vnet) so it makes no sense to block jail A from accessing its fragments queue while jail B is accessing its own fragments queue. Virtualise the lock for improved concurrency. Differential Revision: https://reviews.freebsd.org/D24504	2020-04-26 16:30:00 +00:00
Kristof Provost	a7c8533634	pf: Improve input validation If we pass an anchor name which doesn't exist pfr_table_count() returns -1, which leads to an overflow in mallocarray() and thus a panic. Explicitly check that pfr_table_count() does not return an error. Reported-by: syzbot+bd09d55d897d63d5f4f4@syzkaller.appspotmail.com Reviewed by: melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24539	2020-04-26 16:16:39 +00:00
Kristof Provost	98582ce381	pf: Improve ioctl() input validation Both DIOCCHANGEADDR and DIOCADDADDR take a struct pf_pooladdr from userspace. They failed to validate the dyn pointer contained in its struct pf_addr_wrap member structure. This triggered assertion failures under fuzz testing in pfi_dynaddr_setup(). Happily the dyn variable was overruled there, but we should verify that it's set to NULL anyway. Reported-by: syzbot+93e93150bc29f9b4b85f@syzkaller.appspotmail.com Reviewed by: emaste MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24431	2020-04-19 16:10:20 +00:00
Kristof Provost	95324dc3f4	pf: Do not allow negative ps_len in DIOCGETSTATES Userspace may pass a negative ps_len value to us, which causes an assertion failure in malloc(). Treat negative values as zero, i.e. return the required size. Reported-by: syzbot+53370d9d0358ee2a059a@syzkaller.appspotmail.com Reviewed by: lutz at donnerhacke.de MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24447	2020-04-17 14:35:11 +00:00
Alexander V. Chernikov	643ce94878	Convert pf rtable checks to the new routing KPI. Switch uRPF to use specific fib(9)-provided uRPF. Switch MSS calculation to the latest fib(9) kpi. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D24386	2020-04-15 13:00:48 +00:00
Pawel Biernacki	10b49b2302	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. Mark all nodes in pf, pfsync and carp as MPSAFE. Reviewed by: kp Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23634	2020-02-21 16:23:00 +00:00
Kristof Provost	e3e03bc159	pf: Apply kif flags to new group members If we have a 'set skip on <ifgroup>' rule this flag it set on the group kif, but must also be set on all members. pfctl does this when the rules are set, but if groups are added afterwards we must also apply the flags to the new member. If not, new group members will not be skipped until the rules are reloaded. Reported by: dvl@ Reviewed by: glebius@ Differential Revision: https://reviews.freebsd.org/D23254	2020-01-23 22:13:41 +00:00
Kristof Provost	ef1bd1e517	pfsync: Ensure we enter network epoch before calling ip_output As of r356974 calls to ip_output() require us to be in the network epoch. That wasn't the case for the calls done from pfsyncintr() and pfsync_defer_tmo().	2020-01-22 21:01:19 +00:00
Kristof Provost	cca2ea64e9	pf: Make request_maxcount runtime adjustable There's no reason for this to be a tunable. It's perfectly safe to change this at runtime. Reviewed by: Lutz Donnerhacke Differential Revision: https://reviews.freebsd.org/D22737	2019-12-14 02:06:07 +00:00
Kristof Provost	492f3a312a	pf: Add endline to all DPFPRINTF() DPFPRINTF() doesn't automatically add an endline, so be consistent and always add it.	2019-11-24 13:53:36 +00:00
Kristof Provost	a0d571cbef	pf: Must be in NET_EPOCH to call icmp_error icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH. Failure to hold it leads to the following panic (with INVARIANTS): panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742 cpuid = 2 time = 1571233273 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920 vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980 panic() at panic+0x43/frame 0xfffffe00e09779e0 icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0 icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10 pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50 ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0 fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0 Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(), but ip_output() will soon be converted to a function that requires epoch, so entering NET_EPOCH directly from pf_intr() makes more sense. Discussed with: glebius@	2019-10-18 03:36:26 +00:00
Mark Johnston	bff630d1dc	Fix the build after r353458. MFC with: r353458 Sponsored by: The FreeBSD Foundation	2019-10-13 00:08:17 +00:00
Mark Johnston	6cc9ab8610	Add a missing include of opt_sctp.h. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-10-12 23:01:16 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Ed Maste	c54ee572e5	pf: zero (another) output buffer in pfioctl Avoid potential structure padding leak. r350294 identified a leak via static analysis; although there's no report of a leak with the DIOCGETSRCNODES ioctl it's a good practice to zero the memory. Suggested by: kp MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-31 16:58:09 +00:00
Kristof Provost	f287767d4f	pf: Remove partial RFC2675 support Remove our (very partial) support for RFC2675 Jumbograms. They're not used, not actually supported and not a good idea. Reviewed by: thj@ Differential Revision: https://reviews.freebsd.org/D21086	2019-07-29 13:21:31 +00:00
Ed Maste	532bc58628	pf: zero output buffer in pfioctl Avoid potential structure padding leak. Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: kp MFC after: 3 days Security: Potential kernel memory disclosure Sponsored by: The FreeBSD Foundation	2019-07-24 16:51:14 +00:00
Hans Petter Selasky	59854ecf55	Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m \| grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:54:41 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Li-Wen Hsu	d086d41363	Remove an uneeded indentation introduced in r223637 to silence gcc warnging MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-05-25 23:58:09 +00:00
Kristof Provost	1c75b9d2cd	pf: No need to M_NOWAIT in DIOCRSETTFLAGS Now that we don't hold a lock during DIOCRSETTFLAGS memory allocation we can use M_WAITOK. MFC after: 1 week Event: Aberdeen hackathon 2019 Pointed out by: glebius@	2019-04-18 11:37:44 +00:00
Kristof Provost	f5e0d9fcb4	pf: Fix panic on invalid DIOCRSETTFLAGS If during DIOCRSETTFLAGS pfrio_buffer is NULL copyin() will fault, which we're not allowed to do with a lock held. We must count the number of entries in the table and release the lock during copyin(). Only then can we re-acquire the lock. Note that this is safe, because pfr_set_tflags() will check if the table and entries exist. This was discovered by a local syzcaller instance. MFC after: 1 week Event: Aberdeen hackathon 2019	2019-04-17 16:42:54 +00:00
Rodney W. Grimes	6c1c6ae537	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317	2019-04-04 19:01:13 +00:00
Conrad Meyer	a8a16c7128	Replace read_random(9) with more appropriate arc4rand(9) KPIs Reviewed by: ae, delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19760	2019-04-04 01:02:50 +00:00
Ed Maste	a342f5772f	pf: use UID_ROOT and GID_WHEEL named constants in make_dev No functional change but improves consistency and greppability of make_dev calls. Discussed with: kp	2019-03-26 21:20:42 +00:00
Kristof Provost	64af73aade	pf: Ensure that IP addresses match in ICMP error packets States in pf(4) let ICMP and ICMP6 packets pass if they have a packet in their payload that matches an exiting connection. It was not checked whether the outer ICMP packet has the same destination IP as the source IP of the inner protocol packet. Enforce that these addresses match, to prevent ICMP packets that do not make sense. Reported by: Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv Obtained from: OpenBSD Security: CVE-2019-5598	2019-03-21 08:09:52 +00:00
Kristof Provost	812483c46e	pf: Rename pfsync bucket lock Previously the main pfsync lock and the bucket locks shared the same name. This lead to spurious warnings from WITNESS like this: acquiring duplicate lock of same type: "pfsync" 1st pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1402 2nd pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1429 It's perfectly okay to grab both the main pfsync lock and a bucket lock at the same time. We don't need different names for each bucket lock, because we should always only acquire a single one of those at a time. MFC after: 1 week	2019-03-16 10:14:03 +00:00
Kristof Provost	5904868691	pf :Use counter(9) in pf tables. The counters of pf tables are updated outside the rule lock. That means state updates might overwrite each other. Furthermore allocation and freeing of counters happens outside the lock as well. Use counter(9) for the counters, and always allocate the counter table element, so that the race condition cannot happen any more. PR: 230619 Submitted by: Kajetan Staszkiewicz <vegeta@tuxpowered.net> Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19558	2019-03-15 11:08:44 +00:00
Gleb Smirnoff	1830dae3d3	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
Kristof Provost	f8e7fe32a4	pf: Fix DIOCGETSRCNODES r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the number of source tracking nodes. This meant that we never copied the information to userspace, leading to '? -> ?' output from pfctl. PR: 236368 MFC after: 1 week	2019-03-08 09:33:16 +00:00
Kristof Provost	6f4909de5f	pf: IPv6 fragments with malformed extension headers could be erroneously passed by pf or cause a panic We mistakenly used the extoff value from the last packet to patch the next_header field. If a malicious host sends a chain of fragmented packets where the first packet and the final packet have different lengths or number of extension headers we'd patch the next_header at the wrong offset. This can potentially lead to panics or rule bypasses. Security: CVE-2019-5597 Obtained from: OpenBSD Reported by: Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv	2019-03-01 07:37:45 +00:00
Kristof Provost	22c58991e3	pf: Small performance tweak Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. From the man page: The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states. This handling of adaptive timeouts is done in pf_state_expires(). The calculation needs three values: start, end and states. 1. Normal rules "pass .." without adaptive setting meaning "start = 0" runs in the else-section and therefore takes "start" and "end" from the global default settings and sets "states" to pf_status.states (= total number of states). 2. Special rules like "pass .. keep state (adaptive.start 500 adaptive.end 1000)" have start != 0, run in the if-section and take "start" and "end" from the rule and set "states" to the number of states created by their rule using counter_u64_fetch(). Thats all ok, but there is a third case without special handling in the above code snippet: 3. All "rdr/nat pass .." statements use together the pf_default_rule. Therefore we have "start != 0" in this case and we run the if-section but we better should run the else-section in this case and do not fetch the counter of the pf_default_rule but take the total number of states. Submitted by: Andreas Longwitz <longwitz@incore.de> MFC after: 2 weeks	2019-02-24 17:23:55 +00:00
Patrick Kelsey	d178fee632	Place pf_altq_get_nth_active() under the ALTQ ifdef MFC after: 1 week	2019-02-11 05:39:38 +00:00
Patrick Kelsey	8f2ac65690	Reduce the time it takes the kernel to install a new PF config containing a large number of queues In general, the time savings come from separating the active and inactive queues lists into separate interface and non-interface queue lists, and changing the rule and queue tag management from list-based to hash-bashed. In HFSC, a linear scan of the class table during each queue destroy was also eliminated. There are now two new tunables to control the hash size used for each tag set (default for each is 128): net.pf.queue_tag_hashsize net.pf.rule_tag_hashsize Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D19131	2019-02-11 05:17:31 +00:00
Gleb Smirnoff	d38ca3297c	Return PFIL_CONSUMED if packet was consumed. While here gather all the identical endings of pf_check_*() into single function. PR: 235411	2019-02-02 05:49:05 +00:00
Gleb Smirnoff	b252313f0b	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
Patrick Kelsey	59099cd385	Don't re-evaluate ALTQ kernel configuration due to events on non-ALTQ interfaces Re-evaluating the ALTQ kernel configuration can be expensive, particularly when there are a large number (hundreds or thousands) of queues, and is wholly unnecessary in response to events on interfaces that do not support ALTQ as such interfaces cannot be part of an ALTQ configuration. Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18918	2019-01-28 20:26:09 +00:00
Kristof Provost	d9d146e67b	pf: Fix use-after-free of counters When cleaning up a vnet we free the counters in V_pf_default_rule and V_pf_status from shutdown_pf(), but we can still use them later, for example through pf_purge_expired_src_nodes(). Free them as the very last operation, as they rely on nothing else themselves. PR: 235097 MFC after: 1 week	2019-01-25 01:06:06 +00:00
Kristof Provost	180b0dcbbb	pf: Validate psn_len in DIOCGETSRCNODES psn_len is controlled by user space, but we allocated memory based on it. Check how much memory we might need at most (i.e. how many source nodes we have) and limit the allocation to that. Reported by: markj MFC after: 1 week	2019-01-22 02:13:33 +00:00
Kristof Provost	6a8ee0f715	pf: fix pfsync breaking carp Fix missing initialisation of sc_flags into a valid sync state on clone which breaks carp in pfsync. This regression was introduce by r342051. PR: 235005 Submitted by: smh@FreeBSD.org Pointy hat to: kp MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18882	2019-01-18 08:19:54 +00:00
Kristof Provost	032dff662c	pf: silence a runtime warning Sometimes, for negated tables, pf can log 'pfr_update_stats: assertion failed'. This warning does not clarify anything for users, so silence it, just as OpenBSD has. PR: 234874 MFC after: 1 week	2019-01-15 08:59:51 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Kristof Provost	336683f24f	pf: Fix endless loop on NAT exhaustion with sticky-address When we try to find a source port in pf_get_sport() it's possible that all available source ports will be in use. In that case we call pf_map_addr() to try to find a new source IP to try from. If there are no more available source IPs pf_map_addr() will return 1 and we stop trying. However, if sticky-address is set we'll always return the same IP address, even if we've already tried that one. We need to check the supplied address, because if that's the one we'd set it means pf_get_sport() has already tried it, and we should error out rather than keep trying. PR: 233867 MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18483	2018-12-12 20:15:06 +00:00
Kristof Provost	5b551954ab	pf: Prevent integer overflow in PF when calculating the adaptive timeout. Mainly states of established TCP connections would be affected resulting in immediate state removal once the number of states is bigger than adaptive.start. Disabling adaptive timeouts is a workaround to avoid this bug. Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr) Reported by: Andreas Longwitz <longwitz AT incore.de> Obtained from: OpenBSD MFC after: 2 weeks	2018-12-11 21:44:39 +00:00
Kristof Provost	4fc65bcbe3	pfsync: Performance improvement pfsync code is called for every new state, state update and state deletion in pf. While pf itself can operate on multiple states at the same time (on different cores, assuming the states hash to a different hashrow), pfsync only had a single lock. This greatly reduced throughput on multicore systems. Address this by splitting the pfsync queues into buckets, based on the state id. This ensures that updates for a given connection always end up in the same bucket, which allows pfsync to still collapse multiple updates into one, while allowing multiple cores to proceed at the same time. The number of buckets is tunable, but defaults to 2 x number of cpus. Benchmarking has shown improvement, depending on hardware and setup, from ~30% to ~100%. MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D18373	2018-12-06 19:27:15 +00:00
Kristof Provost	2b0a4ffadb	pf: add a comment describing why do we call pf_map_addr again if port selection process fails Obtained from: OpenBSD	2018-12-06 18:58:54 +00:00
Kristof Provost	b2e0b24f76	pf: Fix panic on overlapping interface names In rare situations[] it's possible for two different interfaces to have the same name. This confuses pf, because kifs are indexed by name (which is assumed to be unique). As a result we can end up trying to if_rele(NULL), which panics. Explicitly checking the ifp pointer before if_rele() prevents the panic. Note pf will likely behave in unexpected ways on the the overlapping interfaces. [] Insert an interface in a vnet jail. Rename it to an interface which exists on the host. Remove the jail. There are now two interfaces with the same name in the host.	2018-12-01 09:58:21 +00:00
Kristof Provost	87e4ca37d5	pf: Prevent tables referenced by rules in anchors from getting disabled. PR: 183198 Obtained from: OpenBSD MFC after: 2 weeks	2018-11-08 21:54:40 +00:00
Kristof Provost	58ef854f8b	pf: Fix build if INVARIANTS is not set r340061 included a number of assertions pf_frent_remove(), but these assertions were the only use of the 'prev' variable. As a result builds without INVARIANTS had an unused variable, and failed. Reported by: vangyzen@	2018-11-02 19:23:50 +00:00
Kristof Provost	14624ab582	pf: Keep a reference to struct ifnets we're using Ensure that the struct ifnet we use can't go away until we're done with it.	2018-11-02 17:05:40 +00:00
Kristof Provost	dde6e1fecb	pfsync: Add missing unlock If we fail to set up the multicast entry for pfsync and return an error we must release the pfsync lock first. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17506	2018-11-02 17:03:53 +00:00
Kristof Provost	04fe85f068	pfsync: Allow module to be unloaded MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17505	2018-11-02 17:01:18 +00:00
Kristof Provost	fbbf436d56	pfsync: Handle syncdev going away If the syncdev is removed we no longer need to clean up the multicast entry we've got set up for that device. Pass the ifnet detach event through pf to pfsync, and remove our multicast handle, and mark us as no longer having a syncdev. Note that this callback is always installed, even if the pfsync interface is disabled (and thus it's not a per-vnet callback pointer). MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17502	2018-11-02 16:57:23 +00:00
Kristof Provost	26549dfcad	pfsync: Ensure uninit is done before pf pfsync touches pf memory (for pf_state and the pfsync callback pointers), not the other way around. We need to ensure that pfsync is torn down before pf. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17501	2018-11-02 16:53:15 +00:00
Kristof Provost	5f6cf24e2d	pfsync: Make pfsync callbacks per-vnet The callbacks are installed and removed depending on the state of the pfsync device, which is per-vnet. The callbacks must also be per-vnet. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17499	2018-11-02 16:47:07 +00:00
Kristof Provost	790194cd47	pf: Limit the fragment entry queue length to 64 per bucket. So we have a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17734	2018-11-02 15:32:04 +00:00
Kristof Provost	fd2ea405e6	pf: Split the fragment reassembly queue into smaller parts Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17733	2018-11-02 15:26:51 +00:00
Kristof Provost	2b1c354ee6	pf: Count holes rather than fragments for reassembly Avoid traversing the list of fragment entris to check whether the pf(4) reassembly is complete. Instead count the holes that are created when inserting a fragment. If there are no holes left, the fragments are continuous. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17732	2018-11-02 15:23:57 +00:00
Kristof Provost	19a22ae313	Revert "pf: Limit the maximum number of fragments per packet" This reverts commit r337969. We'll handle this the OpenBSD way, in upcoming commits.	2018-11-02 15:01:59 +00:00
Kristof Provost	99eb00558a	pf: Make ':0' ignore link-local v6 addresses too When users mark an interface to not use aliases they likely also don't want to use the link-local v6 address there. PR: 201695 Submitted by: Russell Yount <Russell.Yount AT gmail.com> Differential Revision: https://reviews.freebsd.org/D17633	2018-10-28 05:32:50 +00:00
Kristof Provost	13d640d376	pf: Fix copy/paste error in IPv6 address rewriting We checked the destination address, but replaced the source address. This was fixed in OpenBSD as part of their NAT rework, which we don't want to import right now. CID: 1009561 MFC after: 3 weeks	2018-10-24 00:19:44 +00:00
Kristof Provost	73c9014569	pf: ifp can never be NULL in pfi_ifaddr_event() There's no point in the NULL check for ifp, because we'll already have dereferenced it by then. Moreover, the event will always have a valid ifp. Replace the late check with an early assertion. CID: 1357338	2018-10-23 23:15:44 +00:00
Kristof Provost	1563a27e1f	pf synproxy will do the 3WHS on behalf of the target machine, and once the 3WHS is completed, establish the backend connection. The trigger for "3WHS completed" is the reception of the first ACK. However, we should not proceed if that ACK also has RST or FIN set. PR: 197484 Obtained from: OpenBSD MFC after: 2 weeks	2018-10-20 18:37:21 +00:00
John-Mark Gurney	032d3aaa96	Significantly improve pf purge cpu usage by only taking locks when there is work to do. This reduces CPU consumption to one third on systems. This will help keep the thread CPU usage under control now that the default hash size has increased. Reviewed by: kp Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17097	2018-09-16 00:44:23 +00:00
Patrick Kelsey	249cc75fd1	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
Kristof Provost	d47023236c	pf: Limit the maximum number of fragments per packet Similar to the network stack issue fixed in r337782 pf did not limit the number of fragments per packet, which could be exploited to generate high CPU loads with a crafted series of packets. Limit each packet to no more than 64 fragments. This should be sufficient on typical networks to allow maximum-sized IP frames. This addresses the issue for both IPv4 and IPv6. MFC after: 3 days Security: CVE-2018-5391 Sponsored by: Klara Systems	2018-08-17 15:00:10 +00:00
Kristof Provost	e9ddca4a40	pf: Take the IF_ADDR_RLOCK() when iterating over the group list We did do this elsewhere in pf, but the lock was missing here. Sponsored by: Essen Hackathon	2018-08-11 16:37:55 +00:00
Kristof Provost	33b242b533	pf: Fix 'set skip on' for groups The pfi_skip_if() function sometimes caused skipping of groups to work, if the members of the group used the groupname as a name prefix. This is often the case, e.g. group lo usually contains lo0, lo1, ..., but not always. Rather than relying on the name explicitly check for group memberships. Obtained from: OpenBSD (pf_if.c,v 1.62, pf_if.c,v 1.63) Sponsored by: Essen Hackathon	2018-08-11 16:34:30 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Kristof Provost	32ece669c2	pf: Fix synproxy Synproxy was accidentally broken by r335569. The 'return (action)' must be executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP error) should only be sent if the packet was dropped (i.e. PF_DROP) and the return flag is set. PR: 229477 Submitted by: Andre Albsmeier <mail AT fbsd.e4m.org> MFC after: 1 week	2018-07-14 10:14:59 +00:00
Kristof Provost	3e603d1ffa	pf: Fix panic on vnet jail shutdown with synproxy When shutting down a vnet jail pf_shutdown() clears the remaining states, which through pf_clear_states() calls pf_unlink_state(). For synproxy states pf_unlink_state() will send a TCP RST, which eventually tries to schedule the pf swi in pf_send(). This means we can't remove the software interrupt until after pf_shutdown(). MFC after: 1 week	2018-07-14 09:11:32 +00:00
Will Andrews	cc535c95ca	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Kristof Provost	de210decd1	pfsync: Fix state sync during initial bulk update States learned via pfsync from a peer with the same ruleset checksum were not getting assigned to rules like they should because pfsync_in_upd() wasn't passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import. PR: 229092 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Obtained from: OpenBSD MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-30 12:51:08 +00:00
Kristof Provost	150182e309	pf: Support "return" statements in passing rules when they fail. Normally pf rules are expected to do one of two things: pass the traffic or block it. Blocking can be silent - "drop", or loud - "return", "return-rst", "return-icmp". Yet there is a 3rd category of traffic passing through pf: Packets matching a "pass" rule but when applying the rule fails. This happens when redirection table is empty or when src node or state creation fails. Such rules always fail silently without notifying the sender. Allow users to configure this behaviour too, so that pf returns an error packet in these cases. PR: 226850 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-22 21:59:30 +00:00
Kristof Provost	0b799353d8	pf: Fix deadlock with route-to If a locally generated packet is routed (with route-to/reply-to/dup-to) out of a different interface it's passed through the firewall again. This meant we lost the inp pointer and if we required the pointer (e.g. for user ID matching) we'd deadlock trying to acquire an inp lock we've already got. Pass the inp pointer along with pf_route()/pf_route6(). PR: 228782 MFC after: 1 week	2018-06-09 14:17:06 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00

1 2 3 4 5 ...

380 Commits