freebsd-skq

Author	SHA1	Message	Date
pkelsey	1231387b19	Reduce the time it takes the kernel to install a new PF config containing a large number of queues In general, the time savings come from separating the active and inactive queues lists into separate interface and non-interface queue lists, and changing the rule and queue tag management from list-based to hash-bashed. In HFSC, a linear scan of the class table during each queue destroy was also eliminated. There are now two new tunables to control the hash size used for each tag set (default for each is 128): net.pf.queue_tag_hashsize net.pf.rule_tag_hashsize Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D19131	2019-02-11 05:17:31 +00:00
kp	0e08a2a107	pfsync: Handle syncdev going away If the syncdev is removed we no longer need to clean up the multicast entry we've got set up for that device. Pass the ifnet detach event through pf to pfsync, and remove our multicast handle, and mark us as no longer having a syncdev. Note that this callback is always installed, even if the pfsync interface is disabled (and thus it's not a per-vnet callback pointer). MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17502	2018-11-02 16:57:23 +00:00
kp	b83dcc801f	pfsync: Make pfsync callbacks per-vnet The callbacks are installed and removed depending on the state of the pfsync device, which is per-vnet. The callbacks must also be per-vnet. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17499	2018-11-02 16:47:07 +00:00
kp	45e82adb5b	pf: Limit the fragment entry queue length to 64 per bucket. So we have a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17734	2018-11-02 15:32:04 +00:00
kp	679cd03c3a	pf: Split the fragment reassembly queue into smaller parts Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17733	2018-11-02 15:26:51 +00:00
pkelsey	2e5630c90a	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
kp	7016cbb5d6	pf: Increase default hash table size Now that we (by default) limit the number of states to 100.000 it makse sense to also adjust the default size of the hash table. Based on the benchmarking results in https://github.com/ocochard/netbenches/blob/master/Atom_C2758_8Cores-Chelsio_T540-CR/pf-states_hashsize/results/fbsd12-head.r332390/README.md 128K entries offers a good compromise between performance and memory use. Users may still overrule this setting with the net.pf.states_hashsize and net.pf.source_nodes_hashsize loader(8) tunables.	2018-08-05 13:54:37 +00:00
kp	21b4f170cf	pf: Fix typo in r336221 Reported by: olivier@	2018-07-12 18:07:28 +00:00
kp	f5bc1a9c7b	pf: Increate default state table size The typical system now has a lot more memory than when pf was new, and is also expected to handle more connections. Increase the default size of the state table. Note that users can overrule this using 'set limit states' in pf.conf. From OpenBSD: The year is 2018. Mercury, Bowie, Cash, Motorola and DEC all left us. Just pf still has a default state table limit of 10000. Had! Now it's a tiny little bit more, 100k. lead guitar: me ok chorus: phessler theo claudio benno background school girl laughing: bob Obtained from: OpenBSD	2018-07-12 16:35:35 +00:00
will	5ce23703c1	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
will	af6017a22f	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
kp	fbe5f2b7e0	pf: Add missing include statement rmlocks require <sys/lock.h> as well as <sys/rmlock.h>. Unbreak mips build.	2018-05-30 12:40:37 +00:00
kp	de7905d658	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
kp	337a0778fc	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
kp	109a7b5eec	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
kp	fc599d4911	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
kp	affaad48ea	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
pfg	78a6b08618	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
kp	6f8b05d841	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
bz	876cb9e018	Update pf(4) and pflog(4) to survive basic VNET testing, which includes proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924	2016-06-23 21:34:38 +00:00
kp	b06d3a64e7	pf: Filter on and set vlan PCP values Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786	2016-06-17 18:21:55 +00:00
glebius	306a6faf84	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
kp	3831c06c1b	pf: Fix compliation warning with gcc While fixing the PF_ANEQ() macro I messed up the parentheses, leading to compliation warnings with gcc. Spotted by: ian Pointy Hat: kp	2015-10-25 18:09:03 +00:00
kp	7d9c2c6af4	PF_ANEQ() macro will in most situations returns TRUE comparing two identical IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32 bits (like for IPv6 packet). Those bits containt some garbage and in result PF_ANEQ() wrongly returns TRUE. Fix: Check if packet is of AF_INET type and if it is then compare only first 32 bits of data. PR: 204005 Submitted by: Miłosz Kaniewski	2015-10-25 13:14:53 +00:00
kp	40bca2754d	pf: Fix TSO issues In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779	2015-10-14 16:21:41 +00:00
kp	2a1a59d8e1	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
gnn	5d97cb9c5e	Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory. Differential Revision: https://reviews.freebsd.org/D2291	2015-04-15 14:46:45 +00:00
glebius	d0d9f03f17	Always lock the hash row of a source node when updating its 'states' counter. PR: 182401 Sponsored by: Nginx, Inc.	2015-03-17 12:19:28 +00:00
glebius	534401756a	- Improve INET/INET6 scope. - style(9) declarations. - Make couple of local functions static.	2015-02-16 23:50:53 +00:00
glebius	16f1b2f354	Toss declarations to fix regular build and NO_INET6 build.	2015-02-16 21:52:28 +00:00
glebius	2b3895f345	Commit a miss from r278843. Pointy hat to: glebius	2015-02-16 18:33:33 +00:00
brd	0ca7a023e1	Fix build. Approved by: gibbs	2015-02-16 18:06:24 +00:00
glebius	3bef17fedf	Missed from r278831.	2015-02-16 06:02:46 +00:00
glebius	12e7b30255	Back out r276841, r276756, r276747, r276746. The change in r276747 is very very questionable, since it makes vimages more dependent on each other. But the reason for the backout is that it screwed up shutting down the pf purge threads, and now kernel immedially panics on pf module unload. Although module unloading isn't an advertised feature of pf, it is very important for development process. I'd like to not backout r276746, since in general it is good. But since it has introduced numerous build breakages, that later were addressed in r276841, r276756, r276747, I need to back it out as well. Better replay it in clean fashion from scratch.	2015-01-22 01:23:16 +00:00
rodrigc	89bede2eff	Reapply previous patch to fix build. PR: 194515	2015-01-06 16:47:02 +00:00
rodrigc	58319f89ed	Merge: r258322 from projects/pf branch Split functions that initialize various pf parts into their vimage parts and global parts. Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). PR: 194515 Differential Revision: D1309 Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com> Reviewed by: trociny, zec, gnn	2015-01-06 08:39:06 +00:00
glebius	7d0b571895	- Count global pf(4) statistics in counter(9). - Do not count global number of states and of src_nodes, use uma_zone_get_cur() to obtain values. - Struct pf_status becomes merely an ioctl API structure, and moves to netpfil/pf/pf.h with its constants. - V_pf_status is now of type struct pf_kstatus. Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-14 18:57:46 +00:00
mm	532d55ab5f	Backport from projects/pf r263908: De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. MFC after: 1 week	2014-04-20 09:17:48 +00:00
mm	c4f653f608	Merge from projects/pf r251993 (glebius@): De-vnet hash sizes and hash masks. Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny MFC after: 1 month	2014-03-25 06:55:53 +00:00
glebius	c23c087e5b	Instead of playing games with casts simply add 3 more members to the structure pf_rule, that are used when the structure is passed via ioctl(). PR: 187074	2014-03-05 00:40:03 +00:00
glebius	1ea1d562a3	Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. Thanks to Dennis for providing me shell access to problematic box and his help with reproducing, debugging and investigating the problem. Thanks to: Dennis Yusupoff <dyr smartspb.net> Also reported by: dumbbell, pgj, Rambler Sponsored by: Nginx, Inc.	2014-02-14 10:05:21 +00:00
glebius	217b478e1f	Revert accidentially leaked changes in r261627.	2014-02-08 09:57:52 +00:00
glebius	02f3acc9c1	Remove never set flag FL_OVERWRITE. The only place where it was checked led to lock/critnest leak.	2014-02-08 09:56:26 +00:00
glebius	c884926273	To support upcoming changes change internal API for source node handling: - Removed pf_remove_src_node(). - Introduce pf_unlink_src_node() and pf_unlink_src_node_locked(). These function do not proceed with freeing of a node, just disconnect it from storage. - New function pf_free_src_nodes() works on a list of previously disconnected nodes and frees them. - Utilize new API in pf_purge_expired_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:16:34 +00:00
glebius	2d853e5460	Add missing 'extern'.	2013-11-22 19:02:22 +00:00
glebius	e352aa585e	Move new pf includes to the pf directory. The pfvar.h remain in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz	2013-10-27 16:25:57 +00:00
glebius	4fe4e9732a	Start splitting pfvar.h into internal and external parts. - Provide pf_altq.h that has only stuff needed for ALTQ. - Start pf.h, that would have all constant values and eventually non-kernel structures. - Build ALTQ w/o pfvar.h, include if_var.h, that before came via pollution. - Build tcpdump w/o pfvar.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:59:58 +00:00
glebius	9062851653	Utilize Jenkins hash with random seed for source nodes storage.	2012-09-20 06:52:05 +00:00
glebius	439d708ae8	Add missing break. Pointy hat to: glebius	2012-09-20 03:09:58 +00:00
glebius	63628d08be	Fix build, pass the pointy hat please.	2012-09-18 12:21:32 +00:00

1 2

53 Commits