freebsd-dev

Author	SHA1	Message	Date
Andrey V. Elsukov	f7c4fdee1a	Add "record-state", "set-limit" and "defer-action" rule options to ipfw. "record-state" is similar to "keep-state", but it doesn't produce implicit O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the same feature as "record-state", it is single opcode without implicit O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic states. When rule with this opcode is matched, the rule's action will not be executed, instead dynamic state will be created. And when this state will be matched by "check-state", then rule action will be executed. This allows create a more complicated rulesets. Submitted by: lev MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D1776	2018-07-09 11:35:18 +00:00
Andrew Turner	2bf9501287	Create a new macro for static DPCPU data. On arm64 (and possible other architectures) we are unable to use static DPCPU data in kernel modules. This is because the compiler will generate PC-relative accesses, however the runtime-linker expects to be able to relocate these. In preparation to fix this create two macros depending on if the data is global or static. Reviewed by: bz, emaste, markj Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D16140	2018-07-05 17:13:37 +00:00
Will Andrews	cc535c95ca	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Kristof Provost	de210decd1	pfsync: Fix state sync during initial bulk update States learned via pfsync from a peer with the same ruleset checksum were not getting assigned to rules like they should because pfsync_in_upd() wasn't passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import. PR: 229092 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Obtained from: OpenBSD MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-30 12:51:08 +00:00
Kristof Provost	150182e309	pf: Support "return" statements in passing rules when they fail. Normally pf rules are expected to do one of two things: pass the traffic or block it. Blocking can be silent - "drop", or loud - "return", "return-rst", "return-icmp". Yet there is a 3rd category of traffic passing through pf: Packets matching a "pass" rule but when applying the rule fails. This happens when redirection table is empty or when src node or state creation fails. Such rules always fail silently without notifying the sender. Allow users to configure this behaviour too, so that pf returns an error packet in these cases. PR: 226850 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-22 21:59:30 +00:00
Andrey V. Elsukov	20efcfc602	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
Kristof Provost	0b799353d8	pf: Fix deadlock with route-to If a locally generated packet is routed (with route-to/reply-to/dup-to) out of a different interface it's passed through the firewall again. This meant we lost the inp pointer and if we required the pointer (e.g. for user ID matching) we'd deadlock trying to acquire an inp lock we've already got. Pass the inp pointer along with pf_route()/pf_route6(). PR: 228782 MFC after: 1 week	2018-06-09 14:17:06 +00:00
Mateusz Guzik	4e180881ae	uma: implement provisional api for per-cpu zones Per-cpu zone allocations are very rarely done compared to regular zones. The intent is to avoid pessimizing the latter case with per-cpu specific code. In particular contrary to the claim in r334824, M_ZERO is sometimes being used for such zones. But the zeroing method is completely different and braching on it in the fast path for regular zones is a waste of time.	2018-06-08 21:40:03 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Andrey V. Elsukov	67ad3c0bf9	Restore the ability to keep states after parent rule deletion. This feature is disabled by default and was removed when dynamic states implementation changed to be lockless. Now it is reimplemented with small differences - when dyn_keep_states sysctl variable is enabled, dyn_match_ipv[46]_state() function doesn't match child states of deleted rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state() function does check that state was not orphaned, and if so, it returns pointer to default_rule and its position in the rules map. The main visible difference is that orphaned states still have the same rule number that they have before parent rule deleted, because now a state has many fields related to rule and changing them all atomically to point to default_rule seems hard enough. Reported by: <lantw44 at gmail.com> MFC after: 2 days	2018-05-22 13:28:05 +00:00
Andrey V. Elsukov	4bb8a5b0c9	Remove check for matching the rulenum, ruleid and rule pointer from dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not ready to be committed code, and they are there by accident. Due to the race these checks can lead to creating of duplicate states when concurrent threads in the same time will try to add state for two packets of the same flow, but in reverse directions and matched by different parent rules. Reported by: lev MFC after: 3 days	2018-05-21 16:19:00 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Andrey V. Elsukov	782360dec3	Bring in some last changes in NAT64 implementation: o Modify ipfw(8) to be able set any prefix6 not just Well-Known, and also show configured prefix6; o relocate some definitions and macros into proper place; o convert nat64_debug and nat64_allow_private variables to be VNET-compatible; o add struct nat64_config that keeps generic configuration needed to NAT64 code; o add nat64_check_prefix6() function to check validness of specified by user IPv6 prefix according to RFC6052; o use nat64_check_private_ip4() and nat64_embed_ip4() functions instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows to use any configured IPv6 prefixes that are allowed by RFC6052; o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is Well-Known IPv6 prefix. It is used to reduce overhead to check this; o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config structure. And respectivelly modify the rest of code; o remove now unused ro argument from nat64_output() function; o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions; o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile as example. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2018-05-09 11:59:24 +00:00
Sean Bruno	2695c9c109	Retire ixgb(4) This driver was for an early and uncommon legacy PCI 10GbE for a single ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family. Submitted by: kbowling Reviewed by: brooks imp jeffrey.e.pieper@intel.com Relnotes: yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15234	2018-05-02 15:59:15 +00:00
Andrey V. Elsukov	5f69d0a4ff	To avoid possible deadlock do not acquire JQUEUE_LOCK before callout_drain. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-13 10:03:30 +00:00
Andrey V. Elsukov	2d8fcffb99	Fix integer types mismatch for flags field in nat64stl_cfg structure. Also preserve internal flags on NAT64STL reconfiguration. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-12 21:29:40 +00:00
Andrey V. Elsukov	eed302572a	Use cfg->nomatch_verdict as return value from NAT64LSN handler when given mbuf is considered as not matched. If mbuf was consumed or freed during handling, we must return IP_FW_DENY, since ipfw's pfil handler ipfw_check_packet() expects IP_FW_DENY when mbuf pointer is NULL. This fixes KASSERT panics when NAT64 is used with INVARIANTS. Also remove unused nomatch_final field from struct nat64lsn_cfg. Reported by: Justin Holcomb <justin at justinholcomb dot me> Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-12 21:13:30 +00:00
Andrey V. Elsukov	c570565f12	Migrate NAT64 to FIB KPI. Obtained from: Yandex LLC MFC after: 1 week	2018-04-12 21:05:20 +00:00
Kristof Provost	c41420d5dc	pf: limit ioctl to a reasonable and tuneable number of elements pf ioctls frequently take a variable number of elements as argument. This can potentially allow users to request very large allocations. These will fail, but even a failing M_NOWAIT might tie up resources and result in concurrent M_WAITOK allocations entering vm_wait and inducing reclamation of caches. Limit these ioctls to what should be a reasonable value, but allow users to tune it should they need to. Differential Revision: https://reviews.freebsd.org/D15018	2018-04-11 11:43:12 +00:00
Oleg Bulyzhin	3995ad1768	Fix ipfw table creation when net.inet.ip.fw.tables_sets = 0 and non zero set specified on table creation. This fixes following: # sysctl net.inet.ip.fw.tables_sets net.inet.ip.fw.tables_sets: 0 # ipfw table all info # ipfw set 1 table 1 create type addr # ipfw set 1 table 1 create type addr # ipfw add 10 set 1 count ip from table$1$ to any 00010 count ip from table(1) to any # ipfw add 10 set 1 count ip from table$1$ to any 00010 count ip from table(1) to any # ipfw table all info --- table(1), set(1) --- kindex: 4, type: addr references: 1, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 3, type: addr references: 1, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 2, type: addr references: 0, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 1, type: addr references: 0, valtype: legacy algorithm: addr:radix items: 0, size: 296 # MFC after: 1 week	2018-04-11 11:12:20 +00:00
Kristof Provost	1a125a2f7f	pf: Improve ioctl validation Ensure that multiplications for memory allocations cannot overflow, and that we'll not try to allocate M_WAITOK for potentially overly large allocations. MFC after: 1 week	2018-04-06 19:36:35 +00:00
Kristof Provost	02214ac854	pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. There's no obvious limit to the request size for these, so we limit the requests to something which won't overflow. Change the memory allocation to M_NOWAIT so excessive requests will fail rather than stall forever. MFC after: 1 week	2018-04-06 19:20:45 +00:00
Kristof Provost	adfe2f6aff	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
Kristof Provost	8748b499c1	pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of tables at a time, and as such try to allocate <number of tables> * sizeof(struct pfr_table). This multiplication can overflow. Thanks to mallocarray() this is not exploitable, but an overflow does panic the system. Arbitrarily limit this to 65535 tables. pfctl only ever processes one table at a time, so it presents no issues there. MFC after: 1 week	2018-04-06 15:01:45 +00:00
Brooks Davis	541d96aaaf	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
Kristof Provost	effaab8861	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
Kristof Provost	b4b8fa3387	pf: Fix memory leak in DIOCRADDTABLES If a user attempts to add two tables with the same name the duplicate table will not be added, but we forgot to free the duplicate table, leaking memory. Ensure we free the duplicate table in the error path. Reported by: Coverity CID: 1382111 MFC after: 3 weeks	2018-03-19 21:13:25 +00:00
Andrey V. Elsukov	12c080e613	Do not try to reassemble IPv6 fragments in "reass" rule. ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets that it gets. Until proper IPv6 fragments handling function will be implemented, pass IPv6 packets to next rule. PR: 170604 MFC after: 1 week	2018-03-12 09:40:46 +00:00
Kristof Provost	bf56a3fe47	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
Andrey V. Elsukov	99493f5a4a	Remove duplicate #include <netinet/ip_var.h>.	2018-02-07 19:12:05 +00:00
Andrey V. Elsukov	b99a682320	Rework ipfw dynamic states implementation to be lockless on fast path. o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and for dynamic states implementation information; o added DYN_LOOKUP_NEEDED() macro that can be used to determine the need of new lookup of dynamic states; o ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only. o IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state; o IPv6 scope zones support is added; o ipfw(4) now depends from Concurrency Kit; o states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications. o the "expired" SLIST field is used for states expiring. o struct dyn_data is used to keep generic information for both IPv4 and IPv6; o struct dyn_parent is used to keep O_LIMIT_PARENT information; o IPv4 and IPv6 states are stored in different hash tables; o O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states; o per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads; o mutexes to protect modification of lists in hash tables now kept in separate arrays. 65535 limit to maximum number of hash buckets now removed. o Separate lookup and install functions added for IPv4 and IPv6 states and for parent states. o By default now is used Jenkinks hash function. Obtained from: Yandex LLC MFC after: 42 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D12685	2018-02-07 18:59:54 +00:00
Kristof Provost	c201b5644d	pf: Avoid warning without INVARIANTS When INVARIANTS is not set the 'last' variable is not used, which can generate compiler warnings. If this invariant is ever violated it'd result in a KASSERT failure in refcount_release(), so this one is not strictly required.	2018-02-01 07:52:06 +00:00
Andrey V. Elsukov	14a6bab1da	When IPv6 packet is handled by O_REJECT opcode, convert ICMP code specified in the arg1 into ICMPv6 destination unreachable code according to RFC7915. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2018-01-24 12:40:28 +00:00
Kristof Provost	6701c43213	pf: States have at least two references pf_unlink_state() releases a reference to the state without checking if this is the last reference. It can't be, because pf_state_insert() initialises it to two. KASSERT() that this is always the case. CID: 1347140	2018-01-24 04:29:16 +00:00
Pedro F. Giffuni	d821d36419	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
Andrey V. Elsukov	d38344208e	Add UDPLite support to ipfw(4). Now it is possible to use UDPLite's port numbers in rules, create dynamic states for UDPLite packets and see "UDPLite" for matched packets in log. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2018-01-19 12:50:03 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Pedro F. Giffuni	454529cd0b	netpfil/ipfw: Make some use of mallocarray(9). Reviewed by: kp, ae Differential Revision: https://reviews.freebsd.org/D13834	2018-01-11 15:29:29 +00:00
Kristof Provost	6273ba66f2	pf: Avoid integer overflow issues by using mallocarray() iso. malloc() pfioctl() handles several ioctl that takes variable length input, these include: - DIOCRADDTABLES - DIOCRDELTABLES - DIOCRGETTABLES - DIOCRGETTSTATS - DIOCRCLRTSTATS - DIOCRSETTFLAGS All of them take a pfioc_table struct as input from userland. One of its elements (pfrio_size) is used in a buffer length calculation. The calculation contains an integer overflow which if triggered can lead to out of bound reads and writes later on. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>	2018-01-07 13:35:15 +00:00
Kristof Provost	9d671fee3a	pf: Allow the module to be unloaded pf can now be safely unloaded. Most of this code is exercised on vnet jail shutdown. Don't block unloading.	2017-12-31 16:18:13 +00:00
Kristof Provost	5d0020d6d7	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
Pedro F. Giffuni	6e778a7efd	SPDX: license IDs for some ISC-related files.	2017-12-08 15:57:29 +00:00
Pedro F. Giffuni	8820ecc040	SPDX: Fix some cases wrongly attributed to MIT. In the cases of BSD-style license variants without clauses, use 0BSD for the time being in lack of a better description.	2017-11-30 15:10:11 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Michael Tuexen	665c8a2ee5	Add to ipfw support for sending an SCTP packet containing an ABORT chunk. This is similar to the TCP case. where a TCP RST segment can be sent. There is one limitation: When sending an ABORT in response to an incoming packet, it should be tested if there is no ABORT chunk in the received packet. Currently, it is only checked if the first chunk is an ABORT chunk to avoid parsing the whole packet, which could result in a DOS attack. Thanks to Timo Voelker for helping me to test this patch. Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part) Differential Revision: https://reviews.freebsd.org/D13239	2017-11-26 18:19:01 +00:00
Andrey V. Elsukov	1719df1bb4	Modify ipfw's dynamic states KPI. Hide the locking logic used in the dynamic states implementation from generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule() function to have similar names: ipfw_dyn_install_state() and ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the ipfw_dyn_lookup_state() function. Now this function return NULL when there is no state and pointer to the parent rule when state is found. Thus now there is no need to return pointer to dynamic rule, and no need to hold bucket lock for this state. Remove ipfw_dyn_unlock() function. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D11657	2017-11-23 08:02:02 +00:00
Andrey V. Elsukov	9d15540022	Check that address family of state matches address family of packet. If it is not matched avoid comparing other state fields. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 07:05:25 +00:00
Andrey V. Elsukov	30df59d581	Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c. It is not specific for dynamic states function and called also from generic code. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 06:04:57 +00:00
Andrey V. Elsukov	288bf455bb	Rework rule ranges matching. Use comparison rule id with UINT32_MAX to match all rules with the same rule number. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 05:55:53 +00:00
Andrey V. Elsukov	7143bb7626	Add ipfw_add_protected_rule() function that creates rule with 65535 number in the reserved set 31. Use this function to create default rule. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-22 05:49:21 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Andrey V. Elsukov	66f84fabb3	Add comment for accidentally committed unrelated change in r325960. Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects a packet is IPv4. And in case when it is IPv6, it just translates them as IPv4. This leads to corruption and in some cases to panics. In particular a panic can happen when value of ip6_plen modified to something that leads to IP fragmentation, but actual packet length does not match the IP length. Packets that are not IPv4 will be dropped by NAT rule. Reported by: Viktor Dukhovni <freebsd at dukhovni dot org> MFC after: 1 week	2017-11-17 23:25:06 +00:00
Andrey V. Elsukov	e11f0a0c4c	Unconditionally enable support for O_IPSEC opcode. IPsec support can be loaded as kernel module, thus do not depend from kernel option IPSEC and always build O_IPSEC opcode implementation as enabled. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-17 22:40:02 +00:00
Don Lewis	4001fcbe0a	Fix Dummynet AQM packet marking function ecn_mark() and fq_codel / fq_pie schedulers packet classification functions in layer2 (bridge mode). Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie schedulers packet classification functions (fq_codel_classify_flow() and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP) packet. However, this assumption is incorrect if ipfw/dummynet is used to manage layer2 traffic (bridge mode) since mbuf will point at L2 frame. This patch solves this problem by identifying the source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN offset when converting an mbuf pointer to ip pointer if the traffic is from layer2. More specifically, in dummynet packet tagging function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the traffic is from layer2 and set to zero otherwise. Whenever an access to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is used instead of mtod(m, struct ip *) to correctly convert mbuf pointer to ip pointer in both L2 and L3 traffic. Submitted by: lstewart MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12506	2017-10-26 10:11:35 +00:00
Andrey V. Elsukov	5c70ebfa57	Add IPv6 support for O_TCPDATALEN opcode. PR: 222746 MFC after: 1 week	2017-10-24 08:39:05 +00:00
Andrey V. Elsukov	ff0a137952	Fix regression in handling O_FORWARD_IP opcode after r279948. To properly handle 'fwd tablearg,port' opcode, copy sin_port value from sockaddr_in structure stored in the opcode into corresponding hopstore field. PR: 222953 MFC after: 1 week	2017-10-13 11:11:53 +00:00
Michael Tuexen	945906384d	Fix a bug which avoided that rules for matching port numbers for SCTP packets where actually matched. While there, make clean in the man-page that SCTP port numbers are supported in rules. MFC after: 1 month	2017-10-02 18:25:30 +00:00
Andrey V. Elsukov	5df8171da3	Use in_localip() function instead of unlocked access to addresses hash to determine that an address is our local. PR: 220078 MFC after: 1 week	2017-09-20 22:35:28 +00:00
Andrey V. Elsukov	369bc48dc5	Do not acquire IPFW_WLOCK when a named object is created and destroyed. Acquiring of IPFW_WLOCK is requried for cases when we are going to change some data that can be accessed during processing of packets flow. When we create new named object, there are not yet any rules, that references it, thus holding IPFW_UH_WLOCK is enough to safely update needed structures. When we destroy an object, we do this only when its reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK, because noone references it. The another case is when we failed to finish some action and thus we are doing rollback and destroying an object, in this case it is still not referenced by rules and no need to acquire IPFW_WLOCK. This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring. MFC after: 1 week Sponsored by: Yandex LLC	2017-09-20 22:00:06 +00:00
Kristof Provost	7f3ad01804	pf_get_sport(): Prevent possible endless loop when searching for an unused nat port This is an import of Alexander Bluhm's OpenBSD commit r1.60, the first chunk had to be modified because on OpenBSD the 'cut' declaration is located elsewhere. Upstream report by Jingmin Zhou: https://marc.info/?l=openbsd-pf&m=150020133510896&w=2 OpenBSD commit message: Use a 32 bit variable to detect integer overflow when searching for an unused nat port. Prevents a possible endless loop if high port is 65535 or low port is 0. report and analysis Jingmin Zhou; OK sashan@ visa@ Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c PR: 221201 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: OpenBSD via ElectroBSD MFC after: 1 week	2017-08-08 21:09:26 +00:00
Luiz Otavio O Souza	9ffd0f54a7	Fix a couple of typos in a comment. MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate)	2017-07-21 03:04:55 +00:00
Philip Paeps	b0e1660d53	Fix GRE over IPv6 tunnels with IPFW Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless net.inet6.ip6.fw.deny_unknown_exthdrs was unset). PR: 220640 Submitted by: Kun Xie <kxie@xiplink.com> MFC after: 1 week	2017-07-13 09:01:22 +00:00
Kristof Provost	b7ae43552b	pf: Fix vnet purging pf_purge_thread() breaks up the work of iterating all states (in pf_purge_expired_states()) and tracks progress in the idx variable. If multiple vnets exist this results in pf_purge_thread() only calling pf_purge_expired_states() for part of the states (the first part of the first vnet, second part of the second vnet and so on). Combined with the mark-and-sweep approach to cleaning up old rules (in V_pf_unlinked_rules) that resulted in pf freeing rules that were still referenced by states. This in turn caused panics when pf_state_expires() encounters that state and attempts to access the rule. We need to track the progress per vnet, not globally, so idx is moved into a per-vnet V_pf_purge_idx. PR: 219251 Sponsored by: Hackathon Essen 2017	2017-07-09 17:56:39 +00:00
Andrey V. Elsukov	785c0d4d97	Fix IPv6 extension header parsing. The length field doesn't include the first 8 octets. Obtained from: Yandex LLC MFC after: 3 days	2017-06-29 19:06:43 +00:00
Don Lewis	d196c9ee16	Fix the queue delay estimation in PIE/FQ-PIE when the timestamp (TS) method is used. When packet timestamp is used, the "current_qdelay" keeps storing the last queue delay value calculated in the dequeue function. Therefore, when a burst of packets arrives followed by a pause, the "current_qdelay" will store a high value caused by the burst and stick to that value during the pause because the queue delay measurement is done inside the dequeue function. This causes the drop probability calculation function to calculate high drop probability value instead of zero and prevents the burst allowance mechanism from working properly. Fix this problem by resetting "current_qdelay" inside the drop probability calculation function when the queue length is zero and TS option is used. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-05-19 08:38:03 +00:00
Don Lewis	36fb8be630	The result of right shifting a negative signed value is implementation defined. On machines without arithmetic shift instructions, zero bits may be shifted in from the left, giving a large positive result instead of the desired divide-by power-of-2. Fix this by operating on the absolute value and compensating for the possible negation later. Reverse the order of the underflow/overflow tests and the exponential decay calculation to avoid the possibility of an erroneous overflow detection if p is a sufficiently small non-negative value. Also check for negative values of prob before doing the exponential decay to avoid another instance of of right shifting a negative value. Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-05-19 01:23:06 +00:00
Kristof Provost	468cefa22e	pf: Fix vnet initialisation When running the vnet init code (pf_load_vnet()) we used to iterate over all vnets, marking them as unhooked. This is incorrect and leads to panics if pf is unloaded, as the unload code does not unregister the pfil hooks (because the vnet is marked as unhooked). There's no need or reason to touch other vnets during initialisation. Their pf_load_vnet() function will be triggered, which handles all required initialisation. Reviewed by: zec, gnn Differential Revision: https://reviews.freebsd.org/D10592	2017-05-07 14:33:58 +00:00
Kristof Provost	64c79ee733	pf: Fix panic on unload vnet_pf_uninit() is called through vnet_deregister_sysuninit() and linker_file_unload() when the pf module is unloaded. This is executed after pf_unload() so we end up trying to take locks which have been destroyed already. Move pf_unload() to a separate SYSUNINIT() to ensure it's called after all the vnet_pf_uninit() calls. Differential Revision: https://reviews.freebsd.org/D10025	2017-05-03 20:56:54 +00:00
Marko Zec	1e9e374199	Fix VNET leakages in PF by V_irtualizing pfr_ktables and friends. Apparently this resolves a PF-triggered panic when destroying VNET jails. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:34:39 +00:00
Marko Zec	3a36ee404f	Since curvnet is already properly set on entry to event handlers, there's no need to override it, particularly not unconditionally with vnet0. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:30:28 +00:00
Kristof Provost	00eab743ab	pf: Fix possible incorrect IPv6 fragmentation When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days	2017-04-20 09:05:53 +00:00
Kristof Provost	4e261006a1	pf: Also clear limit counters The "pfctl -F info" command didn't clear the limit counters ( as shown in the "pfctl -vsi" output). Submitted by: Max <maximos@als.nnov.ru>	2017-04-18 20:07:21 +00:00
Andrey V. Elsukov	da62ffd9cd	Avoid undefined behavior. The 'pktid' variable is modified while being used twice between sequence points, probably due to htonl() is macro. Reported by: PVS-Studio MFC after: 1 week	2017-04-14 11:58:41 +00:00
Andrey V. Elsukov	ba3e1361b0	Use address of specific union member instead of whole union address to fix PVS-Studio warnings. MFC after: 1 week	2017-04-14 11:41:09 +00:00
Andrey V. Elsukov	1ca7c3b815	The rule field in the ipfw_dyn_rule structure is used as storage to pass rule number and rule set to userland. In r272840 the kernel internal rule representation was changed and the rulenum field of struct ip_fw_rule got the type uint32_t, but userlevel representation still have the type uint16_t. To not overflow the size of pointer on the systems with 32-bit pointer size use separate variable to copy rulenum and set. Reported by: PVS-Studio MFC after: 1 week	2017-04-14 11:19:09 +00:00
Gleb Smirnoff	9f5efe718f	Fix potential NULL deref. Found by: PVS Studio	2017-04-14 01:56:15 +00:00
Maxim Konovalov	f91eb6adad	o Redundant assignments removed. Found by: PVS-Stdio, V519 Reviewed by: ae	2017-04-13 18:13:10 +00:00
Conrad Meyer	bcd8d3b805	dummynet: Use strlcpy to appease static checkers Some dummynet modules used strcpy() to copy from a larger buffer (dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that the lengths of the strings in the dn_aqm buffers were always hardcoded to be smaller than the dn_extra_parms buffer ("CODEL", "PIE"). Use strlcpy() instead, to appease static checkers. No functional change. Reported by: Coverity CIDs: 1356163, 1356165 Sponsored by: Dell EMC Isilon	2017-04-13 17:47:44 +00:00
Andrey V. Elsukov	88d950a650	Remove "IPFW static rules" rmlock. Make PFIL's lock global and use it for this purpose. This reduces the number of locks needed to acquire for each packet. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC No objection from: #network Differential Revision: https://reviews.freebsd.org/D10154	2017-04-03 13:35:04 +00:00
Andrey V. Elsukov	aac74aeac7	Add ipfw_pmod kernel module. The module is designed for modification of a packets of any protocols. For now it implements only TCP MSS modification. It adds the external action handler for "tcp-setmss" action. A rule with tcp-setmss action does additional check for protocol and TCP flags. If SYN flag is present, it parses TCP options and modifies MSS option if its value is greater than configured value in the rule. Then it adjustes TCP checksum if needed. After handling the search continues with the next rule. Obtained from: Yandex LLC MFC after: 2 weeks Relnotes: yes Sponsored by: Yandex LLC No objection from: #network Differential Revision: https://reviews.freebsd.org/D10150	2017-04-03 03:07:48 +00:00
Andrey V. Elsukov	11c56650f0	Add O_EXTERNAL_DATA opcode support. This opcode can be used to attach some data to external action opcode. And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require creating of named instance to pass configuration arguments to external action handler. The data is coming just next to O_EXTERNAL_ACTION opcode. The userlevel part currenly supports formatting for opcode with ipfw_insn size, by default it expects u16 numeric value in the arg1. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:44:40 +00:00
Andrey V. Elsukov	399ad57874	Add the log formatting for an external action opcode. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:26:30 +00:00
Kristof Provost	3601d25181	pf: Fix leak of pf_state_keys If we hit the state limit we returned from pf_create_state() without cleaning up. PR: 217997 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2017-04-01 12:22:34 +00:00
Andrey V. Elsukov	788e62864f	Reset the cached state of last lookup in the dynamic states when an external action is completed, but the rule search is continued. External action handler can change the content of @args argument, that is used for dynamic state lookup. Enforce the new lookup to be able install new state, when the search is continued. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-03-31 09:26:08 +00:00
Kristof Provost	2f8fb3a868	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
Kristof Provost	08ef4ddb0f	pf: Fix rule evaluation after inet6 route-to In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out of a different interface. pf_test6() needs to know that the packet was forwarded (in case it needs to refragment so it knows whether to call ip6_output() or ip6_forward()). This lead pf_test6() to try to evaluate rules against the PF_FWD direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT. Once fwdir is set correctly the correct output/forward function will be called. PR: 217883 Submitted by: Kajetan Staszkiewicz MFC after: 1 week Sponsored by: InnoGames GmbH	2017-03-19 03:06:09 +00:00
Don Lewis	46c8aadb6f	Change several constants used by the PIE algorithm from unsigned to signed. - PIE_MAX_PROB is compared to variable of int64_t and the type promotion rules can cause the value of that variable to be treated as unsigned. If the value is actually negative, then the result of the comparsion is incorrect, causing the algorithm to perform poorly in some situations. Changing the constant to be signed cause the comparision to work correctly. - PIE_SCALE is also compared to signed values. Fortunately they are also compared to zero and negative values are discarded so this is more of a cosmetic fix. - PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small enough that the automatic promotion to unsigned is harmless. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-03-18 23:00:13 +00:00
Kristof Provost	5c172e7059	pf: Fix memory leak on vnet shutdown or unload Rules are unlinked in shutdown_pf(), so we must call pf_unload_vnet_purge(), which frees unlinked rules, after that, not before. Reviewed by: eri, bz Differential Revision: https://reviews.freebsd.org/D10040	2017-03-18 01:37:20 +00:00
Andrey V. Elsukov	3667f39ea3	Use memset with structure size.	2017-03-14 07:57:33 +00:00
Conrad Meyer	49b6a5d60a	nat64lsn: Use memset() with structure, not pointer, size PR: 217738 Submitted by: Svyatoslav <razmyslov at viva64.com> Sponsored by: Viva64 (PVS-Studio)	2017-03-13 17:53:46 +00:00
Kristof Provost	2a57d24bd1	pf: Fix incorrect rw_sleep() in pf_unload() When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep() with it, because it would release a lock we do not hold. There's no need for the lock either, so we can just tsleep(). While here also make the same change in pf_purge_thread(), because it explicitly takes the lock before rw_sleep() and then immediately releases it afterwards.	2017-03-12 05:42:57 +00:00
Kristof Provost	f618201314	pf: Do not lose the VNET lock when ending the purge thread When the pf_purge_thread() exits it must make sure to release the VNET_LIST_RLOCK it still holds. kproc_exit() does not return.	2017-03-12 05:00:04 +00:00
Maxim Konovalov	f621c2cd39	o Typo in the comment fixed. PR: 217617 Submitted by: lutz	2017-03-09 09:54:23 +00:00
Kristof Provost	98a9874f7b	pf: Fix a crash in low-memory situations If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's no more memory for it) it frees skp. This is wrong, because skp is a pf_state_key *, so we need to free skp, as is done later in the function. Getting it wrong means we try to free a stack variable of the calling pf_test_rule() function, and we panic.	2017-03-06 23:41:23 +00:00
Andrey V. Elsukov	53de37f8ca	Fix the build. Use new ipfw_lookup_table() in the nat64 too. Reported by: cy MFC after: 2 weeks	2017-03-06 00:41:59 +00:00
Andrey V. Elsukov	54e5669d8c	Add IPv6 support to O_IP_DST_LOOKUP opcode. o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of ipfw_insn_u32; o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and remove old ipfw_lookup_table(); o use args->f_id.flow_id6 that is in host byte order to get DSCP value; o add SCTP ports support to 'lookup src/dst-port' opcode; o add IPv6 support to 'lookup src/dst-ip' opcode. PR: 217292 Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9873	2017-03-05 23:48:24 +00:00
Andrey V. Elsukov	c750a56914	Reject invalid object types that can not be used with specific opcodes. When we doing reference counting of named objects in the new rule, for existing objects check that opcode references to correct object, otherwise return EINVAL. PR: 217391 MFC after: 1 week Sponsored by: Yandex LLC	2017-03-05 22:19:43 +00:00
Andrey V. Elsukov	43b294a4db	Fix matching table entry value. Use real table value instead of its index in valuestate array. When opcode has size equal to ipfw_insn_u32, this means that it should additionally match value specified in d[0] with table entry value. ipfw_table_lookup() returns table value index, use TARG_VAL() macro to convert it to its value. The actual 32-bit value stored in the tag field of table_value structure, where all unspecified u32 values are kept. PR: 217262 Reviewed by: melifaro MFC after: 1 week Sponsored by: Yandex LLC	2017-03-03 20:22:42 +00:00

1 2 3 4 5 ...

614 Commits