freebsd-skq

Author	SHA1	Message	Date
Andrey V. Elsukov	d18c1f26a4	Reapply r345274 with build fixes for 32-bit architectures. Update NAT64LSN implementation: o most of data structures and relations were modified to be able support large number of translation states. Now each supported protocol can use full ports range. Ports groups now are belongs to IPv4 alias addresses, not hosts. Each ports group can keep several states chunks. This is controlled with new `states_chunks` config option. States chunks allow to have several translation states for single alias address and port, but for different destination addresses. o by default all hash tables now use jenkins hash. o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path. o one NAT64LSN instance now can be used to handle several IPv6 prefixes, special prefix "::" value should be used for this purpose when instance is created. o due to modified internal data structures relations, the socket opcode that does states listing was changed. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-19 10:57:03 +00:00
Andrey V. Elsukov	d6369c2d18	Revert r345274. It appears that not all 32-bit architectures have necessary CK primitives.	2019-03-18 14:00:19 +00:00
Andrey V. Elsukov	d7a1cf06f3	Update NAT64LSN implementation: o most of data structures and relations were modified to be able support large number of translation states. Now each supported protocol can use full ports range. Ports groups now are belongs to IPv4 alias addresses, not hosts. Each ports group can keep several states chunks. This is controlled with new `states_chunks` config option. States chunks allow to have several translation states for single alias address and port, but for different destination addresses. o by default all hash tables now use jenkins hash. o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path. o one NAT64LSN instance now can be used to handle several IPv6 prefixes, special prefix "::" value should be used for this purpose when instance is created. o due to modified internal data structures relations, the socket opcode that does states listing was changed. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-18 12:59:08 +00:00
Andrey V. Elsukov	5c04f73e07	Add NAT64 CLAT implementation as defined in RFC6877. CLAT is customer-side translator that algorithmically translates 1:1 private IPv4 addresses to global IPv6 addresses, and vice versa. It is implemented as part of ipfw_nat64 kernel module. When module is loaded or compiled into the kernel, it registers "nat64clat" external action. External action named instance can be created using `create` command and then used in ipfw rules. The create command accepts two IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted, IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used. # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in Obtained from: Yandex LLC Submitted by: Boris N. Lytochkin MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC	2019-03-18 11:44:53 +00:00
Andrey V. Elsukov	002cae78da	Add SPDX-License-Identifier and update year in copyright. MFC after: 1 month	2019-03-18 10:50:32 +00:00
Andrey V. Elsukov	b11efc1eb6	Modify struct nat64_config. Add second IPv6 prefix to generic config structure and rename another fields to conform to RFC6877. Now it contains two prefixes and length: PLAT is provider-side translator that translates N:1 global IPv6 addresses to global IPv4 addresses. CLAT is customer-side translator (XLAT) that algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses. Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn) translators. Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept prefix length and use plat_plen to specify prefix length. Retire net.inet.ip.fw.nat64_allow_private sysctl variable. Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to configure this ability separately for each NAT64 instance. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2019-03-18 10:39:14 +00:00
Kristof Provost	812483c46e	pf: Rename pfsync bucket lock Previously the main pfsync lock and the bucket locks shared the same name. This lead to spurious warnings from WITNESS like this: acquiring duplicate lock of same type: "pfsync" 1st pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1402 2nd pfsync @ /usr/src/sys/netpfil/pf/if_pfsync.c:1429 It's perfectly okay to grab both the main pfsync lock and a bucket lock at the same time. We don't need different names for each bucket lock, because we should always only acquire a single one of those at a time. MFC after: 1 week	2019-03-16 10:14:03 +00:00
Kristof Provost	5904868691	pf :Use counter(9) in pf tables. The counters of pf tables are updated outside the rule lock. That means state updates might overwrite each other. Furthermore allocation and freeing of counters happens outside the lock as well. Use counter(9) for the counters, and always allocate the counter table element, so that the race condition cannot happen any more. PR: 230619 Submitted by: Kajetan Staszkiewicz <vegeta@tuxpowered.net> Reviewed by: glebius MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19558	2019-03-15 11:08:44 +00:00
Gleb Smirnoff	f355cb3e6f	PFIL_MEMPTR for ipfw link level hook With new pfil(9) KPI it is possible to pass a void pointer with length instead of mbuf pointer to a packet filter. Until this commit no filters supported that, so pfil run through a shim function pfil_fake_mbuf(). Now the ipfw(4) hook named "default-link", that is instantiated when net.link.ether.ipfw sysctl is on, supports processing pointer/length packets natively. - ip_fw_args now has union for either mbuf or void , and if flags have non-zero length, then we use the void . - through ipfw_chk() we handle mem/mbuf cases differently. - ether_header goes away from args. It is ipfw_chk() responsibility to do parsing of Ethernet header. - ipfw_log() now uses different bpf APIs to log packets. Although ipfw_chk() is now capable to process pointer/length packets, this commit adds support for the link level hook only, see ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet() can be improved too, but that requires more changes since the hook supports more complex actions: NAT, divert, etc. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D19357	2019-03-14 22:52:16 +00:00
Gleb Smirnoff	dc0fa4f712	Remove 'dir' argument from dummynet_io(). This makes it possible to make dn_dir flags private to dummynet. There is still some room for improvement.	2019-03-14 22:32:50 +00:00
Gleb Smirnoff	b00b7e03fd	Reduce argument list to ipfw_divert(), as args holds the rule ref and the direction. While here make 'tee' a bool.	2019-03-14 22:31:12 +00:00
Gleb Smirnoff	cef9f220cd	Remove 'dir' argument in ng_ipfw_input, since ip_fw_args now has this info. While here make 'tee' boolean.	2019-03-14 22:30:05 +00:00
Gleb Smirnoff	b7795b6746	- Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir" parameter that is often passes together with args. - Rename ip_fw_args.oif to ifp and now it is set to either input or output interface, depending on IPFW_ARGS_IN/OUT bit set.	2019-03-14 22:28:50 +00:00
Gleb Smirnoff	1830dae3d3	Make second argument of ip_divert(), that specifies packet direction a bool. This allows pf(4) to avoid including ipfw(4) private files.	2019-03-14 22:23:09 +00:00
Gleb Smirnoff	2d0232783c	Simplify ipfw_bpf_mtap2(). No functional change.	2019-03-14 22:20:48 +00:00
Andrey V. Elsukov	ca0f03e808	Add IP_FW_NAT64 to codes that ipfw_chk() can return. It will be used by upcoming NAT64 changes. We use separate code to avoid propogating EACCES error code to user level applications when NAT64 consumes a packet. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-03-11 10:42:09 +00:00
Andrey V. Elsukov	d76227959a	Add NULL pointer check to nat64_output(). It is possible, that a processed packet was originated by local host, in this case m->m_pkthdr.rcvif is NULL. Check and set it to V_loif to avoid NULL pointer dereference in IP input code, since it is expected that packet has valid receiving interface when netisr processes it. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-03-11 10:33:32 +00:00
Kristof Provost	f8e7fe32a4	pf: Fix DIOCGETSRCNODES r343295 broke DIOCGETSRCNODES by failing to reset 'nr' after counting the number of source tracking nodes. This meant that we never copied the information to userspace, leading to '? -> ?' output from pfctl. PR: 236368 MFC after: 1 week	2019-03-08 09:33:16 +00:00
Andrey V. Elsukov	83354acf5a	Fix the problem with O_LIMIT states introduced in r344018. dyn_install_state() uses `rule` pointer when it creates state. For O_LIMIT states this pointer actually is not struct ip_fw, it is pointer to O_LIMIT_PARENT state, that keeps actual pointer to ip_fw parent rule. Thus we need to cache rule id and number before calling dyn_get_parent_state(), so we can use them later when the `rule` pointer is overrided. PR: 236292 MFC after: 3 days	2019-03-07 04:40:44 +00:00
Kristof Provost	6f4909de5f	pf: IPv6 fragments with malformed extension headers could be erroneously passed by pf or cause a panic We mistakenly used the extoff value from the last packet to patch the next_header field. If a malicious host sends a chain of fragmented packets where the first packet and the final packet have different lengths or number of extension headers we'd patch the next_header at the wrong offset. This can potentially lead to panics or rule bypasses. Security: CVE-2019-5597 Obtained from: OpenBSD Reported by: Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv	2019-03-01 07:37:45 +00:00
Kristof Provost	22c58991e3	pf: Small performance tweak Because fetching a counter is a rather expansive function we should use counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr pass" rule should not cause more effort than separate "rdr" and "pass" rules. For rules with adaptive timeout values the call of counter_u64_fetch() should be accepted, but otherwise not. From the man page: The adaptive timeout values can be defined both globally and for each rule. When used on a per-rule basis, the values relate to the number of states created by the rule, otherwise to the total number of states. This handling of adaptive timeouts is done in pf_state_expires(). The calculation needs three values: start, end and states. 1. Normal rules "pass .." without adaptive setting meaning "start = 0" runs in the else-section and therefore takes "start" and "end" from the global default settings and sets "states" to pf_status.states (= total number of states). 2. Special rules like "pass .. keep state (adaptive.start 500 adaptive.end 1000)" have start != 0, run in the if-section and take "start" and "end" from the rule and set "states" to the number of states created by their rule using counter_u64_fetch(). Thats all ok, but there is a third case without special handling in the above code snippet: 3. All "rdr/nat pass .." statements use together the pf_default_rule. Therefore we have "start != 0" in this case and we run the if-section but we better should run the else-section in this case and do not fetch the counter of the pf_default_rule but take the total number of states. Submitted by: Andreas Longwitz <longwitz@incore.de> MFC after: 2 weeks	2019-02-24 17:23:55 +00:00
Andrey V. Elsukov	804a6541db	Remove `set' field from state structure and use set from parent rule. Initially it was introduced because parent rule pointer could be freed, and rule's information could become inaccessible. In r341471 this was changed. And now we don't need this information, and also it can become stale. E.g. rule can be moved from one set to another. This can lead to parent's set and state's set will not match. In this case it is possible that static rule will be freed, but dynamic state will not. This can happen when `ipfw delete set N` command is used to delete rules, that were moved to another set. To fix the problem we will use the set number from parent rule. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2019-02-11 18:10:55 +00:00
Patrick Kelsey	d178fee632	Place pf_altq_get_nth_active() under the ALTQ ifdef MFC after: 1 week	2019-02-11 05:39:38 +00:00
Patrick Kelsey	8f2ac65690	Reduce the time it takes the kernel to install a new PF config containing a large number of queues In general, the time savings come from separating the active and inactive queues lists into separate interface and non-interface queue lists, and changing the rule and queue tag management from list-based to hash-bashed. In HFSC, a linear scan of the class table during each queue destroy was also eliminated. There are now two new tunables to control the hash size used for each tag set (default for each is 128): net.pf.queue_tag_hashsize net.pf.rule_tag_hashsize Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D19131	2019-02-11 05:17:31 +00:00
Gleb Smirnoff	d38ca3297c	Return PFIL_CONSUMED if packet was consumed. While here gather all the identical endings of pf_check_*() into single function. PR: 235411	2019-02-02 05:49:05 +00:00
Gleb Smirnoff	2790ca97d9	Fix build without INET6.	2019-02-01 00:33:17 +00:00
Gleb Smirnoff	b252313f0b	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
Gleb Smirnoff	f712b16127	Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock. The pfil(9) system is about to be converted to epoch(9) synchronization, so we need [temporarily] go back with ipfw internal locking. Discussed with: ae	2019-01-31 21:04:50 +00:00
Andrey V. Elsukov	7664b71b62	Fix the bug introduced in r342908, that causes problems with dynamic handling for protocols without ports numbers. Since port numbers were uninitialized for protocols like ICMP/ICMPv6, ipfw_chk() used some non-zero values to create dynamic states, and due this it failed to match replies with created states. Reported by: Oliver Hartmann, Boris Lytochkin Obtained from: Yandex LLC X-MFC after: r342908	2019-01-29 11:18:41 +00:00
Patrick Kelsey	59099cd385	Don't re-evaluate ALTQ kernel configuration due to events on non-ALTQ interfaces Re-evaluating the ALTQ kernel configuration can be expensive, particularly when there are a large number (hundreds or thousands) of queues, and is wholly unnecessary in response to events on interfaces that do not support ALTQ as such interfaces cannot be part of an ALTQ configuration. Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18918	2019-01-28 20:26:09 +00:00
Kristof Provost	d9d146e67b	pf: Fix use-after-free of counters When cleaning up a vnet we free the counters in V_pf_default_rule and V_pf_status from shutdown_pf(), but we can still use them later, for example through pf_purge_expired_src_nodes(). Free them as the very last operation, as they rely on nothing else themselves. PR: 235097 MFC after: 1 week	2019-01-25 01:06:06 +00:00
Kristof Provost	180b0dcbbb	pf: Validate psn_len in DIOCGETSRCNODES psn_len is controlled by user space, but we allocated memory based on it. Check how much memory we might need at most (i.e. how many source nodes we have) and limit the allocation to that. Reported by: markj MFC after: 1 week	2019-01-22 02:13:33 +00:00
Kristof Provost	6a8ee0f715	pf: fix pfsync breaking carp Fix missing initialisation of sc_flags into a valid sync state on clone which breaks carp in pfsync. This regression was introduce by r342051. PR: 235005 Submitted by: smh@FreeBSD.org Pointy hat to: kp MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D18882	2019-01-18 08:19:54 +00:00
Kristof Provost	032dff662c	pf: silence a runtime warning Sometimes, for negated tables, pf can log 'pfr_update_stats: assertion failed'. This warning does not clarify anything for users, so silence it, just as OpenBSD has. PR: 234874 MFC after: 1 week	2019-01-15 08:59:51 +00:00
Andrey V. Elsukov	48266154de	Relax requirement to packet size of CARP protocol and remove version check. CARP shares protocol number 112 with VRRP (RFC 5798). And the size of VRRP packet may be smaller than CARP. ipfw_chk() does m_pullup() to at least sizeof(struct carp_header) and can fail when packet is VRRP. This leads to packet drop and message about failed pullup attempt. Also, RFC 5798 defines version 3 of VRRP protocol, this version number also unsupported by CARP and such check leads to packet drop. carp_input() does its own checks for protocol version and packet size, so we can remove these checks to be able pass VRRP packets. PR: 234207 MFC after: 1 week	2019-01-11 01:54:15 +00:00
Andrey V. Elsukov	3b1522c229	Fix the build with INVARIANTS. MFC after: 1 month	2019-01-10 02:01:20 +00:00
Andrey V. Elsukov	1cdf23bc03	Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64. And refactor the code to avoid unneeded initialization to reduce overhead of per-packet processing. ipfw(4) can be invoked by pfil(9) framework for each packet several times. Each call uses on-stack variable of type struct ip_fw_args to keep the state of ipfw(4) processing. Currently this variable has 240 bytes size on amd64. Each time ipfw(4) does bzero() on it, and then it initializes some fields. glebius@ has reported that they at Netflix discovered, that initialization of this variable produces significant overhead on packet processing. After patching I managed to increase performance of packet processing on simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to 11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card). Introduced new field flags, it is used to keep track of what fields was initialized. Some fields were moved into the anonymous union, to reduce the size. They all are mutually exclusive. dummypar field was unused, and therefore it is removed. The hopstore6 field type was changed from sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct ip_fw_args is 128 bytes. ipfw_chk() was modified to properly handle ip_fw_args.flags instead of rely on checking for NULL pointers. Reviewed by: gallatin Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D18690	2019-01-10 01:47:57 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Kristof Provost	336683f24f	pf: Fix endless loop on NAT exhaustion with sticky-address When we try to find a source port in pf_get_sport() it's possible that all available source ports will be in use. In that case we call pf_map_addr() to try to find a new source IP to try from. If there are no more available source IPs pf_map_addr() will return 1 and we stop trying. However, if sticky-address is set we'll always return the same IP address, even if we've already tried that one. We need to check the supplied address, because if that's the one we'd set it means pf_get_sport() has already tried it, and we should error out rather than keep trying. PR: 233867 MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D18483	2018-12-12 20:15:06 +00:00
Kristof Provost	5b551954ab	pf: Prevent integer overflow in PF when calculating the adaptive timeout. Mainly states of established TCP connections would be affected resulting in immediate state removal once the number of states is bigger than adaptive.start. Disabling adaptive timeouts is a workaround to avoid this bug. Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr) Reported by: Andreas Longwitz <longwitz AT incore.de> Obtained from: OpenBSD MFC after: 2 weeks	2018-12-11 21:44:39 +00:00
Kristof Provost	4fc65bcbe3	pfsync: Performance improvement pfsync code is called for every new state, state update and state deletion in pf. While pf itself can operate on multiple states at the same time (on different cores, assuming the states hash to a different hashrow), pfsync only had a single lock. This greatly reduced throughput on multicore systems. Address this by splitting the pfsync queues into buckets, based on the state id. This ensures that updates for a given connection always end up in the same bucket, which allows pfsync to still collapse multiple updates into one, while allowing multiple cores to proceed at the same time. The number of buckets is tunable, but defaults to 2 x number of cpus. Benchmarking has shown improvement, depending on hardware and setup, from ~30% to ~100%. MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D18373	2018-12-06 19:27:15 +00:00
Kristof Provost	2b0a4ffadb	pf: add a comment describing why do we call pf_map_addr again if port selection process fails Obtained from: OpenBSD	2018-12-06 18:58:54 +00:00
Andrey V. Elsukov	d66f9c86fa	Add ability to request listing and deleting only for dynamic states. This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but after rules reloading some state must be deleted. Added new flag '-D' for such purpose. Retire '-e' flag, since there can not be expired states in the meaning that this flag historically had. Also add "verbose" mode for listing of dynamic states, it can be enabled with '-v' flag and adds additional information to states list. This can be useful for debugging. Obtained from: Yandex LLC MFC after: 2 months Sponsored by: Yandex LLC	2018-12-04 16:12:43 +00:00
Andrey V. Elsukov	cefe3d67e2	Reimplement how net.inet.ip.fw.dyn_keep_states works. Turning on of this feature allows to keep dynamic states when parent rule is deleted. But it works only when the default rule is "allow from any to any". Now when rule with dynamic opcode is going to be deleted, and net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference named objects corresponding to this rule, and also reference the rule. And when ipfw_dyn_lookup_state() will find state for deleted parent rule, it will return the pointer to the deleted rule, that is still valid. This implementation doesn't support O_LIMIT_PARENT rules. The refcnt field was added to struct ip_fw to keep reference, also next pointer added to be able iterate rules and not damage the content when deleted rules are chained. Named objects are referenced only when states are going to be deleted to be able reuse kidx of named objects when new parent rules will be installed. ipfw_dyn_get_count() function was modified and now it also looks into dynamic states and constructs maps of existing named objects. This is needed to correctly export orphaned states into userland. ipfw_free_rule() was changed to be global, since now dynamic state can free rule, when it is expired and references counters becomes 1. External actions subsystem also modified, since external actions can be deregisterd and instances can be destroyed. In these cases deleted rules, that are referenced by orphaned states, must be modified to prevent access to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance() functions added for these purposes. Obtained from: Yandex LLC MFC after: 2 months Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17532	2018-12-04 16:01:25 +00:00
Andrey V. Elsukov	0df76496a6	Add assertion to check that named object has correct type. Obtained from: Yandex LLC MFC after: 1 week	2018-12-04 15:12:28 +00:00
Kristof Provost	b2e0b24f76	pf: Fix panic on overlapping interface names In rare situations[] it's possible for two different interfaces to have the same name. This confuses pf, because kifs are indexed by name (which is assumed to be unique). As a result we can end up trying to if_rele(NULL), which panics. Explicitly checking the ifp pointer before if_rele() prevents the panic. Note pf will likely behave in unexpected ways on the the overlapping interfaces. [] Insert an interface in a vnet jail. Rename it to an interface which exists on the host. Remove the jail. There are now two interfaces with the same name in the host.	2018-12-01 09:58:21 +00:00
Andrey V. Elsukov	2636ba4d03	Do not limit the mbuf queue length for keepalive packets. It was unlimited before overhaul, and one user reported that this limit can be reached easily. PR: 233562 MFC after: 1 week	2018-11-27 16:51:01 +00:00
Andrey V. Elsukov	b2b5660688	Add ability to use dynamic external prefix in ipfw_nptv6 module. Now an interface name can be specified for nptv6 instance instead of ext_prefix. The module will track if_addr_ext events and when suitable IPv6 address will be added to specified interface, it will be configured as external prefix. When address disappears instance becomes unusable, i.e. it doesn't match any packets. Reviewed by: 0mp (manpages) Tested by: Dries Michiels <driesm dot michiels gmail com> MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D17765	2018-11-12 11:20:59 +00:00
Kristof Provost	87e4ca37d5	pf: Prevent tables referenced by rules in anchors from getting disabled. PR: 183198 Obtained from: OpenBSD MFC after: 2 weeks	2018-11-08 21:54:40 +00:00
Kristof Provost	58ef854f8b	pf: Fix build if INVARIANTS is not set r340061 included a number of assertions pf_frent_remove(), but these assertions were the only use of the 'prev' variable. As a result builds without INVARIANTS had an unused variable, and failed. Reported by: vangyzen@	2018-11-02 19:23:50 +00:00
Kristof Provost	14624ab582	pf: Keep a reference to struct ifnets we're using Ensure that the struct ifnet we use can't go away until we're done with it.	2018-11-02 17:05:40 +00:00
Kristof Provost	dde6e1fecb	pfsync: Add missing unlock If we fail to set up the multicast entry for pfsync and return an error we must release the pfsync lock first. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17506	2018-11-02 17:03:53 +00:00
Kristof Provost	04fe85f068	pfsync: Allow module to be unloaded MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17505	2018-11-02 17:01:18 +00:00
Kristof Provost	fbbf436d56	pfsync: Handle syncdev going away If the syncdev is removed we no longer need to clean up the multicast entry we've got set up for that device. Pass the ifnet detach event through pf to pfsync, and remove our multicast handle, and mark us as no longer having a syncdev. Note that this callback is always installed, even if the pfsync interface is disabled (and thus it's not a per-vnet callback pointer). MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17502	2018-11-02 16:57:23 +00:00
Kristof Provost	26549dfcad	pfsync: Ensure uninit is done before pf pfsync touches pf memory (for pf_state and the pfsync callback pointers), not the other way around. We need to ensure that pfsync is torn down before pf. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17501	2018-11-02 16:53:15 +00:00
Kristof Provost	5f6cf24e2d	pfsync: Make pfsync callbacks per-vnet The callbacks are installed and removed depending on the state of the pfsync device, which is per-vnet. The callbacks must also be per-vnet. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17499	2018-11-02 16:47:07 +00:00
Kristof Provost	790194cd47	pf: Limit the fragment entry queue length to 64 per bucket. So we have a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17734	2018-11-02 15:32:04 +00:00
Kristof Provost	fd2ea405e6	pf: Split the fragment reassembly queue into smaller parts Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17733	2018-11-02 15:26:51 +00:00
Kristof Provost	2b1c354ee6	pf: Count holes rather than fragments for reassembly Avoid traversing the list of fragment entris to check whether the pf(4) reassembly is complete. Instead count the holes that are created when inserting a fragment. If there are no holes left, the fragments are continuous. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17732	2018-11-02 15:23:57 +00:00
Kristof Provost	19a22ae313	Revert "pf: Limit the maximum number of fragments per packet" This reverts commit r337969. We'll handle this the OpenBSD way, in upcoming commits.	2018-11-02 15:01:59 +00:00
Kristof Provost	99eb00558a	pf: Make ':0' ignore link-local v6 addresses too When users mark an interface to not use aliases they likely also don't want to use the link-local v6 address there. PR: 201695 Submitted by: Russell Yount <Russell.Yount AT gmail.com> Differential Revision: https://reviews.freebsd.org/D17633	2018-10-28 05:32:50 +00:00
Eugene Grosbein	5310c19174	ipfw: implement ngtee/netgraph actions for layer-2 frames. Kernel part of ipfw does not support and ignores rules other than "pass", "deny" and dummynet-related for layer-2 (ethernet frames). Others are processed as "pass". Make it support ngtee/netgraph rules just like they are supported for IP packets. For example, this allows us to mirror some frames selectively to another interface for delivery to remote network analyzer over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900" attached to "lower" hook of vlan900's ng_ether(4) node, that would be as simple as: ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0 PR: 213452 MFC after: 1 month Tested-by: Fyodor Ustinov <ufm@ufm.su>	2018-10-27 07:32:26 +00:00
Kristof Provost	13d640d376	pf: Fix copy/paste error in IPv6 address rewriting We checked the destination address, but replaced the source address. This was fixed in OpenBSD as part of their NAT rework, which we don't want to import right now. CID: 1009561 MFC after: 3 weeks	2018-10-24 00:19:44 +00:00
Kristof Provost	73c9014569	pf: ifp can never be NULL in pfi_ifaddr_event() There's no point in the NULL check for ifp, because we'll already have dereferenced it by then. Moreover, the event will always have a valid ifp. Replace the late check with an early assertion. CID: 1357338	2018-10-23 23:15:44 +00:00
Andrey V. Elsukov	ab108c4b07	Do not decrement RST life time if keep_alive is not turned on. This allows use differen values configured by user for sysctl variable net.inet.ip.fw.dyn_rst_lifetime. Obtained from: Yandex LLC MFC after: 3 weeks Sponsored by: Yandex LLC	2018-10-21 16:44:57 +00:00
Andrey V. Elsukov	2ffadd56f5	Call inet_ntop() only when its result is needed. Obtained from: Yandex LLC MFC after: 3 weeks Sponsored by: Yandex LLC	2018-10-21 16:37:53 +00:00
Andrey V. Elsukov	aa2715612c	Retire IPFIREWALL_NAT64_DIRECT_OUTPUT kernel option. And add ability to switch the output method in run-time. Also document some sysctl variables that can by changed for NAT64 module. NAT64 had compile time option IPFIREWALL_NAT64_DIRECT_OUTPUT to use if_output directly from nat64 module. By default is used netisr based output method. Now both methods can be used, but they require different handling by rules. Obtained from: Yandex LLC MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D16647	2018-10-21 16:29:12 +00:00
Kristof Provost	1563a27e1f	pf synproxy will do the 3WHS on behalf of the target machine, and once the 3WHS is completed, establish the backend connection. The trigger for "3WHS completed" is the reception of the first ACK. However, we should not proceed if that ACK also has RST or FIN set. PR: 197484 Obtained from: OpenBSD MFC after: 2 weeks	2018-10-20 18:37:21 +00:00
Andrey V. Elsukov	986368d85d	Add extra parentheses to fix "versrcreach" opcode, (oif != NULL) should not be used as condition for ternary operator. Submitted by: Tatsuki Makino <tatsuki_makino at hotmail dot com> Approved by: re (kib) MFC after: 1 week	2018-10-15 10:25:34 +00:00
John-Mark Gurney	032d3aaa96	Significantly improve pf purge cpu usage by only taking locks when there is work to do. This reduces CPU consumption to one third on systems. This will help keep the thread CPU usage under control now that the default hash size has increased. Reviewed by: kp Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17097	2018-09-16 00:44:23 +00:00
Patrick Kelsey	249cc75fd1	Extended pf(4) ioctl interface and pfctl(8) to allow bandwidths of 2^32 bps or greater to be used. Prior to this, bandwidth parameters would simply wrap at the 2^32 boundary. The computations in the HFSC scheduler and token bucket regulator have been modified to operate correctly up to at least 100 Gbps. No other algorithms have been examined or modified for correct operation above 2^32 bps (some may have existing computation resolution or overflow issues at rates below that threshold). pfctl(8) will now limit non-HFSC bandwidth parameters to 2^32 - 1 before passing them to the kernel. The extensions to the pf(4) ioctl interface have been made in a backwards-compatible way by versioning affected data structures, supporting all versions in the kernel, and implementing macros that will cause existing code that consumes that interface to use version 0 without source modifications. If version 0 consumers of the interface are used against a new kernel that has had bandwidth parameters of 2^32 or greater configured by updated tools, such bandwidth parameters will be reported as 2^32 - 1 bps by those old consumers. All in-tree consumers of the pf(4) interface have been updated. To update out-of-tree consumers to the latest version of the interface, define PFIOC_USE_LATEST ahead of any includes and use the code of pfctl(8) as a guide for the ioctls of interest. PR: 211730 Reviewed by: jmallett, kp, loos MFC after: 2 weeks Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D16782	2018-08-22 19:38:48 +00:00
Kristof Provost	d47023236c	pf: Limit the maximum number of fragments per packet Similar to the network stack issue fixed in r337782 pf did not limit the number of fragments per packet, which could be exploited to generate high CPU loads with a crafted series of packets. Limit each packet to no more than 64 fragments. This should be sufficient on typical networks to allow maximum-sized IP frames. This addresses the issue for both IPv4 and IPv6. MFC after: 3 days Security: CVE-2018-5391 Sponsored by: Klara Systems	2018-08-17 15:00:10 +00:00
Luiz Otavio O Souza	a0376d4d29	Fix a typo in comment. MFC after: 3 days X-MFC with: r321316 Sponsored by: Rubicon Communications, LLC (Netgate)	2018-08-15 16:36:29 +00:00
Kristof Provost	e9ddca4a40	pf: Take the IF_ADDR_RLOCK() when iterating over the group list We did do this elsewhere in pf, but the lock was missing here. Sponsored by: Essen Hackathon	2018-08-11 16:37:55 +00:00
Kristof Provost	33b242b533	pf: Fix 'set skip on' for groups The pfi_skip_if() function sometimes caused skipping of groups to work, if the members of the group used the groupname as a name prefix. This is often the case, e.g. group lo usually contains lo0, lo1, ..., but not always. Rather than relying on the name explicitly check for group memberships. Obtained from: OpenBSD (pf_if.c,v 1.62, pf_if.c,v 1.63) Sponsored by: Essen Hackathon	2018-08-11 16:34:30 +00:00
Andrey V. Elsukov	5c4aca8218	Use host byte order when comparing mss values. This fixes tcp-setmss action on little endian machines. PR: 225536 Submitted by: John Zielinski	2018-08-08 17:32:02 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Kristof Provost	32ece669c2	pf: Fix synproxy Synproxy was accidentally broken by r335569. The 'return (action)' must be executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP error) should only be sent if the packet was dropped (i.e. PF_DROP) and the return flag is set. PR: 229477 Submitted by: Andre Albsmeier <mail AT fbsd.e4m.org> MFC after: 1 week	2018-07-14 10:14:59 +00:00
Kristof Provost	3e603d1ffa	pf: Fix panic on vnet jail shutdown with synproxy When shutting down a vnet jail pf_shutdown() clears the remaining states, which through pf_clear_states() calls pf_unlink_state(). For synproxy states pf_unlink_state() will send a TCP RST, which eventually tries to schedule the pf swi in pf_send(). This means we can't remove the software interrupt until after pf_shutdown(). MFC after: 1 week	2018-07-14 09:11:32 +00:00
Andrey V. Elsukov	0a2c13d333	Use correct size when we are allocating array for skipto index. Also, there is no need to use M_ZERO for idxmap_back. It will be re-filled just after allocation in update_skipto_cache(). PR: 229665 MFC after: 1 week	2018-07-12 11:38:18 +00:00
Andrey V. Elsukov	f7c4fdee1a	Add "record-state", "set-limit" and "defer-action" rule options to ipfw. "record-state" is similar to "keep-state", but it doesn't produce implicit O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the same feature as "record-state", it is single opcode without implicit O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic states. When rule with this opcode is matched, the rule's action will not be executed, instead dynamic state will be created. And when this state will be matched by "check-state", then rule action will be executed. This allows create a more complicated rulesets. Submitted by: lev MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D1776	2018-07-09 11:35:18 +00:00
Andrew Turner	2bf9501287	Create a new macro for static DPCPU data. On arm64 (and possible other architectures) we are unable to use static DPCPU data in kernel modules. This is because the compiler will generate PC-relative accesses, however the runtime-linker expects to be able to relocate these. In preparation to fix this create two macros depending on if the data is global or static. Reviewed by: bz, emaste, markj Sponsored by: ABT Systems Ltd Differential Revision: https://reviews.freebsd.org/D16140	2018-07-05 17:13:37 +00:00
Will Andrews	cc535c95ca	Revert r335833. Several third-parties use at least some of these ioctls. While it would be better for regression testing if they were used in base (or at least in the test suite), it's currently not worth the trouble to push through removal. Submitted by: antoine, markj	2018-07-04 03:36:46 +00:00
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Kristof Provost	de210decd1	pfsync: Fix state sync during initial bulk update States learned via pfsync from a peer with the same ruleset checksum were not getting assigned to rules like they should because pfsync_in_upd() wasn't passing the PFSYNC_SI_CKSUM flag along to pfsync_state_import. PR: 229092 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Obtained from: OpenBSD MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-30 12:51:08 +00:00
Kristof Provost	150182e309	pf: Support "return" statements in passing rules when they fail. Normally pf rules are expected to do one of two things: pass the traffic or block it. Blocking can be silent - "drop", or loud - "return", "return-rst", "return-icmp". Yet there is a 3rd category of traffic passing through pf: Packets matching a "pass" rule but when applying the rule fails. This happens when redirection table is empty or when src node or state creation fails. Such rules always fail silently without notifying the sender. Allow users to configure this behaviour too, so that pf returns an error packet in these cases. PR: 226850 Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> MFC after: 1 week Sponsored by: InnoGames GmbH	2018-06-22 21:59:30 +00:00
Andrey V. Elsukov	20efcfc602	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
Kristof Provost	0b799353d8	pf: Fix deadlock with route-to If a locally generated packet is routed (with route-to/reply-to/dup-to) out of a different interface it's passed through the firewall again. This meant we lost the inp pointer and if we required the pointer (e.g. for user ID matching) we'd deadlock trying to acquire an inp lock we've already got. Pass the inp pointer along with pf_route()/pf_route6(). PR: 228782 MFC after: 1 week	2018-06-09 14:17:06 +00:00
Mateusz Guzik	4e180881ae	uma: implement provisional api for per-cpu zones Per-cpu zone allocations are very rarely done compared to regular zones. The intent is to avoid pessimizing the latter case with per-cpu specific code. In particular contrary to the claim in r334824, M_ZERO is sometimes being used for such zones. But the zeroing method is completely different and braching on it in the fast path for regular zones is a waste of time.	2018-06-08 21:40:03 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Andrey V. Elsukov	67ad3c0bf9	Restore the ability to keep states after parent rule deletion. This feature is disabled by default and was removed when dynamic states implementation changed to be lockless. Now it is reimplemented with small differences - when dyn_keep_states sysctl variable is enabled, dyn_match_ipv[46]_state() function doesn't match child states of deleted rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state() function does check that state was not orphaned, and if so, it returns pointer to default_rule and its position in the rules map. The main visible difference is that orphaned states still have the same rule number that they have before parent rule deleted, because now a state has many fields related to rule and changing them all atomically to point to default_rule seems hard enough. Reported by: <lantw44 at gmail.com> MFC after: 2 days	2018-05-22 13:28:05 +00:00
Andrey V. Elsukov	4bb8a5b0c9	Remove check for matching the rulenum, ruleid and rule pointer from dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not ready to be committed code, and they are there by accident. Due to the race these checks can lead to creating of duplicate states when concurrent threads in the same time will try to add state for two packets of the same flow, but in reverse directions and matched by different parent rules. Reported by: lev MFC after: 3 days	2018-05-21 16:19:00 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Andrey V. Elsukov	782360dec3	Bring in some last changes in NAT64 implementation: o Modify ipfw(8) to be able set any prefix6 not just Well-Known, and also show configured prefix6; o relocate some definitions and macros into proper place; o convert nat64_debug and nat64_allow_private variables to be VNET-compatible; o add struct nat64_config that keeps generic configuration needed to NAT64 code; o add nat64_check_prefix6() function to check validness of specified by user IPv6 prefix according to RFC6052; o use nat64_check_private_ip4() and nat64_embed_ip4() functions instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows to use any configured IPv6 prefixes that are allowed by RFC6052; o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is Well-Known IPv6 prefix. It is used to reduce overhead to check this; o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config structure. And respectivelly modify the rest of code; o remove now unused ro argument from nat64_output() function; o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions; o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile as example. Obtained from: Yandex LLC MFC after: 1 month Sponsored by: Yandex LLC	2018-05-09 11:59:24 +00:00
Sean Bruno	2695c9c109	Retire ixgb(4) This driver was for an early and uncommon legacy PCI 10GbE for a single ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family. Submitted by: kbowling Reviewed by: brooks imp jeffrey.e.pieper@intel.com Relnotes: yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15234	2018-05-02 15:59:15 +00:00
Andrey V. Elsukov	5f69d0a4ff	To avoid possible deadlock do not acquire JQUEUE_LOCK before callout_drain. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-13 10:03:30 +00:00
Andrey V. Elsukov	2d8fcffb99	Fix integer types mismatch for flags field in nat64stl_cfg structure. Also preserve internal flags on NAT64STL reconfiguration. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-12 21:29:40 +00:00
Andrey V. Elsukov	eed302572a	Use cfg->nomatch_verdict as return value from NAT64LSN handler when given mbuf is considered as not matched. If mbuf was consumed or freed during handling, we must return IP_FW_DENY, since ipfw's pfil handler ipfw_check_packet() expects IP_FW_DENY when mbuf pointer is NULL. This fixes KASSERT panics when NAT64 is used with INVARIANTS. Also remove unused nomatch_final field from struct nat64lsn_cfg. Reported by: Justin Holcomb <justin at justinholcomb dot me> Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2018-04-12 21:13:30 +00:00
Andrey V. Elsukov	c570565f12	Migrate NAT64 to FIB KPI. Obtained from: Yandex LLC MFC after: 1 week	2018-04-12 21:05:20 +00:00
Kristof Provost	c41420d5dc	pf: limit ioctl to a reasonable and tuneable number of elements pf ioctls frequently take a variable number of elements as argument. This can potentially allow users to request very large allocations. These will fail, but even a failing M_NOWAIT might tie up resources and result in concurrent M_WAITOK allocations entering vm_wait and inducing reclamation of caches. Limit these ioctls to what should be a reasonable value, but allow users to tune it should they need to. Differential Revision: https://reviews.freebsd.org/D15018	2018-04-11 11:43:12 +00:00
Oleg Bulyzhin	3995ad1768	Fix ipfw table creation when net.inet.ip.fw.tables_sets = 0 and non zero set specified on table creation. This fixes following: # sysctl net.inet.ip.fw.tables_sets net.inet.ip.fw.tables_sets: 0 # ipfw table all info # ipfw set 1 table 1 create type addr # ipfw set 1 table 1 create type addr # ipfw add 10 set 1 count ip from table$1$ to any 00010 count ip from table(1) to any # ipfw add 10 set 1 count ip from table$1$ to any 00010 count ip from table(1) to any # ipfw table all info --- table(1), set(1) --- kindex: 4, type: addr references: 1, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 3, type: addr references: 1, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 2, type: addr references: 0, valtype: legacy algorithm: addr:radix items: 0, size: 296 --- table(1), set(1) --- kindex: 1, type: addr references: 0, valtype: legacy algorithm: addr:radix items: 0, size: 296 # MFC after: 1 week	2018-04-11 11:12:20 +00:00
Kristof Provost	1a125a2f7f	pf: Improve ioctl validation Ensure that multiplications for memory allocations cannot overflow, and that we'll not try to allocate M_WAITOK for potentially overly large allocations. MFC after: 1 week	2018-04-06 19:36:35 +00:00
Kristof Provost	02214ac854	pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. There's no obvious limit to the request size for these, so we limit the requests to something which won't overflow. Change the memory allocation to M_NOWAIT so excessive requests will fail rather than stall forever. MFC after: 1 week	2018-04-06 19:20:45 +00:00
Kristof Provost	adfe2f6aff	pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS These ioctls can process a number of items at a time, which puts us at risk of overflow in mallocarray() and of impossibly large allocations even if we don't overflow. Limit the allocation to required size (or the user allocation, if that's smaller). That does mean we need to do the allocation with the rules lock held (so the number doesn't change while we're doing this), so it can't M_WAITOK. MFC after: 1 week	2018-04-06 15:54:30 +00:00
Kristof Provost	8748b499c1	pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of tables at a time, and as such try to allocate <number of tables> * sizeof(struct pfr_table). This multiplication can overflow. Thanks to mallocarray() this is not exploitable, but an overflow does panic the system. Arbitrarily limit this to 65535 tables. pfctl only ever processes one table at a time, so it presents no issues there. MFC after: 1 week	2018-04-06 15:01:45 +00:00
Brooks Davis	541d96aaaf	Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900	2018-03-30 18:50:13 +00:00
Kristof Provost	effaab8861	netpfil: Introduce PFIL_FWD flag Forwarded packets passed through PFIL_OUT, which made it difficult for firewalls to figure out if they were forwarding or producing packets. This in turn is an issue for pf for IPv6 fragment handling: it needs to call ip6_output() or ip6_forward() to handle the fragments. Figuring out which was difficult (and until now, incorrect). Having pfil distinguish the two removes an ugly piece of code from pf. Introduce a new variant of the netpfil callbacks with a flags variable, which has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if a packet is forwarded. Reviewed by: ae, kevans Differential Revision: https://reviews.freebsd.org/D13715	2018-03-23 16:56:44 +00:00
Kristof Provost	b4b8fa3387	pf: Fix memory leak in DIOCRADDTABLES If a user attempts to add two tables with the same name the duplicate table will not be added, but we forgot to free the duplicate table, leaking memory. Ensure we free the duplicate table in the error path. Reported by: Coverity CID: 1382111 MFC after: 3 weeks	2018-03-19 21:13:25 +00:00
Andrey V. Elsukov	12c080e613	Do not try to reassemble IPv6 fragments in "reass" rule. ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets that it gets. Until proper IPv6 fragments handling function will be implemented, pass IPv6 packets to next rule. PR: 170604 MFC after: 1 week	2018-03-12 09:40:46 +00:00
Kristof Provost	bf56a3fe47	pf: Cope with overly large net.pf.states_hashsize If the user configures a states_hashsize or source_nodes_hashsize value we may not have enough memory to allocate this. This used to lock up pf, because these allocations used M_WAITOK. Cope with this by attempting the allocation with M_NOWAIT and falling back to the default sizes (with M_WAITOK) if these fail. PR: 209475 Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com> MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14367	2018-02-25 08:56:44 +00:00
Andrey V. Elsukov	99493f5a4a	Remove duplicate #include <netinet/ip_var.h>.	2018-02-07 19:12:05 +00:00
Andrey V. Elsukov	b99a682320	Rework ipfw dynamic states implementation to be lockless on fast path. o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and for dynamic states implementation information; o added DYN_LOOKUP_NEEDED() macro that can be used to determine the need of new lookup of dynamic states; o ipfw_dyn_rule now becomes obsolete. Currently it used to pass information from kernel to userland only. o IPv4 and IPv6 states now described by different structures dyn_ipv4_state and dyn_ipv6_state; o IPv6 scope zones support is added; o ipfw(4) now depends from Concurrency Kit; o states are linked with "entry" field using CK_SLIST. This allows lockless lookup and protected by mutex modifications. o the "expired" SLIST field is used for states expiring. o struct dyn_data is used to keep generic information for both IPv4 and IPv6; o struct dyn_parent is used to keep O_LIMIT_PARENT information; o IPv4 and IPv6 states are stored in different hash tables; o O_LIMIT_PARENT states now are kept separately from O_LIMIT and O_KEEP_STATE states; o per-cpu dyn_hp pointers are used to implement hazard pointers and they prevent freeing states that are locklessly used by lookup threads; o mutexes to protect modification of lists in hash tables now kept in separate arrays. 65535 limit to maximum number of hash buckets now removed. o Separate lookup and install functions added for IPv4 and IPv6 states and for parent states. o By default now is used Jenkinks hash function. Obtained from: Yandex LLC MFC after: 42 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D12685	2018-02-07 18:59:54 +00:00
Kristof Provost	c201b5644d	pf: Avoid warning without INVARIANTS When INVARIANTS is not set the 'last' variable is not used, which can generate compiler warnings. If this invariant is ever violated it'd result in a KASSERT failure in refcount_release(), so this one is not strictly required.	2018-02-01 07:52:06 +00:00
Andrey V. Elsukov	14a6bab1da	When IPv6 packet is handled by O_REJECT opcode, convert ICMP code specified in the arg1 into ICMPv6 destination unreachable code according to RFC7915. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2018-01-24 12:40:28 +00:00
Kristof Provost	6701c43213	pf: States have at least two references pf_unlink_state() releases a reference to the state without checking if this is the last reference. It can't be, because pf_state_insert() initialises it to two. KASSERT() that this is always the case. CID: 1347140	2018-01-24 04:29:16 +00:00
Pedro F. Giffuni	d821d36419	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
Andrey V. Elsukov	d38344208e	Add UDPLite support to ipfw(4). Now it is possible to use UDPLite's port numbers in rules, create dynamic states for UDPLite packets and see "UDPLite" for matched packets in log. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2018-01-19 12:50:03 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Pedro F. Giffuni	454529cd0b	netpfil/ipfw: Make some use of mallocarray(9). Reviewed by: kp, ae Differential Revision: https://reviews.freebsd.org/D13834	2018-01-11 15:29:29 +00:00
Kristof Provost	6273ba66f2	pf: Avoid integer overflow issues by using mallocarray() iso. malloc() pfioctl() handles several ioctl that takes variable length input, these include: - DIOCRADDTABLES - DIOCRDELTABLES - DIOCRGETTABLES - DIOCRGETTSTATS - DIOCRCLRTSTATS - DIOCRSETTFLAGS All of them take a pfioc_table struct as input from userland. One of its elements (pfrio_size) is used in a buffer length calculation. The calculation contains an integer overflow which if triggered can lead to out of bound reads and writes later on. Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>	2018-01-07 13:35:15 +00:00
Kristof Provost	9d671fee3a	pf: Allow the module to be unloaded pf can now be safely unloaded. Most of this code is exercised on vnet jail shutdown. Don't block unloading.	2017-12-31 16:18:13 +00:00
Kristof Provost	5d0020d6d7	pf: Clean all fragments on shutdown When pf is unloaded, or a vnet jail using pf is stopped we need to ensure we clean up all fragments, not just the expired ones.	2017-12-31 10:01:31 +00:00
Pedro F. Giffuni	6e778a7efd	SPDX: license IDs for some ISC-related files.	2017-12-08 15:57:29 +00:00
Pedro F. Giffuni	8820ecc040	SPDX: Fix some cases wrongly attributed to MIT. In the cases of BSD-style license variants without clauses, use 0BSD for the time being in lack of a better description.	2017-11-30 15:10:11 +00:00
Pedro F. Giffuni	fe267a5590	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.	2017-11-27 15:23:17 +00:00
Michael Tuexen	665c8a2ee5	Add to ipfw support for sending an SCTP packet containing an ABORT chunk. This is similar to the TCP case. where a TCP RST segment can be sent. There is one limitation: When sending an ABORT in response to an incoming packet, it should be tested if there is no ABORT chunk in the received packet. Currently, it is only checked if the first chunk is an ABORT chunk to avoid parsing the whole packet, which could result in a DOS attack. Thanks to Timo Voelker for helping me to test this patch. Reviewed by: bcr@ (man page part), ae@ (generic, non-SCTP part) Differential Revision: https://reviews.freebsd.org/D13239	2017-11-26 18:19:01 +00:00
Andrey V. Elsukov	1719df1bb4	Modify ipfw's dynamic states KPI. Hide the locking logic used in the dynamic states implementation from generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule() function to have similar names: ipfw_dyn_install_state() and ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the ipfw_dyn_lookup_state() function. Now this function return NULL when there is no state and pointer to the parent rule when state is found. Thus now there is no need to return pointer to dynamic rule, and no need to hold bucket lock for this state. Remove ipfw_dyn_unlock() function. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D11657	2017-11-23 08:02:02 +00:00
Andrey V. Elsukov	9d15540022	Check that address family of state matches address family of packet. If it is not matched avoid comparing other state fields. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 07:05:25 +00:00
Andrey V. Elsukov	30df59d581	Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c. It is not specific for dynamic states function and called also from generic code. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 06:04:57 +00:00
Andrey V. Elsukov	288bf455bb	Rework rule ranges matching. Use comparison rule id with UINT32_MAX to match all rules with the same rule number. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-23 05:55:53 +00:00
Andrey V. Elsukov	7143bb7626	Add ipfw_add_protected_rule() function that creates rule with 65535 number in the reserved set 31. Use this function to create default rule. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-22 05:49:21 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Andrey V. Elsukov	66f84fabb3	Add comment for accidentally committed unrelated change in r325960. Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects a packet is IPv4. And in case when it is IPv6, it just translates them as IPv4. This leads to corruption and in some cases to panics. In particular a panic can happen when value of ip6_plen modified to something that leads to IP fragmentation, but actual packet length does not match the IP length. Packets that are not IPv4 will be dropped by NAT rule. Reported by: Viktor Dukhovni <freebsd at dukhovni dot org> MFC after: 1 week	2017-11-17 23:25:06 +00:00
Andrey V. Elsukov	e11f0a0c4c	Unconditionally enable support for O_IPSEC opcode. IPsec support can be loaded as kernel module, thus do not depend from kernel option IPSEC and always build O_IPSEC opcode implementation as enabled. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-11-17 22:40:02 +00:00
Don Lewis	4001fcbe0a	Fix Dummynet AQM packet marking function ecn_mark() and fq_codel / fq_pie schedulers packet classification functions in layer2 (bridge mode). Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie schedulers packet classification functions (fq_codel_classify_flow() and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP) packet. However, this assumption is incorrect if ipfw/dummynet is used to manage layer2 traffic (bridge mode) since mbuf will point at L2 frame. This patch solves this problem by identifying the source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN offset when converting an mbuf pointer to ip pointer if the traffic is from layer2. More specifically, in dummynet packet tagging function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the traffic is from layer2 and set to zero otherwise. Whenever an access to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is used instead of mtod(m, struct ip *) to correctly convert mbuf pointer to ip pointer in both L2 and L3 traffic. Submitted by: lstewart MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12506	2017-10-26 10:11:35 +00:00
Andrey V. Elsukov	5c70ebfa57	Add IPv6 support for O_TCPDATALEN opcode. PR: 222746 MFC after: 1 week	2017-10-24 08:39:05 +00:00
Andrey V. Elsukov	ff0a137952	Fix regression in handling O_FORWARD_IP opcode after r279948. To properly handle 'fwd tablearg,port' opcode, copy sin_port value from sockaddr_in structure stored in the opcode into corresponding hopstore field. PR: 222953 MFC after: 1 week	2017-10-13 11:11:53 +00:00
Michael Tuexen	945906384d	Fix a bug which avoided that rules for matching port numbers for SCTP packets where actually matched. While there, make clean in the man-page that SCTP port numbers are supported in rules. MFC after: 1 month	2017-10-02 18:25:30 +00:00
Andrey V. Elsukov	5df8171da3	Use in_localip() function instead of unlocked access to addresses hash to determine that an address is our local. PR: 220078 MFC after: 1 week	2017-09-20 22:35:28 +00:00
Andrey V. Elsukov	369bc48dc5	Do not acquire IPFW_WLOCK when a named object is created and destroyed. Acquiring of IPFW_WLOCK is requried for cases when we are going to change some data that can be accessed during processing of packets flow. When we create new named object, there are not yet any rules, that references it, thus holding IPFW_UH_WLOCK is enough to safely update needed structures. When we destroy an object, we do this only when its reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK, because noone references it. The another case is when we failed to finish some action and thus we are doing rollback and destroying an object, in this case it is still not referenced by rules and no need to acquire IPFW_WLOCK. This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring. MFC after: 1 week Sponsored by: Yandex LLC	2017-09-20 22:00:06 +00:00
Kristof Provost	7f3ad01804	pf_get_sport(): Prevent possible endless loop when searching for an unused nat port This is an import of Alexander Bluhm's OpenBSD commit r1.60, the first chunk had to be modified because on OpenBSD the 'cut' declaration is located elsewhere. Upstream report by Jingmin Zhou: https://marc.info/?l=openbsd-pf&m=150020133510896&w=2 OpenBSD commit message: Use a 32 bit variable to detect integer overflow when searching for an unused nat port. Prevents a possible endless loop if high port is 65535 or low port is 0. report and analysis Jingmin Zhou; OK sashan@ visa@ Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c PR: 221201 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: OpenBSD via ElectroBSD MFC after: 1 week	2017-08-08 21:09:26 +00:00
Luiz Otavio O Souza	9ffd0f54a7	Fix a couple of typos in a comment. MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate)	2017-07-21 03:04:55 +00:00
Philip Paeps	b0e1660d53	Fix GRE over IPv6 tunnels with IPFW Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless net.inet6.ip6.fw.deny_unknown_exthdrs was unset). PR: 220640 Submitted by: Kun Xie <kxie@xiplink.com> MFC after: 1 week	2017-07-13 09:01:22 +00:00
Kristof Provost	b7ae43552b	pf: Fix vnet purging pf_purge_thread() breaks up the work of iterating all states (in pf_purge_expired_states()) and tracks progress in the idx variable. If multiple vnets exist this results in pf_purge_thread() only calling pf_purge_expired_states() for part of the states (the first part of the first vnet, second part of the second vnet and so on). Combined with the mark-and-sweep approach to cleaning up old rules (in V_pf_unlinked_rules) that resulted in pf freeing rules that were still referenced by states. This in turn caused panics when pf_state_expires() encounters that state and attempts to access the rule. We need to track the progress per vnet, not globally, so idx is moved into a per-vnet V_pf_purge_idx. PR: 219251 Sponsored by: Hackathon Essen 2017	2017-07-09 17:56:39 +00:00
Andrey V. Elsukov	785c0d4d97	Fix IPv6 extension header parsing. The length field doesn't include the first 8 octets. Obtained from: Yandex LLC MFC after: 3 days	2017-06-29 19:06:43 +00:00
Don Lewis	d196c9ee16	Fix the queue delay estimation in PIE/FQ-PIE when the timestamp (TS) method is used. When packet timestamp is used, the "current_qdelay" keeps storing the last queue delay value calculated in the dequeue function. Therefore, when a burst of packets arrives followed by a pause, the "current_qdelay" will store a high value caused by the burst and stick to that value during the pause because the queue delay measurement is done inside the dequeue function. This causes the drop probability calculation function to calculate high drop probability value instead of zero and prevents the burst allowance mechanism from working properly. Fix this problem by resetting "current_qdelay" inside the drop probability calculation function when the queue length is zero and TS option is used. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-05-19 08:38:03 +00:00
Don Lewis	36fb8be630	The result of right shifting a negative signed value is implementation defined. On machines without arithmetic shift instructions, zero bits may be shifted in from the left, giving a large positive result instead of the desired divide-by power-of-2. Fix this by operating on the absolute value and compensating for the possible negation later. Reverse the order of the underflow/overflow tests and the exponential decay calculation to avoid the possibility of an erroneous overflow detection if p is a sufficiently small non-negative value. Also check for negative values of prob before doing the exponential decay to avoid another instance of of right shifting a negative value. Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-05-19 01:23:06 +00:00
Kristof Provost	468cefa22e	pf: Fix vnet initialisation When running the vnet init code (pf_load_vnet()) we used to iterate over all vnets, marking them as unhooked. This is incorrect and leads to panics if pf is unloaded, as the unload code does not unregister the pfil hooks (because the vnet is marked as unhooked). There's no need or reason to touch other vnets during initialisation. Their pf_load_vnet() function will be triggered, which handles all required initialisation. Reviewed by: zec, gnn Differential Revision: https://reviews.freebsd.org/D10592	2017-05-07 14:33:58 +00:00
Kristof Provost	64c79ee733	pf: Fix panic on unload vnet_pf_uninit() is called through vnet_deregister_sysuninit() and linker_file_unload() when the pf module is unloaded. This is executed after pf_unload() so we end up trying to take locks which have been destroyed already. Move pf_unload() to a separate SYSUNINIT() to ensure it's called after all the vnet_pf_uninit() calls. Differential Revision: https://reviews.freebsd.org/D10025	2017-05-03 20:56:54 +00:00
Marko Zec	1e9e374199	Fix VNET leakages in PF by V_irtualizing pfr_ktables and friends. Apparently this resolves a PF-triggered panic when destroying VNET jails. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:34:39 +00:00
Marko Zec	3a36ee404f	Since curvnet is already properly set on entry to event handlers, there's no need to override it, particularly not unconditionally with vnet0. Submitted by: Peter Blok <peter.blok@bsd4all.org> Reviewed by: kp	2017-04-25 08:30:28 +00:00
Kristof Provost	00eab743ab	pf: Fix possible incorrect IPv6 fragmentation When forwarding pf tracks the size of the largest fragment in a fragmented packet, and refragments based on this size. It failed to ensure that this size was a multiple of 8 (as is required for all but the last fragment), so it could end up generating incorrect fragments. For example, if we received an 8 byte and 12 byte fragment pf would emit a first fragment with 12 bytes of payload and the final fragment would claim to be at offset 8 (not 12). We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so other users won't make the same mistake. Reported by: Antonios Atlasis <aatlasis at secfu net> MFC after: 3 days	2017-04-20 09:05:53 +00:00
Kristof Provost	4e261006a1	pf: Also clear limit counters The "pfctl -F info" command didn't clear the limit counters ( as shown in the "pfctl -vsi" output). Submitted by: Max <maximos@als.nnov.ru>	2017-04-18 20:07:21 +00:00
Andrey V. Elsukov	da62ffd9cd	Avoid undefined behavior. The 'pktid' variable is modified while being used twice between sequence points, probably due to htonl() is macro. Reported by: PVS-Studio MFC after: 1 week	2017-04-14 11:58:41 +00:00
Andrey V. Elsukov	ba3e1361b0	Use address of specific union member instead of whole union address to fix PVS-Studio warnings. MFC after: 1 week	2017-04-14 11:41:09 +00:00
Andrey V. Elsukov	1ca7c3b815	The rule field in the ipfw_dyn_rule structure is used as storage to pass rule number and rule set to userland. In r272840 the kernel internal rule representation was changed and the rulenum field of struct ip_fw_rule got the type uint32_t, but userlevel representation still have the type uint16_t. To not overflow the size of pointer on the systems with 32-bit pointer size use separate variable to copy rulenum and set. Reported by: PVS-Studio MFC after: 1 week	2017-04-14 11:19:09 +00:00
Gleb Smirnoff	9f5efe718f	Fix potential NULL deref. Found by: PVS Studio	2017-04-14 01:56:15 +00:00
Maxim Konovalov	f91eb6adad	o Redundant assignments removed. Found by: PVS-Stdio, V519 Reviewed by: ae	2017-04-13 18:13:10 +00:00
Conrad Meyer	bcd8d3b805	dummynet: Use strlcpy to appease static checkers Some dummynet modules used strcpy() to copy from a larger buffer (dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that the lengths of the strings in the dn_aqm buffers were always hardcoded to be smaller than the dn_extra_parms buffer ("CODEL", "PIE"). Use strlcpy() instead, to appease static checkers. No functional change. Reported by: Coverity CIDs: 1356163, 1356165 Sponsored by: Dell EMC Isilon	2017-04-13 17:47:44 +00:00
Andrey V. Elsukov	88d950a650	Remove "IPFW static rules" rmlock. Make PFIL's lock global and use it for this purpose. This reduces the number of locks needed to acquire for each packet. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC No objection from: #network Differential Revision: https://reviews.freebsd.org/D10154	2017-04-03 13:35:04 +00:00
Andrey V. Elsukov	aac74aeac7	Add ipfw_pmod kernel module. The module is designed for modification of a packets of any protocols. For now it implements only TCP MSS modification. It adds the external action handler for "tcp-setmss" action. A rule with tcp-setmss action does additional check for protocol and TCP flags. If SYN flag is present, it parses TCP options and modifies MSS option if its value is greater than configured value in the rule. Then it adjustes TCP checksum if needed. After handling the search continues with the next rule. Obtained from: Yandex LLC MFC after: 2 weeks Relnotes: yes Sponsored by: Yandex LLC No objection from: #network Differential Revision: https://reviews.freebsd.org/D10150	2017-04-03 03:07:48 +00:00
Andrey V. Elsukov	11c56650f0	Add O_EXTERNAL_DATA opcode support. This opcode can be used to attach some data to external action opcode. And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require creating of named instance to pass configuration arguments to external action handler. The data is coming just next to O_EXTERNAL_ACTION opcode. The userlevel part currenly supports formatting for opcode with ipfw_insn size, by default it expects u16 numeric value in the arg1. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:44:40 +00:00
Andrey V. Elsukov	399ad57874	Add the log formatting for an external action opcode. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2017-04-03 02:26:30 +00:00
Kristof Provost	3601d25181	pf: Fix leak of pf_state_keys If we hit the state limit we returned from pf_create_state() without cleaning up. PR: 217997 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2017-04-01 12:22:34 +00:00
Andrey V. Elsukov	788e62864f	Reset the cached state of last lookup in the dynamic states when an external action is completed, but the rule search is continued. External action handler can change the content of @args argument, that is used for dynamic state lookup. Enforce the new lookup to be able install new state, when the search is continued. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2017-03-31 09:26:08 +00:00
Kristof Provost	2f8fb3a868	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
Kristof Provost	08ef4ddb0f	pf: Fix rule evaluation after inet6 route-to In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out of a different interface. pf_test6() needs to know that the packet was forwarded (in case it needs to refragment so it knows whether to call ip6_output() or ip6_forward()). This lead pf_test6() to try to evaluate rules against the PF_FWD direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT. Once fwdir is set correctly the correct output/forward function will be called. PR: 217883 Submitted by: Kajetan Staszkiewicz MFC after: 1 week Sponsored by: InnoGames GmbH	2017-03-19 03:06:09 +00:00
Don Lewis	46c8aadb6f	Change several constants used by the PIE algorithm from unsigned to signed. - PIE_MAX_PROB is compared to variable of int64_t and the type promotion rules can cause the value of that variable to be treated as unsigned. If the value is actually negative, then the result of the comparsion is incorrect, causing the algorithm to perform poorly in some situations. Changing the constant to be signed cause the comparision to work correctly. - PIE_SCALE is also compared to signed values. Fortunately they are also compared to zero and negative values are discarded so this is more of a cosmetic fix. - PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small enough that the automatic promotion to unsigned is harmless. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-03-18 23:00:13 +00:00
Kristof Provost	5c172e7059	pf: Fix memory leak on vnet shutdown or unload Rules are unlinked in shutdown_pf(), so we must call pf_unload_vnet_purge(), which frees unlinked rules, after that, not before. Reviewed by: eri, bz Differential Revision: https://reviews.freebsd.org/D10040	2017-03-18 01:37:20 +00:00
Andrey V. Elsukov	3667f39ea3	Use memset with structure size.	2017-03-14 07:57:33 +00:00
Conrad Meyer	49b6a5d60a	nat64lsn: Use memset() with structure, not pointer, size PR: 217738 Submitted by: Svyatoslav <razmyslov at viva64.com> Sponsored by: Viva64 (PVS-Studio)	2017-03-13 17:53:46 +00:00
Kristof Provost	2a57d24bd1	pf: Fix incorrect rw_sleep() in pf_unload() When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep() with it, because it would release a lock we do not hold. There's no need for the lock either, so we can just tsleep(). While here also make the same change in pf_purge_thread(), because it explicitly takes the lock before rw_sleep() and then immediately releases it afterwards.	2017-03-12 05:42:57 +00:00
Kristof Provost	f618201314	pf: Do not lose the VNET lock when ending the purge thread When the pf_purge_thread() exits it must make sure to release the VNET_LIST_RLOCK it still holds. kproc_exit() does not return.	2017-03-12 05:00:04 +00:00
Maxim Konovalov	f621c2cd39	o Typo in the comment fixed. PR: 217617 Submitted by: lutz	2017-03-09 09:54:23 +00:00
Kristof Provost	98a9874f7b	pf: Fix a crash in low-memory situations If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's no more memory for it) it frees skp. This is wrong, because skp is a pf_state_key *, so we need to free skp, as is done later in the function. Getting it wrong means we try to free a stack variable of the calling pf_test_rule() function, and we panic.	2017-03-06 23:41:23 +00:00
Andrey V. Elsukov	53de37f8ca	Fix the build. Use new ipfw_lookup_table() in the nat64 too. Reported by: cy MFC after: 2 weeks	2017-03-06 00:41:59 +00:00
Andrey V. Elsukov	54e5669d8c	Add IPv6 support to O_IP_DST_LOOKUP opcode. o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of ipfw_insn_u32; o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and remove old ipfw_lookup_table(); o use args->f_id.flow_id6 that is in host byte order to get DSCP value; o add SCTP ports support to 'lookup src/dst-port' opcode; o add IPv6 support to 'lookup src/dst-ip' opcode. PR: 217292 Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9873	2017-03-05 23:48:24 +00:00
Andrey V. Elsukov	c750a56914	Reject invalid object types that can not be used with specific opcodes. When we doing reference counting of named objects in the new rule, for existing objects check that opcode references to correct object, otherwise return EINVAL. PR: 217391 MFC after: 1 week Sponsored by: Yandex LLC	2017-03-05 22:19:43 +00:00
Andrey V. Elsukov	43b294a4db	Fix matching table entry value. Use real table value instead of its index in valuestate array. When opcode has size equal to ipfw_insn_u32, this means that it should additionally match value specified in d[0] with table entry value. ipfw_table_lookup() returns table value index, use TARG_VAL() macro to convert it to its value. The actual 32-bit value stored in the tag field of table_value structure, where all unspecified u32 values are kept. PR: 217262 Reviewed by: melifaro MFC after: 1 week Sponsored by: Yandex LLC	2017-03-03 20:22:42 +00:00
Andrey V. Elsukov	576429f04b	Fix NPTv6 rule counters when one_pass is not enabled. Consider the rule matching when both @done and @retval values returned from ipfw_run_eaction() are zero. And modify ipfw_nptv6() to return IP_FW_DENY and @done=0 when addresses do not match. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2017-03-01 20:00:19 +00:00
Pedro F. Giffuni	e099b90b80	sys: Replace zero with NULL for pointers. Found with: devel/coccinelle MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9694	2017-02-22 02:35:59 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Eric van Gyzen	643faabe0d	pf: use inet_ntoa_r() instead of inet_ntoa(); maybe fix IPv6 OS fingerprinting inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. This code had an INET6 conditional before this commit, but opt_inet6.h was not included, so INET6 was never defined. Apparently, pf's OS fingerprinting hasn't worked with IPv6 for quite some time. This commit might fix it, but I didn't test that. Reviewed by: gnn, kp MFC after: 2 weeks Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6) Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:44:44 +00:00
Enji Cooper	bc64f428ad	Fix typos in comments (returing -> returning) MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-02-07 00:09:48 +00:00
Gleb Smirnoff	164aa3ce5e	Fix indentantion in pf_purge_thread(). No functional change.	2017-01-30 22:47:48 +00:00
Luiz Otavio O Souza	a5c1a50a26	Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-29 02:17:52 +00:00
Andrey V. Elsukov	ce3a6cf06a	Initialize IPFW static rules rmlock with RM_RECURSE flag. This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock doesn't allow recursion on rm_rlock(), so at this time fix this with RM_RECURSE flag. Later we need to change ipfw to avoid such recursions. PR: 216171 Reported by: Eugene Grosbein MFC after: 1 week	2017-01-17 10:50:28 +00:00
Marius Strobl	0ac43d9728	In dummynet(4), random chunks of memory are casted to struct dn_, potentially leading to fatal unaligned accesses on architectures with strict alignment requirements. This change fixes dummynet(4) as far as accesses to 64-bit members of struct dn_ are concerned, tripping up on sparc64 with accesses to 32-bit members happening to be correctly aligned there. In other words, this only fixes the tip of the iceberg; larger parts of dummynet(4) still need to be rewritten in order to properly work on all of !x86. In principle, considering the amount of code in dummynet(4) that needs this erroneous pattern corrected, an acceptable workaround would be to declare all struct dn_* packed, forcing compilers to do byte-accesses as a side-effect. However, given that the structs in question aren't laid out well either, this would break ABI/KBI. While at it, replace all existing bcopy(9) calls with memcpy(9) for performance reasons, as there is no need to check for overlap in these cases. PR: 189219 MFC after: 5 days	2017-01-09 20:51:51 +00:00
Marcel Moolenaar	aa8c6a6dca	Improve upon r309394 Instead of taking an extra reference to deal with pfsync_q_ins() and pfsync_q_del() taken and dropping a reference (resp,) make it optional of those functions to take or drop a reference by passing an extra argument. Submitted by: glebius@	2016-12-10 03:31:38 +00:00
Gleb Smirnoff	296d65b7a9	Backout accidentially leaked in r309746 not yet reviewed patch :(	2016-12-09 18:00:45 +00:00
Gleb Smirnoff	3cbee8caa1	Use counter_ratecheck() in the ICMP rate limiting. Together with: rrs, jtl	2016-12-09 17:59:15 +00:00
Andrey V. Elsukov	02784f106e	Convert result of hash_packet6() into host byte order. For IPv4 similar function uses addresses and ports in host byte order, but for IPv6 it used network byte order. This led to very bad hash distribution for IPv6 flows. Now the result looks similar to IPv4. Reported by: olivier MFC after: 1 week Sponsored by: Yandex LLC	2016-12-06 23:52:56 +00:00
Kristof Provost	c3e14afc18	pflog: Correctly initialise subrulenr subrulenr is considered unset if it's set to -1, not if it's set to 1. See contrib/tcpdump/print-pflog.c pflog_print() for a user. This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0): rule 0..16777216(match) instead of the correct output of rule 0/0(match) PR: 214832 Submitted by: andywhite@gmail.com	2016-12-05 21:52:10 +00:00
Marcel Moolenaar	d6d35f1561	Fix use-after-free bugs in pfsync(4) Use after free happens for state that is deleted. The reference count is what prevents the state from being freed. When the state is dequeued, the reference count is dropped and the memory freed. We can't dereference the next pointer or re-queue the state. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8671	2016-12-02 06:15:59 +00:00
Andrey V. Elsukov	c5f2dbb625	Fix ICMPv6 Time Exceeded error message translation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-11-26 10:04:05 +00:00
Luiz Otavio O Souza	e40145851b	Remove the mbuf tag after use (for reinjected packets). Fixes the packet processing in dummynet l2 rules. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-11-03 00:26:58 +00:00
Luiz Otavio O Souza	3e80a649fb	Stop abusing from struct ifnet presence to determine the packet direction for dummynet, use the correct argument for that, remove the false coment about the presence of struct ifnet. Fixes the input match of dummynet l2 rules. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-11-01 18:42:44 +00:00
Andrey V. Elsukov	308f2c6d56	Fix `ipfw table lookup` handler to return entry value, but not its index. Submitted by: loos MFC after: 1 week	2016-10-19 11:51:17 +00:00
Kristof Provost	1f4955785d	pf: port extended DSCP support from OpenBSD Ignore the ECN bits on 'tos' and 'set-tos' and allow to use DCSP names instead of having to embed their TOS equivalents as plain numbers. Obtained from: OpenBSD Sponsored by: OPNsense Differential Revision: https://reviews.freebsd.org/D8165	2016-10-13 20:34:44 +00:00

... 2 3 4 5 6 ...

794 Commits