freebsd-dev

Author	SHA1	Message	Date
Kristof Provost	2f8fb3a868	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
Kristof Provost	08ef4ddb0f	pf: Fix rule evaluation after inet6 route-to In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out of a different interface. pf_test6() needs to know that the packet was forwarded (in case it needs to refragment so it knows whether to call ip6_output() or ip6_forward()). This lead pf_test6() to try to evaluate rules against the PF_FWD direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT. Once fwdir is set correctly the correct output/forward function will be called. PR: 217883 Submitted by: Kajetan Staszkiewicz MFC after: 1 week Sponsored by: InnoGames GmbH	2017-03-19 03:06:09 +00:00
Don Lewis	46c8aadb6f	Change several constants used by the PIE algorithm from unsigned to signed. - PIE_MAX_PROB is compared to variable of int64_t and the type promotion rules can cause the value of that variable to be treated as unsigned. If the value is actually negative, then the result of the comparsion is incorrect, causing the algorithm to perform poorly in some situations. Changing the constant to be signed cause the comparision to work correctly. - PIE_SCALE is also compared to signed values. Fortunately they are also compared to zero and negative values are discarded so this is more of a cosmetic fix. - PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small enough that the automatic promotion to unsigned is harmless. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 1 week	2017-03-18 23:00:13 +00:00
Kristof Provost	5c172e7059	pf: Fix memory leak on vnet shutdown or unload Rules are unlinked in shutdown_pf(), so we must call pf_unload_vnet_purge(), which frees unlinked rules, after that, not before. Reviewed by: eri, bz Differential Revision: https://reviews.freebsd.org/D10040	2017-03-18 01:37:20 +00:00
Andrey V. Elsukov	3667f39ea3	Use memset with structure size.	2017-03-14 07:57:33 +00:00
Conrad Meyer	49b6a5d60a	nat64lsn: Use memset() with structure, not pointer, size PR: 217738 Submitted by: Svyatoslav <razmyslov at viva64.com> Sponsored by: Viva64 (PVS-Studio)	2017-03-13 17:53:46 +00:00
Kristof Provost	2a57d24bd1	pf: Fix incorrect rw_sleep() in pf_unload() When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep() with it, because it would release a lock we do not hold. There's no need for the lock either, so we can just tsleep(). While here also make the same change in pf_purge_thread(), because it explicitly takes the lock before rw_sleep() and then immediately releases it afterwards.	2017-03-12 05:42:57 +00:00
Kristof Provost	f618201314	pf: Do not lose the VNET lock when ending the purge thread When the pf_purge_thread() exits it must make sure to release the VNET_LIST_RLOCK it still holds. kproc_exit() does not return.	2017-03-12 05:00:04 +00:00
Maxim Konovalov	f621c2cd39	o Typo in the comment fixed. PR: 217617 Submitted by: lutz	2017-03-09 09:54:23 +00:00
Kristof Provost	98a9874f7b	pf: Fix a crash in low-memory situations If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's no more memory for it) it frees skp. This is wrong, because skp is a pf_state_key *, so we need to free skp, as is done later in the function. Getting it wrong means we try to free a stack variable of the calling pf_test_rule() function, and we panic.	2017-03-06 23:41:23 +00:00
Andrey V. Elsukov	53de37f8ca	Fix the build. Use new ipfw_lookup_table() in the nat64 too. Reported by: cy MFC after: 2 weeks	2017-03-06 00:41:59 +00:00
Andrey V. Elsukov	54e5669d8c	Add IPv6 support to O_IP_DST_LOOKUP opcode. o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of ipfw_insn_u32; o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and remove old ipfw_lookup_table(); o use args->f_id.flow_id6 that is in host byte order to get DSCP value; o add SCTP ports support to 'lookup src/dst-port' opcode; o add IPv6 support to 'lookup src/dst-ip' opcode. PR: 217292 Reviewed by: melifaro MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9873	2017-03-05 23:48:24 +00:00
Andrey V. Elsukov	c750a56914	Reject invalid object types that can not be used with specific opcodes. When we doing reference counting of named objects in the new rule, for existing objects check that opcode references to correct object, otherwise return EINVAL. PR: 217391 MFC after: 1 week Sponsored by: Yandex LLC	2017-03-05 22:19:43 +00:00
Andrey V. Elsukov	43b294a4db	Fix matching table entry value. Use real table value instead of its index in valuestate array. When opcode has size equal to ipfw_insn_u32, this means that it should additionally match value specified in d[0] with table entry value. ipfw_table_lookup() returns table value index, use TARG_VAL() macro to convert it to its value. The actual 32-bit value stored in the tag field of table_value structure, where all unspecified u32 values are kept. PR: 217262 Reviewed by: melifaro MFC after: 1 week Sponsored by: Yandex LLC	2017-03-03 20:22:42 +00:00
Andrey V. Elsukov	576429f04b	Fix NPTv6 rule counters when one_pass is not enabled. Consider the rule matching when both @done and @retval values returned from ipfw_run_eaction() are zero. And modify ipfw_nptv6() to return IP_FW_DENY and @done=0 when addresses do not match. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2017-03-01 20:00:19 +00:00
Pedro F. Giffuni	e099b90b80	sys: Replace zero with NULL for pointers. Found with: devel/coccinelle MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9694	2017-02-22 02:35:59 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Eric van Gyzen	643faabe0d	pf: use inet_ntoa_r() instead of inet_ntoa(); maybe fix IPv6 OS fingerprinting inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. This code had an INET6 conditional before this commit, but opt_inet6.h was not included, so INET6 was never defined. Apparently, pf's OS fingerprinting hasn't worked with IPv6 for quite some time. This commit might fix it, but I didn't test that. Reviewed by: gnn, kp MFC after: 2 weeks Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6) Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:44:44 +00:00
Enji Cooper	bc64f428ad	Fix typos in comments (returing -> returning) MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-02-07 00:09:48 +00:00
Gleb Smirnoff	164aa3ce5e	Fix indentantion in pf_purge_thread(). No functional change.	2017-01-30 22:47:48 +00:00
Luiz Otavio O Souza	a5c1a50a26	Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-29 02:17:52 +00:00
Andrey V. Elsukov	ce3a6cf06a	Initialize IPFW static rules rmlock with RM_RECURSE flag. This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock doesn't allow recursion on rm_rlock(), so at this time fix this with RM_RECURSE flag. Later we need to change ipfw to avoid such recursions. PR: 216171 Reported by: Eugene Grosbein MFC after: 1 week	2017-01-17 10:50:28 +00:00
Marius Strobl	0ac43d9728	In dummynet(4), random chunks of memory are casted to struct dn_, potentially leading to fatal unaligned accesses on architectures with strict alignment requirements. This change fixes dummynet(4) as far as accesses to 64-bit members of struct dn_ are concerned, tripping up on sparc64 with accesses to 32-bit members happening to be correctly aligned there. In other words, this only fixes the tip of the iceberg; larger parts of dummynet(4) still need to be rewritten in order to properly work on all of !x86. In principle, considering the amount of code in dummynet(4) that needs this erroneous pattern corrected, an acceptable workaround would be to declare all struct dn_* packed, forcing compilers to do byte-accesses as a side-effect. However, given that the structs in question aren't laid out well either, this would break ABI/KBI. While at it, replace all existing bcopy(9) calls with memcpy(9) for performance reasons, as there is no need to check for overlap in these cases. PR: 189219 MFC after: 5 days	2017-01-09 20:51:51 +00:00
Marcel Moolenaar	aa8c6a6dca	Improve upon r309394 Instead of taking an extra reference to deal with pfsync_q_ins() and pfsync_q_del() taken and dropping a reference (resp,) make it optional of those functions to take or drop a reference by passing an extra argument. Submitted by: glebius@	2016-12-10 03:31:38 +00:00
Gleb Smirnoff	296d65b7a9	Backout accidentially leaked in r309746 not yet reviewed patch :(	2016-12-09 18:00:45 +00:00
Gleb Smirnoff	3cbee8caa1	Use counter_ratecheck() in the ICMP rate limiting. Together with: rrs, jtl	2016-12-09 17:59:15 +00:00
Andrey V. Elsukov	02784f106e	Convert result of hash_packet6() into host byte order. For IPv4 similar function uses addresses and ports in host byte order, but for IPv6 it used network byte order. This led to very bad hash distribution for IPv6 flows. Now the result looks similar to IPv4. Reported by: olivier MFC after: 1 week Sponsored by: Yandex LLC	2016-12-06 23:52:56 +00:00
Kristof Provost	c3e14afc18	pflog: Correctly initialise subrulenr subrulenr is considered unset if it's set to -1, not if it's set to 1. See contrib/tcpdump/print-pflog.c pflog_print() for a user. This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0): rule 0..16777216(match) instead of the correct output of rule 0/0(match) PR: 214832 Submitted by: andywhite@gmail.com	2016-12-05 21:52:10 +00:00
Marcel Moolenaar	d6d35f1561	Fix use-after-free bugs in pfsync(4) Use after free happens for state that is deleted. The reference count is what prevents the state from being freed. When the state is dequeued, the reference count is dropped and the memory freed. We can't dereference the next pointer or re-queue the state. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D8671	2016-12-02 06:15:59 +00:00
Andrey V. Elsukov	c5f2dbb625	Fix ICMPv6 Time Exceeded error message translation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-11-26 10:04:05 +00:00
Luiz Otavio O Souza	e40145851b	Remove the mbuf tag after use (for reinjected packets). Fixes the packet processing in dummynet l2 rules. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-11-03 00:26:58 +00:00
Luiz Otavio O Souza	3e80a649fb	Stop abusing from struct ifnet presence to determine the packet direction for dummynet, use the correct argument for that, remove the false coment about the presence of struct ifnet. Fixes the input match of dummynet l2 rules. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-11-01 18:42:44 +00:00
Andrey V. Elsukov	308f2c6d56	Fix `ipfw table lookup` handler to return entry value, but not its index. Submitted by: loos MFC after: 1 week	2016-10-19 11:51:17 +00:00
Kristof Provost	1f4955785d	pf: port extended DSCP support from OpenBSD Ignore the ECN bits on 'tos' and 'set-tos' and allow to use DCSP names instead of having to embed their TOS equivalents as plain numbers. Obtained from: OpenBSD Sponsored by: OPNsense Differential Revision: https://reviews.freebsd.org/D8165	2016-10-13 20:34:44 +00:00
Kristof Provost	813196a11a	pf: remove fastroute tag The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code allows to skip the in pfil hooks and completely removes the out pfil invoke, albeit looking up a route that the IP stack will likely find on its own. The code between IPv4 and IPv6 is also inconsistent and marked as "XXX" for years. Submitted by: Franco Fichtner <franco@opnsense.org> Differential Revision: https://reviews.freebsd.org/D8058	2016-10-04 19:35:14 +00:00
Kevin Lo	c7641cd18d	Remove ifa_list, use ifa_link (structure field) instead. While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in sctp_bsd_addr.c Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8051	2016-09-28 13:29:11 +00:00
Andrey V. Elsukov	0d9cbb874c	Move opcode rewriter init and destroy handlers into non-VENT code. PR: 212576,212649,212077 Submitted by: John Zielinski MFC after: 1 week	2016-09-18 17:35:17 +00:00
Andrey V. Elsukov	70c1466dad	Fix swap tables between sets when this functional is enabled. We have 6 opcode rewriters for table opcodes. When `set swap' command invoked, it is called for each rewriter, so at the end we get the same result, because opcode rewriter uses ETLV type to match opcode. And all tables opcodes have the same ETLV type. To solve this problem, use separate sets handler for one opcode rewriter. Use it to handle TEST_ALL, SWAP_ALL and MOVE_ALL commands. PR: 212630 MFC after: 1 week	2016-09-13 18:16:15 +00:00
Bjoern A. Zeeb	db68f7839f	Try to fix gcc compilation errors (which are right). nat64_getlasthdr() returns an int, which can be -1 in case of error, storing the result in an uint8_t and then comparing to < 0 is not helpful. Do what is done in the rest of the code and make proto an int here as well.	2016-08-18 10:26:15 +00:00
Oleg Bulyzhin	e7560c836f	Fix command: ipfw set (enable\|disable) N (where N > 4). enable_sets() expects set bitmasks, not set numbers. MFC after: 3 days	2016-08-15 13:06:29 +00:00
Kristof Provost	0df377cbb8	pf: Add missing byte-order swap to pf_match_addr_range Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not match addresses correctly on little-endian systems. PR: 211796 Obtained from: OpenBSD (sthen) MFC after: 3 days	2016-08-15 12:13:14 +00:00
Andrey V. Elsukov	ecd3637584	Use %ju to print unsigned 64-bit value. Reported by: kib	2016-08-13 22:14:16 +00:00
Andrey V. Elsukov	57fb3b7a78	Add `stats reset` command implementation to NPTv6 module to be able reset statistics counters. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-08-13 16:45:14 +00:00
Andrey V. Elsukov	c402a01b03	Replace __noinline with special debug macro NAT64NOINLINE.	2016-08-13 16:26:15 +00:00
Andrey V. Elsukov	d8caf56e9e	Add ipfw_nat64 module that implements stateless and stateful NAT64. The module works together with ipfw(4) and implemented as its external action module. Stateless NAT64 registers external action with name nat64stl. This keyword should be used to create NAT64 instance and to address this instance in rules. Stateless NAT64 uses two lookup tables with mapped IPv4->IPv6 and IPv6->IPv4 addresses to perform translation. A configuration of instance should looks like this: 1. Create lookup tables: # ipfw table T46 create type addr valtype ipv6 # ipfw table T64 create type addr valtype ipv4 2. Fill T46 and T64 tables. 3. Add rule to allow neighbor solicitation and advertisement: # ipfw add allow icmp6 from any to any icmp6types 135,136 4. Create NAT64 instance: # ipfw nat64stl NAT create table4 T46 table6 T64 5. Add rules that matches the traffic: # ipfw add nat64stl NAT ip from any to table(T46) # ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96 6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96 via NAT64 host. Stateful NAT64 registers external action with name nat64lsn. The only one option required to create nat64lsn instance - prefix4. It defines the pool of IPv4 addresses used for translation. A configuration of instance should looks like this: 1. Add rule to allow neighbor solicitation and advertisement: # ipfw add allow icmp6 from any to any icmp6types 135,136 2. Create NAT64 instance: # ipfw nat64lsn NAT create prefix4 A.B.C.D/28 3. Add rules that matches the traffic: # ipfw add nat64lsn NAT ip from any to A.B.C.D/28 # ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96 4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96 via NAT64 host. Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D6434	2016-08-13 16:09:49 +00:00
Andrey V. Elsukov	6951cecf71	Add three helper function to manage tables from external modules. ipfw_objhash_lookup_table_kidx does lookup kernel index of table; ipfw_ref_table/ipfw_unref_table takes and releases reference to table. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-08-13 15:48:56 +00:00
Andrey V. Elsukov	56132dcc0d	Move logging via BPF support into separate file. * make interface cloner VNET-aware; * simplify cloner code and use if_clone_simple(); * migrate LOGIF_LOCK() to rmlock; * add ipfw_bpf_mtap2() function to pass mbuf to BPF; * introduce new additional ipfwlog0 pseudo interface. It differs from ipfw0 by DLT type used in bpfattach. This interface is intended to used by ipfw modules to dump packets with additional info attached. Currently pflog format is used. ipfw_bpf_mtap2() function uses second argument to determine which interface use for dumping. If dlen is equal to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to PFLOG_HDRLEN - ipfwlog0 will be used. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-08-13 15:41:04 +00:00
Andrey V. Elsukov	d6eb9b0249	Restore "nat global" support. Now zero value of arg1 used to specify "tablearg", use the old "tablearg" value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace hardcoded magic number to specify "nat global". Also replace 65535 magic number with corresponding macro. Fix typo in comments. PR: 211256 Tested by: Victor Chernov MFC after: 3 days	2016-08-11 10:10:10 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Andrey V. Elsukov	ed22e564b8	Add named dynamic states support to ipfw(4). The keep-state, limit and check-state now will have additional argument flowname. This flowname will be assigned to dynamic rule by keep-state or limit opcode. And then can be matched by check-state opcode or O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize compatibility with old rulesets default flowname introduced. It will be assigned to the rules when user has omitted state name in keep-state and check-state opcodes. Also if name is ambiguous (can be evaluated as rule opcode) it will be replaced to default. Reviewed by: julian Obtained from: Yandex LLC MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D6674	2016-07-19 04:56:59 +00:00
Andrey V. Elsukov	b867e84e95	Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6 as defined in RFC 6296. The module works together with ipfw(4) and implemented as its external action module. When it is loaded, it registers as eaction and can be used in rules. The usage pattern is similar to ipfw_nat(4). All matched by rule traffic goes to the NPT module. Reviewed by: hrs Obtained from: Yandex LLC MFC after: 1 month Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D6420	2016-07-18 19:46:31 +00:00
Don Lewis	98e82c02e5	Fix problems in the FQ-PIE AQM cleanup code that could leak memory or cause a crash. Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup() is not able to use callout_drain() to make sure that all callouts are finished before it returns, and callout_stop() is not sufficient to make that guarantee. After pie_cleanup() returns, dummynet will free a structure that any remaining callouts will want to access. Fix these problems by allocating a separate structure to contain the data used by the callouts. In pie_cleanup(), call callout_reset_sbt() to replace the normal callout with a cleanup callout that does the cleanup work for each sub-queue. The instance of the cleanup callout that destroys the last flow will also free the extra allocated block of memory. Protect the reference count manipulation in the cleanup callout with DN_BH_WLOCK() to be consistent with all of the other usage of the reference count where this lock is held by the dummynet code. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7174	2016-07-12 17:32:40 +00:00
Kristof Provost	aa7cac58c6	pf: Map hook returns onto the correct error values pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers expect to get E<foo> error codes. Map the returns values. A pass is 0 (everything is OK), anything else means pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP error message. PR: 207598	2016-07-09 12:17:01 +00:00
Don Lewis	12be18c7d5	Fix a race condition between the main thread in aqm_pie_cleanup() and the callout thread that can cause a kernel panic. Always do the final cleanup in the callout thread by passing a separate callout function for that task to callout_reset_sbt(). Protect the ref_count decrement in the callout with DN_BH_WLOCK(). All other ref_count manipulation is protected with this lock. There is still a tiny window between ref_count reaching zero and the end of the callout function where it is unsafe to unload the module. Fixing this would require the use of callout_drain(), but this can't be done because dummynet holds a mutex and callout_drain() might sleep. Remove the callout_pending(), callout_active(), and callout_deactivate() calls from calculate_drop_prob(). They are not needed because this callout uses callout_init_mtx(). Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> Approved by: re (gjb) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D6928	2016-07-05 00:53:01 +00:00
Bjoern A. Zeeb	9ac51e7911	In case of the global eventhandler make sure the current VNET is still operational before doing any work; otherwise we might run into, e.g., destroyed locks. PR: 210724 Reported by: olevole olevole.ru Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Obtained from: projects/vnet Approved by: re (gjb)	2016-06-30 19:32:45 +00:00
Bjoern A. Zeeb	31fe4e62fa	Move the ipfw_log_bpf() calls from global module initialisation to per-VNET initialisation and virtualise the interface cloning to allow a dedicated ipfw log interface per VNET. Approved by: re (gjb) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-30 01:33:14 +00:00
Bjoern A. Zeeb	a8fc1b786d	The void isn't void. Unbreak sparc64 and powerpc builds. Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 12 days	2016-06-24 11:53:12 +00:00
Bjoern A. Zeeb	66c00e9efb	Proerply virtualize pfsync for bringup after pf is initialized and teardown of VNETs once pf(4) has been shut down. Properly split resources into VNET_SYS(UN)INITs and one time module loading. While here cover the INET parts in the uninit callpath with proper #ifdefs. Approved by: re (gjb) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-23 22:31:44 +00:00
Bjoern A. Zeeb	7d7751a071	Make sure pflog is attached after pf is initializaed so we can borrow pf's lock, and also make sure pflog goes after pf is gone in order to avoid callouts in VNETs to an already freed instance. Reported by: Ivan Klymenko, Johan Hendriks on current@ today Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation MFC after: 13 days Approved by: re (gjb)	2016-06-23 22:31:10 +00:00
Bjoern A. Zeeb	a8e8c57443	PFSTATE_NOSYNC goes onto state_flags, not sync_state; this prevents: panic: pfsync_delete_state: unexpected sync state 8 Reviewed by: kp Approved by: re (gjb) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6942	2016-06-23 21:42:43 +00:00
Bjoern A. Zeeb	a0429b5459	Update pf(4) and pflog(4) to survive basic VNET testing, which includes proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924	2016-06-23 21:34:38 +00:00
Bjoern A. Zeeb	8147948e19	Import a fix for and old security issue (CVE-2010-3830) in pf which was not relevant to FreeBSD as only root could open /dev/pf by default. With VIMAGE this is will longer be the case. As pf(4) starts to be supported with VNETs 3rd party users may open /dev/pf inside the virtual jail instance; thus we need to address this issue after all. While OpenBSD largely rewrote code parts for the fix [1], and it's unclear what Apple [3] did, import the minimal fix from NetBSD [2]. [1] http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_ioctl.c.diff?r1=1.235&r2=1.236 [2] http://mail-index.netbsd.org/source-changes/2011/01/19/msg017518.html [3] https://support.apple.com/en-gb/HT202154 Obtained from: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dist/pf/net/pf_ioctl.c.diff?r1=1.42&r2=1.43&only_with_tag=MAIN MFC After: 2 weeks Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Security: CVE-2010-3830	2016-06-23 05:41:46 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Kristof Provost	3e248e0fb4	pf: Filter on and set vlan PCP values Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786	2016-06-17 18:21:55 +00:00
Alexander V. Chernikov	37aefa2ad1	Fix 4-byte overflow in ipv6_writemask. This bug could cause some IPv6 table prefix delete requests to fail. Obtained from: Yandex LLC	2016-06-05 10:33:53 +00:00
Don Lewis	d673654796	Replace constant expressions that contain multiplications by fractional floating point values with integer divides. This will eliminate any chance that the compiler will generate code to evaluate the expression using floating point at runtime. Suggested by: bde Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 8 days (with r300779 and r300949)	2016-06-01 20:04:24 +00:00
Don Lewis	fe4b5f6659	Cast some expressions that multiply a long long constant by a floating point constant to int64_t. This avoids the runtime conversion of the the other operand in a set of comparisons from int64_t to floating point and doing the comparisions in floating point. Suggested by: lidl Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> MFC after: 2 weeks (with r300779)	2016-05-29 07:23:56 +00:00
Don Lewis	248c72bfb8	Correct a typo in a comment. MFC after: 2 weeks (with r300779)	2016-05-26 22:03:28 +00:00
Don Lewis	4e59799e1b	Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak its expression to work on powerpc and sparc64 (gcc compatibility). Correct a typo in a nearby comment. MFC after: 2 weeks (with r300779)	2016-05-26 21:44:52 +00:00
Don Lewis	91336b403a	Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE). Centre for Advanced Internet Architectures Implementing AQM in FreeBSD * Overview <http://caia.swin.edu.au/freebsd/aqm/index.html> * Articles, Papers and Presentations <http://caia.swin.edu.au/freebsd/aqm/papers.html> * Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html> Overview Recent years have seen a resurgence of interest in better managing the depth of bottleneck queues in routers, switches and other places that get congested. Solutions include transport protocol enhancements at the end-hosts (such as delay-based or hybrid congestion control schemes) and active queue management (AQM) schemes applied within bottleneck queues. The notion of AQM has been around since at least the late 1990s (e.g. RFC 2309). In recent years the proliferation of oversized buffers in all sorts of network devices (aka bufferbloat) has stimulated keen community interest in four new AQM schemes -- CoDel, FQ-CoDel, PIE and FQ-PIE. The IETF AQM working group is looking to document these schemes, and independent implementations are a corner-stone of the IETF's process for confirming the clarity of publicly available protocol descriptions. While significant development work on all three schemes has occured in the Linux kernel, there is very little in FreeBSD. Project Goals This project began in late 2015, and aims to design and implement functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE in FreeBSD (with code BSD-licensed as much as practical). We have chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall and traffic shaper. Implementation of these AQM schemes in FreeBSD will: * Demonstrate whether the publicly available documentation is sufficient to enable independent, functionally equivalent implementations * Provide a broader suite of AQM options for sections the networking community that rely on FreeBSD platforms Program Members: * Rasool Al Saadi (developer) * Grenville Armitage (project lead) Acknowledgements: This project has been made possible in part by a gift from the Comcast Innovation Fund. Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au> X-No objection: core MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6388	2016-05-26 21:40:13 +00:00
Kristof Provost	b599e8dc59	pf: Fix more ICMP mistranslation In the default case fix the substitution of the destination address. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2016-05-23 13:59:48 +00:00
Kristof Provost	c0c82715b8	pf: Fix ICMP translation Fix ICMP source address rewriting in rdr scenarios. PR: 201519 Submitted by: Max <maximos@als.nnov.ru> MFC after: 1 week	2016-05-23 12:41:29 +00:00
Kristof Provost	d9f4fce5a7	pf: Fix fragment timeout We were inconsistent about the use of time_second vs. time_uptime. Always use time_uptime so the value can be meaningfully compared. Submitted by: "Max" <maximos@als.nnov.ru> MFC after: 4 days	2016-05-20 15:41:05 +00:00
Andrey V. Elsukov	d16f495cad	Fix the regression introduced in r300143. When we are creating new dynamic state use MATCH_FORWARD direction to correctly initialize protocol's state.	2016-05-20 15:00:12 +00:00
Andrey V. Elsukov	96e84c57e1	Move protocol state handling code from lookup_dyn_rule_locked() function into dyn_update_proto_state(). This allows eliminate the second state lookup in the ipfw_install_state(). Also remove MATCH_* macros, they are defined in ip_fw_private.h as enum. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-18 12:53:21 +00:00
Andrey V. Elsukov	2685841b38	Make named objects set-aware. Now it is possible to create named objects with the same name in different sets. Add optional manage_sets() callback to objects rewriting framework. It is intended to implement handler for moving and swapping named object's sets. Add ipfw_obj_manage_sets() function that implements generic sets handler. Use new callback to implement sets support for lookup tables. External actions objects are global and they don't support sets. Modify eaction_findbyname() to reflect this. ipfw(8) now may fail to move rules or sets, because some named objects in target set may have conflicting names. Note that ipfw_obj_ntlv type was changed, but since lookup tables actually didn't support sets, this change is harmless. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-17 07:47:23 +00:00
Andrey V. Elsukov	9f2e5ed3cc	Fix memory leak possible in error case. Use free_rule() instead of free(), it will also release memory allocated for rule counters. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-11 10:04:32 +00:00
Andrey V. Elsukov	b309f085e0	Change the type of objhash_cb_t callback function to be able return an error code. Use it to interrupt the loop in ipfw_objhash_foreach(). Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-06 03:18:51 +00:00
Andrey V. Elsukov	2df1a11ffa	Rename find_name_tlv_type() to ipfw_find_name_tlv_type() and make it global. Use it in ip_fw_table.c instead of find_name_tlv() to reduce duplicated code. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-05-05 20:15:46 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Andrey V. Elsukov	9a5be809ab	Make create_object callback optional and return EOPNOTSUPP when it isn't defined. Remove eaction_create_compat() and use designated initializers to initialize eaction_opcodes structure. Obtained from: Yandex LLC	2016-04-27 15:28:25 +00:00
Pedro F. Giffuni	7a6ab8f19e	netpfil: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae	2016-04-15 12:24:01 +00:00
Andrey V. Elsukov	2acdf79f53	Add External Actions KPI to ipfw(9). It allows implementing loadable kernel modules with new actions and without needing to modify kernel headers and ipfw(8). The module registers its action handler and keyword string, that will be used as action name. Using generic syntax user can add rules with this action. Also ipfw(8) can be easily modified to extend basic syntax for external actions, that become a part base system. Sample modules will coming soon. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-04-14 22:51:23 +00:00
Andrey V. Elsukov	4bd916567e	Change the type of 'etlv' field in struct named_object to uint16_t. It should match with the type field in struct ipfw_obj_tlv. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-04-14 21:52:31 +00:00
Andrey V. Elsukov	f8e26ca319	Adjust some comments and make ref_opcode_object() static.	2016-04-14 21:45:18 +00:00
Andrey V. Elsukov	b2df1f7ea1	o Teach opcode rewriting framework handle several rewriters for the same opcode. o Reduce number of times classifier callback is called. It is redundant to call it just after find_op_rw(), since the last does call it already and can have all results. o Do immediately opcode rewrite in the ref_opcode_object(). This eliminates additional classifier lookup later on bulk update. For unresolved opcodes the behavior still the same, we save information from classifier callback in the obj_idx array, then perform automatic objects creation, then perform rewriting for opcodes using indeces from created objects. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2016-04-14 21:31:16 +00:00
Andrey V. Elsukov	f976a4edc0	Move several functions related to opcode rewriting framework from ip_fw_table.c into ip_fw_sockopt.c and make them static. Obtained from: Yandex LLC	2016-04-14 20:49:27 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Kristof Provost	0d8c93313e	pf: Improve forwarding detection When we guess the nature of the outbound packet (output vs. forwarding) we need to take bridges into account. When bridging the input interface does not match the output interface, but we're not forwarding. Similarly, it's possible for the interface to actually be the bridge interface itself (and not a member interface). PR: 202351 MFC after: 2 weeks	2016-03-16 06:42:15 +00:00
Andrey V. Elsukov	657592fd65	Use correct size for malloc. Obtained from: Yandex LLC MFC after: 1 week	2016-03-03 13:07:59 +00:00
John Baldwin	cbc4d2db75	Remove taskqueue_enqueue_fast(). taskqueue_enqueue() was changed to support both fast and non-fast taskqueues 10 years ago in r154167. It has been a compat shim ever since. It's time for the compat shim to go. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: sephe Differential Revision: https://reviews.freebsd.org/D5131	2016-03-01 17:47:32 +00:00
Kristof Provost	14b5e85b18	pf: Fix possible out-of-bounds write In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs, which is processed in pfr_set_addrs(). At the users request we also provide feedback on the deleted addresses, by storing them after the new list ('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()). This means we write outside the bounds of the buffer we've just allocated. We need to look at pfrio_size2 instead (i.e. the size the user reserved for our feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than pfrio_size though, in which case we'd still read outside of the allocated buffer. Instead we allocate the largest of the two values. Reported By: Paul J Murphy <paul@inetstat.net> PR: 207463 MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D5426	2016-02-25 07:33:59 +00:00
Andrey V. Elsukov	23a6c7330c	Fix bug in filling and handling ipfw's O_DSCP opcode. Due to integer overflow CS4 token was handled as BE. PR: 207459 MFC after: 1 week	2016-02-24 13:16:03 +00:00
Kristof Provost	c90369f880	in pf_print_state_parts, do not use skw->proto to print the protocol but our local copy proto that we very carefully set beforehands. skw being NULL is perfectly valid there. Obtained from: OpenBSD (henning)	2016-02-20 12:53:53 +00:00
Gleb Smirnoff	cd82d21b2e	Fix obvious typo, that lead to incorrect sorting. Found by: PVS-Studio	2016-02-18 19:05:30 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Luigi Rizzo	1cdc5f0b87	cleanup and document in some detail the internals of the testing code for dummynet schedulers	2016-01-27 02:22:31 +00:00
Luigi Rizzo	ff8d60ab4d	the _Static_assert was not supposed to be in the commit.	2016-01-27 02:14:08 +00:00
Luigi Rizzo	788c0c66ab	bugfix: the scheduler template (dn_schk) for the round robin scheduler is followed by another structure (rr_schk) whose size must be set in the schk_datalen field of the descriptor. Not allocating the memory may cause other memory to be overwritten (though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky and end up in the padding after the dn_schk). This is a merge candidate for stable and 10.3 MFC after: 3 days	2016-01-27 02:08:30 +00:00
Luigi Rizzo	10d72ffc7d	fix various warnings to compile the test code with -Wextra	2016-01-26 23:37:07 +00:00
Luigi Rizzo	fa57c83c70	fix various warnings (signed/unsigned, printf types, unused arguments)	2016-01-26 23:36:18 +00:00
Luigi Rizzo	f6a5c66400	prevent warnings for signed/unsigned comparisons and unused arguments. Add checks for parameters overflowing 32 bit.	2016-01-26 22:46:58 +00:00
Luigi Rizzo	e72cd9a70d	prevent warning for unused argument	2016-01-26 22:45:45 +00:00
Luigi Rizzo	4d85bfeb07	avoid warnings for signed/unsigned comparison and unused arguments	2016-01-26 22:45:05 +00:00
Luigi Rizzo	f51b072d4c	Revert one chunk from commit 285362, which introduced an off-by-one error in computing a shift index. The error was due to the use of mixed fls() / __fls() functions in another implementation of qfq. To avoid that the problem occurs again, properly document which incarnation of the function we need. Note that the bug only affects QFQ in FreeBSD head from last july, as the patch was not merged to other versions.	2016-01-26 04:48:24 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	fa7c058bf8	Fix panic on table/table entry delete. The panic could have happened if more than 64 distinct values had been used. Table value code uses internal objhash API which requires unique key for each object. For value code, pointer to the actual value data is used. The actual problem arises from the fact that 'actual' e.g. runtime data is stored in array and that array is auto-growing. There is special hook (update_tvalue() function) which is used to update the pointers after the change. For some reason, object 'key' was not updated. Fix this by adding update code to the update_tvalue(). Sponsored by: Yandex LLC	2016-01-21 18:20:40 +00:00
Alexander V. Chernikov	89fc126add	Initialize error value ta_lookup_kfib() by default to please compiler.	2016-01-10 08:37:00 +00:00
Bjoern A. Zeeb	60c274aaf8	Initialize error after r293626 in case neither INET nor INET6 is compiled into the kernel. Ideally lots more code would just not be called (or compiled in) in that case but that requires a lot more surgery. For now try to make IP-less kernels compile again.	2016-01-10 08:14:25 +00:00
Alexander V. Chernikov	004d3e30a7	Make ipfw addr:kfib lookup algo use new routing KPI.	2016-01-10 06:43:43 +00:00
Alexander V. Chernikov	3673828490	Use already pre-calculated number of entries instead of tc->count.	2016-01-10 00:28:44 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Alexander V. Chernikov	460a5b502f	Convert pf(4) to the new routing API. Differential Revision: https://reviews.freebsd.org/D4763	2016-01-07 10:20:03 +00:00
Hans Petter Selasky	c8cfbc066f	Properly drain callouts in the IPFW subsystem to avoid use after free panics when unloading the dummynet and IPFW modules: - The callout drain function can sleep and should not be called having a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)". - Add a new "dn_gone" variable to prevent asynchronous restart of dummynet callouts when unloading the dummynet kernel module. - Call "dn_reschedule()" locked so that "dn_gone" can be set and checked atomically with regard to starting a new callout. Reviewed by: hiren MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3855	2015-12-15 09:02:05 +00:00
Alexander V. Chernikov	65ff3638df	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC	2015-12-08 10:50:03 +00:00
Andrey V. Elsukov	1cf09efe5d	Add destroy_object callback to object rewriting framework. It is called when last reference to named object is going to be released and allows to do additional cleanup for implementation of named objects. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-11-23 22:06:55 +00:00
Bryan Drewery	7143303723	Fix dynamic IPv6 rules showing junk for non-specified address masks. For example: 00002 0 0 (19s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0 00002 4 412 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22 00002 10 777 (1s) LIMIT tcp 2001:894:5a24:653::503:1 52023 <-> 2001:894:5a24:653:ca0a:a9ff:fe04:3978 22 00002 0 0 (17s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> 80f3:70d:23fe:ffff:1005:: 0 Fix this by zeroing the unused address, as is done for IPv4: 00002 0 0 (18s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0 00002 36 14952 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22 00002 0 0 (0s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> :: 0 00002 4 345 (274s) LIMIT tcp 2001:894:5a24:653::503:1 34131 <-> 2001:470:1f11:262:ca0a:a9ff:fe04:3978 22 MFC after: 2 weeks	2015-11-17 20:42:08 +00:00
Alexander V. Chernikov	637670e77e	Bring back the ability of passing cached route via nd6_output_ifp().	2015-11-15 16:02:22 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Alexander V. Chernikov	91e93daf9c	Print proper setfib values in ipfw log. Submitted by: Denis Schneider <v1ne2go at gmail>	2015-11-08 13:44:21 +00:00
Alexander V. Chernikov	b554a27822	Fix setfib target. Problem was introduced in r272840 when converting tablearg value to 0. Submitted by: Denis Schneider <v1ne2go at gmail>	2015-11-08 12:24:19 +00:00
Kristof Provost	5a505b317a	pf: Fix broken rule skip calculation r289932 accidentally broke the rule skip calculation. The address family argument to PF_ANEQ() is now important, and because it was set to 0 the macro always evaluated to false. This resulted in incorrect skip values, which in turn broke the rule evaluations.	2015-11-07 23:51:42 +00:00
Andrey V. Elsukov	ee09cb0bfb	Remove now obsolete KASSERT. Actually, object classify callbacks can skip some opcodes, that could be rewritten. We will deteremine real numbed of rewritten opcodes a bit later in this function. Reported by: David H. Wolfskill <david at catwhisker dot org>	2015-11-03 22:23:09 +00:00
Andrey V. Elsukov	748c9559ee	Eliminate any conditional increments of object_opcodes in the check_ipfw_rule_body() function. This function is intended to just determine that rule has some opcodes that can be rewrited. Then the ref_rule_objects() function will determine real number of rewritten opcodes using classify callback. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-11-03 10:34:26 +00:00
Andrey V. Elsukov	f81431cca1	Add ipfw_check_object_name_generic() function to do basic checks for an object name correctness. Each type of object can do more strict checking in own implementation. Do such checks for tables in check_table_name(). Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-11-03 10:29:46 +00:00
Andrey V. Elsukov	5dc5a0e0aa	Implement `ipfw internal olist` command to list named objects. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-11-03 10:21:53 +00:00
Kristof Provost	679e3c77b7	pf: Fix IPv6 checksums with route-to. When using route-to (or reply-to) pf sends the packet directly to the output interface. If that interface doesn't support checksum offloading the checksum has to be calculated in software. That was already done in the IPv4 case, but not for the IPv6 case. As a result we'd emit packets with pseudo-header checksums (i.e. incorrect checksums). This issue was exposed by the changes in r289316 when pf stopped performing full checksum calculations for all packets. Submitted by: Luoqi Chen MFC after: 1 week	2015-10-29 20:45:53 +00:00
Alexander V. Chernikov	78546dad4e	Eliminate last rtalloc_ign() caller. Differential Revision: https://reviews.freebsd.org/D3927	2015-10-27 21:25:40 +00:00
Kristof Provost	c110fc49da	pf: Fix TSO issues In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779	2015-10-14 16:21:41 +00:00
Alexander V. Chernikov	c6fb65b1df	Bump number of prefixes in O_IP_<SRC\|DST> from 15 to 31 (max possible). PR: 203459 Submitted by: groos at xiplink.com MFC after: 2 weeks	2015-10-03 05:42:25 +00:00
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Kristof Provost	2f6c345adf	pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in pf_test6() because the rcvif and the ifp (output interface) are different. In that case we're bridging though, and the rcvif the the bridge member on which the packet was received and ifp is the bridge itself. If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect. Instead check if the rcvif is a member of the ifp bridge. (In other words, the if_bridge is the ifp's softc). If that's the case we're not forwarding but bridging. PR: 202351 Reviewed by: eri Differential Revision: https://reviews.freebsd.org/D3534	2015-09-01 19:04:04 +00:00
Kristof Provost	64b3b4d611	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
Alexander V. Chernikov	3535eac433	Fix packets/bytes accounting on i386. Spotted by: julian	2015-08-27 07:53:58 +00:00
Luiz Otavio O Souza	22932fc9be	Reapply r196551 which was accidentally reverted by r223637 (update to OpenBSD pf 4.5). Fix argument ordering to memcpy as well as the size of the copy in the (theoretical) case that pfi_buffer_cnt should be greater than ~_max. This fix the failure when you hit the self table size and force it to be resized. MFC after: 3 days Sponsored by: Rubicon Communications (Netgate)	2015-08-24 21:41:05 +00:00
Luiz Otavio O Souza	0a70aaf8f5	Add ALTQ(9) support for the CoDel algorithm. CoDel is a parameterless queue discipline that handles variable bandwidth and RTT. It can be used as the single queue discipline on an interface or as a sub discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ. Differential Revision: https://reviews.freebsd.org/D3272 Reviewd by: rpaulo, gnn (previous version) Obtained from: pfSense Sponsored by: Rubicon Communications (Netgate)	2015-08-21 22:02:22 +00:00
Luiz Otavio O Souza	f2fc809dcd	Fix the copy of addresses passed from userland in table replace command. The size2 is the maximum userland buffer size (used when the addresses are copied back to userland). Obtained from: pfSense MFC after: 3 days Sponsored by: Rubicon Communications (Netgate)	2015-08-17 23:03:54 +00:00
Mariusz Zaborski	643ef281cd	Use correct src/dst ports when removing states. Submitted by: Milosz Kaniewski <m.kaniewski@wheelsystems.com>, UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal) Reviewed by: glebius Approved by: pjd (mentor) Obtained from: OpenBSD MFC after: 3 days	2015-08-11 17:24:34 +00:00
Andrey V. Elsukov	b13653baf9	Reduce overhead of ipfw's me6 opcode. Skip checks for IPv6 multicast addresses. Use in6_localip() for global unicast. And for IPv6 link-local addresses do search in the IPv6 addresses list. Since LLA are stored in the kernel internal form, use IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison. lla_mask has zero bits in the second word, where we keep sin6_scope_id. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 10:53:42 +00:00
Kristof Provost	48c29b118e	pf: Always initialise pf_fragment.fr_flags When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to initialise the fr_flags field. As a result we sometimes mistakenly thought the fragment to not be a buffered fragment. This resulted in panics because we'd end up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it to be part of V_pf_cachequeue). The next time we iterated V_pf_fragqueue we'd use a freed object and panic. While here also fix a pf_fragment use after free in pf_normalize_ip(). pf_reassemble() frees the pf_fragment, so we can't use it any more. PR: 201879, 201932 MFC after: 5 days	2015-07-29 06:35:36 +00:00
Renato Botelho	299c819a75	Simplify logic added in r285945 as suggested by glebius Approved by: glebius MFC after: 3 days Sponsored by: Netgate	2015-07-28 14:59:29 +00:00
Renato Botelho	b1b98a2db7	Respect pf rule log option before log dropped packets with IP options or dangerous v6 headers Reviewed by: gnn, eri Approved by: gnn Obtained from: pfSense MFC after: 3 days Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D3222	2015-07-28 10:31:34 +00:00
Gleb Smirnoff	3e437fd2c6	Fix a typo in r280169. Of course we are interested in deleting nsn only if we have just created it and we were the last reference. Submitted by: dhartmei	2015-07-28 09:36:26 +00:00
Andrey V. Elsukov	af9aa0a837	Add helper functions for IP checksum adjusting. Use these functions in dummynet code and for setdscp. This fixes wrong checksums in some cases. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-07-20 07:26:31 +00:00
Luigi Rizzo	4af7aed7c6	assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors): - use 1ULL to avoid shift truncations - recompute the sum of weight dynamically to provide better fairness - fix an erroneous constant in the computation of the slot - preserve timestamp correctness when the old timestamp is stale.	2015-07-10 19:24:36 +00:00
Luigi Rizzo	e38e277fc4	one more warning suppression when compiling the test code in userspace.	2015-07-10 19:18:49 +00:00
Luigi Rizzo	e25716b7cc	add code to compute fairness indexes; cleanups to remove compile warnings.	2015-07-10 18:10:40 +00:00
Ermal Luçi	a5b789f65a	ALTQ FAIRQ discipline import from DragonFLY Differential Revision: https://reviews.freebsd.org/D2847 Reviewed by: glebius, wblock(manpage) Approved by: gnn(mentor) Obtained from: pfSense Sponsored by: Netgate	2015-06-24 19:16:41 +00:00
Kristof Provost	06ba348d27	pf: Remove frc_direction We don't use the direction of the fragments for anything. The frc_direction field is assigned, but never read. Just remove it. Differential Revision: https://reviews.freebsd.org/D2773 Approved by: philip (mentor)	2015-06-11 17:57:47 +00:00
Kristof Provost	837b925aba	pf: Save the protocol number in the pf_fragment When we try to look up a pf_fragment with pf_find_fragment() we compare (see pf_frag_compare()) addresses (and family), id but also protocol. We failed to save the protocol to the pf_fragment in pf_fragcache(), resulting in failing reassembly. Differential Revision: https://reviews.freebsd.org/D2772	2015-06-11 13:26:16 +00:00
Kristof Provost	0b7eba6ad4	pf: address family must be set when creating a pf_fragment Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set. In that scenario we take a different branch in pf_normalize_ip(), taking us to pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a pf_fragment, but do not set the address family. This leads to a panic when we try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to compare the pf_fragments doesn't know what to do if the address family is not set. Simply ensure that the address family is set correctly (always AF_INET in this path). PR: 200330 Differential Revision: https://reviews.freebsd.org/D2769 Approved by: philip (mentor), gnn (mentor)	2015-06-10 13:44:04 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Luigi Rizzo	62f42cf8ee	use proper types to represent function pointers	2015-05-19 16:51:30 +00:00
Luigi Rizzo	352bc63d72	remove a redundant ; at the end of a function MFC after: 1 week	2015-05-19 15:29:00 +00:00
Luigi Rizzo	bebf3c825f	remove an extra ; after MODULE_DEPEND (would otherwise generate a warning with more verbose compiler flags) MFC after: 1 week	2015-05-19 14:49:31 +00:00
Gleb Smirnoff	3dd01a884c	Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization from associated structures initialization. The mutexes are global, while the structures are per-vnet. Submitted by: Nikos Vassiliadis <nvass gmx.com>	2015-05-19 14:04:21 +00:00
Gleb Smirnoff	30fe681e44	During module unload unlock rules before destroying UMA zones, which may sleep in uma_drain(). It is safe to unlock here, since we are already dehooked from pfil(9) and all pf threads had quit. Sponsored by: Nginx, Inc.	2015-05-19 14:02:40 +00:00
Gleb Smirnoff	78680d05d1	A miss from r283061: don't dereference NULL is pf_get_mtag() fails. PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>	2015-05-18 15:51:27 +00:00
Gleb Smirnoff	b7f69c506d	Don't dereference NULL is pf_get_mtag() fails. PR: 200222 Submitted by: Franco Fichtner <franco opnsense.org>	2015-05-18 15:05:12 +00:00
Luigi Rizzo	8ff71b031e	bugfix (only affecting the "lookup" option in the userspace version of ipfw): the conditional block should not include the 'else' otherwise the code does a 'break;' without completing the check	2015-05-13 11:53:25 +00:00
Alexander V. Chernikov	e09c1944a3	Remove ptei->value check from ipfw_link_table_values(): even if there was non-zero number of restarts, we would unref/clear all value references and start ipfw_link_table_values() once again with (mostly) cleared "tei" buffer. Additionally, ptei->ptv stores only to-be-added values, not existing ones. This is a forgotten piece of previous value refconting implementation, and now it is simply incorrect.	2015-05-12 20:42:42 +00:00
Alexander V. Chernikov	b45fa3fad6	Fix panic when prepare_batch_buffer() returns error.	2015-05-06 07:53:43 +00:00
Alexander V. Chernikov	caf993912e	Fix KASSERT introduced in r282155. Found by: dhw	2015-04-30 21:51:12 +00:00
Alexander V. Chernikov	e948489558	Fix panic introduced by r282070. Arm friendly KASSERT() to ease debug of similar crashes. Submitted by: Olivier Cochard-Labbé	2015-04-28 17:05:55 +00:00
Alexander V. Chernikov	a1bddc75b4	Fix 'may be used uninitialized' warning not caught by clang.	2015-04-27 10:01:22 +00:00
Alexander V. Chernikov	1a458088ff	Use free_nat_instance() for nat instance deletion. Sponsored by: Yandex LLC	2015-04-27 09:16:22 +00:00
Alexander V. Chernikov	74b22066b0	Make rule table kernel-index rewriting support any kind of objects. Currently we have tables identified by their names in userland with internal kernel-assigned indices. This works the following way: When userland wishes to communicate with kernel to add or change rule(s), it makes indexed sorted array of table names (internally ipfw_obj_ntlv entries), and refer to indices in that array in rule manipulation. Prior to committing new rule to the ruleset kernel a) finds all referenced tables, bump their refcounts and change values inside the opcodes to be real kernel indices b) auto-creates all referenced but not existing tables and then do a) for them. Kernel does almost the same when exporting rules to userland: prepares array of used tables in all rules in range, and prepends it before the actual ruleset retaining actual in-kernel indexes for that. There is also special translation layer for legacy clients which is able to provide 'real' indices for table names (basically doing atoi()). While it is arguable that every subsystem really needs names instead of numbers, there are several things that should be noted: 1) every non-singleton subsystem needs to store its runtime state somewhere inside ipfw chain (and be able to get it fast) 2) we can't assume object numbers provided by humans will be dense. Existing nat implementation (O(n) access and LIST inside chain) is a good example. Hence the following: * Convert table-centric rewrite code to be more generic, callback-based * Move most of the code from ip_fw_table.c to ip_fw_sockopt.c * Provide abstract API to permit subsystems convert their objects between userland string identifier and in-kernel index. (See struct opcode_obj_rewrite) for more details * Create another per-chain index (in next commit) shared among all subsystems * Convert current NAT44 implementation to use new API, O(1) lookups, shared index and names instead of numbers (in next commit). Sponsored by: Yandex LLC	2015-04-27 08:29:39 +00:00
Gleb Smirnoff	fdf6290ea9	Fix memory leak. PR: 199670 Reviewed by: ae	2015-04-27 05:44:09 +00:00
Gleb Smirnoff	772e66a6fc	Move ALTQ from contrib to net/altq. The ALTQ code is for many years discontinued by its initial authors. In FreeBSD the code was already slightly edited during the pf(4) SMP project. It is about to be edited more in the projects/ifnet. Moving out of contrib also allows to remove several hacks to the make glue. Reviewed by: net@	2015-04-16 20:22:40 +00:00
Kristof Provost	3d1bbe5fa0	pf: Fix forwarding detection If the direction is not PF_OUT we can never be forwarding. Some input packets have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound packets, causing panics. Equally, we need to ensure that packets were really received and not locally generated before trying to ip6_forward() them. Differential Revision: https://reviews.freebsd.org/D2286 Approved by: gnn(mentor)	2015-04-14 19:07:37 +00:00
George V. Neville-Neil	916e17fd56	I can find no reason to allow packets with both SYN and FIN bits set past this point in the code. The packet should be dropped and not massaged as it is here. Differential Revision: https://reviews.freebsd.org/D2266 Submitted by: eri Sponsored by: Rubicon Communications (Netgate)	2015-04-14 14:43:42 +00:00
Kristof Provost	1873dcc8c9	pf: Skip firewall for refragmented ip6 packets In cases where we scrub (fragment reassemble) on both input and output we risk ending up in infinite loops when forwarding packets. Fragmented packets come in and get collected until we can defragment. At that point the defragmented packet is handed back to the ip stack (at the pfil point in ip6_input(). Normal processing continues. Eventually we figure out that the packet has to be forwarded and we end up at the pfil hook in ip6_forward(). After doing the inspection on the defragmented packet we see that the packet has been defragmented and because we're forwarding we have to refragment it. In pf_refragment6() we split the packet up again and then ip6_forward() the individual fragments. Those fragments hit the pfil hook on the way out, so they're collected until we can reconstruct the full packet, at which point we're right back where we left off and things continue until we run out of stack. Break that loop by marking the fragments generated by pf_refragment6() as M_SKIP_FIREWALL. There's no point in processing those packets in the firewall anyway. We've already filtered on the full packet. Differential Revision: https://reviews.freebsd.org/D2197 Reviewed by: glebius, gnn Approved by: gnn (mentor)	2015-04-06 19:05:00 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Kristof Provost	7dce9b515b	pf: Deal with runt packets On Ethernet packets have a minimal length, so very short packets get padding appended to them. This padding is not stripped off in ip6_input() (due to support for IPv6 Jumbograms, RFC2675). That means PF needs to be careful when reassembling fragmented packets to not include the padding in the reassembled packet. While here also remove the 'Magic from ip_input.' bits. Splitting up and re-joining an mbuf chain here doesn't make any sense. Differential Revision: https://reviews.freebsd.org/D2189 Approved by: gnn (mentor)	2015-04-01 12:16:56 +00:00
Kristof Provost	798318490e	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
Andrey V. Elsukov	bf55a0034d	The offset variable has been cleared all bits except IP6F_OFF_MASK. Use ip6f_mf variable instead of checking its bits.	2015-03-31 14:41:29 +00:00
Sergey Kandaurov	a4879be402	Static'ize pf_fillup_fragment body to match its declaration. Missed in 278925.	2015-03-26 13:31:04 +00:00
Gleb Smirnoff	3e8c6d74bb	Always lock the hash row of a source node when updating its 'states' counter. PR: 182401 Sponsored by: Nginx, Inc.	2015-03-17 12:19:28 +00:00
Andrey V. Elsukov	2530ed9e70	Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value to obtain IPv4 next hop address in tablearg case. Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we still use this opcode, but when packet identified as IPv6 packet, we obtain next hop address from dedicated field nh6 in struct table_value. Replace hopstore field in struct ip_fw_args with anonymous union and add hopstore6 field. Use this field to copy tablearg value for IPv6. Replace spare1 field in struct table_value with zoneid. Use it to keep scope zone id for link-local IPv6 addresses. Since spare1 was used internally, replace spare0 array with two variables spare0 and spare1. Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting IPv6 addresses in table_value. Use zoneid field in struct table_value to store sin6_scope_id value. Since the kernel still uses embedded scope zone id to represent link-local addresses, convert next_hop6 address into this form before return from pfil processing. This also fixes in6_localip() check for link-local addresses. Differential Revision: https://reviews.freebsd.org/D2015 Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-03-13 09:03:25 +00:00
Andrey V. Elsukov	998fbd14b8	Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was consumed by filter. This fixes several panics due to accessing to mbuf after free. Submitted by: Kristof Provost MFC after: 1 week	2015-03-12 08:57:24 +00:00
Gleb Smirnoff	4ac6485cc6	Even more fixes to !INET and !INET6 kernels. In collaboration with: pluknet	2015-02-17 22:33:22 +00:00
Gleb Smirnoff	0324938a0f	- Improve INET/INET6 scope. - style(9) declarations. - Make couple of local functions static.	2015-02-16 23:50:53 +00:00
Gleb Smirnoff	8dc98c2a36	Toss declarations to fix regular build and NO_INET6 build.	2015-02-16 21:52:28 +00:00
Gleb Smirnoff	39a58828ef	In the forwarding case refragment the reassembled packets with the same size as they arrived in. This allows the sender to determine the optimal fragment size by Path MTU Discovery. Roughly based on the OpenBSD work by Alexander Bluhm. Submitted by: Kristof Provost Differential Revision: D1767	2015-02-16 07:01:02 +00:00
Gleb Smirnoff	f5ceb22b78	Update the pf fragment handling code to closer match recent OpenBSD. That partially fixes IPv6 fragment handling. Thanks to Kristof for working on that. Submitted by: Kristof Provost Tested by: peter Differential Revision: D1765	2015-02-16 03:38:27 +00:00
Alexander V. Chernikov	9f925e8a92	Fix IP_FW_NAT44_LIST_NAT size calculation. Found by: lev Sponsored by: Yandex LLC	2015-02-05 14:54:53 +00:00
Alexander V. Chernikov	0caab00959	* Make sure table algorithm destroy hook is always called without locks * Explicitly lock freeing interface references in ta_destroy_ifidx * Change ipfw_iface_unref() to require UH lock * Add forgotten ipfw_iface_unref() to destroy_ifidx_locked() PR: kern/197276 Submitted by: lev Sponsored by: Yandex LLC	2015-02-05 13:49:04 +00:00
Gleb Smirnoff	efc6c51ffa	Back out r276841, r276756, r276747, r276746. The change in r276747 is very very questionable, since it makes vimages more dependent on each other. But the reason for the backout is that it screwed up shutting down the pf purge threads, and now kernel immedially panics on pf module unload. Although module unloading isn't an advertised feature of pf, it is very important for development process. I'd like to not backout r276746, since in general it is good. But since it has introduced numerous build breakages, that later were addressed in r276841, r276756, r276747, I need to back it out as well. Better replay it in clean fashion from scratch.	2015-01-22 01:23:16 +00:00
Alexander V. Chernikov	0b47e42b49	Use ipfw runtime lock only when real modification is required.	2015-01-16 10:49:27 +00:00
Craig Rodrigues	7259906eb0	Do not initialize pfi_unlnkdkifs_mtx and pf_frag_mtx. They are already initialized by MTX_SYSINIT. Submitted by: Nikos Vassiliadis <nvass@gmx.com>	2015-01-08 17:49:07 +00:00
Craig Rodrigues	8d665c6ba8	Reapply previous patch to fix build. PR: 194515	2015-01-06 16:47:02 +00:00
Craig Rodrigues	4de985af0b	Instead of creating a purge thread for every vnet, create a single purge thread and clean up all vnets from this thread. PR: 194515 Differential Revision: D1315 Submitted by: Nikos Vassiliadis <nvass@gmx.com>	2015-01-06 09:03:03 +00:00
Craig Rodrigues	c75820c756	Merge: r258322 from projects/pf branch Split functions that initialize various pf parts into their vimage parts and global parts. Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). PR: 194515 Differential Revision: D1309 Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com> Reviewed by: trociny, zec, gnn	2015-01-06 08:39:06 +00:00
Ermal Luçi	7b56cc430a	pf(4) needs to have a correct checksum during its processing. Calculate checksums for the IPv6 path when needed before delving into pf(4) code as required. PR: 172648, 179392 Reviewed by: glebius@ Approved by: gnn@ Obtained from: pfSense MFC after: 1 week Sponsored by: Netgate	2014-11-19 13:31:08 +00:00
Alexander V. Chernikov	5b07fc31cc	Finish r274315: remove union 'u' from struct pf_send_entry. Suggested by: kib	2014-11-09 17:01:54 +00:00
Alexander V. Chernikov	a458ad86ee	Remove unused 'struct route' fields.	2014-11-09 16:15:28 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Alexander V. Chernikov	038263c36a	Remove unused variable. Found by: Coverity CID: 1245739	2014-11-04 10:25:52 +00:00
Alexander V. Chernikov	552eb491ab	Bump default dynamic limit to 16k entries. Print better log message when limit is hit. PR: 193300 Submitted by: me at nileshgr.com	2014-10-24 13:57:15 +00:00
Alexander V. Chernikov	9e3a53fd35	Rename log2 to tal_log2. Submitted by: luigi	2014-10-22 21:20:37 +00:00
Luigi Rizzo	03be41e6a4	remove/fix old code for building ipfw and dummynet in userspace	2014-10-22 05:21:36 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Alexander V. Chernikov	54b38fcf03	Use copyout() directly instead of updating various fields before/after each sooptcopyout() call. Found by: luigi Sponsored by: Yandex LLC	2014-10-20 11:21:07 +00:00
Alexander V. Chernikov	4040f4ecd6	Perform more checks on the number of tables supplied by user.	2014-10-19 11:15:19 +00:00
Dag-Erling Smørgrav	99e9de871a	Add a complete implementation of MurmurHash3. Tweak both implementations so they match the established idiom. Document them in hash(9). MFC after: 1 month MFC with: r272906	2014-10-18 22:15:11 +00:00
Alexander V. Chernikov	0d90989bef	Use IPFW_RULE_CNTR_SIZE macro instead of non-relevant ip_fw_cntr structure. Found by: luigi	2014-10-18 17:23:41 +00:00
Alexander V. Chernikov	2930362fb1	Fix matching default rule on clear/show commands. Found by: Oleg Ginzburg	2014-10-13 13:49:28 +00:00
Alexander V. Chernikov	956f6d3a3c	Fix KASSERT typo.	2014-10-11 15:04:50 +00:00
Alexander V. Chernikov	3fd16a3a72	Remove redundant if_notifier declaration.	2014-10-10 20:37:06 +00:00
George V. Neville-Neil	1d2baefc13	Change the PF hash from Jenkins to Murmur3. In forwarding tests this showed a conservative 3% incrase in PPS. Differential Revision: https://reviews.freebsd.org/D461 Submitted by: des Reviewed by: emaste MFC after: 1 month	2014-10-10 19:26:26 +00:00
Alexander V. Chernikov	5f8ad2bd82	Fix KASSERT argument type.	2014-10-10 18:57:12 +00:00
Alexander V. Chernikov	d699ee2dc9	Fix NOINET6 build for ipfw.	2014-10-10 18:31:35 +00:00
Alexander V. Chernikov	9fe15d0612	Partially fix build on !amd64 Pointed by: bz	2014-10-10 17:24:56 +00:00
Alexander V. Chernikov	a13a821641	Merge projects/ipfw to HEAD. Main user-visible changes are related to tables: * Tables are now identified by names, not numbers. There can be up to 65k tables with up to 63-byte long names. * Tables are now set-aware (default off), so you can switch/move them atomically with rules. * More functionality is supported (swap, lock, limits, user-level lookup, batched add/del) by generic table code. * New table types are added (flow) so you can match multiple packet fields at once. * Ability to add different type of lookup algorithms for particular table type has been added. * New table algorithms are added (cidr:hash, iface:array, number:array and flow:hash) to make certain types of lookup more effective. * Table value are now capable of holding multiple data fields for different tablearg users Performance changes: * Main ipfw lock was converted to rmlock * Rule counters were separated from rule itself and made per-cpu. * Radix table entries fits into 128 bytes * struct ip_fw is now more compact so more rules will fit into 64 bytes * interface tables uses array of existing ifindexes for faster match ABI changes: All functionality supported by old ipfw(8) remains functional. Old & new binaries can work together with the following restrictions: * Tables named other than ^\d+$ are shown as table(65535) in ruleset in old binaries Internal changes:. Changing table ids to numbers resulted in format modification for most sockopt codes. Old sopt format was compact, but very hard to extend (no versioning, inability to add more opcodes), so * All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes. * The remaining opcodes were also converted to be able to eliminate all older opcodes at once * All IP_FW3 handlers uses special API instead of calling sooptcopy* directly to ease adding another communication methods * struct ip_fw is now different for kernel and userland * tablearg value has been changed to 0 to ease future extensions * table "values" are now indexes in special value array which holds extended data for given index * Batched add/delete has been added to tables code * Most changes has been done to permit batched rule addition. * interface tracking API has been added (started on demand) to permit effective interface tables operations * O(1) skipto cache, currently turned off by default at compile-time (eats 512K). * Several steps has been made towards making libipfw: * most of new functions were separated into "parse/prepare/show and actuall-do-stuff" pieces (already merged). * there are separate functions for parsing text string into "struct ip_fw" and printing "struct ip_fw" to supplied buffer (already merged). * Probably some more less significant/forgotten features MFC after: 1 month Sponsored by: Yandex LLC	2014-10-09 19:32:35 +00:00
Alexander V. Chernikov	f9ab623bf2	Bump ipfw module version.	2014-10-09 16:12:01 +00:00
Alexander V. Chernikov	779b53d008	Sync to HEAD@r272825.	2014-10-09 15:35:28 +00:00
Alexander V. Chernikov	4c060d851c	Fix core on table destroy inroduced by table values code. Rename @ti array copy to 'ti_copy'.	2014-10-09 14:33:20 +00:00
Alexander V. Chernikov	ce575f539f	* Wire large user buffer before processing GET request. * Fix incorrect size calculation for IP_FW_XGET request.	2014-10-09 12:37:53 +00:00
Alexander V. Chernikov	be8bc45790	Add IP_FW_DUMP_SOPTCODES sopt to be able to determine which opcodes are currently available in kernel.	2014-10-08 11:12:14 +00:00
Alexander V. Chernikov	eadf3b965c	Fix possible crash when old value pointer is not updated after array resize.	2014-10-07 18:22:05 +00:00
Alexander V. Chernikov	79e86902e9	Notify table algo aboute runtime data change on table flush.	2014-10-07 16:46:11 +00:00
Alexander V. Chernikov	8ebca97f5e	* Fix crash in interface tracker due to using old "linked" field. * Ensure we're flushing entries without any locks held. * Free memory in (rare) case when interface tracker fails to register ifp. * Add KASSERT on table values refcounts.	2014-10-07 10:54:53 +00:00
Alexander V. Chernikov	bbd5a84297	Improve r272609 (O_TCPOPTS). MFC after: 3 dayes	2014-10-06 12:29:06 +00:00
Alexander V. Chernikov	a5fedf11fc	Sync to HEAD@r272609.	2014-10-06 11:29:50 +00:00
Alexander V. Chernikov	3615981425	Fix O_TCPOPTS processing. Obtained from: luigi	2014-10-06 11:15:11 +00:00
Alexander V. Chernikov	d4e1b51578	Fix build with gcc.	2014-10-04 13:57:14 +00:00
Alexander V. Chernikov	e530ca7333	Please GCC by specifying proper cast.	2014-10-04 13:46:10 +00:00
Alexander V. Chernikov	e3cadfdb32	Bump max rule size to 512 opcodes.	2014-10-04 12:46:26 +00:00
Alexander V. Chernikov	1ce4b35740	Sync to HEAD@r272516.	2014-10-04 12:42:37 +00:00
Alexander V. Chernikov	60805b89df	Add "ipfw_ctl3" FEATURE to indicate presence of new ipfw interface.	2014-10-04 12:10:32 +00:00
Alexander V. Chernikov	ccba94b8fc	Switch ipfw to use rmlock for runtime locking.	2014-10-04 11:40:35 +00:00
Alexander V. Chernikov	be3cc1b567	Bump max rule size to 512 opcodes.	2014-10-04 10:15:49 +00:00
Alexander V. Chernikov	f8350f3a23	Make linear_skipto turned off by default.	2014-10-03 15:54:51 +00:00
Alexander V. Chernikov	31f0d081d8	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
Gleb Smirnoff	495a22b595	Use rn_detachhead() instead of direct free(9) for radix tables. Sponsored by: Nginx, Inc.	2014-10-01 13:35:41 +00:00
Sean Bruno	488c0a7ca8	Fix NULL pointer deref in ipfw when using dummynet at layer 2. Drop packet if pkg->ifp is NULL, which is the case here. ref. https://github.com/HardenedBSD/hardenedBSD commit 4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7 PR 193861 -- DUMMYNET LAYER2: kernel panic in this case a kernel panic occurs. Hence, when we do not get an interface, we just drop the packet in question. PR: 193681 Submitted by: David Carlier <david.carlier@hardenedbsd.org> Obtained from: Hardened BSD MFC after: 2 weeks Relnotes: yes	2014-09-25 02:26:05 +00:00
Alexander V. Chernikov	b1d105bc68	Add pre-alfa version of DXR lookup module. It does build but (currently) does not work. This change is not intended to be merged along with other ipfw changes.	2014-09-21 18:15:09 +00:00
Gleb Smirnoff	2a6009bfa6	Mechanically convert to if_inc_counter().	2014-09-19 09:19:29 +00:00
Gleb Smirnoff	56b61ca27a	Remove ifq_drops from struct ifqueue. Now queue drops are accounted in struct ifnet if_oqdrops. Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops is simply removed from them. There were no API to read this statistic. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-19 09:01:19 +00:00
Gleb Smirnoff	450cecf0a0	- Provide a sleepable lock to protect against ioctl() vs ioctl() races. - Use the new lock to protect against simultaneous DIOCSTART and/or DIOCSTOP ioctls. Reported & tested by: jmallett Sponsored by: Nginx, Inc.	2014-09-12 08:39:15 +00:00
Alexander V. Chernikov	d6164b77f8	Make ipfw_nat module use IP_FW3 codes. Kernel changes: * Split kernel/userland nat structures eliminating IPFW_INTERNAL hack. * Add IP_FW_NAT44_* codes resemblin old ones. * Assume that instances can be named (no kernel support currently). * Use both UH+WLOCK locks for all configuration changes. * Provide full ABI support for old sockopts. Userland changes: * Use IP_FW_NAT44_* codes for nat operations. * Remove undocumented ability to show ranges of nat "log" entries.	2014-09-07 18:30:29 +00:00
Alexander V. Chernikov	1a33e79969	Change copyrights to the proper one.	2014-09-05 14:19:02 +00:00
Alexander V. Chernikov	c9daea0b86	Sync to HEAD@r271160.	2014-09-05 13:52:39 +00:00
Alexander V. Chernikov	6b988f3a27	* Use modular opcode handling inside ipfw_ctl3() instead of static switch. * Provide hints for subsystem initializers if they are called for the first/last time. * Convert every IP_FW3 opcode user to use new sopt API.	2014-09-05 11:11:15 +00:00
Alexander V. Chernikov	e822d9364e	Be consistent and use same arguments for ctl3 opcodes. Move legacy IP_FW_TABLE_XGETSIZE handling to separate function.	2014-09-03 21:57:06 +00:00
Gleb Smirnoff	bf7dcda366	Clean up unused CSUM_FRAGMENT. Sponsored by: Nginx, Inc.	2014-09-03 08:30:18 +00:00
Alexander V. Chernikov	fb4b37a357	* Fix crash due to forgotten value refcouting in ipfw_link_table_values() * Fix argument order in rollback_toperation_state() * Make flush_table() use operation state API to ease checks.	2014-09-02 20:46:18 +00:00
Alexander V. Chernikov	71af39bf34	Add more comments on newly-added functions. Add back opstate handler function.	2014-09-02 14:27:12 +00:00
Gleb Smirnoff	b616ae250c	Explicitly free packet on PF_DROP, otherwise a "quick" rule with "route-to" may still forward it. PR: 177808 Submitted by: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH	2014-09-01 13:00:45 +00:00
Alexander V. Chernikov	0cba2b2802	Add support for multi-field values inside ipfw tables. This is the last major change in given branch. Kernel changes: * Use 64-bytes structures to hold multi-value variables. * Use shared array to hold values from all tables (assume each table algo is capable of holding 32-byte variables). * Add some placeholders to support per-table value arrays in future. * Use simple eventhandler-style API to ease the process of adding new table items. Currently table addition may required multiple UH drops/ acquires which is quite tricky due to atomic table modificatio/swap support, shared array resize, etc. Deal with it by calling special notifier capable of rolling back state before actually performing swap/resize operations. Original operation then restarts itself after acquiring UH lock. * Bump all objhash users default values to at least 64 * Fix custom hashing inside objhash. Userland changes: * Add support for dumping shared value array via "vlist" internal cmd. * Some small print/fill_flags dixes to support u32 values. * valtype is now bitmask of <skipto\|pipe\|fib\|nat\|dscp\|tag\|divert\|netgraph\|limit\|ipv4\|ipv6>. New values can hold distinct values for each of this types. * Provide special "legacy" type which assumes all values are the same. * More helpers/docs following.. Some examples: 3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6 3:41 [1] zfscurr0# ipfw table mimimi info +++ table(mimimi), set(0) +++ kindex: 2, type: addr references: 0, valtype: skipto,limit,ipv4,ipv6 algorithm: addr:radix items: 0, size: 296 3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1 added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1 3:42 [1] zfscurr0# ipfw table mimimi list +++ table(mimimi), set(0) +++ 10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1	2014-08-31 23:51:09 +00:00

... 3 4 5 6 7 ...

678 Commits