freebsd-dev

Author	SHA1	Message	Date
Martin Matuska	dc64d6b7e1	Revert r262196 I am going to split this into two individual patches and test it with the projects/pf branch that may get merged later.	2014-02-19 17:06:04 +00:00
Martin Matuska	a93b9a64fe	De-virtualize pf_mtag_z [1] Process V_pf_overloadqueue in vnet context [2] This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf and vnet jails. pf inside jails remains broken. PR: kern/182964 Submitted by: glebius@FreeBSD.org [2], myself [1] Tested by: rodrigc@FreeBSD.org, myself MFC after: 2 weeks	2014-02-18 22:17:12 +00:00
George V. Neville-Neil	f850d57232	Summary: Two quick edits to the implementation notes as they're no longer stored in netinet but in netpfil.	2014-02-15 18:36:31 +00:00
Dimitry Andric	a3043eeef2	Under sys/netpfil/ipfw, surround two IPv6-specific static functions with #ifdef INET6, since they are unused when INET6 is disabled. MFC after: 3 days	2014-02-15 12:25:01 +00:00
Gleb Smirnoff	48278b8846	Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. Thanks to Dennis for providing me shell access to problematic box and his help with reproducing, debugging and investigating the problem. Thanks to: Dennis Yusupoff <dyr smartspb.net> Also reported by: dumbbell, pgj, Rambler Sponsored by: Nginx, Inc.	2014-02-14 10:05:21 +00:00
Alexander V. Chernikov	5fa3fdd3d9	Reorder struct ip_fw_chain: * move rarely-used fields down * move uh_lock to different cacheline * remove some usused fields Sponsored by: Yandex LLC	2014-01-24 09:13:30 +00:00
Gleb Smirnoff	be3d21a2cf	Remove NULL pointer dereference. CID: 1009118	2014-01-22 15:58:43 +00:00
Gleb Smirnoff	d26bbeb948	Fix resource leak and simplify code for DIOCCHANGEADDR. CID: 1007035	2014-01-22 15:44:38 +00:00
Alexander V. Chernikov	c254a2091c	Revert r260548. We really should not use IPFW_WLOCK() here but this requires some more playing with IPFW_UH_WLOCK(). Leave till later.	2014-01-11 18:27:34 +00:00
Alexander V. Chernikov	ac5863eed8	We don't need chain write lock since we're not modifying its contents. LibAliasSetAddress() uses its own mutex to serialize changes. While here, convert ifp->if_xname access to if_name() function. MFC after: 2 weeks Sponsored by: Yandex LLC	2014-01-11 16:50:41 +00:00
Gleb Smirnoff	a830c4524d	When pf_get_translation() fails, it should leave *sn pointer pristine, otherwise we will panic in pf_test_rule(). PR: 182557	2014-01-06 19:05:04 +00:00
Alexander V. Chernikov	d28d2aa46d	Use rnh_matchaddr instead of rnh_lookup for longest-prefix match. rnh_lookup is effectively the same as rnh_matchaddr if called with empy network mask. MFC after: 2 weeks	2014-01-03 23:11:26 +00:00
Dimitry Andric	85838e48bd	Fix incorrect header guard define in sys/netpfil/pf/pf.h, which snuck in in r257186. Found by clang 3.4.	2013-12-22 19:47:22 +00:00
Gleb Smirnoff	0b5d46ce4d	Fix fallout from r258479: in pf_free_src_node() the node must already be unlinked. Reported by: Konstantin Kukushkin <dark rambler-co.ru> Sponsored by: Nginx, Inc.	2013-12-22 12:10:36 +00:00
Alexander V. Chernikov	fb2b51fab1	Add net.inet.ip.fw.dyn_keep_states sysctl which re-links dynamic states to default rule instead of flushing on rule deletion. This can be useful while performing ruleset reload (think about `atomic` reload via changing sets). Currently it is turned off by default. MFC after: 2 weeks Sponsored by: Yandex LLC	2013-12-18 20:17:05 +00:00
Alexander V. Chernikov	a19b3f74af	Simplify O_NAT opcode handling. MFC after: 2 weeks Sponsored by: Yandex LLC	2013-11-28 15:28:51 +00:00
Alexander V. Chernikov	1058f17749	Check ipfw table numbers in both user and kernel space before rule addition. Found by: Saychik Pavel <umka@localka.net> MFC after: 2 weeks Sponsored by: Yandex LLC	2013-11-28 10:28:28 +00:00
Craig Rodrigues	d04dc94cac	In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);". This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit(). Therefore, vnet_ipfw_nat_uninit() must be called before vnet_ipfw_uninit(), but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions. In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c, IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN and IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER Consequently, if VIMAGE is enabled, and jails are created and destroyed, the system sometimes crashes, because we are trying to use a deleted lock. To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html ) Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL, so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().	2013-11-25 20:20:34 +00:00
Gleb Smirnoff	19acaecac3	The DIOCKILLSRCNODES operation was implemented with O(m*n) complexity, where "m" is number of source nodes and "n" is number of states. Thus, on heavy loaded router its processing consumed a lot of CPU time. Reimplement it with O(m+n) complexity. We first scan through source nodes and disconnect matching ones, putting them on the freelist and marking with a cookie value in their expire field. Then we scan through the states, detecting references to source nodes with a cookie, and disconnect them as well. Then the freelist is passed to pf_free_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> PR: kern/176763 Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:22:26 +00:00
Gleb Smirnoff	d77c1b3269	To support upcoming changes change internal API for source node handling: - Removed pf_remove_src_node(). - Introduce pf_unlink_src_node() and pf_unlink_src_node_locked(). These function do not proceed with freeing of a node, just disconnect it from storage. - New function pf_free_src_nodes() works on a list of previously disconnected nodes and frees them. - Utilize new API in pf_purge_expired_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:16:34 +00:00
Gleb Smirnoff	1320f8c0d5	Fix off by ones when scanning source nodes hash. Sponsored by: Nginx, Inc.	2013-11-22 18:57:27 +00:00
Gleb Smirnoff	4280d14d2b	Style: don't compare unsigned <= 0. Sponsored by: Nginx, Inc.	2013-11-22 18:54:06 +00:00
Luigi Rizzo	41d10f903d	add a counter on the struct mq (a queue of mbufs), and add a block for userspace compiling.	2013-11-22 05:02:37 +00:00
Luigi Rizzo	f783a35ced	disable some ipfw match options when compiling in userspace	2013-11-22 05:01:38 +00:00
Luigi Rizzo	d0f65d47ec	make this code compile in userspace on OSX	2013-11-22 05:00:18 +00:00
Luigi Rizzo	413c8aaa87	more support for userspace compiling of this code: emulate the uma_zone for dynamic rules.	2013-11-22 04:59:17 +00:00
Luigi Rizzo	77024cbc3e	make ipfw_check_packet() and ipfw_check_frame() public, so they can be used in the userspace version of ipfw/dummynet (normally using netmap for the I/O path). This is the first of a few commits to ease compiling the ipfw kernel code in userspace.	2013-11-22 04:57:50 +00:00
Gleb Smirnoff	e4e01d9cec	Some fixups to pf_get_sport after r257223: - Do not return blindly if proto isn't ICMP. - The dport is in network order, so fix comparisons. - Remove ridiculous htonl(arc4random()). - Push local variable to a narrower block.	2013-11-14 14:20:35 +00:00
Gleb Smirnoff	6c71335c62	Fix fallout from r257223. Since pf_test_state_icmp() can call pf_icmp_state_lookup() twice, we need to unlock previously found state. Reported & tested by: gavin	2013-11-05 16:54:25 +00:00
Gleb Smirnoff	b1b9dcae46	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.	2013-11-05 07:32:09 +00:00
Gleb Smirnoff	e1b58d2cff	Code logic of handling PFTM_PURGE into pf_find_state().	2013-11-04 08:20:06 +00:00
Gleb Smirnoff	7710f9f14a	Remove unused PFTM_UNTIL_PACKET const.	2013-11-04 08:15:59 +00:00
Gleb Smirnoff	1ce5620d32	- Fix VIMAGE build. - Fix build with gcc.	2013-10-28 10:12:19 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Baptiste Daroussin	0664b03c16	Import pf.c 1.638 from OpenBSD Original log: Some ICMP types that also have icmp_id, pointed out by markus@ Obtained from: OpenBSD	2013-10-27 20:56:23 +00:00
Baptiste Daroussin	5fff3f1010	Improt pf.c 1.636 from OpenBSD Original log: Make sure pd2 has a pointer to the icmp header in the payload; fixes panic seen with some some icmp types in icmp error message payloads. Obtained from: OpenBSD	2013-10-27 20:52:09 +00:00
Baptiste Daroussin	44df0d9356	Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type in one port of the state key, using the type to determine which side should be the id, and which should be the type. Also: - Handle ICMP6 messages which are typically sent to multicast addresses but recieve unicast replies, by doing fallthrough lookups against the correct multicast address. - Clear up some mistaken assumptions in the PF code: - Not all ICMP packets have an icmp_id, so simulate one based on other data if we can, otherwise set it to 0. - Don't modify the icmp id field in NAT unless it's echo - Use the full range of possible id's when NATing icmp6 echoy Difference with OpenBSD version: - C99ify the new code - WITHOUT_INET6 safe Reviewed by: glebius Obtained from: OpenBSD	2013-10-27 20:44:42 +00:00
Gleb Smirnoff	75bf2db380	Move new pf includes to the pf directory. The pfvar.h remain in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz	2013-10-27 16:25:57 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Philip Paeps	b49bf73f75	Use the correct EtherType for logging IPv6 packets. Reviewed by: melifaro Approved by: re (kib, glebius) MFC after: 3 days	2013-09-28 15:49:36 +00:00
Gleb Smirnoff	8fc6e19c2c	Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit: date: 2010/02/04 14:10:12; author: sthen; state: Exp; lines: +24 -19; pf_get_sport() picks a random port from the port range specified in a nat rule. It should check to see if it's in-use (i.e. matches an existing PF state), if it is, it cycles sequentially through other ports until it finds a free one. However the check was being done with the state keys the wrong way round so it was never actually finding the state to be in-use. - switch the keys to correct this, avoiding random state collisions with nat. Fixes PR 6300 and problems reported by robert@ and viq. - check pf_get_sport() return code in pf_test(); if port allocation fails the packet should be dropped rather than sent out untranslated. Help/ok claudio@. Some additional changes to 1.12: - We also need to bzero() the key to zero padding, otherwise key won't match. - Collapse two if blocks into one with \|\|, since both conditions lead to the same processing. - Only naddr changes in the cycle, so move initialization of other fields above the cycle. - s/u_intXX_t/uintXX_t/g PR: kern/181690 Submitted by: Olivier Cochard-Labbé <olivier cochard.me> Sponsored by: Nginx, Inc.	2013-09-02 10:14:25 +00:00
Alexander Motin	5f4fc3dbcb	Make dummynet use new direct callout(9) execution mechanism. Since the only thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't need extra switch to the clock SWI context. On idle system this change in half reduces number of active CPU cycles and wakes up only one CPU from sleep instead of two. I was going to make this change much earlier as part of calloutng project, but waited for better solution with skipping idle ticks to be implemented. Unfortunately with 10.0 release coming it is better get at least this.	2013-08-24 13:34:36 +00:00
Mikolaj Golub	8856400bcb	Make ipfw nat init/unint work correctly for VIMAGE: * Do per vnet instance cleanup (previously it was only for vnet0 on module unload, and led to libalias leaks and possible panics due to stale pointer dereferences). * Instead of protecting ipfw hooks registering/deregistering by only vnet0 lock (which does not prevent pointers access from another vnets), introduce per vnet ipfw_nat_loaded variable. The variable is set after hooks are registered and unset before they are deregistered. * Devirtualize ifaddr_event_tag as we run only one event handler for all vnets. * It is supposed that ifaddr_change event handler is called in the interface vnet context, so add an assertion. Reviewed by: zec MFC after: 2 weeks	2013-08-24 11:59:51 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Andrey V. Elsukov	415077bad9	Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration. Reported by: Eugene M. Zheganin	2013-07-29 13:17:18 +00:00
Gleb Smirnoff	93ecffe50b	Improve locking strategy between keys hash and ID hash. Before this change state creating sequence was: 1) lock wire key hash 2) link state's wire key 3) unlock wire key hash 4) lock stack key hash 5) link state's stack key 6) unlock stack key hash 7) lock ID hash 8) link into ID hash 9) unlock ID hash What could happen here is that other thread finds the state via key hash lookup after 6), locks ID hash and does some processing of the state. When the thread creating state unblocks, it finds the state it was inserting already non-virgin. Now we perform proper interlocking between key hash locks and ID hash lock: 1) lock wire & stack hashes 2) link state's keys 3) lock ID hash 4) unlock wire & stack hashes 5) link into ID hash 6) unlock ID hash To achieve that, the following hacking was performed in pf_state_key_attach(): - Key hash mutex is marked with MTX_DUPOK. - To avoid deadlock on 2 key hash mutexes, we lock them in order determined by their address value. - pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked the conflicting state synchronously. In theory this could require locking a third key hash, which we can't do now. Now we do not remove the state immediately, instead we leave this task to the purge thread. To avoid conflicts in a short period before state is purged, we push to the very end of the TAILQ. - On success, before dropping key hash locks, pf_state_key_attach() locks ID hash and returns. Tested by: Ian FREISLICH <ianf clue.co.za>	2013-06-13 06:07:19 +00:00
Gleb Smirnoff	5af77b3ebd	Return meaningful error code from pf_state_key_attach() and pf_state_insert().	2013-05-11 18:06:51 +00:00
Gleb Smirnoff	03911dec5b	Better debug message.	2013-05-11 18:03:36 +00:00
Gleb Smirnoff	048c95417d	Fix DIOCADDSTATE operation.	2013-05-11 17:58:26 +00:00

1 2 3

115 Commits