freebsd-dev

Author	SHA1	Message	Date
Sergey Kandaurov	da914858e1	Fix LINT-VIMAGE build after r228814: use virtualized pf_pool_limits.	2011-12-24 00:23:27 +00:00
Gleb Smirnoff	6bc752e028	Merge from OpenBSD: revision 1.122 date: 2009/05/13 01:01:34; author: dlg; state: Exp; lines: +6 -4 only keep track of the number of updates on tcp connections. state sync on all the other protocols is simply pushing the timeouts along which has a resolution of 1 second, so it isnt going to be hurt by pfsync taking up to a second to send it over. keep track of updates on tcp still though, their windows need constant attention.	2011-12-22 19:09:55 +00:00
Gleb Smirnoff	2662e31fc3	Merge from OpenBSD: revision 1.120 date: 2009/04/04 13:09:29; author: dlg; state: Exp; lines: +5 -5 use time_uptime instead of time_second internally. time_uptime isnt affected by adjusting the clock. revision 1.175 date: 2011/11/25 12:52:10; author: dlg; state: Exp; lines: +3 -3 use time_uptime to set state creation values as time_second can be skewed at runtime by things like date(1) and ntpd. time_uptime is monotonic and therefore more useful to compare against.	2011-12-22 19:05:58 +00:00
Gleb Smirnoff	e3b670692a	Merge couple more fixes from OpenBSD to bulk processing: revision 1.118 date: 2009/03/23 06:19:59; author: dlg; state: Exp; lines: +8 -6 wait an appropriate amount of time before giving up on a bulk update, rather than giving up after a hardcoded 5 seconds (which is generally much too short an interval for a bulk update). pointed out by david@, eyeballed by mcbride@ revision 1.171 date: 2011/10/31 22:02:52; author: mikeb; state: Exp; lines: +2 -1 Don't forget to cancel bulk update failure timeout when destroying an interface. Problem report and fix from Erik Lax, thanks! Start a brief note of revisions merged from OpenBSD.	2011-12-22 18:56:27 +00:00
Gleb Smirnoff	c5360c2998	We really mean MTU of the real interface here, not of our pseudo.	2011-12-22 18:51:35 +00:00
Gleb Smirnoff	538c3a7cd0	In FreeBSD we always have bpf(4) API, either real or stub. No need in detecting presense of 'device bpf'.	2011-12-22 18:31:47 +00:00
Gleb Smirnoff	f08535f872	Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.	2011-12-20 13:53:31 +00:00
Gleb Smirnoff	352e70652f	- Cover pfsync callouts deletion with PF_LOCK(). - Cover setting up interface between pf and pfsync with PF_LOCK().	2011-12-20 12:34:16 +00:00
Gleb Smirnoff	c53680a8ec	Return value should be conditional on return value of pfsync_defer_ptr() PR: kern/162947 Submitted by: Matthieu Kraus <matthieu.kraus s2008.tu-chemnitz.de>	2011-11-30 08:47:17 +00:00
Kevin Lo	8d5eb1c4c8	Add missing PF_UNLOCK in pf_test Reviewed by: bz	2011-10-30 14:55:00 +00:00
Gleb Smirnoff	3e850a12ef	Utilize new IF_DEQUEUE_ALL(ifq, m) macro in pfsyncintr() to reduce contention on ifqueue lock.	2011-10-27 09:47:00 +00:00
Gleb Smirnoff	9932deae93	Merge several fixes to bulk update processing from OpenBSD. Merged revisions: 1.148, 1.149, 1.150. This makes number of states on master/slave to be of a sane value.	2011-10-23 15:15:17 +00:00
Gleb Smirnoff	8ecd40b6b2	Fix indentation, no code changed.	2011-10-23 15:10:15 +00:00
Gleb Smirnoff	2f2086d57e	- Fix a bad typo (FreeBSD specific) in pfsync_bulk_update(). Instead of scheduling next run pfsync_bulk_update(), pfsync_bulk_fail() was scheduled. This lead to instant 100% state leak after first bulk update request. - After above fix, it appeared that pfsync_bulk_update() lacks locking. To fix this, sc_bulk_tmo callout was converted to an mtx one. Eventually, all pf/pfsync callouts should be converted to mtx version, since it isn't possible to stop or drain a non-mtx callout without risk of race. - Add comment that callout_stop() in pfsync_clone_destroy() lacks locking. Since pfsync0 can't be destroyed (yet), let it be here.	2011-10-23 15:08:18 +00:00
Gleb Smirnoff	35ad95774e	Fix from r226623 is not sufficient to close all races in pfsync(4). The root of problem is re-locking at the end of pfsync_sendout(). Several functions are calling pfsync_sendout() holding pointers to pf data on stack, and these functions expect this data to be consistent. To fix this, the following approach was taken: - The pfsync_sendout() doesn't call ip_output() directly, but enqueues the mbuf on sc->sc_ifp's interfaces queue, that is currently unused. Then pfsync netisr is scheduled. PF_LOCK isn't dropped in pfsync_sendout(). - The netisr runs through queue and ip_output()s packets on it. Apart from fixing race, this also decouples stack, fixing potential issues, that may happen, when sending pfsync(4) packets on input path. Reviewed by: eri (a quick review)	2011-10-23 14:59:54 +00:00
Gleb Smirnoff	68270a37c8	Absense of M_WAITOK in malloc flags for UMA doesn't equals presense of M_NOWAIT. Specify M_NOWAIT explicitly. This fixes sleeping with PF_LOCK().	2011-10-23 10:13:20 +00:00
Gleb Smirnoff	f54a3a046e	Correct flag for uma_zalloc() is M_WAITOK. M_WAIT is an old and deprecated flag from historical mbuf(9) allocator. This is style only change.	2011-10-23 10:05:25 +00:00
Gleb Smirnoff	8dc59178a8	Fix a race: we should update sc_len before dropping the pf lock, otherwise a number of packets can be queued on sc, while we are in ip_output(), and then we wipe the accumulated sc_len. On next pfsync_sendout() that would lead to writing beyond our mbuf cluster.	2011-10-21 22:28:15 +00:00
Gleb Smirnoff	b6b8562bfc	In FreeBSD ip_output() expects ip_len and ip_off in host byte order PR: kern/159029	2011-10-21 11:11:18 +00:00
Bjoern A. Zeeb	e999988442	Fix recursive pf locking leading to panics. Splatter PF_LOCK_ASSERT()s to document where we are expecting to be called with a lock held to more easily catch unnoticed code paths. This does not neccessarily improve locking in pfsync, it just tries to avoid the panics reported. PR: kern/159390, kern/158873 Submitted by: pluknet (at least something that partly resembles my patch ignoring other cleanup, which I only saw too late on the 2nd PR) MFC After: 3 days	2011-10-19 13:13:56 +00:00
Bjoern A. Zeeb	c902d29994	De-virtualize the pf_task_mtx lock. At the current state of pf locking and virtualization it is not helpful but complicates things. Current state of art is to not virtualize these kinds of locks - inp_group/hash/info/.. are all not virtualized either. MFC after: 3 days	2011-10-19 11:04:49 +00:00
Bjoern A. Zeeb	232ec0c97d	Adjust the PF_ASSERT() macro to what we usually use in the network stack: PF_LOCK_ASSERT() and PF_UNLOCK_ASSERT(). MFC after: 3 days	2011-10-19 10:16:42 +00:00
Bjoern A. Zeeb	72aed41bed	In the non-FreeBSD case we do not expect PF_LOCK and friends to do anything. MFC after: 3 days	2011-10-19 10:08:58 +00:00
Bjoern A. Zeeb	5b63183446	Pseudo interfaces should go at SI_SUB_PSEUDO. However at least pfsync also depends on pf to be initialized already so pf goes at FIRST and the interfaces go at ANY. Then the (VNET_)SYSINIT startups for pf stays at SI_SUB_PROTO_BEGIN and for pfsync we move to the later SI_SUB_PROTO_IF. This is not ideal either but at least an order that should work for the moment and can be re-fined with the VIMAGE merge, once this will actually work with more than one network stack. MFC after: 3 days	2011-10-19 10:04:24 +00:00
Bjoern A. Zeeb	c29a7fb305	Fix an obvious locking bug where we would lock again rather than unlock. MFC after: 3 days	2011-10-19 09:34:40 +00:00
Bjoern A. Zeeb	18d97aa11c	Fix a bug when NPFSYNC > 0 that on FreeBSD we would always return and never remove state. This fixes the problem some people are seeing that state is removed when pf is loaded as a module but not in situations when compiled into the kernel. Reported by: many on freebsd-pf Tested by: flo MFC after: 3 days	2011-10-19 08:57:17 +00:00
Bjoern A. Zeeb	8552ee4b89	Fix indentation in a loop and a tiny maze of #ifdefs for just the __FreeBSD__ parts that had it wrong. MFC after: 3 days	2011-10-19 08:37:48 +00:00
Bjoern A. Zeeb	c5378361a3	Use the correct byte order for the ip_divert(4) mbuf tag port meta information in pf(4). Submitted by: Yaocl (chunlinyao gmail.com), forum post 145106 Approved by: re (kib)	2011-08-25 09:38:33 +00:00
Sergey Kandaurov	a6bab2362e	Fix build failure without BPF. Reported by: deeptech71 at gmail dot com Approved by: re (kib)	2011-08-17 13:02:50 +00:00
Bjoern A. Zeeb	e0bfbfce79	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
Robert Watson	d3c1f00350	Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-04 16:33:06 +00:00
Bjoern A. Zeeb	06034940f5	Remove some further INET related symbols from pf to allow the module to not only compile bu load as well for testing with IPv6-only kernels. For the moment we ignore the csum change in pf_ioctl.c given the pending update to pf45. Reported by: dru Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 20 days	2011-05-31 15:05:29 +00:00
Robert Watson	fa046d8774	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-30 09:43:55 +00:00
Bjoern A. Zeeb	5084821ac2	Make pf compile without INET support by adding #ifdef INETs and correcting few #includes. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:34:01 +00:00
Christian S.J. Peron	6627ad29b4	Correct bogus initialization. It should be noted that this change has been corrected in the vendor branch, but for now, silence clang warnings. Found by: clang Discussed with: mlaier MFC after: 1 week	2011-01-14 04:24:53 +00:00
Rui Paulo	653bfa0ba3	Ignore the return value of ADDCARRY().	2010-10-13 17:16:08 +00:00
Bjoern A. Zeeb	dd5f5f2b1b	When using pf routing options, properly handle IP fragmentation for interfaces with TSO enabled, otherwise one would see an extra ICMP unreach, frag needed pre matching packet on lo0. This syncs pf code to ip_output.c r162084. PR: kern/144311 Submitted by: yongari via mlaier Reviewed by: eri Tested by: kib MFC after: 8 days	2010-09-10 00:00:06 +00:00
Xin LI	dcc2b1ff46	Adapt OpenBSD pf's "sloopy" TCP state machine which is useful for Direct Server Return mode, where not all packets would be visible to the load balancer or gateway. This commit should be reverted when we merge future pf versions. The benefit it would provide is that this version does not break any existing public interface and thus won't be a problem if we want to MFC it to earlier FreeBSD releases. Discussed with: mlaier Obtained from: OpenBSD Sponsored by: iXsystems, Inc. MFC after: 1 month	2009-12-24 00:43:44 +00:00
Max Laier	c31650ade1	Fix argument ordering to memcpy as well as the size of the copy in the (theoretical) case that pfi_buffer_cnt should be greater than ~_max. Submitted by: pjd Reviewed by: {krw,sthen,markus}@openbsd.org MFC after: 3 days	2009-08-25 19:30:32 +00:00
Max Laier	b5a6cecbcd	If we cannot immediately get the pf_consistency_lock in the purge thread, restart the scan after acquiring the lock the hard way. Otherwise we might end up with a dead reference. Reported by: pfsense Reviewed by: eri Initial patch by: eri Tested by: pfsense Approved by: re (kib)	2009-08-19 00:10:10 +00:00
Robert Watson	315e3e38fa	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Bjoern A. Zeeb	f4945c9f02	All consumers of in_cksum.h have been properly #ifdefed already, so do not include the file either as it would give as an extra dependency on INET.	2009-06-10 11:19:34 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Marko Zec	d825c7936c	V_loif is not an array but a pure pointer, so treat it as such. Reviewed by: bz Approved by: julian (mentor)	2009-06-01 21:29:54 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Robert Watson	e27b0c8775	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00

1 2 3 4

200 Commits