freebsd-nq

Author	SHA1	Message	Date
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Bjoern A. Zeeb	13a6cf24ac	Announce both IPsec and UDP Encap (NAT-T) if available for feature_present(3) checks. This will help to run-time detect and conditionally handle specific optionas of either feature in user space (i.e. in libipsec). Descriptions read by: rwatson MFC after: 2 weeks	2010-10-30 18:52:44 +00:00
Thomas Quinot	94294cada5	Fix typo in comment.	2010-10-25 16:11:37 +00:00
Bjoern A. Zeeb	4a85b5e2ea	Make the IPsec SADB embedded route cache a union to be able to hold both the legacy and IPv6 route destination address. Previously in case of IPv6, there was a memory overwrite due to not enough space for the IPv6 address. PR: kern/122565 MFC After: 2 weeks	2010-10-23 20:35:40 +00:00
Bjoern A. Zeeb	acf456a04a	Remove dead code: assignment to a local variable not used anywhere after that. MFC after: 3 days	2010-10-14 15:15:22 +00:00
Bjoern A. Zeeb	e046b77ee1	Style: make the asterisk go with the variable name, not the type. MFC after: 3 days	2010-10-14 14:49:49 +00:00
Bjoern A. Zeeb	3abaa08643	MFp4 @178283: Improve IPsec flow distribution for better netisr parallelism. Instead of using the pointer that would have the last bits masked in a % statement in netisr_select_cpuid() to select the queue, use the SPI. Reviewed by: rwatson MFC after: 4 weeks	2010-05-24 16:27:47 +00:00
VANHULLEBUS Yvan	2e8d55c4e8	Set SA's natt_type before calling key_mature() in key_add(), as the SA may be used as soon as key_mature() has been done. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:58:58 +00:00
VANHULLEBUS Yvan	2d2a2083f7	Update SA's NAT-T stuff before calling key_mature() in key_update(), as SA may be used as soon as key_mature() has been called. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:55:26 +00:00
Bjoern A. Zeeb	82cea7e6f3	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
VANHULLEBUS Yvan	61f73308d4	Locks SPTREE when setting some SP entries to state DEAD. This can prevent kernel panics when updating SPs while there is some traffic for them. Obtained from: NETASQ MFC after: 1m	2010-04-15 12:40:33 +00:00
Ermal Luçi	87a25418ac	Fix a logic error in ipsec code that extracts information from the packets. Reviewed by: bz, mlaier Approved by: mlaier(mentor) MFC after: 1 month	2010-04-02 18:15:23 +00:00
Bjoern A. Zeeb	8b7893b056	When tearing down IPsec as part of a (virtual) network stack, do not try to free the same list twice but free both the acquiring list and the security policy acquiring list. Reviewed by: anchie MFC after: 3 days	2010-03-28 06:51:50 +00:00
Pawel Jakub Dawidek	d0d6567d4a	Correct typo in comment.	2010-02-18 22:34:29 +00:00
Bjoern A. Zeeb	a77cb332ee	Enable IPcomp by default. PR: kern/123587 MFC after: 5 days	2009-11-29 20:47:43 +00:00
Bjoern A. Zeeb	90b4c081d0	Add more statistics variables for IPcomp. Try to version the struct in a backward compatible way. People asked for the versioning of the stats structs in general before. MFC after: 5 days	2009-11-29 20:37:30 +00:00
Bjoern A. Zeeb	10229cd109	Assimilate very similar input and output code paths (no real functional change). MFC after: 5 days	2009-11-29 17:47:49 +00:00
Bjoern A. Zeeb	afa47e51aa	Only add the IPcomp header if crypto reported success and we have a lower payload size. Before we had always added the header, no matter if we actually send out compressed data or not. With this, after the opencrypto/deflate changes, IPcomp starts to work apart from edge cases. Leave it disabled by default until those are fixed as well. PR: kern/123587 MFC after: 5 days	2009-11-29 10:53:34 +00:00
Bjoern A. Zeeb	3d34d241be	Remove whitespace. MFC after: 6 days	2009-11-28 21:42:39 +00:00
Bjoern A. Zeeb	4ff9852103	Directly send data uncompressed if the packet payload size is lower than the compression algorithm threshold. MFC after: 6 days	2009-11-28 21:40:57 +00:00
Bjoern A. Zeeb	023795f033	Correct a typo. MFC after: 6 days	2009-11-28 21:01:26 +00:00
VANHULLEBUS Yvan	3e6265f14d	fixed two race conditions when inserting/removing SAs via PFKey, which can both lead to a kernel panic when adding/removing quickly a lot of SAs. Obtained from: NETASQ MFC after: 2w (MFC on 8 before 8.0 release ???)	2009-11-17 16:00:41 +00:00
VANHULLEBUS Yvan	a45bff047c	Changed an IPSEC_ASSERT to a simple test, as such invalid packets may come from outside without being discarded before. Submitted by: aurelien.ansel@netasq.com Reviewed by: bz (secteam) Obtained from: NETASQ MFC after: 1m	2009-10-01 15:33:53 +00:00
VANHULLEBUS Yvan	22c125a1b6	When checking traffic endpoint's adresses families in key_spdadd(), compare them together instead of comparing each one with respective tunnel endpoint. PR: kern/138439 Submitted by: aurelien.ansel@netasq.com Obtained from: NETASQ MFC after: 1 m	2009-09-16 11:56:44 +00:00
Pawel Jakub Dawidek	fc79063e66	Silent gcc? Yeah, you wish. What I ment was to silence gcc. Spotted by: julian	2009-09-06 19:05:03 +00:00
Pawel Jakub Dawidek	3b02c4a3d3	Initialize state_valid and arraysize variable so gcc won't complain. Reported by: bz	2009-09-06 18:09:25 +00:00
Pawel Jakub Dawidek	950ab2f81e	Improve code a bit by eliminating goto and having one unlock per lock.	2009-09-06 07:32:16 +00:00
Pawel Jakub Dawidek	cee0fa809b	Correct typo in comment.	2009-09-06 07:30:21 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Robert Watson	d1da0a0672	Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks	2009-06-25 16:35:28 +00:00
Robert Watson	2d9cfabad4	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
Robert Watson	80af0152f3	Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)	2009-06-24 21:00:25 +00:00
Bjoern A. Zeeb	57700c9e4d	Move setting of ports from NAT-T below key_getsah() and actually below key_setsaval(). Without that, the lookup for the SA had failed as we were looking for a SA with the new, updated port numbers instead of the old ones and were comparing the ports in key_cmpsaidx(). This makes updating the remote -> local SA on the initiator work again. Problem introduced with: p4 changeset 152114	2009-06-19 21:01:55 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
VANHULLEBUS Yvan	7b495c4494	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
Bjoern A. Zeeb	fc228fbf49	Properly hide IPv4 only variables and functions under #ifdef INET.	2009-06-10 19:25:46 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
VANHULLEBUS Yvan	aa1faa5fc6	Lock SPTREE before parsing it in key_spddump() Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 2 weeks	2009-05-27 09:44:14 +00:00
VANHULLEBUS Yvan	cff5821a61	Only decrease refcnt once when flushing SPD entries, to avoid flushing entries which are still used. Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 1 month	2009-05-27 09:31:50 +00:00
Bjoern A. Zeeb	db2e47925e	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00

1 2 3 4 5

217 Commits