freebsd-skq

Author	SHA1	Message	Date
ae	a77bf6c232	key_spdget uses key_setdumpsp() without SPTREE_RLOCK held (it uses referenced pointer to sp). Remove SPTREE_RLOCK_ASSERT from key_setdumpsp() to fix wrong assertion. Reported by: Emeric POUPON Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-01-27 17:46:55 +00:00
rwatson	60909669f0	In order to reduce use of M_EXT outside of the mbuf allocator and socket-buffer implementations, introduce a return value for MCLGET() (and m_cljget() that underlies it) to allow the caller to avoid testing M_EXT itself. Update all callers to use the return value. With this change, very few network device drivers remain aware of M_EXT; the primary exceptions lie in mbuf-chain pretty printers for debugging, and in a few cases, custom mbuf and cluster allocation implementations. NB: This is a difficult-to-test change as it touches many drivers for which I don't have physical devices. Instead we've gone for intensive review, but further post-commit review would definitely be appreciated to spot errors where changes could not easily be made mechanically, but were largely mechanical in nature. Differential Revision: https://reviews.freebsd.org/D1440 Reviewed by: adrian, bz, gnn Sponsored by: EMC / Isilon Storage Division	2015-01-06 12:59:37 +00:00
ae	7d079a2d8e	Fix VIMAGE build.	2014-12-25 13:38:51 +00:00
ae	84b82e8873	Rename ip4_def_policy variable to def_policy. It is used by both IPv4 and IPv6. Initialize it only once in def_policy_init(). Remove its initialization from key_init() and make it static. Remove several fields from struct secpolicy: * lock - it isn't so useful having mutex in the structure, but the only thing we do with it is initialization and destroying. * state - it has only two values - DEAD and ALIVE. Instead of take a lock and change the state to DEAD, then take lock again in GC function and delete policy from the chain - keep in the chain only ALIVE policies. * scangen - it was used in GC function to protect from sending several SADB_SPDEXPIRE messages for one SPD entry. Now we don't keep DEAD entries in the chain and there is no need to have scangen variable. Use TAILQ to implement SPD entries chain. Use rmlock to protect access to SPD entries chain. Protect all SP lookup with RLOCK, and use WLOCK when we are inserting (or removing) SP entry in the chain. Instead of using pattern "LOCK(); refcnt++; UNLOCK();", use refcount(9) API to implement refcounting in SPD. Merge code from key_delsp() and _key_delsp() into _key_freesp(). And use KEY_FREESP() macro in all cases when we want to release reference or just delete SP entry. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-24 18:34:56 +00:00
ae	19098cfc00	Treat errors when retrieving security policy as policy violation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:46:11 +00:00
ae	3f424f0f24	Initialize error variable. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:40:56 +00:00
ae	3665df88dc	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:35:34 +00:00
ae	8eff9f6e5d	Remove flags and tunalready arguments from ipsec4_process_packet() and make its prototype similar to ipsec6_process_packet. The flags argument isn't used here, tunalready is always zero. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:34:49 +00:00
ae	409532973d	Remove now unused mtag argument from ipsec*_common_input_cb. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:14:49 +00:00
ae	d1182925a1	Remove code related to PACKET_TAG_IPSEC_IN_CRYPTO_DONE mbuf tag. It isn't used in FreeBSD. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:07:21 +00:00
ae	4d4039104f	Remove unused mtag variable. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:01:53 +00:00
ae	9b3ccf0ab3	key_getspacq() returns holding the spacq_lock. Unlock it in all cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-12-07 06:47:00 +00:00
ae	6cf2e00820	Fix style(9) and remove m_freem(NULL). Add XXX comment, it looks incorrect, because m_pkthdr.len is already incremented by M_PREPEND(). Sponsored by: Yandex LLC	2014-12-04 05:02:12 +00:00
ae	90acc68352	Remove __P() macro. Suggested by: kevlo Sponsored by: Yandex LLC	2014-12-03 04:08:41 +00:00
ae	ef3a17b83c	ANSIfy function declarations. Sponsored by: Yandex LLC	2014-12-03 03:50:54 +00:00
ae	4473ed457d	Remove unneded check. No need to do m_pullup to the size that we prepended. Sponsored by: Yandex LLC	2014-12-02 05:28:40 +00:00
ae	b82eb2f5d9	Remove route chaching support from ipsec code. It isn't used for some time. * remove sa_route_union declaration and route_cache member from struct secashead; * remove key_sa_routechange() call from ICMP and ICMPv6 code; * simplify ip_ipsec_mtu(); * remove #include <net/route.h>; Sponsored by: Yandex LLC	2014-12-02 04:20:50 +00:00
ae	cac7b140a6	Remove unused structure declarations. Sponsored by: Yandex LLC	2014-12-02 02:41:44 +00:00
ae	3bcf1e15f7	Remove unused declartations. Sponsored by: Yandex LLC	2014-12-02 02:32:28 +00:00
ae	6ee47c6705	Remove ip4_input() declaration. It was removed in r275133. MFC after: 1 month	2014-11-27 00:27:39 +00:00
ae	f9ef15aae9	Do not use xform_ipip as decapsulation fallback. xform_ipip was used as fallback with low priority for IPIP encapsulated packets that were decrypted. In some cases it can decapsulate packets, that it shouldn't. This leads to situations, when wrong configurations are magically working. Also it can propagate wrong ingress interface and this can break security. Now we redesigned the IPSEC code and IPIP encapsulation is called directly from ipsec_output, and decapsulation is done in the ipsec_input with m_striphdr. Differential Revision: https://reviews.freebsd.org/D1220 MFC after: 1 month Sponsored by: Yandex LLC	2014-11-26 17:44:49 +00:00
ae	bbeee5ebc9	Count statistics for the specific address family. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 12:58:33 +00:00
ae	afe36fb422	Strip IP header only when we act in tunnel mode. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 10:48:59 +00:00
ae	d32d19ece5	Remove redundant ip6_plen initialization. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 10:47:24 +00:00
ae	2188ffe3d0	ipsec6_process_packet is called before ip6_output fixes ip6_plen. Update ip6_plen before bpf processing to be able see correct value. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 22:51:30 +00:00
ae	bc6c58f45f	Fix ips_out_nosa errors accounting. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 14:00:49 +00:00
ae	c075106d39	Pass mbuf to pfil processing before stripping outer IP header as it is described in if_enc(4). MFC after: 2 week Sponsored by: Yandex LLC	2014-11-07 12:05:20 +00:00
glebius	99f4ec50e8	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
ae	192cfad02f	When mode isn't explicitly specified (wildcard) and inner protocol isn't IPv4 or IPv6, assume it is the transport mode. Reported by: jmg MFC after: 1 week Sponsored by: Yandex LLC	2014-11-06 20:23:57 +00:00
ae	2495e4b948	Use in_localip() instead of handmade implementation. MFC after: 1 week Sponsored by: Yandex LLC	2014-10-31 12:19:22 +00:00
jhb	93ccee215c	Use a static callout to drive key_timehandler() instead of timeout(). While here, make key_timehandler() private to key.c. Submitted by: bz (2) Tested by: bz	2014-10-23 20:43:16 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
ae	8adffba139	Do not strip outer header when operating in transport mode. Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP encapsulation, it will be handled with tunneling interface. And thus proper interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling interface will see packets in both directions. Sponsored by: Yandex LLC	2014-10-02 02:00:21 +00:00
glebius	56e9d80329	Mechanically convert to if_inc_counter().	2014-09-19 10:18:14 +00:00
kevlo	dd40fa7e62	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
kevlo	7727a3c215	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
glebius	75acbf068a	Fix style bug: rename the refcount field of m_ext to ext_cnt, to match other members. Sponsored by: Nginx, Inc.	2014-07-11 14:34:29 +00:00
zec	4aaabb881a	The assumption in ipsec4_process_packet() that the payload may be only IPv4 is wrong, so check the IP version before mangling the payload header.	2014-07-01 08:02:25 +00:00
bz	8cfb727def	Use IPv4 statistics in ipsec4_process_packet() rather than the IPv6 version. This also unbreaks the NOINET6 builds after r266800.	2014-05-28 23:01:20 +00:00
vanhu	451f0d7511	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
bz	7d2507a09d	Only do a ports check if this is a NAT-T SA. Otherwise other lookups providing ports may get unexpected results. MFC After: 2 weeks	2014-05-24 09:29:23 +00:00
ae	721d16d187	Remove _IP_VHL* macros and related ifdefs. MFC after: 1 week	2014-04-16 05:31:54 +00:00
ae	7afb4f39b4	The check for local address spoofing lacks ifaddr locking. Remove these loops and use in_localip() and in6_localip() functions instead. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 16:58:32 +00:00
ae	a503000e26	Remove unused variable. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 15:57:27 +00:00
ae	11ab69a2c3	Remove dead code. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 15:55:38 +00:00
jhb	d9d6b88f18	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
ae	b449f4079d	Initialize prot variable. PR: 177417 MFC after: 1 week	2013-11-11 13:19:55 +00:00
glebius	2c1ec831c9	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
jhb	a437be7257	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
ae	afd48faca0	Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski	2013-07-23 14:14:24 +00:00
ae	d467a4169a	Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat, ipsec4stat, ipsec6stat to PCPU counters.	2013-07-09 10:08:13 +00:00
ae	1a36dfcc87	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
ae	b05df49af6	Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP, PFKEY. MFC after: 2 weeks	2013-06-20 11:44:16 +00:00
ae	1e4c88cc8b	Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. MFC after: 2 weeks	2013-06-20 09:55:53 +00:00
ae	844d612b2a	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
glebius	f1574e6b22	Use m_get2() + m_align() instead of hand made key_alloc_mbuf(). Code examination shows, that although key_alloc_mbuf() could return chains, the callers never use chains, so m_get2() should suffice. Sponsored by: Nginx, Inc.	2013-03-15 10:20:15 +00:00
glebius	8e20fa5ae9	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
glebius	fea857f2a8	Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.	2012-10-23 08:33:13 +00:00
glebius	6a485e417a	- Fix one more miss from r241913. - Add XXX comment about necessity of the entire block, that "fixes up" the IP header.	2012-10-23 08:22:01 +00:00
glebius	95d300ced4	Couple of changes missed from r241913, which converted IPv4 stack to network byte order.	2012-10-22 22:42:28 +00:00
glebius	5cc3ac5902	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
andre	34a9a386cb	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
kevlo	2e8dd1a520	Add missing break	2012-09-18 08:00:43 +00:00
vanhu	05d54cfa61	In NAT-T transport mode, allow a client to open a new connection just after closing another. It worked only in tunnel mode before. Submitted by: Andreas Longwitz <longwitz@incore.de> MFC after: 1M	2012-09-12 12:14:50 +00:00
glebius	5190d38ee3	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
jhb	8f6ba08bcc	Unexpand a couple of TAILQ_FOREACH()s.	2012-08-17 16:01:24 +00:00
bz	cd3a3d4b7a	Fix a bug introduced in r221129 that leads to a panic wen using bundled SAs. For now allow same address family bundles. While discovered with ESP and AH, which does not make a lot of sense, IPcomp could be a possible problematic candidate. PR: kern/164400 MFC after: 3 days	2012-07-22 17:46:05 +00:00
bz	dcdb23291f	Merge multi-FIB IPv6 support from projects/multi-fibv6/head/: Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days	2012-02-17 02:39:58 +00:00
bz	a8d3ef905d	Clean up some #endif comments removing from short sections. Add #endif comments to longer, also refining strange ones. Properly use #ifdef rather than #if defined() where possible. Four #if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to avoid conflicts with eventually upcoming changes for RSS. Reported by: bde (most) Reviewed by: bde MFC after: 3 days	2012-01-22 02:13:19 +00:00
pjd	318f633177	Remove unused 'plen' variable.	2011-11-26 23:57:03 +00:00
pjd	a634fb6b1e	The esp_max_ivlen global variable is not needed, we can just use EALG_MAX_BLOCK_LEN.	2011-11-26 23:27:41 +00:00
pjd	cab7b40bd0	malloc(M_WAITOK) never fails, so there is no need to check for NULL.	2011-11-26 23:18:19 +00:00
pjd	14f91c0b66	Eliminate 'err' variable and just use existing 'error'.	2011-11-26 23:15:28 +00:00
pjd	06a6aab6cc	Simplify code a bit.	2011-11-26 23:13:30 +00:00
pjd	2a876c6d0a	There is no need to virtualize esp_max_ivlen.	2011-11-26 23:11:41 +00:00
brueffer	8905963622	Add missing va_end() in an error case to clean up after va_start() (already done in the non-error case). CID: 4726 Found with: Coverity Prevent(tm) MFC after: 1 week	2011-10-07 21:00:26 +00:00
bz	e15f804c7b	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
vanhu	684e2951a0	Release SP's refcount in key_get_spdbyid(). PR: 156676 Submitted by: Tobias Brunner (tobias@strongswan.org) MFC after: 1 week	2011-05-09 13:16:21 +00:00
bz	d28e675043	Make IPsec compile without INET adding appropriate #ifdef checks. Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c to not need three different versions depending on INET, INET6 or both. Mark two places preparing for not yet supported functionality with IPv6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:28:42 +00:00
bz	9a8bd81a84	Do not allow recursive RFC3173 IPComp payload. Reviewed by: Tavis Ormandy (taviso cmpxchg8b.com) MFC after: 5 days Security: CVE-2011-1547	2011-04-01 14:13:49 +00:00
fabient	d56170701e	Optimisation in IPSEC(4): - Remove contention on ISR during the crypto operation by using rwlock(9). - Remove a second lookup of the SA in the callback. Gain on 6 cores CPU with SHA1/AES128 can be up to 30%. Reviewed by: vanhu MFC after: 1 month	2011-03-31 15:23:32 +00:00
fabient	9f3325218f	Fix two SA refcount: - AH does not release the SA like in ESP/IPCOMP when handling EAGAIN - ipsec_process_done incorrectly release the SA. Reviewed by: vanhu MFC after: 1 week	2011-03-31 13:14:24 +00:00
vanhu	b5386e15c1	Fixed IPsec's HMAC_SHA256-512 support to be RFC4868 compliant. This will break interoperability with all older versions of FreeBSD for those algorithms. Reviewed by: bz, gnn Obtained from: NETASQ MFC after: 1w	2011-02-18 09:40:13 +00:00
dim	fb307d7d1d	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
dim	fda4020a88	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
bz	d198c87916	Announce both IPsec and UDP Encap (NAT-T) if available for feature_present(3) checks. This will help to run-time detect and conditionally handle specific optionas of either feature in user space (i.e. in libipsec). Descriptions read by: rwatson MFC after: 2 weeks	2010-10-30 18:52:44 +00:00
thomas	92a9dc30a2	Fix typo in comment.	2010-10-25 16:11:37 +00:00
bz	de9392f9e0	Make the IPsec SADB embedded route cache a union to be able to hold both the legacy and IPv6 route destination address. Previously in case of IPv6, there was a memory overwrite due to not enough space for the IPv6 address. PR: kern/122565 MFC After: 2 weeks	2010-10-23 20:35:40 +00:00
bz	7ff23ffda7	Remove dead code: assignment to a local variable not used anywhere after that. MFC after: 3 days	2010-10-14 15:15:22 +00:00
bz	e31270089e	Style: make the asterisk go with the variable name, not the type. MFC after: 3 days	2010-10-14 14:49:49 +00:00
bz	36eb388782	MFp4 @178283: Improve IPsec flow distribution for better netisr parallelism. Instead of using the pointer that would have the last bits masked in a % statement in netisr_select_cpuid() to select the queue, use the SPI. Reviewed by: rwatson MFC after: 4 weeks	2010-05-24 16:27:47 +00:00
vanhu	b9358a210e	Set SA's natt_type before calling key_mature() in key_add(), as the SA may be used as soon as key_mature() has been done. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:58:58 +00:00
vanhu	33dc72ec8c	Update SA's NAT-T stuff before calling key_mature() in key_update(), as SA may be used as soon as key_mature() has been called. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:55:26 +00:00
bz	0a90ef1728	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
vanhu	1f00f9ada8	Locks SPTREE when setting some SP entries to state DEAD. This can prevent kernel panics when updating SPs while there is some traffic for them. Obtained from: NETASQ MFC after: 1m	2010-04-15 12:40:33 +00:00
eri	e8f045c8eb	Fix a logic error in ipsec code that extracts information from the packets. Reviewed by: bz, mlaier Approved by: mlaier(mentor) MFC after: 1 month	2010-04-02 18:15:23 +00:00
bz	cb7afff0b8	When tearing down IPsec as part of a (virtual) network stack, do not try to free the same list twice but free both the acquiring list and the security policy acquiring list. Reviewed by: anchie MFC after: 3 days	2010-03-28 06:51:50 +00:00
pjd	c08e2cb276	Correct typo in comment.	2010-02-18 22:34:29 +00:00
bz	68cb475233	Enable IPcomp by default. PR: kern/123587 MFC after: 5 days	2009-11-29 20:47:43 +00:00
bz	21f84f921a	Add more statistics variables for IPcomp. Try to version the struct in a backward compatible way. People asked for the versioning of the stats structs in general before. MFC after: 5 days	2009-11-29 20:37:30 +00:00
bz	a8256824b8	Assimilate very similar input and output code paths (no real functional change). MFC after: 5 days	2009-11-29 17:47:49 +00:00
bz	89a2398493	Only add the IPcomp header if crypto reported success and we have a lower payload size. Before we had always added the header, no matter if we actually send out compressed data or not. With this, after the opencrypto/deflate changes, IPcomp starts to work apart from edge cases. Leave it disabled by default until those are fixed as well. PR: kern/123587 MFC after: 5 days	2009-11-29 10:53:34 +00:00
bz	90cf239f86	Remove whitespace. MFC after: 6 days	2009-11-28 21:42:39 +00:00
bz	8bebf9e60f	Directly send data uncompressed if the packet payload size is lower than the compression algorithm threshold. MFC after: 6 days	2009-11-28 21:40:57 +00:00
bz	b1928221d2	Correct a typo. MFC after: 6 days	2009-11-28 21:01:26 +00:00
vanhu	7b642517df	fixed two race conditions when inserting/removing SAs via PFKey, which can both lead to a kernel panic when adding/removing quickly a lot of SAs. Obtained from: NETASQ MFC after: 2w (MFC on 8 before 8.0 release ???)	2009-11-17 16:00:41 +00:00
vanhu	4f56708582	Changed an IPSEC_ASSERT to a simple test, as such invalid packets may come from outside without being discarded before. Submitted by: aurelien.ansel@netasq.com Reviewed by: bz (secteam) Obtained from: NETASQ MFC after: 1m	2009-10-01 15:33:53 +00:00
vanhu	550a925d5c	When checking traffic endpoint's adresses families in key_spdadd(), compare them together instead of comparing each one with respective tunnel endpoint. PR: kern/138439 Submitted by: aurelien.ansel@netasq.com Obtained from: NETASQ MFC after: 1 m	2009-09-16 11:56:44 +00:00
pjd	dfd5ed3d44	Silent gcc? Yeah, you wish. What I ment was to silence gcc. Spotted by: julian	2009-09-06 19:05:03 +00:00
pjd	fe152cceaf	Initialize state_valid and arraysize variable so gcc won't complain. Reported by: bz	2009-09-06 18:09:25 +00:00
pjd	aa12b3e9c9	Improve code a bit by eliminating goto and having one unlock per lock.	2009-09-06 07:32:16 +00:00
pjd	78cb45a022	Correct typo in comment.	2009-09-06 07:30:21 +00:00
rwatson	ef8d755d4d	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
rwatson	fb9ffed650	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
rwatson	b3be1c6e3b	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
rwatson	d32e9e9901	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
rwatson	6955067932	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
rwatson	88f8de4d40	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
rwatson	57ca4583e7	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
rwatson	bd6eb7be79	Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks	2009-06-25 16:35:28 +00:00
rwatson	ea70a3542d	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
rwatson	9c4380a8ee	Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)	2009-06-24 21:00:25 +00:00
bz	55f6868044	Move setting of ports from NAT-T below key_getsah() and actually below key_setsaval(). Without that, the lookup for the SA had failed as we were looking for a SA with the new, updated port numbers instead of the old ones and were comparing the ports in key_cmpsaidx(). This makes updating the remote -> local SA on the initiator work again. Problem introduced with: p4 changeset 152114	2009-06-19 21:01:55 +00:00
bz	b017972f11	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
vanhu	16c1346b9a	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
bz	86d0c6ee3d	Properly hide IPv4 only variables and functions under #ifdef INET.	2009-06-10 19:25:46 +00:00
bz	b7ff2bdc20	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
zec	8b1f38241a	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
rwatson	2bab695560	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
vanhu	48cef84e5f	Lock SPTREE before parsing it in key_spddump() Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 2 weeks	2009-05-27 09:44:14 +00:00
vanhu	6e1cb07c00	Only decrease refcnt once when flushing SPD entries, to avoid flushing entries which are still used. Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 1 month	2009-05-27 09:31:50 +00:00
bz	9642ff6e28	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
zec	d78a1b1a82	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
zec	7a56f17240	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
zec	39b6dc8ba2	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
bms	0915b81c76	Stub out IN6_LOOKUP_MULTI() for GETSPI requests, for now. This has the effect that IPv6 multicast traffic won't trigger an SPI allocation when IPSEC is in use, however, this obviously needs to stomp on locks, and IN6_LOOKUP_MULTI() is about to go away. This definitely needs to be revisited before 8.x is branched as a release branch.	2009-04-29 11:15:58 +00:00
bz	a12cc82f1a	key_gettunnel() has been unsued with FAST_IPSEC (now IPSEC). KAME had explicit checks at one point using it, so just hide it behind #if 0 for now until we are sure if we can completely dump it or not. MFC after: 1 month	2009-04-27 21:04:16 +00:00
zec	b39b54e6de	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
zec	c85551e0bc	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
vanhu	7e0f7398ba	Fixed comments so it stays in 80 chars by line with hard tabs of 8 chars.... Approved by: gnn(mentor)	2009-03-23 16:20:39 +00:00
vanhu	21967caaf2	Spelling fix in a comment Approved by: gnn(mentor)	2009-03-20 09:12:01 +00:00
vanhu	cea6d30cdc	Fixed style for some comments Approved by: gnn(mentor)	2009-03-19 15:50:45 +00:00
vanhu	72aca0d947	Fixed style for some comments Approved by: gnn(mentor)	2009-03-19 15:44:13 +00:00
vanhu	e33d6fbff6	Fixed deletion of sav entries in key_delsah() Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 1 month	2009-03-18 14:01:41 +00:00
vanhu	a5f4a55744	SAs are valid (but dying) when they reached soft lifetime, even if they have never been used. Approved by: gnn(mentor) MFC after: 2 weeks	2009-03-05 16:22:32 +00:00
bz	4321e2a8f4	Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif. Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this. Requested by: peter	2009-03-01 11:01:00 +00:00
bz	df2be82cec	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
bz	710220924b	Shuffle the vimage.h includes or add where missing.	2009-02-27 13:22:26 +00:00
rdivacky	e5bfcba080	Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15. Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current	2009-02-24 18:09:31 +00:00

1 2 3 4 5 ...

402 Commits