freebsd-dev

Author	SHA1	Message	Date
Andrey V. Elsukov	574fde00be	Since PFIL can change mbuf pointer, we should update pointers after calling ipsec_filter(). Sponsored by: Yandex LLC	2015-04-28 09:29:28 +00:00
Andrey V. Elsukov	a9b9f6b6c6	Make ipsec_in_reject() static. We use ipsec[46]_in_reject() instead. Sponsored by: Yandex LLC	2015-04-27 01:12:51 +00:00
Andrey V. Elsukov	3d80e82d60	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
Andrey V. Elsukov	962ac6c727	Change ipsec_address() and ipsec_logsastr() functions to take two additional arguments - buffer and size of this buffer. ipsec_address() is used to convert sockaddr structure to presentation format. The IPv6 part of this function returns pointer to the on-stack buffer and at the moment when it will be used by caller, it becames invalid. IPv4 version uses 4 static buffers and returns pointer to new buffer each time when it called. But anyway it is still possible to get corrupted data when several threads will use this function. ipsec_logsastr() is used to format string about SA entry. It also uses static buffer and has the same problem with concurrent threads. To fix these problems add the buffer pointer and size of this buffer to arguments. Now each caller will pass buffer and its size to these functions. Also convert all places where these functions are used (except disabled code). And now ipsec_address() uses inet_ntop() function from libkern. PR: 185996 Differential Revision: https://reviews.freebsd.org/D2321 Reviewed by: gnn Sponsored by: Yandex LLC	2015-04-18 16:58:33 +00:00
Andrey V. Elsukov	1d3b268c04	Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6. ipsec6_common_input_cb() uses partial copy of ip6_input() to parse headers. But this isn't correct, when we use tunnel mode IPSec. When we stripped outer IPv6 header from the decrypted packet, it can become IPv4 packet and should be handled by ip_input. Also when we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted packet with inner IPv6 header to ip6_input, it will correctly handle it and also can decide to forward it. The "skip" variable points to offset where payload starts. In tunnel mode we reset it to zero after stripping the outer header. So, when it is zero, we should requeue mbuf via netisr. Differential Revision: https://reviews.freebsd.org/D2306 Reviewed by: adrian, gnn Sponsored by: Yandex LLC	2015-04-18 16:51:24 +00:00
Andrey V. Elsukov	1ae800e7a6	Fix handling of scoped IPv6 addresses in IPSec code. * in ipsec_encap() embed scope zone ids into link-local addresses in the new IPv6 header, this helps ip6_output() disambiguate the scope; * teach key_ismyaddr6() use in6_localip(). in6_localip() is less strict than key_sockaddrcmp(). It doesn't compare all fileds of struct sockaddr_in6, but it is faster and it should be safe, because all SA's data was checked for correctness. Also, since IPv6 link-local addresses in the &V_in6_ifaddrhead are stored in kernel-internal form, we need to embed scope zone id from SA into the address before calling in6_localip. * in ipsec_common_input() take scope zone id embedded in the address and use it to initialize sin6_scope_id, then use this sockaddr structure to lookup SA, because we keep addresses in the SADB without embedded scope zone id. Differential Revision: https://reviews.freebsd.org/D2304 Reviewed by: gnn Sponsored by: Yandex LLC	2015-04-18 16:46:31 +00:00
Andrey V. Elsukov	61f376155d	Remove xform_ipip.c and code related to XF_IP4. The only thing is used from this code is ipip_output() function, that does IPIP encapsulation. Other parts of XF_IP4 code were removed in r275133. Also it isn't possible to configure the use of XF_IP4, nor from userland via setkey(8), nor from the kernel. Simplify the ipip_output() function and rename it to ipsec_encap(). * move IP_DF handling from ipsec4_process_packet() into ipsec_encap(); * since ipsec_encap() called from ipsec[64]_process_packet(), it is safe to assume that mbuf is contiguous at least to IP header for used IP version. Remove all unneeded m_pullup(), m_copydata and related checks. * use V_ip_defttl and V_ip6_defhlim for outer headers; * use V_ip4_ipsec_ecn and V_ip6_ipsec_ecn for outer headers; * move all diagnostic messages to the ipsec_encap() callers; * simplify handling of ipsec_encap() results: if it returns non zero value, print diagnostic message and free mbuf. * some style(9) fixes. Differential Revision: https://reviews.freebsd.org/D2303 Reviewed by: glebius Sponsored by: Yandex LLC	2015-04-18 16:38:45 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Andrey V. Elsukov	ba76ce40b4	Remove extra '&'. sin6 is already a pointer. PR: 195011 MFC after: 1 week	2015-03-07 18:44:52 +00:00
Andrey V. Elsukov	47568136c5	Fix possible memory leak and several races in the IPsec policy management code. Resurrect the state field in the struct secpolicy, it has IPSEC_SPSTATE_ALIVE value when security policy linked in the chain, and IPSEC_SPSTATE_DEAD value in all other cases. This field protects from trying to unlink one security policy several times from the different threads. Take additional reference in the key_flush_spd() to be sure that policy won't be freed from the different thread while we are sending SPDEXPIRE message. Add KEY_FREESP() call to the key_unlink() to release additional reference that we take when use key_getsp*() functions. Differential Revision: https://reviews.freebsd.org/D1914 Tested by: Emeric POUPON <emeric.poupon at stormshield dot eu> Reviewed by: hrs Sponsored by: Yandex LLC	2015-02-24 10:35:07 +00:00
Andrey V. Elsukov	b489a49fc0	key_spdget uses key_setdumpsp() without SPTREE_RLOCK held (it uses referenced pointer to sp). Remove SPTREE_RLOCK_ASSERT from key_setdumpsp() to fix wrong assertion. Reported by: Emeric POUPON Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-01-27 17:46:55 +00:00
Robert Watson	2a8c860fe3	In order to reduce use of M_EXT outside of the mbuf allocator and socket-buffer implementations, introduce a return value for MCLGET() (and m_cljget() that underlies it) to allow the caller to avoid testing M_EXT itself. Update all callers to use the return value. With this change, very few network device drivers remain aware of M_EXT; the primary exceptions lie in mbuf-chain pretty printers for debugging, and in a few cases, custom mbuf and cluster allocation implementations. NB: This is a difficult-to-test change as it touches many drivers for which I don't have physical devices. Instead we've gone for intensive review, but further post-commit review would definitely be appreciated to spot errors where changes could not easily be made mechanically, but were largely mechanical in nature. Differential Revision: https://reviews.freebsd.org/D1440 Reviewed by: adrian, bz, gnn Sponsored by: EMC / Isilon Storage Division	2015-01-06 12:59:37 +00:00
Andrey V. Elsukov	fe07a9d08f	Fix VIMAGE build.	2014-12-25 13:38:51 +00:00
Andrey V. Elsukov	93201211e9	Rename ip4_def_policy variable to def_policy. It is used by both IPv4 and IPv6. Initialize it only once in def_policy_init(). Remove its initialization from key_init() and make it static. Remove several fields from struct secpolicy: * lock - it isn't so useful having mutex in the structure, but the only thing we do with it is initialization and destroying. * state - it has only two values - DEAD and ALIVE. Instead of take a lock and change the state to DEAD, then take lock again in GC function and delete policy from the chain - keep in the chain only ALIVE policies. * scangen - it was used in GC function to protect from sending several SADB_SPDEXPIRE messages for one SPD entry. Now we don't keep DEAD entries in the chain and there is no need to have scangen variable. Use TAILQ to implement SPD entries chain. Use rmlock to protect access to SPD entries chain. Protect all SP lookup with RLOCK, and use WLOCK when we are inserting (or removing) SP entry in the chain. Instead of using pattern "LOCK(); refcnt++; UNLOCK();", use refcount(9) API to implement refcounting in SPD. Merge code from key_delsp() and _key_delsp() into _key_freesp(). And use KEY_FREESP() macro in all cases when we want to release reference or just delete SP entry. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-24 18:34:56 +00:00
Andrey V. Elsukov	a91150da31	Treat errors when retrieving security policy as policy violation. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:46:11 +00:00
Andrey V. Elsukov	e65ada3e3c	Initialize error variable. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:40:56 +00:00
Andrey V. Elsukov	0275b2e369	Remove flag/flags argument from the following functions: ipsec_getpolicybyaddr() ipsec4_checkpolicy() ip_ipsec_output() ip6_ipsec_output() The only flag used here was IP_FORWARDING. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 18:35:34 +00:00
Andrey V. Elsukov	619764beab	Remove flags and tunalready arguments from ipsec4_process_packet() and make its prototype similar to ipsec6_process_packet. The flags argument isn't used here, tunalready is always zero. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:34:49 +00:00
Andrey V. Elsukov	f0514a8b8a	Remove now unused mtag argument from ipsec*_common_input_cb. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:14:49 +00:00
Andrey V. Elsukov	08537f4526	Remove code related to PACKET_TAG_IPSEC_IN_CRYPTO_DONE mbuf tag. It isn't used in FreeBSD. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:07:21 +00:00
Andrey V. Elsukov	566cbcc82a	Remove unused mtag variable. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2014-12-11 17:01:53 +00:00
Andrey V. Elsukov	4268212124	key_getspacq() returns holding the spacq_lock. Unlock it in all cases. MFC after: 1 week Sponsored by: Yandex LLC	2014-12-07 06:47:00 +00:00
Andrey V. Elsukov	3d6aff5615	Fix style(9) and remove m_freem(NULL). Add XXX comment, it looks incorrect, because m_pkthdr.len is already incremented by M_PREPEND(). Sponsored by: Yandex LLC	2014-12-04 05:02:12 +00:00
Andrey V. Elsukov	18961126cb	Remove __P() macro. Suggested by: kevlo Sponsored by: Yandex LLC	2014-12-03 04:08:41 +00:00
Andrey V. Elsukov	2e84e6eac9	ANSIfy function declarations. Sponsored by: Yandex LLC	2014-12-03 03:50:54 +00:00
Andrey V. Elsukov	bd766f3425	Remove unneded check. No need to do m_pullup to the size that we prepended. Sponsored by: Yandex LLC	2014-12-02 05:28:40 +00:00
Andrey V. Elsukov	2d957916ef	Remove route chaching support from ipsec code. It isn't used for some time. * remove sa_route_union declaration and route_cache member from struct secashead; * remove key_sa_routechange() call from ICMP and ICMPv6 code; * simplify ip_ipsec_mtu(); * remove #include <net/route.h>; Sponsored by: Yandex LLC	2014-12-02 04:20:50 +00:00
Andrey V. Elsukov	1fea1b0889	Remove unused structure declarations. Sponsored by: Yandex LLC	2014-12-02 02:41:44 +00:00
Andrey V. Elsukov	0e23cc372d	Remove unused declartations. Sponsored by: Yandex LLC	2014-12-02 02:32:28 +00:00
Andrey V. Elsukov	ffbf9cdeb6	Remove ip4_input() declaration. It was removed in r275133. MFC after: 1 month	2014-11-27 00:27:39 +00:00
Andrey V. Elsukov	b05765d75f	Do not use xform_ipip as decapsulation fallback. xform_ipip was used as fallback with low priority for IPIP encapsulated packets that were decrypted. In some cases it can decapsulate packets, that it shouldn't. This leads to situations, when wrong configurations are magically working. Also it can propagate wrong ingress interface and this can break security. Now we redesigned the IPSEC code and IPIP encapsulation is called directly from ipsec_output, and decapsulation is done in the ipsec_input with m_striphdr. Differential Revision: https://reviews.freebsd.org/D1220 MFC after: 1 month Sponsored by: Yandex LLC	2014-11-26 17:44:49 +00:00
Andrey V. Elsukov	f9d8f66552	Count statistics for the specific address family. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 12:58:33 +00:00
Andrey V. Elsukov	612faae7a2	Strip IP header only when we act in tunnel mode. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 10:48:59 +00:00
Andrey V. Elsukov	ab2164e0b5	Remove redundant ip6_plen initialization. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-13 10:47:24 +00:00
Andrey V. Elsukov	67fd172767	ipsec6_process_packet is called before ip6_output fixes ip6_plen. Update ip6_plen before bpf processing to be able see correct value. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 22:51:30 +00:00
Andrey V. Elsukov	f3c93842bf	Fix ips_out_nosa errors accounting. MFC after: 1 week Sponsored by: Yandex LLC	2014-11-12 14:00:49 +00:00
Andrey V. Elsukov	b6e1ad3a3a	Pass mbuf to pfil processing before stripping outer IP header as it is described in if_enc(4). MFC after: 2 week Sponsored by: Yandex LLC	2014-11-07 12:05:20 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Andrey V. Elsukov	1f194d8ae1	When mode isn't explicitly specified (wildcard) and inner protocol isn't IPv4 or IPv6, assume it is the transport mode. Reported by: jmg MFC after: 1 week Sponsored by: Yandex LLC	2014-11-06 20:23:57 +00:00
Andrey V. Elsukov	f5196a58a0	Use in_localip() instead of handmade implementation. MFC after: 1 week Sponsored by: Yandex LLC	2014-10-31 12:19:22 +00:00
John Baldwin	a4432e6bf7	Use a static callout to drive key_timehandler() instead of timeout(). While here, make key_timehandler() private to key.c. Submitted by: bz (2) Tested by: bz	2014-10-23 20:43:16 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Andrey V. Elsukov	a28b277a9f	Do not strip outer header when operating in transport mode. Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP encapsulation, it will be handled with tunneling interface. And thus proper interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling interface will see packets in both directions. Sponsored by: Yandex LLC	2014-10-02 02:00:21 +00:00
Gleb Smirnoff	6ff8af1ca5	Mechanically convert to if_inc_counter().	2014-09-19 10:18:14 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Gleb Smirnoff	fcc34a238c	Fix style bug: rename the refcount field of m_ext to ext_cnt, to match other members. Sponsored by: Nginx, Inc.	2014-07-11 14:34:29 +00:00
Marko Zec	b01e3d0802	The assumption in ipsec4_process_packet() that the payload may be only IPv4 is wrong, so check the IP version before mangling the payload header.	2014-07-01 08:02:25 +00:00
Bjoern A. Zeeb	6d120f909c	Use IPv4 statistics in ipsec4_process_packet() rather than the IPv6 version. This also unbreaks the NOINET6 builds after r266800.	2014-05-28 23:01:20 +00:00
VANHULLEBUS Yvan	aaf2cfc0d6	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
Bjoern A. Zeeb	799653be1c	Only do a ports check if this is a NAT-T SA. Otherwise other lookups providing ports may get unexpected results. MFC After: 2 weeks	2014-05-24 09:29:23 +00:00
Andrey V. Elsukov	cd92c2e1ec	Remove _IP_VHL* macros and related ifdefs. MFC after: 1 week	2014-04-16 05:31:54 +00:00
Andrey V. Elsukov	e064642c9a	The check for local address spoofing lacks ifaddr locking. Remove these loops and use in_localip() and in6_localip() functions instead. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 16:58:32 +00:00
Andrey V. Elsukov	b48f1835a7	Remove unused variable. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 15:57:27 +00:00
Andrey V. Elsukov	c91a8e0ebe	Remove dead code. MFC after: 1 week Sponsored by: Yandex LLC	2014-04-04 15:55:38 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Andrey V. Elsukov	00a689c438	Initialize prot variable. PR: 177417 MFC after: 1 week	2013-11-11 13:19:55 +00:00
Gleb Smirnoff	eedc7fd9e8	Provide includes that are needed in these files, and before were read in implicitly via if.h -> if_var.h pollution. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:18:50 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Andrey V. Elsukov	6794f46021	Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski	2013-07-23 14:14:24 +00:00
Andrey V. Elsukov	db8c087944	Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat, ipsec4stat, ipsec6stat to PCPU counters.	2013-07-09 10:08:13 +00:00
Andrey V. Elsukov	c80211e3cf	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
Andrey V. Elsukov	a04d64d875	Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP, PFKEY. MFC after: 2 weeks	2013-06-20 11:44:16 +00:00
Andrey V. Elsukov	6659296cb0	Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. MFC after: 2 weeks	2013-06-20 09:55:53 +00:00
Andrey V. Elsukov	9cb8d207af	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
Gleb Smirnoff	dcba52a5f1	Use m_get2() + m_align() instead of hand made key_alloc_mbuf(). Code examination shows, that although key_alloc_mbuf() could return chains, the callers never use chains, so m_get2() should suffice. Sponsored by: Nginx, Inc.	2013-03-15 10:20:15 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Gleb Smirnoff	8ad458a471	Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.	2012-10-23 08:33:13 +00:00
Gleb Smirnoff	d2bffb140e	- Fix one more miss from r241913. - Add XXX comment about necessity of the entire block, that "fixes up" the IP header.	2012-10-23 08:22:01 +00:00
Gleb Smirnoff	20472bce45	Couple of changes missed from r241913, which converted IPv4 stack to network byte order.	2012-10-22 22:42:28 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Kevin Lo	9f614af4cf	Add missing break	2012-09-18 08:00:43 +00:00
VANHULLEBUS Yvan	d1b835208a	In NAT-T transport mode, allow a client to open a new connection just after closing another. It worked only in tunnel mode before. Submitted by: Andreas Longwitz <longwitz@incore.de> MFC after: 1M	2012-09-12 12:14:50 +00:00
Gleb Smirnoff	d6d3f01e0a	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
John Baldwin	2541fcd953	Unexpand a couple of TAILQ_FOREACH()s.	2012-08-17 16:01:24 +00:00
Bjoern A. Zeeb	174b0d419b	Fix a bug introduced in r221129 that leads to a panic wen using bundled SAs. For now allow same address family bundles. While discovered with ESP and AH, which does not make a lot of sense, IPcomp could be a possible problematic candidate. PR: kern/164400 MFC after: 3 days	2012-07-22 17:46:05 +00:00
Bjoern A. Zeeb	81d5d46b3c	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.	2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb	83e521ec73	Clean up some #endif comments removing from short sections. Add #endif comments to longer, also refining strange ones. Properly use #ifdef rather than #if defined() where possible. Four #if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to avoid conflicts with eventually upcoming changes for RSS. Reported by: bde (most) Reviewed by: bde MFC after: 3 days	2012-01-22 02:13:19 +00:00
Pawel Jakub Dawidek	d3e8e66d75	Remove unused 'plen' variable.	2011-11-26 23:57:03 +00:00
Pawel Jakub Dawidek	cdb7ebe38c	The esp_max_ivlen global variable is not needed, we can just use EALG_MAX_BLOCK_LEN.	2011-11-26 23:27:41 +00:00
Pawel Jakub Dawidek	5be4c9b9e6	malloc(M_WAITOK) never fails, so there is no need to check for NULL.	2011-11-26 23:18:19 +00:00
Pawel Jakub Dawidek	0e4fb1db44	Eliminate 'err' variable and just use existing 'error'.	2011-11-26 23:15:28 +00:00
Pawel Jakub Dawidek	0a95a08ecb	Simplify code a bit.	2011-11-26 23:13:30 +00:00
Pawel Jakub Dawidek	b6a4c9acdb	There is no need to virtualize esp_max_ivlen.	2011-11-26 23:11:41 +00:00
Christian Brueffer	4795003bd2	Add missing va_end() in an error case to clean up after va_start() (already done in the non-error case). CID: 4726 Found with: Coverity Prevent(tm) MFC after: 1 week	2011-10-07 21:00:26 +00:00
Bjoern A. Zeeb	e0bfbfce79	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
VANHULLEBUS Yvan	568fac6f2e	Release SP's refcount in key_get_spdbyid(). PR: 156676 Submitted by: Tobias Brunner (tobias@strongswan.org) MFC after: 1 week	2011-05-09 13:16:21 +00:00
Bjoern A. Zeeb	db178eb816	Make IPsec compile without INET adding appropriate #ifdef checks. Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c to not need three different versions depending on INET, INET6 or both. Mark two places preparing for not yet supported functionality with IPv6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:28:42 +00:00
Bjoern A. Zeeb	dc49da9761	Do not allow recursive RFC3173 IPComp payload. Reviewed by: Tavis Ormandy (taviso cmpxchg8b.com) MFC after: 5 days Security: CVE-2011-1547	2011-04-01 14:13:49 +00:00
Fabien Thomas	73171433c5	Optimisation in IPSEC(4): - Remove contention on ISR during the crypto operation by using rwlock(9). - Remove a second lookup of the SA in the callback. Gain on 6 cores CPU with SHA1/AES128 can be up to 30%. Reviewed by: vanhu MFC after: 1 month	2011-03-31 15:23:32 +00:00
Fabien Thomas	11d2f4df50	Fix two SA refcount: - AH does not release the SA like in ESP/IPCOMP when handling EAGAIN - ipsec_process_done incorrectly release the SA. Reviewed by: vanhu MFC after: 1 week	2011-03-31 13:14:24 +00:00
VANHULLEBUS Yvan	442da28aeb	Fixed IPsec's HMAC_SHA256-512 support to be RFC4868 compliant. This will break interoperability with all older versions of FreeBSD for those algorithms. Reviewed by: bz, gnn Obtained from: NETASQ MFC after: 1w	2011-02-18 09:40:13 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Bjoern A. Zeeb	13a6cf24ac	Announce both IPsec and UDP Encap (NAT-T) if available for feature_present(3) checks. This will help to run-time detect and conditionally handle specific optionas of either feature in user space (i.e. in libipsec). Descriptions read by: rwatson MFC after: 2 weeks	2010-10-30 18:52:44 +00:00
Thomas Quinot	94294cada5	Fix typo in comment.	2010-10-25 16:11:37 +00:00
Bjoern A. Zeeb	4a85b5e2ea	Make the IPsec SADB embedded route cache a union to be able to hold both the legacy and IPv6 route destination address. Previously in case of IPv6, there was a memory overwrite due to not enough space for the IPv6 address. PR: kern/122565 MFC After: 2 weeks	2010-10-23 20:35:40 +00:00
Bjoern A. Zeeb	acf456a04a	Remove dead code: assignment to a local variable not used anywhere after that. MFC after: 3 days	2010-10-14 15:15:22 +00:00
Bjoern A. Zeeb	e046b77ee1	Style: make the asterisk go with the variable name, not the type. MFC after: 3 days	2010-10-14 14:49:49 +00:00
Bjoern A. Zeeb	3abaa08643	MFp4 @178283: Improve IPsec flow distribution for better netisr parallelism. Instead of using the pointer that would have the last bits masked in a % statement in netisr_select_cpuid() to select the queue, use the SPI. Reviewed by: rwatson MFC after: 4 weeks	2010-05-24 16:27:47 +00:00
VANHULLEBUS Yvan	2e8d55c4e8	Set SA's natt_type before calling key_mature() in key_add(), as the SA may be used as soon as key_mature() has been done. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:58:58 +00:00
VANHULLEBUS Yvan	2d2a2083f7	Update SA's NAT-T stuff before calling key_mature() in key_update(), as SA may be used as soon as key_mature() has been called. Obtained from: NETASQ MFC after: 1 week	2010-05-05 08:55:26 +00:00
Bjoern A. Zeeb	82cea7e6f3	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
VANHULLEBUS Yvan	61f73308d4	Locks SPTREE when setting some SP entries to state DEAD. This can prevent kernel panics when updating SPs while there is some traffic for them. Obtained from: NETASQ MFC after: 1m	2010-04-15 12:40:33 +00:00
Ermal Luçi	87a25418ac	Fix a logic error in ipsec code that extracts information from the packets. Reviewed by: bz, mlaier Approved by: mlaier(mentor) MFC after: 1 month	2010-04-02 18:15:23 +00:00
Bjoern A. Zeeb	8b7893b056	When tearing down IPsec as part of a (virtual) network stack, do not try to free the same list twice but free both the acquiring list and the security policy acquiring list. Reviewed by: anchie MFC after: 3 days	2010-03-28 06:51:50 +00:00
Pawel Jakub Dawidek	d0d6567d4a	Correct typo in comment.	2010-02-18 22:34:29 +00:00
Bjoern A. Zeeb	a77cb332ee	Enable IPcomp by default. PR: kern/123587 MFC after: 5 days	2009-11-29 20:47:43 +00:00
Bjoern A. Zeeb	90b4c081d0	Add more statistics variables for IPcomp. Try to version the struct in a backward compatible way. People asked for the versioning of the stats structs in general before. MFC after: 5 days	2009-11-29 20:37:30 +00:00
Bjoern A. Zeeb	10229cd109	Assimilate very similar input and output code paths (no real functional change). MFC after: 5 days	2009-11-29 17:47:49 +00:00
Bjoern A. Zeeb	afa47e51aa	Only add the IPcomp header if crypto reported success and we have a lower payload size. Before we had always added the header, no matter if we actually send out compressed data or not. With this, after the opencrypto/deflate changes, IPcomp starts to work apart from edge cases. Leave it disabled by default until those are fixed as well. PR: kern/123587 MFC after: 5 days	2009-11-29 10:53:34 +00:00
Bjoern A. Zeeb	3d34d241be	Remove whitespace. MFC after: 6 days	2009-11-28 21:42:39 +00:00
Bjoern A. Zeeb	4ff9852103	Directly send data uncompressed if the packet payload size is lower than the compression algorithm threshold. MFC after: 6 days	2009-11-28 21:40:57 +00:00
Bjoern A. Zeeb	023795f033	Correct a typo. MFC after: 6 days	2009-11-28 21:01:26 +00:00
VANHULLEBUS Yvan	3e6265f14d	fixed two race conditions when inserting/removing SAs via PFKey, which can both lead to a kernel panic when adding/removing quickly a lot of SAs. Obtained from: NETASQ MFC after: 2w (MFC on 8 before 8.0 release ???)	2009-11-17 16:00:41 +00:00
VANHULLEBUS Yvan	a45bff047c	Changed an IPSEC_ASSERT to a simple test, as such invalid packets may come from outside without being discarded before. Submitted by: aurelien.ansel@netasq.com Reviewed by: bz (secteam) Obtained from: NETASQ MFC after: 1m	2009-10-01 15:33:53 +00:00
VANHULLEBUS Yvan	22c125a1b6	When checking traffic endpoint's adresses families in key_spdadd(), compare them together instead of comparing each one with respective tunnel endpoint. PR: kern/138439 Submitted by: aurelien.ansel@netasq.com Obtained from: NETASQ MFC after: 1 m	2009-09-16 11:56:44 +00:00
Pawel Jakub Dawidek	fc79063e66	Silent gcc? Yeah, you wish. What I ment was to silence gcc. Spotted by: julian	2009-09-06 19:05:03 +00:00
Pawel Jakub Dawidek	3b02c4a3d3	Initialize state_valid and arraysize variable so gcc won't complain. Reported by: bz	2009-09-06 18:09:25 +00:00
Pawel Jakub Dawidek	950ab2f81e	Improve code a bit by eliminating goto and having one unlock per lock.	2009-09-06 07:32:16 +00:00
Pawel Jakub Dawidek	cee0fa809b	Correct typo in comment.	2009-09-06 07:30:21 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Robert Watson	d1da0a0672	Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks	2009-06-25 16:35:28 +00:00
Robert Watson	2d9cfabad4	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
Robert Watson	80af0152f3	Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)	2009-06-24 21:00:25 +00:00
Bjoern A. Zeeb	57700c9e4d	Move setting of ports from NAT-T below key_getsah() and actually below key_setsaval(). Without that, the lookup for the SA had failed as we were looking for a SA with the new, updated port numbers instead of the old ones and were comparing the ports in key_cmpsaidx(). This makes updating the remote -> local SA on the initiator work again. Problem introduced with: p4 changeset 152114	2009-06-19 21:01:55 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
VANHULLEBUS Yvan	7b495c4494	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
Bjoern A. Zeeb	fc228fbf49	Properly hide IPv4 only variables and functions under #ifdef INET.	2009-06-10 19:25:46 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
VANHULLEBUS Yvan	aa1faa5fc6	Lock SPTREE before parsing it in key_spddump() Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 2 weeks	2009-05-27 09:44:14 +00:00
VANHULLEBUS Yvan	cff5821a61	Only decrease refcnt once when flushing SPD entries, to avoid flushing entries which are still used. Approved by: gnn(mentor) Obtained from: NETASQ MFC after: 1 month	2009-05-27 09:31:50 +00:00
Bjoern A. Zeeb	db2e47925e	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Bruce M Simpson	d9cc2ca298	Stub out IN6_LOOKUP_MULTI() for GETSPI requests, for now. This has the effect that IPv6 multicast traffic won't trigger an SPI allocation when IPSEC is in use, however, this obviously needs to stomp on locks, and IN6_LOOKUP_MULTI() is about to go away. This definitely needs to be revisited before 8.x is branched as a release branch.	2009-04-29 11:15:58 +00:00
Bjoern A. Zeeb	f4ad3139bb	key_gettunnel() has been unsued with FAST_IPSEC (now IPSEC). KAME had explicit checks at one point using it, so just hide it behind #if 0 for now until we are sure if we can completely dump it or not. MFC after: 1 month	2009-04-27 21:04:16 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00

1 2 3 4 5 ...

412 Commits