freebsd-dev

Author	SHA1	Message	Date
Hiroki Sato	dead19563c	Allow to configure net.inet6.ip6.{accept_rtadv,no_radr} by the loader tunables as well because they have to be configured before interface initialization for AF_INET6.	2012-03-02 07:23:28 +00:00
Hiroki Sato	a1875676ca	Remove a redundant check.	2012-03-02 07:22:04 +00:00
Bjoern A. Zeeb	5aa7e8edc5	In selectroute() add a missing fibnum argument to an in6_rtalloc() call in an #if 0 section. In in6_selecthlim() optimize a case where in6p cannot be NULL due to an earlier check. More consistently use u_int instead of int for fibnum function arguments. Sponsored by: Cisco Systems, Inc. MFC after: 3 days	2012-02-24 20:06:04 +00:00
Kip Macy	a93cda789a	When using flowtable llentrys can outlive the interface with which they're associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid. Move the free pointer in to the llentry itself and update the initalization sites. MFC after: 2 weeks	2012-02-23 18:21:37 +00:00
Michael Tuexen	310a027788	Remove two clang warnings. MFC after: 1 month.	2012-02-18 16:06:15 +00:00
Bjoern A. Zeeb	1b46c7f832	Allow to provide a hint to in6_selectsrc() for the interface using the return ifnet double pointer. Pass that hint down to in6_selectif() to be used when i) the default FIB is queried and ii) route lookup fails because the network is not present (i.e. someone deleted the connected subnet). This hint should not be generally used from anywhere outside the neighbor discovery code. We just make use of it from nd6_ns_output(). Extend the nd6_na_output() interface by a nd6_na_output_fib() version and pass the FIB number from the NS mbuf on to NA to allow the new mbuf to inherit the FIB tag and a later lookup from ip6_output() to succeed in the aformentioned example case. Provide a wrapper function for the old public interface also used from CARP but mark it with BURN_BRIDGES to cleanup in HEAD after MFC. Sponsored by: Cisco Systems, Inc.	2012-02-14 11:51:32 +00:00
Bjoern A. Zeeb	81d5d46b3c	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.	2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb	ee799639e8	Add SO_SETFIB option support on PF_INET6 sockets and allow inheriting the FIB number from the process, as set by setfib(2), on socket creation. Sponsored by: Cisco Systems, Inc.	2012-02-03 11:00:53 +00:00
Bjoern A. Zeeb	db566a23b6	Provide the IPv6 counterpart to the extended IPv4 rtalloc(9) KPI. Sponsored by: Cisco Systems, Inc.	2012-02-03 09:33:58 +00:00
Bjoern A. Zeeb	5490110cf5	In preparation for multi-FIB IPv6 support, factor the code for joining and leaving multicast groups out from in6_update_ifa() and in6_purgeaddr(). Sponsored by: Cisco Systems, Inc.	2012-02-03 08:50:19 +00:00
Gleb Smirnoff	9c2ae3b1c8	Remove casts from inet6 address testing macros, thus preserving qualifier from original argument. Obtained from: NetBSD, r. 1.67 Submitted by: maxim	2012-01-26 12:04:19 +00:00
Sergey Kandaurov	8e4609a4a3	Remove unused variable. The actual ia6->ia6_lifetime access is hidden in IFA6_IS_INVALID/IFA6_IS_DEPRECATED macros since a long time ago (see netinet6/nd6.c, r1.104 of KAME for the reference). MFC after: 3 days	2012-01-25 08:53:42 +00:00
Bjoern A. Zeeb	4aa7588c8f	Plug a possible ifa_ref leak in case of premature return from in6_purgeaddr(). Reviewed by: rwatson MFC after: 3 days	2012-01-24 13:57:30 +00:00
Sergey Kandaurov	373de5d88b	Remove the stale XXX rt_newaddrmsg comment. A routing socket message is generated since r192282. Reviewed by: bz MFC after: 3 days	2012-01-24 09:51:42 +00:00
Bjoern A. Zeeb	e0f1891c48	Remove unnecessary line break. MFC after: 3 days	2012-01-24 06:21:38 +00:00
Bjoern A. Zeeb	83e521ec73	Clean up some #endif comments removing from short sections. Add #endif comments to longer, also refining strange ones. Properly use #ifdef rather than #if defined() where possible. Four #if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to avoid conflicts with eventually upcoming changes for RSS. Reported by: bde (most) Reviewed by: bde MFC after: 3 days	2012-01-22 02:13:19 +00:00
Michael Tuexen	f7f2907a7e	Small cleanup, no functional change.	2012-01-15 14:03:05 +00:00
Michael Tuexen	c58e60be43	Add an SCTP sysctl "blackhole", similar to the one for TCP. If set to 1, no ABORT is sent back in response to an incoming INIT. If set to 2, no ABORT is sent back in response to an out of the blue packet. If set to 0 (the default), ABORTs are sent. Discussed with rrs@. MFC after: 1 month.	2012-01-08 09:56:24 +00:00
John Baldwin	137f91e80f	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
Bjoern A. Zeeb	9d9b92f299	Mark a couple of file local functions static and stop exporting them. MFC after: 1 week	2012-01-05 01:14:35 +00:00
Bjoern A. Zeeb	f67e13d66d	Convert an #ifdef DIAGNOSTIC if/panic to a KASSERT. MFC after: 1 week	2012-01-05 01:13:25 +00:00
John Baldwin	19b0c9b246	Use the mli_relinmhead list normally used to defer calls to in6m_release_locked() to defer calls to mld_v1_transmit_report() until after the IF_ADDR_LOCK is dropped. This removes a race where the lock is dropped and reacquired while attempting to walk an interface's address list. Reviewed by: bz MFC after: 1 week	2012-01-04 13:35:20 +00:00
Gleb Smirnoff	1331bbc33f	Use correct locking when traversing interface address list. Reviewed by: bz	2012-01-04 07:01:23 +00:00
John Baldwin	c6f4ea8062	When cancelling multicast timers on an interface, don't release the reference on a group in the leaving state while iterating over the loop. Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach() of placing the groups to free on pending release list and then releasing the references after dropping the IF_ADDR_LOCK. This closes an ugly race where the code was dropping the lock in the middle of iterating over the list. It also fixes some additional potential use-after-free bugs since the cancellation routine also applied other changes to the group after dropping the reference. Now those changes are performed before the reference is dropped and the group is potentially freed. Prodded to fix by: glebius Reviewed by: bz MFC after: 1 week	2012-01-03 20:34:52 +00:00
John Baldwin	9f745f61b8	Grab a reference on the matching interface address (ifa) in the handling of the SIOC[DG]LIFADDR icotls before dropping the IF_ADDR_LOCK() and release the reference after using it. This prevents the address from being potentially freed out from under the ioctl handler. Reviewed by: bz MFC after: 1 week	2012-01-03 19:44:36 +00:00
John Baldwin	f5b50e25ec	Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that do not modify the queues they iterate over. Submitted by: glebius	2012-01-03 16:22:29 +00:00
Bjoern A. Zeeb	ad05fc1d2d	Remove an uneeded inpcb forward declaration and align the function declaration following to match the style in the rest of the file. MFC after: 3 days	2012-01-02 13:03:13 +00:00
Bjoern A. Zeeb	1a69707f40	Remove a declaration to a non-existent function. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2011-12-31 16:19:22 +00:00
John Baldwin	3b0b2840be	Use queue(3) macros instead of home-rolled versions in several places in the INET6 code. This includes retiring the 'ndpr_next' and 'pfr_next' macros. Submitted by: pluknet (earlier version) Reviewed by: pluknet	2011-12-29 18:25:18 +00:00
Michael Tuexen	60990c0c06	Address issues found by clang. While there, fix also some style issues. MFC after: 3 months.	2011-12-27 10:16:24 +00:00
John Baldwin	94e8313349	Fix a bug where TAILQ_FIRST(&V_ifnet) was accessed without holding the proper lock. Reviewed by: bz MFC after: 1 week	2011-12-24 18:11:54 +00:00
Gleb Smirnoff	7121247312	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
Maxim Konovalov	d96ea877a7	o Convert IPv6 read-only stats sysctls to the read-write ones. o Teach netstat(1) -z to reset these stats sysctls. PR: bin/153206 Reviewed by: glebuis Sponsored by: NGINX, Inc. MFC after: 1 month	2011-12-19 05:50:34 +00:00
Michael Tuexen	7215cc1b74	Fix unused parameter warnings. While there, fix some whitespace issues. MFC after: 3 months.	2011-12-17 19:21:40 +00:00
Gleb Smirnoff	08b68b0e4c	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
Gleb Smirnoff	6d18ea8ff9	Fix double free. PR: kern/163089 Submitted by: Herbie Robinson <Herbie.Robinson stratus.com>	2011-12-07 13:37:42 +00:00
Bjoern A. Zeeb	08907004c6	Return the correct value for the IPV6_MULTICAST_HOPS getsockopt() call. Submitted by: rpaulo MFC after: 3 days	2011-11-13 02:32:10 +00:00
Qing Li	0f1aca6519	A default route learned from the RAs could be deleted manually after its installation. This removal may be accidental and can prevent the default route from being installed in the future if the associated default router has the best preference. The cause is the lack of status update in the default router on the state of its route installation in the kernel FIB. This patch fixes the described problem. Reviewed by: hrs, discussed with hrs MFC after: 5 days	2011-11-11 23:22:38 +00:00
Mikolaj Golub	040ee1ec95	Fix false positive EADDRINUSE that could be returned by bind, due to the typo made in r227207. Reported by: kib Tested by: kib	2011-11-11 14:09:09 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Gleb Smirnoff	37c1ff48a9	In icmp6_redirect_input: - Assert that we got a valid mbuf with rcvif pointer. [1] - Use __func__ in logging. Submitted by: prabhakar lakhera <prabhakar.lakhera gmail.com> [1] Submitted by: Kristof Provost <kristof sigsegv.be> [1]	2011-11-07 14:22:18 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Mikolaj Golub	fc06cd427e	Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid inp_socket->so_options dereference when we may not acquire the lock on the inpcb. This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked. Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>	2011-11-06 10:47:20 +00:00
Mikolaj Golub	29381b363b	Before dereferencing intotw() check for NULL, the same way as it is done for in_pcb (see r157474). MFC after: 1 week	2011-11-06 09:29:52 +00:00
Sergey Kandaurov	6ba404cab8	Remove a couple of write-only variables.	2011-11-03 09:09:05 +00:00
Qing Li	14417253d8	The code change made in r226040 was incomplete and resulted in routes such as fe80::1%lo0 no being installed. This patch completes the original intended fix. Reviewed by: hrs, bz MFC after: 3 days	2011-10-16 22:24:04 +00:00
Qing Li	e74eb12c68	The IPv6 code was influx at the time of r196865 due to the L2/L3 separation rewrite changes. r196865 was committed to fix a scope violation problem in the following test scenario: box-1# ifconfig em0 inet6 2001:db8:1:: prefixlen 64 anycast box-1# ifconfig em1 inet6 2001:db8:2::1 prefixlen 64 box-2# ifconfig re0 inet6 2001:db8:1::6 prefixlen 64 em0 and re0 are on the same link. box-2# ping6 2001:db8:1:: PING6(56=40+8+8 bytes) 2001:db8:1::6 --> 2001:db8:1:: the ICMPv6 response should have a source address of em1, which is 2001:db8:2::1, not the link-local address of em0. That code is no longer necessary and breaks the IPv6-Ready logo testing, so revert it now. Reviewed by: hrs MFC after: 3 days	2011-10-16 22:15:13 +00:00
Hiroki Sato	154d5f7321	Fix a problem that an interface unexpectedly becomes IFF_UP by just doing "ifconfing inet6 -ifdisabled" when the interface has ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.	2011-10-16 19:46:52 +00:00
Gleb Smirnoff	d5378bb633	Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation.	2011-10-13 13:33:23 +00:00
Gleb Smirnoff	b590b6ae80	Restore functions in6_ifaddloop() and in6_ifremloop() that were inlined by Qing Li in his big new-ARP commit. I am going to utilize them in my newcarp work, and also these functions left declared in in6_var.h for all the time they were absent. Reviewed by: bz	2011-10-13 13:05:36 +00:00
Qing Li	6c6aa80c9d	The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to determine if a loopback route should be installed for an interface IPv6 address. Another condition is the address must not belong to a looopback interface. Reviewed by: hrs MFC after: 3 days	2011-10-05 16:27:11 +00:00
Bjoern A. Zeeb	d7ae37140a	Fix an obvious bug from r186196 shadowing a variable, not correctly appending the new mbuf to the chain reference but possibly causing an mbuf nextpkt loop leading to a memory used after handoff (or having been freed) and leaking an mbuf here. Reviewed by: rwatson, brooks MFC after: 3 days	2011-09-30 18:20:16 +00:00
Kip Macy	1eeb6d97d0	Make KBI changes required for future MFCing of inpcb rtentry / llentry caching. Reviewed by: rwatson, bz Approved by: re (kib)	2011-09-20 20:27:26 +00:00
Hiroki Sato	6090ab8bd6	Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes inconsistency when options are specified by both setsockopt() and ancillary data types. PR: kern/158307 Approved by: re (bz)	2011-09-20 00:29:17 +00:00
Hiroki Sato	049087a0f3	Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE (r225485). When setting an interface name to it, the following configurations will be enabled: 1. "no_radr" is set to all IPv6 interfaces automatically. 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif. This is done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this means you can manually supersede this configuration if necessary). 3. The node will add RA-sending routers to the default router list even if net.inet6.ip6.forwarding=1. This mode is added to conform to RFC 6204 (a router which connects the end-user network to a service provider network). To enable packet forwarding, you still need to set ipv6_gateway_enable=YES. Note that accepting router entries into the default router list when packet forwarding capability and a routing daemon are enabled can result in messing up the routing table. To minimize such unexpected behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif. Approved by: re (bz)	2011-09-13 00:06:11 +00:00
Sergey Kandaurov	0ad2addc9d	Fix if_addr_mtx recursion in mld6. mld_set_version() is called only from mld_v1_input_query() and mld_v2_input_query() both holding the if_addr_mtx lock, and then calling into mld_v2_cancel_link_timers() acquires it the second time, which results in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after mld_set_version() is called; while here, further reduce locking scope to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked(). PR: kern/158426 Reported by: Thomas <tps vr-web.de>, Tom Vijlbrief <tom.vijlbrief xs4all.nl> Tested by: Tom Vijlbrief Reviewed by: bz Approved by: re (kib)	2011-08-22 23:39:40 +00:00
Bjoern A. Zeeb	8a006adb24	Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)	2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb	90bc35de38	Add an in6_localip() helper function as in6_localaddr() is not doing what people think: returning true for an address in any connected subnet, not necessarily on the local machine. Sponsored by: Sandvine Incorporated MFC after: 2 weeks Approved by: re (kib)	2011-08-20 16:43:47 +00:00
Michael Tuexen	ca85e9482a	The result of a joint work between rrs@ and myself at the IETF: * Decouple the path supervision using a separate HB timer per path. * Add support for potentially failed state. * Bring back RTO.min to 1 second. * Accept packets on IP-addresses already announced via an ASCONF * While there: do some cleanups. Approved by: re@ MFC after: 2 months.	2011-08-03 20:21:00 +00:00
Michael Tuexen	78d9a31d3a	The socket API only specifies SCTP for SOCK_SEQPACKET and SOCK_STREAM, but not SOCK_DGRAM. So don't register it for SOCK_DGRAM. While there, fix some indentation.	2011-07-12 19:29:29 +00:00
Marko Zec	13e255fab7	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
Bjoern A. Zeeb	e0bfbfce79	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
Bjoern A. Zeeb	869052041d	Add the missing call to ip6_ipsec_filtertunnel() to be able to control whether decapsulated IPsec packets will be passed to pfil again depending on the setting of the net.ip6.ipsec6.filtertunnel sysctl. PR: kern/157670 Submitted by: Manuel Kasper (mk neon1.net) MFC after: 2 weeks	2011-06-08 10:59:36 +00:00
Bjoern A. Zeeb	ffe8cd7b10	Correct comments and debug logging in ipsec to better match reality. MFC after: 3 days	2011-06-08 03:02:11 +00:00
Robert Watson	52cd27cb58	Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-06 12:55:02 +00:00
Hiroki Sato	23be782526	Do not activate automatic LL addr configuration when 0/1->1 transition of ND6_IFF_IFDISABLED flag.	2011-06-06 04:12:57 +00:00
Hiroki Sato	77bc49858c	- Make the code more proactively clear an ND6_IFF_IFDISABLED flag when an explicit action for INET6 configuration happens. The changes are: 1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl, setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers an attempt to clear the ND6_IFF_IFDISABLED flag. 2. When an AF_INET6 address is added successfully to an interface and it is marked as ND6_IFF_IFDISABLED, an attempt to clear the ND6_IFF_IFDISABLED happens. This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8); in most cases manual configuration is no longer needed. - When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure a link-local address. This change ensures link-local address configuration when "ifconfig IF inet6" command is invoked. For example, "ifconfig IF inet6 auto_linklocal" now always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already set to 1 (i.e. down/up cycle is no longer needed). Reviewed by: bz	2011-06-06 02:37:38 +00:00
Hiroki Sato	e7fa8d0ada	- Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1. - A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR. This controls if accepting a route in an RA message as the default route. The default value for each interface can be set by net.inet6.ip6.no_radr. The system wide default value is 0. - A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in NA on RA accepting interfaces. The default is 0 (R-bit is set based on net.inet6.ip6.forwarding). Background: IPv6 host/router model suggests a router sends an RA and a host accepts it for router discovery. Because of that, KAME implementation does not allow accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can make the routing table confused since it can change the default router unintentionally. However, in practice there are cases where we cannot distinguish a host from a router clearly. For example, a customer edge router often works as a host against the ISP, and as a router against the LAN at the same time. Another example is a complex network configurations like an L2TP tunnel for IPv6 connection to Internet over an Ethernet link with another native IPv6 subnet. In this case, the physical interface for the native IPv6 subnet works as a host, and the pseudo-interface for L2TP works as the default IP forwarding route. Problem: Disabling processing RA messages when net.inet6.ip6.forwarding=1 and accepting them when net.inet6.ip6.forward=0 cause the following practical issues: - A router cannot perform SLAAC. It becomes a problem if a box has multiple interfaces and you want to use SLAAC on some of them, for example. A customer edge router for IPv6 Internet access service using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the physical interface for administration purpose; updating firmware and so on (link-local addresses can be used there, but GUAs by SLAAC are often used for scalability). - When a host has multiple IPv6 interfaces and it receives multiple RAs on them, controlling the default route is difficult. Router preferences defined in RFC 4191 works only when the routers on the links are under your control. Details of Implementation Changes: Router Advertisement messages will be accepted even when net.inet6.ip6.forwarding=1. More precisely, the conditions are as follow: (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding) => Normal RA processing on that interface. (as IPv6 host) (ACCEPT_RTADV && (NO_RADR \|\| ip6.forwarding)) => Accept RA but add the router to the defroute list with rtlifetime=0 unconditionally. This effectively prevents from setting the received router address as the box's default route. (!ACCEPT_RTADV) => No RA processing on that interface. ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface are classified as "RA-accepting" or not. An RA-accepting interface always processes RA messages regardless of ip6.forwarding. The difference caused by NO_RADR or ip6.forwarding is whether the RA source address is considered as the default router or not. R-bit in NA on the RA accepting interfaces is set based on net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests a router should disable the R-bit completely even when the box has net.inet6.ip6.forwarding=1, I believe there is no technical reason with doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif (the default is 0). Usage: # ifconfig fxp0 inet6 accept_rtadv => accept RA on fxp0 # ifconfig fxp0 inet6 accept_rtadv no_radr => accept RA on fxp0 but ignore default route information in it. # sysctl net.inet6.ip6.norbit_no_radr=1 => R-bit in NAs on RA accepting interfaces will always be set to 0.	2011-06-06 02:14:23 +00:00
Hiroki Sato	7de7a90404	Use uint8_t for sockaddr sa_len. Reviewed by: bz	2011-06-05 11:40:30 +00:00
Robert Watson	d3c1f00350	Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-04 16:33:06 +00:00
Robert Watson	fa046d8774	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-30 09:43:55 +00:00
Bjoern A. Zeeb	8d5a3ca77b	Add FEATURE() definitions for IPv4 and IPv6 so that we can use feature_present(3) to dynamically decide whether to use one or the other family. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 10 days	2011-05-25 00:34:25 +00:00
Robert Watson	68e0d7e06a	Move from passing a wildcard boolean to a general set up lookup flags into in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly for IPv6 functions. In the future, we would like to support other flags relating to locking strategy. This change doesn't appear to modify the KBI in practice, as callers already passed in INPLOOKUP_WILDCARD rather than a simple boolean. MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-23 15:23:18 +00:00
Qing Li	5b84dc789a	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
Michael Tuexen	274b0bd51d	Remove code with any effect.	2011-05-03 20:34:02 +00:00
Michael Tuexen	e6194c2ed4	Improve compilation of SCTP code without INET support. Some bugs where fixed while doing this: * ASCONF-ACK messages might use wrong port number when using IPv6. * Checking for additional addresses takes the correct address into account and also does not do more comparisons than necessary. This patch is based on one received from bz@ who was sponsored by The FreeBSD Foundation and iXsystems. MFC after: 1 week	2011-04-30 11:18:16 +00:00
Bjoern A. Zeeb	79288c112c	Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only as well compiling out most functions adding or extending #ifdef INET coverage. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:17:00 +00:00
Bjoern A. Zeeb	67107f4594	Make the PCB code compile without INET support by adding #ifdef INETs and correcting few #includes. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:04:34 +00:00
Bjoern A. Zeeb	db178eb816	Make IPsec compile without INET adding appropriate #ifdef checks. Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c to not need three different versions depending on INET, INET6 or both. Mark two places preparing for not yet supported functionality with IPv6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:28:42 +00:00
Bernd Walter	cae54c668c	correct variable type name in comment	2011-04-25 09:00:52 +00:00
Bjoern A. Zeeb	1024547144	MFp4 CH=191760,191770: Not compiling in and not initializing from inetsw from in_proto.c for IPv6 only, we need to initialize upper layer protocols from inet6sw. Make sure to not initialize them twice in a Dual-Stack environment but only conditionally on no INET as we have done for TCP for a long time. Otherwise we would leak resources. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days	2011-04-20 08:05:23 +00:00
Bjoern A. Zeeb	0105c5eb47	Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always passing the cached proxydl reference (sockaddr_dl initialized or not) to nd6_na_output(). nd6_na_output() will thus assume a proxy NA. Revert to conditionally passing either &proxydl or NULL if no proxy case desired. Tested by: ipv6gw and ref9-i386 Reported by: Pete French (petefrench ingresso.co.uk on stable) Reported by: bz, simon on Y! cluster Reported by: kib PR: kern/151908 MFC after: 3 days	2011-04-17 16:07:08 +00:00
Bjoern A. Zeeb	e2a4005dcc	Remove a check in udp6_send() that prevented v4-mapped v6 addresses from working. We store v4 and v6 addresses as a union but for v4-mapped addresses only store the 32bits w/o the ::ffff: word. That failed the check as for example 127.0.0.1 would be ::7f00:1 rather than ::ffff:7f00:1 and the IN6_IS_ADDR_V4MAPPED() never worked here. Given we can hardly get here with an unbound local address or invalid inp_vflags remove the check. Reported by: tuexen Reviewed by: tuexen MFC after: 3 days	2011-04-09 02:22:49 +00:00
Bjoern A. Zeeb	9537bb47b7	After r219579 and r219779 unbreak v4-mapped v6 sockets for UDP some more. Similar to what we do for TCP check for v4-mapped addresses and then handle them or the normal v6 address case. For either set inp_vflags before calling into the pcb connect function so that we have an unambiguous view in case we need to set the local address or port. Looked at: tuexen (as part of more) MFC after: 3 days	2011-04-09 01:29:46 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	efc76f729a	Merge the two identical implementations for local port selections from in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport(). MFC after: 2 weeks	2011-03-12 21:46:37 +00:00
Bjoern A. Zeeb	4a2b25621f	Push a possible "unbind" in some situation from in6_pcbsetport() to callers. This also fixes a problem when the prison call could set the inp->in6p_laddr (laddr) and a following priv_check_cred() call would return an error and will allow us to merge the IPv4 and IPv6 implementation. MFC after: 2 weeks	2011-03-12 16:45:15 +00:00
Bjoern A. Zeeb	8b529ca61e	Make sure the locally cached value of rt->rt_gateway stays stable, even after dropping the reference and unlocking. Previously we have dereferenced a NULL pointer (after r121765). Simply unlocking after the block does not work either because of lock ordering (see r121765) and in addition we would still hold a pointer to something that might be gone by the time we access it. Thus take a copy of the value rather than just caching the pointer. PR: kern/151908 Submitted by: chenyl (netstar2008 126.com) (initial version) MFC after: 2 weeks	2011-03-12 09:41:25 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Michael Tuexen	4c97400f86	Fix bugs related to M_FLOWID: * Store the flowid when receiving an SCTP/IPv6 packet. * Store the flowid when receiving an SCTP packet with wrong CRC. * Initilize flowid correctly. * Put test code under INVARIANTS. MFC after: 3 months.	2011-02-07 15:04:23 +00:00
Randall Stewart	5d40cf5d23	1) Typo correction in comments and one spacing change. 2) Mass update to all copyrights. MFC after: 3 Months	2011-02-05 12:12:51 +00:00
Michael Tuexen	7c99d56fdf	Improve plausibility check in sctp_handle_sack(). Allow cmt_on_off to support values 0 (no CMT), 1 (CMT), and 2 (CMT/RP). MFC after: 3 months.	2010-12-22 17:59:38 +00:00
John Hay	e9a23b5585	Add IFT_L2VLAN to the list that is capable of supplying the ingredients of the EUI64 part of an IPv6 address. Otherwise vlans will all use the MAC address of the first ethernet interface of the system. MFC after: 1 week	2010-12-22 11:58:31 +00:00
Bjoern A. Zeeb	1d5089c2c2	Loosen the locking in nd6-free() again after r216022 to avoid a LOR and a recursed lock. Reported by: delphij Tested by: delphij PR: kern/148857 MFC After: 3 days	2010-12-07 22:43:29 +00:00
Bjoern A. Zeeb	e6950476b9	Plug well observed races on la_hold entries with the callout handler. Call the handler function with the lock held, return unlocked as we might free the entry. Rework functions later in the call graph to be either called with the lock held or, only if needed, unlocked. Place asserts to document and tighten assumptions on various lle locking, which were not always true before. We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was decentralized to minimize possible complexity introduced with the formerly missing locking there. This also resulted in a push down of local variable scopes into smaller blocks. Reported by: many PR: kern/148857 Submitted by: Dmitrij Tejblum (tejblum yandex-team.ru) (original version) MFC After: 4 days	2010-11-29 00:04:08 +00:00
Rebecca Cran	6d79f3f6ae	Fix more continuous/contiguous typos (cf. r215955)	2010-11-27 21:51:39 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb	8987b01ea9	In case of an early return from the function there is no need to zero the route upfront, so defer as long as we can. MFC after: 3 days	2010-11-20 12:27:40 +00:00
Bjoern A. Zeeb	683525038b	Do not initialize flag variables before needed. Consistently use the LLE_ prefix for lla_lookup() and the ND6_ prefix for nd6_lookup() even though both are defined the same. Use the right flag variable when checking each. No real functional change. MFC after: 4 days	2010-11-17 10:43:20 +00:00
Bjoern A. Zeeb	20723e34e3	No need to re-initialize the callout. We initially do it in in6_lltable_new() right after allocation. Worse, we are losing the right flags here. MFC after: 4 days	2010-11-17 09:25:08 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Bjoern A. Zeeb	4a85b5e2ea	Make the IPsec SADB embedded route cache a union to be able to hold both the legacy and IPv6 route destination address. Previously in case of IPv6, there was a memory overwrite due to not enough space for the IPv6 address. PR: kern/122565 MFC After: 2 weeks	2010-10-23 20:35:40 +00:00
Rui Paulo	4c88812572	Purposely tell the compiler that we ignore the return value of ADDCARRY() in the REDUCE macro. Reviewed by: dim, rdivacky	2010-10-13 10:45:22 +00:00
Xin LI	64e0f48e7c	Add a bandaid for a long-standing race condition during route entry un-expiring. The previous version of code have no locking when testing rt_refcnt. The result of the lack of locking may result in a condition where a routing entry have a reference count but at the same time have RTPRF_OURS bit set and an expiration timer. These would eventually lead to a panic: panic: rtqkill route really not free When the system have ICMP redirects accepted from local gateway in a moderate frequency, for instance. Commit this workaround for now until we have some better solution. PR: kern/149804 Reviewed by: bz Tested by: Zhao Xin, Pete French MFC after: 2 weeks	2010-09-27 19:26:56 +00:00
Attilio Rao	5f6bf4518d	IP_BINDANY is not correctly handled in getsockopt() case. Fix it by specifying the correct bits. Sponsored by: Sandvine Incorporated Reviewed by: bz, emaste, rstone Obtained from: Sandvine Incorporated MFC after: 10 days	2010-09-24 14:38:54 +00:00
Michael Tuexen	15537f41b4	Remove unused variables. MFC after: 2 weeks.	2010-09-15 20:41:20 +00:00
Bjoern A. Zeeb	1b48d24533	MFp4 CH=183052 183053 183258: In protosw we define pr_protocol as short, while on the wire it is an uint8_t. That way we can have "internal" protocols like DIVERT, SEND or gaps for modules (PROTO_SPACER). Switch ipproto_{un,}register to accept a short protocol number() and do an upfront check for valid boundries. With this we also consistently report EPROTONOSUPPORT for out of bounds protocols, as we did for proto == 0. This allows a caller to not error for this case, which is especially important if we want to automatically call these from domain handling. () the functions have been without any in-tree consumer since the initial introducation, so this is considered save. Implement ip6proto_{un,}register() similarly to their legacy IP counter parts to allow modules to hook up dynamically. Reviewed by: philip, will MFC after: 1 week	2010-09-02 17:43:44 +00:00
Michael Tuexen	9c7635e18b	Fix the the SCTP_WITH_NO_CSUM option when used in combination with interface supporting CRC offload. While at it, make use of the feature that the loopback interface provides CRC offloading. MFC after: 4 weeks	2010-08-29 18:50:30 +00:00
Michael Tuexen	20083c2eb1	Fix the switching on/off of CMT using sysctl and socket option. Fix the switching on/off of PF and NR-SACKs using sysctl. Add minor improvement in handling malloc failures. Improve the address checks when sending. MFC after: 4 weeks	2010-08-28 17:59:51 +00:00
Hajimu UMEMOTO	365ccde0fb	optp may be NULL.	2010-08-20 17:52:49 +00:00
Ana Kukec	e7a6db7467	Fix mbuf leakages and remove unneccessary duplicate mbuf frees. Use the right copy of an mbuf for the IP6_EXTHDR_CHECK. Reported by: zec, hrs Approved by: bz (mentor)	2010-08-19 23:16:44 +00:00
Ana Kukec	1db8d1f843	MFp4: anchie_soc2009 branch: Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971. The implementation consists of a kernel module that gets packets from the nd6 code, sends them to user space on a dedicated socket and reinjects them back for further processing. Hooks are used from nd6 code paths to divert relevant packets to the send implementation for processing in user space. The hooks are only triggered if the send module is loaded. In case no user space application is connected to the send socket, processing continues normaly as if the module would not be loaded. Unloading the module is not possible at this time due to missing nd6 locking. The native SeND socket is similar to a raw IPv6 socket but with its own, internal pseudo-protocol. Approved by: bz (mentor)	2010-08-19 11:31:03 +00:00
Hajimu UMEMOTO	388288b202	Make `ping6 -I' work with net.inet6.ip6.use_defaultzone=1. MFC after: 2 weeks	2010-08-17 17:30:56 +00:00
Bjoern A. Zeeb	8c09aa57d9	In rip6_input(), in case of multicast, we might skip the normal processing and go to the next iteration early if multicast filtering would decide that this socket shall not receive the data. Unlock the pcb in that case or we leak the read lock and next time trying to get a write lock, would hang forever. PR: kern/149608 Submitted by: Chris Luke (chrisy flirble.org) MFC after: 3 days	2010-08-14 14:13:44 +00:00
Will Andrews	9963e8a52c	Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)	2010-08-11 20:18:19 +00:00
Will Andrews	54bfbd5153	Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks	2010-08-11 00:51:50 +00:00
Bjoern A. Zeeb	4f7495d32a	MFp4 CH180235: Add proto spacers to inet6sw like we have for legacy IP. This allows us to dynamically pf_proto_register() for INET6 from modules, needed by upcoming CARP changes and SeND. MC and SCTP could make use of it as well in theory in the future after upcoming VIMAGE vnet teardown work. Discussed with: will, anchie MFC after: 10 days	2010-08-09 19:53:24 +00:00
Bjoern A. Zeeb	19291ab3de	Document the mandatory argument to the arptimer() and nd6_llinfo_timer() functions with a KASSERT(). Note: there is no need to return after panic. In the legacy IP case, only assign the arg after the check, in the IPv6 case, remove the extra checks for the table and interface as they have to be there unless we freed and forgot to cancel the timer. It doesn't matter anyway as we would panic on the NULL pointer deref immediately and the bug is elsewhere. This unifies the code of both address families to some extend. Reviewed by: rwatson MFC after: 6 days	2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb	101235dcb3	Since r186119 IP6 input counters for octets and packets were not working anymore. In addition more checks and operations were missing. In case lla_lookup results in a match, get the ifaddr to update the statistics counters, and check that the address is neither tentative, duplicate or otherwise invalid before accepting the packet. If ok, record the address information in the mbuf. [ as is done in case lla_lookup does not return a result and we go through the FIB ]. Reported by: remko Tested by: remko MFC after: 2 weeks	2010-07-21 13:01:21 +00:00
Alfred Perlstein	8e96292d91	Fix our version of IPv6 address representation. We do not respect rules 3 and 4 in the required list: 1. omit leading zeros 2. "::" used to their maximum extent whenever possible 3. "::" used where shortens address the most 4. "::" used in the former part in case of a tie breaker 5. do not shorten one 16 bit 0 field 6. use lower case http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04.html Submitted by: Kalluru Abhiram @ Juniper Networks Obtained from: Juniper Networks Reviewed by: hrs, dougb	2010-05-19 00:35:47 +00:00
Kip Macy	83e711ec14	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
Kip Macy	94162961c6	do a proper fix Pointed out by: np@ MFC after: 3 days	2010-05-13 19:47:36 +00:00
Kip Macy	fc21c49a0f	fix compile error on some builds by doing the equivalent of an "extern VNET_DEFINE" without "__used" MFC after: 3 days	2010-05-13 19:36:13 +00:00
Kip Macy	1f93b77267	try working around panic by validating rt and lle MFC after: 3 days	2010-05-12 03:29:11 +00:00
Kip Macy	693810835d	boot time size the flowtable MFC after: 3 days	2010-05-10 21:31:20 +00:00
Kip Macy	77931dd513	Add flowtable support to IPv6 Tested by: qingli@ Reviewed by: qingli@ MFC after: 3 days	2010-05-09 20:32:00 +00:00
Bjoern A. Zeeb	82cea7e6f3	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb	7a657e630d	Enhance the historic behaviour of raw sockets and jails in a way that we allow all possible jail IPs as source address rather than forcing the "primary". While IPv6 naturally has source address selection, for legacy IP we do not go through the pain in case IP_HDRINCL was not set. People should bind(2) for that. This will, for example, allow ping(\|6) -S to work correctly for non-primary addresses. Reported by: (ten 211.ru) Tested by: (ten 211.ru) MFC after: 4 days	2010-04-27 15:07:08 +00:00
Bjoern A. Zeeb	877fc3b64b	Make sure IPv6 source address selection does not change interface addresses while walking the IPv6 address list if in the jail case something is connecting to ::1. Reported by: Pieter de Boer (pieter thedarkside.nl) Tested by: Pieter de Boer (pieter thedarkside.nl) MFC after: 4 days	2010-04-27 15:05:03 +00:00
Konstantin Belousov	99c750a814	Provide 32bit compat for SIOCGDEFIFACE_IN6. Based on submission by: pluknet gmail com Reviewed by: emaste MFC after: 2 weeks	2010-04-27 09:47:14 +00:00
Bjoern A. Zeeb	becba438d2	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
Bruce M Simpson	f1014c074d	When embedding the scope ID in MLDv1 output, check if the scope of the address being embedded is in fact link-local, before attempting to embed it. Note that this operation is a side-effect of trying to avoid recursion on the IN6 scope lock. PR: 144560 Submitted by: Petr Lampa MFC after: 3 days	2010-04-10 12:24:21 +00:00
Michael Tuexen	b5c164935e	* Fix some race condition in SACK/NR-SACK processing. * Fix handling of mapping arrays when draining mbufs or processing FORWARD-TSN chunks. * Cleanup code (no duplicate code anymore for SACKs and NR-SACKs). Part of this code was developed together with rrs. MFC after: 2 weeks.	2010-04-03 15:40:14 +00:00
Bjoern A. Zeeb	d715e397f0	We are holding a write lock here so avoid aquiring it twice calling the "locked" version rather than the wrapper function. MFC after: 6 days	2010-03-25 10:29:00 +00:00
Randall Stewart	1966e5b5a1	The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks	2010-03-12 22:58:52 +00:00
Randall Stewart	9b03990a13	With the recent change of the sctp checksum to support offload, no delayed checksum was added to the ip6 output code. This causes cards that do not support SCTP checksum offload to have SCTP packets that are IPv6 NOT have the sctp checksum performed. Thus you could not communicate with a peer. This adds the missing bits to make the checksum happen for these cards. PR: 144529 MFC after: 2 weeks	2010-03-12 08:10:30 +00:00
Qing Li	c1752bcd65	Use reference counting instead of locking to secure an address while that address is being used to generate temporary IPv6 address. This approach is sufficient and avoids recursive locking. MFC after: 3 days	2010-02-27 07:12:25 +00:00
Pawel Jakub Dawidek	ceda73974b	No need to include security/mac/mac_framework.h here.	2010-02-18 22:30:37 +00:00
Bjoern A. Zeeb	681ffdf935	Correct a typo. Submitted by: kensmith MFC after: 3 days	2010-01-24 10:22:39 +00:00
Bjoern A. Zeeb	4dcc55a363	Garbage collect references to the no longer implemented tcp_fasttimo(). Discussed with: rwatson MFC after: 5 days	2010-01-17 13:07:52 +00:00
Bjoern A. Zeeb	592bcae802	Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control whether to use source address selection (default) or the primary jail address for unbound outgoing connections. This is intended to be used by people upgrading from single-IP jails to multi-IP jails but not having to change firewall rules, application ACLs, ... but to force their connections (unless otherwise changed) to the primry jail IP they had been used for years, as well as for people prefering to implement similar policies. Note that for IPv6, if configured incorrectly, this might lead to scope violations, which single-IPv6 jails could as well, as by the design of jails. [1] Reviewed by: jamie, hrs (ipv6 part) Pointed out by: hrs [1] MFC After: 2 weeks Asked for by: Jase Thew (bazerka beardz.net)	2010-01-17 12:57:11 +00:00
Edward Tomasz Napierala	3745cc73d0	Replace several instances of 'if (!a & b)' with 'if (!(a &b))' in order to silence newer GCC versions.	2010-01-08 15:44:49 +00:00
Bjoern A. Zeeb	1767c52079	Correct a typo. Submitted by: sn_ (sn_ gmx.net) on hackers@ MFC after: 3 days	2010-01-06 23:05:00 +00:00
Qing Li	6f1828763e	The IFA_RTSELF address flag marks a loopback route has been installed for the interface address. This marker is necessary to properly support PPP types of links where multiple links can have the same local end IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which was combined into the route flag bits during prefix installation in IPv6. This inclusion causing the prefix route to be unusable. This patch fixes this bug by excluding the IFA_RTSELF flag during route installation. MFC after: 5 days	2010-01-04 23:39:53 +00:00
Qing Li	baf7c37373	Multiple IPv6 addresses of the same prefix can be installed on the same interface. The first address will install the prefix route into the kernel routing table and that prefix will be marked as on-link. Without RADIX_MPATH enabled, the other address aliases of the same prefix will update the prefix reference count but no other routes will be installed. Consequently the prefixes associated with these addresses would not be marked as on-link. As such, incoming packets destined to these address aliases will fail the ND6 on-link check on input. This patch fixes the above problem by searching the kernel routing table and try to find an on-link prefix on the given interface. MFC after: 5 days	2009-12-30 21:51:23 +00:00
Qing Li	c7ab66020f	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
Bruce M Simpson	aa16623133	Use ALLOW_NEW_SOURCES and BLOCK_OLD_SOURCES to signal a join or leave with SSM MLDv2 by default. This is current practice and complies with RFC 4604, as well as being required by production IPv6 networks in Japan. The behaviour may be disabled by setting the net.inet6.mld.use_allow sysctl/tunable to 0. Requested by: Hideki Yamamoto MFC after: 1 week	2009-12-22 20:40:22 +00:00
Bruce M Simpson	977ff62485	Add missing #include <sys/ktr.h>. Submitted by: Hideki Yamamoto MFC after: 1 week	2009-12-15 10:40:40 +00:00
Bjoern A. Zeeb	de0bd6f76b	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
Bruce M Simpson	1f81c2b6ff	Adapt r197136 to IPv6 stack: Comment some flawed assumptions in in6p_join_group() about mixing SSM full-state and delta-based APIs. MFC after: 1 day	2009-11-19 13:39:07 +00:00
Bruce M Simpson	604a60d1f0	Adapt r197135 to IPv6 stack: Don't allow joins w/o source on an existing group. This is almost always pilot error. We don't need to check for group filter UNDEFINED state at t1, because we only ever allocate filters with their groups, so we unconditionally reject such calls with EINVAL. Trying to change the active filter mode w/o going through IPV6_MSFILTER is also disallowed. MFC after: 1 day	2009-11-19 13:33:23 +00:00
Bruce M Simpson	1ee6b058a2	Adapt r197132 to IPv6 stack: Tighten input checking in in6p_join_group(): * Don't try to use the source address, when its family is unspecified. * If we get a join without a source, on an existing inclusive mode group, this is an error, as it would change the filter mode. Fix a problem with the handling of in6_mfilter for new memberships: * Do not rely on im6f being NULL; it is explicitly initialized to a non-NULL pointer when constructing a membership. * Explicitly initialize *im6f to EX mode when the source address is unspecified. This fixes a problem with in_mfilter slot recycling in the join path. MFC after: 1 day	2009-11-19 13:30:06 +00:00
Bruce M Simpson	0dc5893ef3	Adapt r197314 to IPv6 stack: Return ENOBUFS consistently if user attempts to exceed in_mcast_maxsocksrc resource limit. MFC after: 1 day	2009-11-19 12:21:20 +00:00
Bruce M Simpson	57a9feeaad	Adapt r197130 to IPv6 stack: Fix an obvious logic error in the IPv4 multicast leave processing, where the filter mode vector was not updated correctly after the leave. MFC after: 1 day	2009-11-19 12:18:30 +00:00
Bruce M Simpson	7ab5a5cd1a	Adapt the fix for IGMPv2 in r199287 for the IPv6 stack. Only multicast routing is affected by the issue. MFC after: 1 day	2009-11-19 11:55:19 +00:00
Hajimu UMEMOTO	ef8d671cca	- We are not guaranteed that we're not dropping a reference that we did not add. Call LLE_REMREF() only when callout_stop() actually canceled a pending callout. - callout_reset() may cancel a pending callout. When callout_reset() canceled a pending callout, call LLE_REMREF() to drop a reference for the canceled callout. MFC after: 1 week	2009-11-12 14:48:36 +00:00
Hajimu UMEMOTO	f0c0b1430c	CURVNET_RESTORE() was not called in certain cases. MFC after: 3 days	2009-11-11 08:28:18 +00:00
Hajimu UMEMOTO	287e3cb475	Make nd6_llinfo_timer() does its job, again. ln->la_expire was greater than time_second, in most cases. MFC after: 3 days	2009-11-06 17:34:26 +00:00
Hajimu UMEMOTO	2eb10edccb	Don't call LLE_FREE() after nd6_free(). MFC after: 3 days	2009-11-06 10:07:38 +00:00
Qing Li	6cb2b4e7a8	Use the correct option name in the preprocessor command to enable or disable diagnostic messages. Reviewed by: ru MFC after: 3 days	2009-10-23 18:27:34 +00:00
Bjoern A. Zeeb	14c129fc3e	Explicitly compare to a return code. Discussed with: philip (after we both misread the logic there the 1st time) MFC after: 6 weeks	2009-10-14 12:01:11 +00:00
Hiroki Sato	27f13d5d0f	- Do not assign a link-local address when ND6_IFF_IFDISABLED. Adding a tentative address is useless. - Comment out a confused warning message when in6_ifattach_linklocal() fails. This can occur when the interface does not support ioctl(SIOCAIFADDR) (interfaces associated with 802.11 wireless network device drivers, for example).	2009-10-12 18:54:02 +00:00
Julian Elischer	0b4b0b0fee	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
Hiroki Sato	d7caaef2e5	Enable adding a link-local address even if ND6_IFF_IFDISABLED. Note that when the interface has ND6_IFF_IFDISABLED, a newly-added address is always marked as IN6_IFF_TENTATIVE so that the interface can perform DAD after the ND6_IFF_IFDISABLED is cleared.	2009-10-02 07:00:20 +00:00
Randall Stewart	482444b4a5	Support for VNET in SCTP (hopefully)	2009-09-17 15:11:12 +00:00
Qing Li	9bb7d0f47a	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
Hiroki Sato	a283298ce3	Improve flexibility of receiving Router Advertisement and automatic link-local address configuration: - Convert a sysctl net.inet6.ip6.accept_rtadv to one for the default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a global knob. The default value of the sysctl is 0. - Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a sysctl net.inet6.ip6.auto_linklocal to one for its default value. The default value of the sysctl is 1. - Make ND6_IFF_IFDISABLED more robust. It can be used to disable IPv6 functionality of an interface now. - Receiving RA is allowed if ip6_forwarding==0 and ND6_IFF_ACCEPT_RTADV is set on that interface. The former condition will be revisited later to support a "host + router" box like IPv6 CPE router. The current behavior is compatible with the older releases of FreeBSD. - The ifconfig(8) now supports these ND6 flags as well as "nud", "prefer_source", and "disabled" in ndp(8). The ndp(8) now supports "auto_linklocal". Discussed with: bz and jinmei Reviewed by: bz MFC after: 3 days	2009-09-12 22:08:20 +00:00
Qing Li	d134008aa0	The addresses that are assigned to the loopback interface should be part of the kernel routing table. Reviewed by: bz MFC after: immediately	2009-09-05 20:24:37 +00:00
Qing Li	7dcdecb107	This patch fixes an address scope violation. Considering the scenario where an anycast address is assigned on one interface, and a global address with the same scope is assigned on another interface. In other words, the interface owns the anycast address has only the link-local address as one other address. Without this patch, "ping6" the anycast address from another station will observe the source address of the returned ICMP6 echo reply has the link-local address, not the global address that exists on the other interface in the same node. Reviewed by: bz MFC after: immediately	2009-09-05 16:50:55 +00:00
Qing Li	9452b0d2de	This patch fixes the following issues: - Interface link-local address is not reachable within the node that owns the interface, this is due to the mismatch in address scope as the result of the installed interface address loopback route. Therefore for each interface address loopback route, the rt_gateway field (of AF_LINK type) will be used to track which interface a given address belongs to. This will aid the address source to use the proper interface for address scope/zone validation. - The loopback address is not reachable. The root cause is the same as the above. - Empty nd6 entries are created for the IPv6 loopback addresses only for validation reason. Doing so will eliminate as much of the special case (loopback addresses) handling code as possible, however, these empty nd6 entries should not be returned to the userland applications such as the "ndp" command. Since both of the above issues contain common files, these files are committed together. Reviewed by: bz MFC after: immediately	2009-09-05 16:43:16 +00:00
Qing Li	42cb3aa492	Prefix on-link verification is being performed on statically configured prefixes. Since these statically configured prefixes do not have any associated advertising routers, these prefixes are treated as unreachable and those prefix routes are deleted from the routing table. Therefore bypass prefixes that are not learned from router advertisements during prefix on-link check. Reviewed by: hrs	2009-08-30 02:07:23 +00:00
Qing Li	7bcee7f336	When multiple interfaces exist in the system, with each interface having an IPv6 address assigned to it, and if an incoming packet received on one interface has a packet destination address that belongs to another interface, the routing table is consulted to determine how to reach this packet destination. Since the packet destination is an interface address, the route table will return a host route with the loopback interface as rt_ifp. The input code must recognize this fact, instead of using the loopback interface, the input code performs a search to find the right interface that owns the given IPv6 address. Reviewed by: bz, gnn, kmacy MFC after: immediately	2009-08-26 21:32:50 +00:00
Robert Watson	dc56e98f0d	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Qing Li	09b0354839	A piece of code was added to install a host route when an IPv6 interface address is configured with a /128 prefix. This is no longer necessary due to r192011. In fact that code conflicts with r192011. This patch removes the host route installation when detecting the /128 prefix, and instead let the code added by r192011 to install the loopback route for that IPv6 interface address. Reviewed by: bz Approved by: re	2009-08-12 19:15:26 +00:00
Robert Watson	315e3e38fa	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Qing Li	df813b7ea2	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Bjoern A. Zeeb	a08362ce46	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Bruce M Simpson	b36c89e55f	Fix a problem, whereby misbehaving IPv6 applications, which don't include a valid zone ID or interface identifier in a v6 multicast leave, would trigger a fairly paranoid KASSERT(). Observed with Boost++ regression tests on ref8.freebsd.org. Approved by: re (kib)	2009-07-18 17:38:18 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Qing Li	05b262e264	This patch adds a host route to an interface address (that is assigned to a non loopback/ppp link type) through the loopback interface. Prior to the new L2/L3 rewrite, this host route was explicitly created when processing the IPv6 address assignment. This loopback host route is deleted when that IPv6 address is removed from the interface. Reviewed by: bz, gnn Approved by: re	2009-07-12 19:20:55 +00:00
Robert Watson	5f06a81ae9	Fix "options VIMAGE_GLOBALS" build following introduction of in6_ifaddrhead. Approved by: re (kib)	2009-06-29 15:23:50 +00:00
Robert Watson	f291b9cd38	In in6_update_ifa(), jump to 'cleanup' rather than returning directly in one additional case, avoiding an ifaddr reference leak. Defer releasing the in6_ifaddr's in6_ifaddrhead reference until the end of in6_unlink_ifa(), as callers are inconsistent regarding whether or not they hold a reference across the call. This avoids using the ifaddr after it may have been freed. Reported by: tegge Reviewed by: tegge Approved by: re (blanket) MFC after: 6 weeks	2009-06-27 11:05:53 +00:00
Robert Watson	d1da0a0672	Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks	2009-06-25 16:35:28 +00:00
Robert Watson	3cfed08d1d	Clean up reference management in in6_update_ifa and in6_unlink_ifa, and in particular, add a reference for in6_ifaddrhead since we do remove a reference for it when an IPv6 address is removed. This fixes ifconfig delete of an IPv6 alias. Reported by: tegge MFC after: 6 weeks	2009-06-25 08:37:38 +00:00
Robert Watson	80af0152f3	Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)	2009-06-24 21:00:25 +00:00
Bjoern A. Zeeb	88d166bf19	Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful. Asked for by: rwatson Reviewed by: rwatson	2009-06-23 22:08:55 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Bjoern A. Zeeb	0c88be0499	In r194702 I meant to remove vnet.h which is no longer needed, not route.h.	2009-06-23 14:54:42 +00:00
Bjoern A. Zeeb	14a20db534	in6_rtqdrain() has been unused. Cleanup. As this was the only consumer of net/route.h left remove that as well.	2009-06-23 13:22:19 +00:00
Robert Watson	1099f828b3	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
Roman Divacky	e40bae9a45	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Jamie Gritton	c1f192193d	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
Marko Zec	878a6d7dff	Remove unnecessary #ifdef lines and code. Approved by: julian (mentor)	2009-06-12 09:31:14 +00:00
Colin Percival	9a1bde1808	Prevent integer overflow in direct pipe write code from circumventing virtual-to-physical page lookups. [09:09] Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10] Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11] Approved by: so (cperciva) Approved by: re (not really, but SVN wants this...) Security: FreeBSD-SA-09:09.pipe Security: FreeBSD-SA-09:10.ipv6 Security: FreeBSD-SA-09:11.ntpd	2009-06-10 10:31:11 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Hiroki Sato	dbe5926046	Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones. This change introduces the following two flags to gif(4): accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default. send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default. These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis. Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed. Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks	2009-06-07 23:00:40 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Marko Zec	d825c7936c	V_loif is not an array but a pure pointer, so treat it as such. Reviewed by: bz Approved by: julian (mentor)	2009-06-01 21:29:54 +00:00
Marko Zec	0733f6a615	Remove an #undef MIN that slipped under the radar and led me to hastily introduce an #define MIN() a few lines below in r191816. Approved by: julian (mentor) Discussed with: bz	2009-06-01 20:59:40 +00:00
Bjoern A. Zeeb	c2c2a7c11e	Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible	2009-06-01 15:49:42 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
Pawel Jakub Dawidek	f44270e764	- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it). Discussed with: julian, adrian, jhb, rwatson, kmacy	2009-06-01 10:30:00 +00:00
Jamie Gritton	76ca6f88da	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
Bruce M Simpson	29dc7bc636	Merge final round of MLD changes from p4: ip6_input.c, in6.h: * Add netinet6-specific mbuf flag M_RTALERT_MLD, shadowing M_PROTO6. * Always set this flag if HBH Router Alert option is present for MLD, even when not forwarding. icmp6.c: * In icmp6_input(), spell m->m_pkthdr.rcvif as ifp to be consistent. * Use scope ID for verifying input. Do not apply SSM filters here, no inpcb. * Check for M_RTALERT_MLD when validating MLD traffic, as we can't see IPv6 hop options outside of ip6_input(). in6_mcast.c: * Use KAME scope/zone ID in in6_multi. * Update net.inet6.ip6.mcast.filters implementation to use scope IDs for comparisons. * Fix scope ID treatment in multicast socket option processing. Scope IDs passed in from userland will be ignored as other less ambiguous APIs exist for specifying the link. * Tighten userland input checks in IPv6 SSM delta and full-state ops. * Source filter embedded scope IDs need to be revisited, for now just clear them and ignore them on input. * Adapt KAME behaviour of looking up the scope ID in the default zone for multicast leaves, when the interface is ambiguous. mld6.c: * Tighten origin checks on MLD traffic as per RFC3810 Section 6.2: * ip6_src MAY be the unspecified address for MLDv1 reports. * ip6_src MAY have link-local address scope for MLDv1 reports, MLDv1 queries, and MLDv2 queries. * Perform address field validation before accepting queries. * Use KAME scope/zone ID in query/report processing. * Break const correctness for mld_v1_input_report(), mld_v1_input_query() as we temporarily modify the input mbuf chain. * Clear the scope ID before handoff to userland MLD daemon. * Fix MLDv1 old querier present timer processing. With the protocol defaults, hosts should revert to MLDv2 after 260s. * Add net.inet6.mld.v1enable sysctl, default to on. ifmcstat.c: * Use sysctl by default; -K requests kvm(3) if so compiled. mld.4: * Connect man page to build. Tested using PCS.	2009-05-27 18:57:13 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Bjoern A. Zeeb	6a9148fe92	Implement UDP control block support. So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2. Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes. Submitted by: jhb (7.x version) Reviewed by: rwatson	2009-05-23 16:51:13 +00:00
Bjoern A. Zeeb	db2e47925e	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
Bruce M Simpson	05e5bb311b	Pullup from p4 tip: * Fix MLDv2 general query timer (fallout from automated refactoring). * Refactor MLDv1 timer. MLDv2 query processing is now working.	2009-05-21 18:05:17 +00:00
Bruce M Simpson	0ed39d3ec4	Pullup svn source to p4 top of tree: * Fix LOR in MLDv2 query input path. * Strip embedded KAME scope IDs for on-wire IPv6 address comparisons.	2009-05-21 17:01:38 +00:00
Qing Li	c9d763bf41	When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix. Reviewed by: kmacy	2009-05-20 21:07:15 +00:00
Bjoern A. Zeeb	97ea741513	Add two missing INIT_VNET_INET6(curvnet) to make VIMAGE kernels happier.	2009-05-18 17:48:46 +00:00
Qing Li	511e8a5343	This patch resolves the following issues: -- A routing socket message is not generated when an IPv6 address is either inserted or deleted from an interface. The missing routing message problem was discovered by Randall Stewart and Michael Tuxen during SCTP testing. -- Previously when an IPv6 address is configured on an interface, if the prefix length is /128, then a host route is instaleld in the kernel for this address. But this host route is not deleted when that IPv6 address is removed from the interface. -- Routes to the link-local all-nodes multicast address and the interface-local all-nodes multicast address are not removed when the last IPv6 address is removed from an interface. Reviewed by: bz, gnn	2009-05-18 02:25:45 +00:00
Warner Losh	71ce264c94	Implement RFC 5095 more fully. Rather than marking this no-op code as BURN_BRIDGES, just remove it. Adjust comments. Reviewed by: dwhite, emaste, battlez	2009-05-09 18:25:58 +00:00
Marko Zec	94e9f5a1c2	Remove unnecessary CURVNET_SET() calls where curvnet context is (i.e. seems to be) already set. This should reduce console noise due to curvnet recursion reports. This change has no impact on nooptions VIMAGE builds. Approved by: julian (mentor)	2009-05-06 13:30:46 +00:00
Alexander Kabaev	5b65b8bc5b	Silence unsolicited spam printed out when KTR_MLD happens to be in KTR_COMPILE mask. Compiling KTR trace points in does not necessarily mean enabling them, use proper check against ktr_mask instead.	2009-05-05 16:27:45 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Bruce M Simpson	a3d3b633a9	Limit scope of acquisition of INP_RLOCK for multicast input filter to the scope of its use, even though this may thrash the lock if the INP is referenced for other purposes. Tested by: David Wolfskill	2009-05-01 11:05:24 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Bruce M Simpson	33cde13046	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
Bruce M Simpson	0279cfbe91	Add MLDv2 protocol header, but do not connect it to the build.	2009-04-29 11:31:23 +00:00
Bruce M Simpson	8f002c6ce7	Import IPv6 SSM module but do not connect it to the build.	2009-04-29 11:26:45 +00:00
Bruce M Simpson	ba970783a9	Add IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT, in6addr_linklocal_allv2routers for use by MLDv2. Add IPv6 SSM socket layer membership vector size constants and tree bounds. Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes straight to the RFC 3678 socket options.	2009-04-29 10:22:44 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Bjoern A. Zeeb	3f795dd3c7	Compare protosw pointer with NULL. MFC after: 1 month	2009-04-23 17:41:54 +00:00
Robert Watson	93c83dd8bf	Assert the interface address list lock in IFP_TO_IA6(), as it will iterate the interface address list. Marginally expand IF_ADDR_LOCK() coverage in mld6.c to make sure it's held when IFP_TO_IA6() is called. MFC after: 2 weeks	2009-04-20 22:56:34 +00:00
Robert Watson	c4dd3fe108	Prefer structure fields (ifa_link) to macro aliases for them (ifa_list). MFC after: 2 weeks	2009-04-20 22:45:21 +00:00
Robert Watson	1e6a41398c	Acquire interface address list lock around access to if_addrhead, closing several writer-writer races, and some read-write races. MFC after: 2 weeks	2009-04-20 21:37:46 +00:00
Robert Watson	f68ffa034b	Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually accessing queue(9) structure fields for if_addrhead. Prefer FreeBSD field name if_addrhead to compatibility macro if_addrlist. MFC after: 2 weeks	2009-04-20 21:05:37 +00:00
Robert Watson	ac6ba96269	Close some but not all writer-writer races when maintaining IPv6 interface address lists by locking the interface address list lock. MFC after: 2 weeks	2009-04-20 16:05:16 +00:00
Robert Watson	1e1d603e2f	Lock interface address lists before iterating over them in nd6. MFC after: 2 weeks	2009-04-20 14:41:23 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Kip Macy	de4ab55e43	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry	2009-04-15 20:34:19 +00:00
Robert Watson	e27b0c8775	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00
Robert Watson	f68f9f77fe	Commit file omitted in r190962: Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 11:53:12 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
Bruce M Simpson	443fc3176d	Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved. Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c. v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and have not been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here. Reviewed by: Pavlin Radoslavov	2009-03-19 01:43:03 +00:00
Robert Watson	e5adda3d51	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
Robert Watson	ad71fe3c35	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz	2009-03-15 09:58:31 +00:00
Marius Strobl	c89c8a1029	On architectures with strict alignment requirements compensate the misalignment of the IP header that prepending the EtherIP header might have caused. PR: 131921 MFC after: 1 week	2009-03-07 19:08:58 +00:00
Bjoern A. Zeeb	1263305f0c	Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095. While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets. Reviewed by: gnn (a bit back, earlier version)	2009-03-03 13:12:12 +00:00
Bjoern A. Zeeb	2bebb49117	Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif. Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this. Requested by: peter	2009-03-01 11:01:00 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
Bjoern A. Zeeb	61cab5d638	Shuffle the vimage.h includes or add where missing.	2009-02-27 13:22:26 +00:00
Robert Watson	a714e55f73	Assert the radix head lock in in6_rtqkill(). MFC after: 3 days	2009-02-23 22:58:59 +00:00
Bjoern A. Zeeb	97aa4a517a	Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions. The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate. Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted. Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers. Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments. Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject(). Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?	2009-02-08 09:27:07 +00:00
Jamie Gritton	67c19233f1	Don't bother null-checking the thread pointer before the prison checks in udp6_connect (td is already dereferenced elsewhere without such a check). This makes the conversion from a sockaddr to a sockaddr_in6 always happen, so convert once at the beginning of the function rather than twice in the middle. Approved by: bz (mentor)	2009-02-05 15:04:23 +00:00
Jamie Gritton	7c2f3cb964	Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind. Approved by: bz (mentor)	2009-02-05 14:25:53 +00:00
Jamie Gritton	b89e82dd87	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
Bjoern A. Zeeb	5f16e341d4	When iterating through the list trying to find a router in defrouter_select(), NULL the cached llentry after unlocking as we are no longer interested in it and with the second iteration would try to unlock it again resulting in panic: Lock (rw) lle not locked @ ... Reported by: Mark Atkinson <m.atkinson@f5.com> Tested by: Mark Atkinson <m.atkinson@f5.com> PR: kern/128247 (in follow-up, unrelated to original report)	2009-02-04 10:35:27 +00:00
Randall Stewart	a99b67833a	- Cleanup checksum code. - Prepare for CRC offloading, add MIB counters (RS/MT). - Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT). - Bugfix: Handle close() with SO_LINGER correctly when notifications are generated during the close() call(MT). - Bugfix: Generate DRY event when sender is dry during subscription. Only for 1-to-1 style sockets (RS/MT) - Bugfix: Put vtags for the correct amount of time into time-wait (MT). - Bugfix: Clear vtag entries correctly on expiration (MT). - Bugfix: shutdown() indicates ENOTCONN when called for unconnected 1-to-1 style sockets (MT). - Bugfix: In sctp Auth code (PL). - Add support for devices that support SCTP csum offload (igb). - Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS) Obtained from: With help from Peter Lei and Michael Tuexen	2009-02-03 11:04:03 +00:00
Bjoern A. Zeeb	09f8c3ff36	Remove the single global unlocked route cache ip6_forward_rt from the inet6 stack along with statistics and make sure we properly free the rt in all cases. While the current situation is not better performance wise it prevents panics seen more often these days. After more inet6 and ipsec cleanup we should be able to improve the situation again passing the rt to ip6_forward directly. Leave the ip6_forward_rt entry in struct vinet6 but mark it for removal. PR: kern/128247, kern/131038 MFC after: 25 days Committed from: Bugathon #6 Tested by: Denis Ahrens <denis@h3q.com> (different initial version)	2009-02-01 21:11:08 +00:00
Bjoern A. Zeeb	959e14c15e	Remove unused local MACROs. Submitted by: Christoph Mallon christoph.mallon@gmx.de MFC after: 2 weeks	2009-01-31 17:35:44 +00:00
Bjoern A. Zeeb	39f046dac2	Coalesce two consecutive #ifdef IPSEC blocks. Move the skip_ipsec: label below the goto as we can never have ipsecrt set if we get to that label so there is no need to check. MFC after: 2 weeks	2009-01-31 12:24:53 +00:00
Bjoern A. Zeeb	e173d3df0c	Remove dead code from #if 0: we do not have an ipsrcchk_rt anywhere else. MFC after: 2 weeks	2009-01-31 11:19:20 +00:00
Bjoern A. Zeeb	2e730bea0a	Like with r185713 make sure to not leak a lock as rtalloc1(9) returns a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked and rtfree(9)d rather than just rtfree(9)d. Since the PR was filed, new places with the same problem were added with new code. Also check that the rt is valid before freeing it either way there. PR: kern/129793 Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu> MFC after: 2 weeks Committed from: Bugathon #6	2009-01-31 10:48:02 +00:00
Bjoern A. Zeeb	351c4745f1	Remove 4 entirely unsued ip6 variables. Leave then in struct vinet6 to not break the ABI with kernel modules but mark them for removal so we can do it in one batch when the time is right. MFC after: 1 month	2009-01-30 23:40:24 +00:00
Bjoern A. Zeeb	1cecba0fcd	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
Sam Leffler	cbd1844537	remove too noisy DIAGNOSTIC code Reviewed by: qingli	2009-01-18 07:20:02 +00:00
Qing Li	14981d8057	Revive the RTF_LLINFO flag in route.h. The kernel code is guarded by the new kernel option COMPAT_ROUTE_FLAGS for binary backward compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO. RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS is defined.	2009-01-12 11:24:32 +00:00
Bjoern A. Zeeb	813dd6ae5e	Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well. While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2] PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 21:57:49 +00:00
Bjoern A. Zeeb	5ce0eb7f08	Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 13:06:56 +00:00
Randall Stewart	bbb0e3d9d5	Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"	2009-01-06 13:27:56 +00:00
Randall Stewart	c7c7ea4b5a	Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.	2009-01-06 12:13:40 +00:00
Bjoern A. Zeeb	4b5c098fdf	Switch the last protosw* structs to C99 initializers. Reviewed by: ed, julian, Christoph Mallon <christoph.mallon@gmx.de> MFC after: 2 weeks	2009-01-05 20:29:01 +00:00
Robert Watson	5e48a30d2e	Unlike with struct protosw, several instances of struct ip6protosw did not use C99-style sparse structure initialization, so remove NULL assignments for now-removed pr_usrreq function pointers. Reported by: Chris Ruiz <yr.retarded at gmail.com>	2009-01-04 21:53:42 +00:00
Robert Watson	cba318dc12	struct ip6protosw is a copy of struct protosw, so remove pr_usrreq there to reflect removal from struct protosw. Spotted by: ed	2009-01-04 21:13:51 +00:00
Qing Li	dc49549713	Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure. Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.	2009-01-03 00:27:28 +00:00
Qing Li	8eca593c5a	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
Kip Macy	ee6326a30b	avoid lock recursion by deferring the link check until after LLE lock is dropped	2008-12-24 01:08:18 +00:00
Bjoern A. Zeeb	f5d35259fe	Correct variable name in comment. MFC after: 4 weeks	2008-12-22 12:54:52 +00:00
Qing Li	ebf1c74403	Similar to the INET case, do not destroy the nd6 entries for interface addresses until those addresses are removed. I already made the patch in INET but forgot to bring the code over for INET6.	2008-12-22 07:11:15 +00:00
Bjoern A. Zeeb	099d0bd34b	Only unlock the llentry if it is actually valid. Reported by: ed	2008-12-18 19:09:14 +00:00
Bjoern A. Zeeb	97590249ad	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 13:00:18 +00:00
Bjoern A. Zeeb	dcdb4371ca	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 12:52:34 +00:00
Qing Li	9928dafbb8	Remove the rt argument from nd6_storelladdr() because rt is no longer accessed.	2008-12-17 10:27:34 +00:00
Qing Li	f16e1269b4	A couple of files were not meant to be committed.	2008-12-17 10:19:53 +00:00
Qing Li	bbd8aebaba	in6_clsroute() was applied to prefix routes causing some of them to expire. in6_clsroute() was only applied to cloned routes that are no longer applicable after the arp-v2 commit.	2008-12-17 10:03:49 +00:00
Kip Macy	a614678035	* Compare pointer with NULL * Remove trailing whitespace (added in r186162) * Reduce indentation by rephrasing test Submitted by: Christopher Mallon (christoph dot mallon at gmx dot de)	2008-12-16 23:56:24 +00:00
Kip Macy	fd14c50bbb	- Simplify handling of the deferring of mbuf transmit until after lle lock drop - add a couple of comments to clarify intent	2008-12-16 23:06:36 +00:00
Kip Macy	75bab8b81d	check pointers against NULL	2008-12-16 06:01:08 +00:00
Kip Macy	aba53ef0a6	convert more pointer validation checks to checking against NULL	2008-12-16 03:12:44 +00:00
Kip Macy	d78be3a909	simplify locking in find_pfxlist_reachable_router	2008-12-16 03:05:18 +00:00
Kip Macy	23ee1bfa82	explicitly check return of lla_lookup against NULL	2008-12-16 02:47:22 +00:00
Kip Macy	15209fb6e8	advance tail pointer in nd6_output_lle and check lla_output return against NULL	2008-12-16 02:33:53 +00:00
Kip Macy	688d079b2d	check return from lla_lookup against NULL not zero	2008-12-16 02:30:42 +00:00
Kip Macy	56c423b065	make sure redirect doesn't return without dropping the lock	2008-12-16 02:06:26 +00:00
Kip Macy	83904f7116	need to check that lle is not null before unlock if the break condition is not met also fix the break condition to explicitly check against NULL	2008-12-16 02:05:11 +00:00
Kip Macy	6289115121	unlock the llentry after use in find_pfxlist_reachable_router	2008-12-16 01:58:30 +00:00
Qing Li	3d3728e9f8	Initialize the variable "router", and apply "static_route" flag across the entire nd6_cache_lladdr() function.	2008-12-16 01:21:19 +00:00

... 4 5 6 7 8 ...

1352 Commits