freebsd-dev

Author	SHA1	Message	Date
Robert Watson	f68ffa034b	Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually accessing queue(9) structure fields for if_addrhead. Prefer FreeBSD field name if_addrhead to compatibility macro if_addrlist. MFC after: 2 weeks	2009-04-20 21:05:37 +00:00
Robert Watson	ac6ba96269	Close some but not all writer-writer races when maintaining IPv6 interface address lists by locking the interface address list lock. MFC after: 2 weeks	2009-04-20 16:05:16 +00:00
Robert Watson	1e1d603e2f	Lock interface address lists before iterating over them in nd6. MFC after: 2 weeks	2009-04-20 14:41:23 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Kip Macy	de4ab55e43	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry	2009-04-15 20:34:19 +00:00
Robert Watson	e27b0c8775	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00
Robert Watson	f68f9f77fe	Commit file omitted in r190962: Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 11:53:12 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
Bruce M Simpson	443fc3176d	Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved. Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c. v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and have not been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here. Reviewed by: Pavlin Radoslavov	2009-03-19 01:43:03 +00:00
Robert Watson	e5adda3d51	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
Robert Watson	ad71fe3c35	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz	2009-03-15 09:58:31 +00:00
Marius Strobl	c89c8a1029	On architectures with strict alignment requirements compensate the misalignment of the IP header that prepending the EtherIP header might have caused. PR: 131921 MFC after: 1 week	2009-03-07 19:08:58 +00:00
Bjoern A. Zeeb	1263305f0c	Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095. While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets. Reviewed by: gnn (a bit back, earlier version)	2009-03-03 13:12:12 +00:00
Bjoern A. Zeeb	2bebb49117	Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif. Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this. Requested by: peter	2009-03-01 11:01:00 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
Bjoern A. Zeeb	61cab5d638	Shuffle the vimage.h includes or add where missing.	2009-02-27 13:22:26 +00:00
Robert Watson	a714e55f73	Assert the radix head lock in in6_rtqkill(). MFC after: 3 days	2009-02-23 22:58:59 +00:00
Bjoern A. Zeeb	97aa4a517a	Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions. The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate. Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted. Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers. Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments. Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject(). Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?	2009-02-08 09:27:07 +00:00
Jamie Gritton	67c19233f1	Don't bother null-checking the thread pointer before the prison checks in udp6_connect (td is already dereferenced elsewhere without such a check). This makes the conversion from a sockaddr to a sockaddr_in6 always happen, so convert once at the beginning of the function rather than twice in the middle. Approved by: bz (mentor)	2009-02-05 15:04:23 +00:00
Jamie Gritton	7c2f3cb964	Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind. Approved by: bz (mentor)	2009-02-05 14:25:53 +00:00
Jamie Gritton	b89e82dd87	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
Bjoern A. Zeeb	5f16e341d4	When iterating through the list trying to find a router in defrouter_select(), NULL the cached llentry after unlocking as we are no longer interested in it and with the second iteration would try to unlock it again resulting in panic: Lock (rw) lle not locked @ ... Reported by: Mark Atkinson <m.atkinson@f5.com> Tested by: Mark Atkinson <m.atkinson@f5.com> PR: kern/128247 (in follow-up, unrelated to original report)	2009-02-04 10:35:27 +00:00
Randall Stewart	a99b67833a	- Cleanup checksum code. - Prepare for CRC offloading, add MIB counters (RS/MT). - Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT). - Bugfix: Handle close() with SO_LINGER correctly when notifications are generated during the close() call(MT). - Bugfix: Generate DRY event when sender is dry during subscription. Only for 1-to-1 style sockets (RS/MT) - Bugfix: Put vtags for the correct amount of time into time-wait (MT). - Bugfix: Clear vtag entries correctly on expiration (MT). - Bugfix: shutdown() indicates ENOTCONN when called for unconnected 1-to-1 style sockets (MT). - Bugfix: In sctp Auth code (PL). - Add support for devices that support SCTP csum offload (igb). - Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS) Obtained from: With help from Peter Lei and Michael Tuexen	2009-02-03 11:04:03 +00:00
Bjoern A. Zeeb	09f8c3ff36	Remove the single global unlocked route cache ip6_forward_rt from the inet6 stack along with statistics and make sure we properly free the rt in all cases. While the current situation is not better performance wise it prevents panics seen more often these days. After more inet6 and ipsec cleanup we should be able to improve the situation again passing the rt to ip6_forward directly. Leave the ip6_forward_rt entry in struct vinet6 but mark it for removal. PR: kern/128247, kern/131038 MFC after: 25 days Committed from: Bugathon #6 Tested by: Denis Ahrens <denis@h3q.com> (different initial version)	2009-02-01 21:11:08 +00:00
Bjoern A. Zeeb	959e14c15e	Remove unused local MACROs. Submitted by: Christoph Mallon christoph.mallon@gmx.de MFC after: 2 weeks	2009-01-31 17:35:44 +00:00
Bjoern A. Zeeb	39f046dac2	Coalesce two consecutive #ifdef IPSEC blocks. Move the skip_ipsec: label below the goto as we can never have ipsecrt set if we get to that label so there is no need to check. MFC after: 2 weeks	2009-01-31 12:24:53 +00:00
Bjoern A. Zeeb	e173d3df0c	Remove dead code from #if 0: we do not have an ipsrcchk_rt anywhere else. MFC after: 2 weeks	2009-01-31 11:19:20 +00:00
Bjoern A. Zeeb	2e730bea0a	Like with r185713 make sure to not leak a lock as rtalloc1(9) returns a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked and rtfree(9)d rather than just rtfree(9)d. Since the PR was filed, new places with the same problem were added with new code. Also check that the rt is valid before freeing it either way there. PR: kern/129793 Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu> MFC after: 2 weeks Committed from: Bugathon #6	2009-01-31 10:48:02 +00:00
Bjoern A. Zeeb	351c4745f1	Remove 4 entirely unsued ip6 variables. Leave then in struct vinet6 to not break the ABI with kernel modules but mark them for removal so we can do it in one batch when the time is right. MFC after: 1 month	2009-01-30 23:40:24 +00:00
Bjoern A. Zeeb	1cecba0fcd	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
Sam Leffler	cbd1844537	remove too noisy DIAGNOSTIC code Reviewed by: qingli	2009-01-18 07:20:02 +00:00
Qing Li	14981d8057	Revive the RTF_LLINFO flag in route.h. The kernel code is guarded by the new kernel option COMPAT_ROUTE_FLAGS for binary backward compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO. RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS is defined.	2009-01-12 11:24:32 +00:00
Bjoern A. Zeeb	813dd6ae5e	Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well. While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2] PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 21:57:49 +00:00
Bjoern A. Zeeb	5ce0eb7f08	Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 13:06:56 +00:00
Randall Stewart	bbb0e3d9d5	Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"	2009-01-06 13:27:56 +00:00
Randall Stewart	c7c7ea4b5a	Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.	2009-01-06 12:13:40 +00:00
Bjoern A. Zeeb	4b5c098fdf	Switch the last protosw* structs to C99 initializers. Reviewed by: ed, julian, Christoph Mallon <christoph.mallon@gmx.de> MFC after: 2 weeks	2009-01-05 20:29:01 +00:00
Robert Watson	5e48a30d2e	Unlike with struct protosw, several instances of struct ip6protosw did not use C99-style sparse structure initialization, so remove NULL assignments for now-removed pr_usrreq function pointers. Reported by: Chris Ruiz <yr.retarded at gmail.com>	2009-01-04 21:53:42 +00:00
Robert Watson	cba318dc12	struct ip6protosw is a copy of struct protosw, so remove pr_usrreq there to reflect removal from struct protosw. Spotted by: ed	2009-01-04 21:13:51 +00:00
Qing Li	dc49549713	Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure. Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.	2009-01-03 00:27:28 +00:00
Qing Li	8eca593c5a	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
Kip Macy	ee6326a30b	avoid lock recursion by deferring the link check until after LLE lock is dropped	2008-12-24 01:08:18 +00:00
Bjoern A. Zeeb	f5d35259fe	Correct variable name in comment. MFC after: 4 weeks	2008-12-22 12:54:52 +00:00
Qing Li	ebf1c74403	Similar to the INET case, do not destroy the nd6 entries for interface addresses until those addresses are removed. I already made the patch in INET but forgot to bring the code over for INET6.	2008-12-22 07:11:15 +00:00
Bjoern A. Zeeb	099d0bd34b	Only unlock the llentry if it is actually valid. Reported by: ed	2008-12-18 19:09:14 +00:00
Bjoern A. Zeeb	97590249ad	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 13:00:18 +00:00
Bjoern A. Zeeb	dcdb4371ca	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 12:52:34 +00:00
Qing Li	9928dafbb8	Remove the rt argument from nd6_storelladdr() because rt is no longer accessed.	2008-12-17 10:27:34 +00:00
Qing Li	f16e1269b4	A couple of files were not meant to be committed.	2008-12-17 10:19:53 +00:00
Qing Li	bbd8aebaba	in6_clsroute() was applied to prefix routes causing some of them to expire. in6_clsroute() was only applied to cloned routes that are no longer applicable after the arp-v2 commit.	2008-12-17 10:03:49 +00:00
Kip Macy	a614678035	* Compare pointer with NULL * Remove trailing whitespace (added in r186162) * Reduce indentation by rephrasing test Submitted by: Christopher Mallon (christoph dot mallon at gmx dot de)	2008-12-16 23:56:24 +00:00
Kip Macy	fd14c50bbb	- Simplify handling of the deferring of mbuf transmit until after lle lock drop - add a couple of comments to clarify intent	2008-12-16 23:06:36 +00:00
Kip Macy	75bab8b81d	check pointers against NULL	2008-12-16 06:01:08 +00:00
Kip Macy	aba53ef0a6	convert more pointer validation checks to checking against NULL	2008-12-16 03:12:44 +00:00
Kip Macy	d78be3a909	simplify locking in find_pfxlist_reachable_router	2008-12-16 03:05:18 +00:00
Kip Macy	23ee1bfa82	explicitly check return of lla_lookup against NULL	2008-12-16 02:47:22 +00:00
Kip Macy	15209fb6e8	advance tail pointer in nd6_output_lle and check lla_output return against NULL	2008-12-16 02:33:53 +00:00
Kip Macy	688d079b2d	check return from lla_lookup against NULL not zero	2008-12-16 02:30:42 +00:00
Kip Macy	56c423b065	make sure redirect doesn't return without dropping the lock	2008-12-16 02:06:26 +00:00
Kip Macy	83904f7116	need to check that lle is not null before unlock if the break condition is not met also fix the break condition to explicitly check against NULL	2008-12-16 02:05:11 +00:00
Kip Macy	6289115121	unlock the llentry after use in find_pfxlist_reachable_router	2008-12-16 01:58:30 +00:00
Qing Li	3d3728e9f8	Initialize the variable "router", and apply "static_route" flag across the entire nd6_cache_lladdr() function.	2008-12-16 01:21:19 +00:00
Kip Macy	fbc2ca1bef	unlock and destroy an llentry's lock before freeing Found by: sam	2008-12-16 00:20:49 +00:00
Kip Macy	c0641cc03b	unlock looked up llentrys in defrouter_select	2008-12-16 00:18:04 +00:00
Kip Macy	c2b3a02b38	fix two use after frees in nd6_cache_lladdr caused by last minute unlock shuffling	2008-12-16 00:16:51 +00:00
Bjoern A. Zeeb	fc384fa5d6	Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)	2008-12-15 21:50:54 +00:00
Qing Li	6e6b3f7cbc	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
Kip Macy	41c6def2d1	in6_addroute is called through rnh_addadr which is always called with the radix node head lock held exclusively. Pass RTF_RNH_LOCKED to rtalloc so that rtalloc1_fib will not try to re-acquire the lock.	2008-12-13 20:15:42 +00:00
Bjoern A. Zeeb	1b193af610	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
Kip Macy	7b5ba4e7f0	RTF_RNH_LOCKED needs to be passed in the flags arg not report, apologies to thompsa	2008-12-12 02:07:45 +00:00
Andrew Thompson	84b5117529	Pass RTF_RNH_LOCKED to rtalloc1 sunce the node head is locked, this avoids a recursive lock panic on inet6 detach. Reviewed by: kmacy	2008-12-12 01:46:59 +00:00
Bjoern A. Zeeb	86413abf5f	Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation	2008-12-11 16:26:38 +00:00
Marko Zec	385195c062	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
Warner Losh	609ff41f16	Add missing include to sys/lock.h before sys/rwlock.h	2008-12-08 00:28:21 +00:00
Kip Macy	3120b9d428	- convert radix node head lock from mutex to rwlock - make radix node head lock not recursive - fix LOR in rtexpunge - fix LOR in rtredirect Reviewed by: sam	2008-12-07 21:15:43 +00:00
Randall Stewart	830d754d52	Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks	2008-12-06 13:19:54 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Marko Zec	f02493cbbd	Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-28 23:30:51 +00:00
Bjoern A. Zeeb	6aee2fc550	Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-27 12:04:35 +00:00
Bjoern A. Zeeb	6974bd9e75	Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks () () possibly need to leave a stub wrappers in 7 to keep the symbols.	2008-11-27 10:43:08 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
Bjoern A. Zeeb	0206cdb846	Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-26 20:52:26 +00:00
Bjoern A. Zeeb	a7df09e8c9	Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks	2008-11-26 12:54:31 +00:00
Bjoern A. Zeeb	b0fab0344f	Plug a credential leak in case the inpcb is freed by in6_pcbfree() instead of in_pcbfree(); missed in r183606. Reviewed by: rwatson MFC after: 3 days (instantly for 7.1-RC?)	2008-11-26 12:24:18 +00:00
Marko Zec	44e33a0758	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
Robert Watson	4b908c8bb4	Add a MAC label, MAC Framework, and MAC policy entry points for IPv6 fragment reassembly queues. This allows policies to label reassembly queues, perform access control checks when matching fragments to a queue, update a queue label when fragments are matched, and label the resulting reassembled datagram. Obtained from: TrustedBSD Project	2008-10-26 22:45:18 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Bjoern A. Zeeb	dc3c09c89f	Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199. Reviewed by: silby, rpaulo, mlaier MFC after: 2 months	2008-10-20 18:43:59 +00:00
Bjoern A. Zeeb	6f4da20196	Check that the mbuf len is positive (like we do in the v4 case). Read the other way round this means that even with the checks the m_len turned negative in some cases which led to panics. The reason to my understanding seems to be that the checks are wrong (also for v4) ignoring possible padding when checking cmsg_len or padding after data when adjusting the mbuf. Doing proper cheks seems to break applications like named so further investigation and regression tests are needed. PR: kern/119123 Tested by: Ashish Shukla wahjava gmail.com MFC after: 3 days	2008-10-15 19:24:18 +00:00
Robert Watson	a6a6167314	When disconnecting a UDPv6 socket, acquire the socket lock around the changing of the so_state field, as is done in UDPv4. Remove XXX locking comment. MFC after: 3 days	2008-10-12 20:01:32 +00:00
Bjoern A. Zeeb	55fd3bafdb	Style changes: compare pointer to NULL and move a }. MFC after: 6 weeks	2008-10-04 17:07:58 +00:00
Bjoern A. Zeeb	86d02c5c63	Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred	2008-10-04 15:06:34 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Colin Percival	29a6d781af	Default to ignoring potentially evil IPv6 Neighbor Solicitation messages. Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-08:10.nd6 Thanks to: jinmei, bz	2008-10-02 00:32:59 +00:00
Robert Watson	4a0a13971e	When invoking the udp_send() from udp6_send() due to use of a v6-mapped IPv4 address, first drop the udbinfo and inpcb locks, which will otherwise be recursed. This leads to a potential minor race, but is preferable to a deadlock when acquiring a read lock after a write lock on the inpcb. MFC after: 3 days Reported by: Norbert Papke <fbsd-ml@scrapper.ca>, lioux	2008-09-22 06:44:03 +00:00
Bjoern A. Zeeb	9de45b2780	mld_timerresid() returns ms so instead of doing the maths in usec and then dividing down to ms, do the maths in ms. Obtained from: NetBSD mld6.c rev. 1.47 MFC after: 2 months	2008-09-10 19:42:13 +00:00
Simon L. B. Nielsen	59ca51adba	- Fix amd64 local privilege escalation. [08:07] - Fix nmount(2) local privilege escalation. [08:08] - Fix IPv6 remote kernel panics. [08:09] Fix for [08:07] is merge of r181823. Submitted by: kib [08:07], csjp [08:08], bz [08:09] Reviewed by: peter [08:07], jhb [08:07] Reviewed by: jinmei [08:09], rwatson [08:09] Approved by: re (SA blanket) Approved by: so (simon) Security: FreeBSD-SA-08:07.amd64 Security: FreeBSD-SA-08:08.nmount Security: FreeBSD-SA-08:09.icmp6	2008-09-03 19:09:47 +00:00
Bjoern A. Zeeb	bf0d5f8e16	Fix a bug, when a specially crafted ICMPV6 MLD packet could lead to an integer divide by zero panic in the kernel, if the kernel was run with hz<1000. Neither i386, pc98, amd64 or sparc64 are affected in the currently supported branches and default configuration. Submitted by: Miikka Saukko, Ossi Herrala and Jukka Taimisto from the CROSS project at Codenomicon Ltd. via CERT-FI. Reviewed by: bz, rwatson Security: CVE-2008-2464 MFC after: 8 hours	2008-09-03 08:13:58 +00:00
Robert Watson	7a0a0eecf0	In UDPv6, reduce scope of global udbinfo lock during append to last matching socket by dropping it before udp6_append(), and remove duplicate unlocks of udbinfo and inpcb in sysctl return path. MFC after: 3 days	2008-08-31 13:16:45 +00:00
Julian Elischer	5e5d5c6f17	another missed V_	2008-08-25 06:09:32 +00:00
Julian Elischer	5ed3800e41	Fix some of the formatting fixes.. It's amazing how some thing stand out in a commit message.	2008-08-20 01:24:55 +00:00
Julian Elischer	ac957cd271	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
Bjoern A. Zeeb	f125044552	As part of step 1.5 of the vimage framework resolve conflicts with file local static globals which would be folded onto the same name with the V_ macros. Reviewed by: kris, brooks, simon	2008-08-18 13:16:19 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb	48d48eb980	Fix a regression introduced in r179289 splitting up ip6_savecontrol() into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the mp thus the information will be missing in userland. Istead of going with a * as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument. PR: kern/126349 Reviewed by: rwatson, dwmalone	2008-08-16 06:39:18 +00:00
Robert Watson	2209e8f159	Adopt the slightly weaker consistency locking approach used in IPv4 raw sockets for IPv6 raw sockets: separately lock the inpcb for determining the destination address for a connect()'d raw socket at the rip6_send() layer, and then re-acquire the inpcb lock in the rip6_output() layer to query other options on the socket. Previously, the global raw IP socket lock was used, which while correct and marginally more consistent, could add significantly to global raw IP socket lock contention. MFC after: 1 week	2008-07-30 09:26:27 +00:00
Robert Watson	ae89d5a389	When copying in and out current ICMPv6 filters on a raw IPv6 socket, lock the inpcb and use a local stack variable to copy to/from userspace so that sooptcopyin()/sooptcopyout() aren't called while holding an rwlock. While here, fix a bug in which a failed sooptcopyin() might lead to partially consistent ICMPv6 filters on the socket by not ignoring the error returned by sooptcopyin(). MFC after: 2 weeks	2008-07-29 19:37:16 +00:00
Robert Watson	2f1ff0cd80	Since we fail IPv6 raw socket allocation if inp->in6p_icmp6filt can't be allocated, there's no need to conditionize use and freeing of it later. MFC after: 1 week	2008-07-29 18:09:46 +00:00
Robert Watson	cc29ac7d22	Marginally decomplicate set/getsockopt code in ip6_output.c by simply using the passed arguments explicitly and unconditionally rather than testing them and calling panic(). The result is the same but easier to read. MFC after: 3 days	2008-07-29 09:31:03 +00:00
Alexander Motin	6c5bbf5ce1	Move inpcb lock higher to protect some nonbinding fields reading. It fixes nothing at this time, but decided to be more correct.	2008-07-28 19:32:18 +00:00
Alexander Motin	b11e21ae80	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock.	2008-07-27 20:30:34 +00:00
Bjoern A. Zeeb	078b704233	Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks. Reviewed by: rwatson	2008-07-10 13:31:11 +00:00
Bjoern A. Zeeb	cdcb11b92c	For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t. Reviewed by: rwatson	2008-07-10 13:23:22 +00:00
Randall Stewart	fc14de76f4	1) Adds the rest of the VIMAGE change macros 2) Adds some __UserSpace__ on some of the common defines that the user space code needs 3) Fixes a bug when we send up data to a user that failed. We need to a) trim off the data chunk headers, if present, and b) make sure the frag bit is communicated properly for the msgs coming off the stream queues... i.e. we see if some of the msg has been taken. Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!	2008-07-09 16:45:30 +00:00
Bjoern A. Zeeb	a55b8b2068	Document required locking in in6_sleectsrc() in case an inp is passed in by adding an assert. Requested by: rwatson Reviewed by: rwatson	2008-07-09 16:33:21 +00:00
Bjoern A. Zeeb	f2f877d38c	Change the parameters to in6_selectsrc(): - pass in the inp instead of both in6p_moptions and laddr. - pass in cred for upcoming prison checks. Reviewed by: rwatson	2008-07-08 18:41:36 +00:00
Robert Watson	963e491243	Use soreceive_dgram() and sosend_dgram() with UDPv6, as we do with UDPv4. Tested by: ps MFC after: 3 months	2008-07-08 10:15:23 +00:00
Robert Watson	65c577c01d	Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp6_append(). MFC after: 1 month	2008-07-07 10:11:17 +00:00
Robert Watson	0ae76120da	Improve approximation of style(9) in raw socket code.	2008-07-05 18:03:39 +00:00
Robert Watson	4f7d1876d5	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
Robert Watson	59dd72d040	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
Robert Watson	aaa37a7e4e	Remove GIANT_REQUIRED from IPv6 input, forward, and frag6 code. The frag6 code is believed to be MPSAFE, and leaving aside the IPv6 route cache in forwarding, Giant appears not to adequately synchronize the data structures in the input or forwarding paths.	2008-07-03 10:55:13 +00:00
Robert Watson	0a2fe17365	Set the IPv6 netisr handler as NETISR_MPSAFE on the basis that, despite there still being some well-known races in mld6 and nd6, running with Giant over the netisr handler provides little or not additional synchronization that might cause mld6 and nd6 to behave better.	2008-07-02 23:12:40 +00:00
Bjoern A. Zeeb	2d8bba43bd	Try to fix errors introduced in svn180085/cvs rev. 1.10: * Include ip6_var.h for ip6stat. * Use the correct name under ip6stat: `ip6s_cantforward' instead of its IPv4 counterpart. MFC after: 10 days	2008-06-29 07:34:21 +00:00
Alexander Kabaev	2ce7b410dc	Repair botched variable rename. Pointy hat to: julian	2008-06-29 04:33:45 +00:00
Julian Elischer	b3fb530c76	Oops, we've been incrementing the wrong cantforward variable. Obtained from: vimage tree	2008-06-29 00:25:16 +00:00
Julian Elischer	5f9a5768d2	Rename two vars so that they are different from the same vars in ipv4. They are static so it was not a problem 'per se' but it was confusing to the reader. Obtained from: vimage tree	2008-06-29 00:17:45 +00:00
Randall Stewart	b3f1ea41fd	- Macro-izes the packed declaration in all headers. - Vimage prep - these are major restructures to move all global variables to be accessed via a macro or two. The variables all go into a single structure. - Asconf address addition tweaks (add_or_del Interfaces) - Fix rwnd calcualtion to be more conservative. - Support SACK_IMMEDIATE flag to skip delayed sack by demand of peer. - Comment updates in the sack mapping calculations - Invarients panic added. - Pre-support for UDP tunneling (we can do this on MAC but will need added support from UDP to get a "pipe" of UDP packets in. - clear trace buffer sysctl added when local tracing on. Note the majority of this huge patch is all the vimage prep stuff :-)	2008-06-14 07:58:05 +00:00
Robert Watson	9622e84fcf	Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week	2008-05-29 08:27:14 +00:00
Bjoern A. Zeeb	9a38ba8101	Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days	2008-05-24 15:20:48 +00:00
Randall Stewart	c54a18d26b	- Adds support for the multi-asconf (From Kozuka-san) - Adds some prepwork (Not all yet) for vimage in particular support the delete the sctppcbinfo.xx structs. There is still a leak in here if it were to be called plus we stil need the regrouping (From Me and Michael Tuexen) - Adds support for UDP tunneling. For BSD there is no socket yet setup so its disabled, but major argument changes are in here to emcompass the passing of the port number (zero when you don't have a udp tunnel, the default for BSD). Will add some hooks in UDP here shortly (discussed with Robert) that will allow easy tunneling. (Mainly from Peter Lei and Michael Tuexen with some BSD work from me :-D) - Some ease for windows, evidently leave is reserved by their compile move label leave: -> out: MFC after: 1 week	2008-05-20 13:47:46 +00:00
Julian Elischer	8b07e49a00	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
Robert Watson	c7bc5dc1f5	Acquire a read lock, rather than a write lock, on a UDPv6 inpcb when delivering to the socket or extracting socket details for monitoring purposes. MFC after: 3 months	2008-04-22 12:20:33 +00:00
Robert Watson	bb145f600c	In ICMPv6, read lock rather than write lock the inpcb on receive. MFC after: 3 months	2008-04-21 12:08:40 +00:00
Robert Watson	9ad11dd8a4	With IPv4 raw sockets, read lock rather than write lock the inpcb when receiving or transmitting. With IPv6 raw sockets, read lock rather than write lock the inpcb when receiving. Unfortunately, IPv6 source address selection appears to require a write lock on the inpcb for the time being. MFC after: 3 months	2008-04-21 12:06:41 +00:00
Robert Watson	8328afb791	When querying a local or remote address on an IPv6 socket, use only a read lock on the inpcb. MFC after: 3 months	2008-04-19 14:36:19 +00:00
Robert Watson	8501a69cc9	Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)	2008-04-17 21:38:18 +00:00
Randall Stewart	276ca5012c	- Have SCTP use the new pru_flush functionality PR: 122710 MFC after: 1 week	2008-04-14 18:12:37 +00:00
Qing Li	e440aed958	This patch provides the back end support for equal-cost multi-path (ECMP) for both IPv4 and IPv6. Previously, multipath route insertion is disallowed. For example, route add -net 192.103.54.0/24 10.9.44.1 route add -net 192.103.54.0/24 10.9.44.2 The second route insertion will trigger an error message of "add net 192.103.54.0/24: gateway 10.2.5.2: route already in table" Multiple default routes can also be inserted. Here is the netstat output: default 10.2.5.1 UGS 0 3074 bge0 => default 10.2.5.2 UGS 0 0 bge0 When multipath routes exist, the "route delete" command requires a specific gateway to be specified or else an error message would be displayed. For example, route delete default would fail and trigger the following error message: "route: writing to routing socket: No such process" "delete net default: not in table" On the other hand, route delete default 10.2.5.2 would be successful: "delete net default: gateway 10.2.5.2" One does not have to specify a gateway if there is only a single route for a particular destination. I need to perform more testings on address aliases and multiple interfaces that have the same IP prefixes. This patch as it stands today is not yet ready for prime time. Therefore, the ECMP code fragments are fully guarded by the RADIX_MPATH macro. Include the "options RADIX_MPATH" in the kernel configuration to enable this feature. Reviewed by: robert, sam, gnn, julian, kmacy	2008-04-13 05:45:14 +00:00
Robert Watson	f457d58098	In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and eliminate unnecessary local variable caching of the list head pointer, making the code a bit easier to read. MFC after: 3 weeks	2008-04-06 21:20:56 +00:00
Ruslan Ermilov	ea26d58729	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
Bjoern A. Zeeb	9e3bdede0f	Correct IPsec behaviour with a 'use' level in SP but no SA available. In that case return an continue processing the packet without IPsec. PR: 121384 MFC after: 5 days Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]	2008-03-14 16:38:11 +00:00
Bjoern A. Zeeb	8cfbd2995b	Correct reference counting on the SP for outgoing IPv6 IPsec connections. PR: 121374 Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) MFC after: 5 days	2008-03-14 11:55:04 +00:00
Bjoern A. Zeeb	39d8cf90cb	#if 0 out a currently unsued (and incomplete) function: ip6_ipsec_mtu(). No need to compile 'dead' code. I am leaving it in because we will have to review the concept and should use the common function in various places. MFC after: 5 days	2008-03-14 11:44:30 +00:00
Bjoern A. Zeeb	41aa71dd3e	Replace the function name in two identical printfs by __func__, __LINE__ so we can distinguish them when people report a problem. PR: 121373 MFC after: 5 days	2008-03-14 11:09:11 +00:00
Bjoern A. Zeeb	c26fe973a3	Rather than passing around a cached 'priv', pass in an ucred to ipsec_set_policy and do the privilege check only if needed. Try to assimilate both ip_ctloutput code blocks calling ipsec*_set_policy. Reviewed by: rwatson	2008-02-02 14:11:31 +00:00
Bjoern A. Zeeb	79ba395267	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)	2008-01-24 08:25:59 +00:00

1 2 3 4 5 ...

965 Commits